AMA: Ajeya Cotra, researcher at Open Phil
[EDIT: Thanks for the questions everyone! Just noting that I’m mostly done answering questions, and there were a few that came in Tuesday night or later that I probably won’t get to.]
Hi everyone! I’m Ajeya, and I’ll be doing an Ask Me Anything here. I’ll plan to start answering questions Monday Feb 1 at 10 AM Pacific. I will be blocking off much of Monday and Tuesday for question-answering, and may continue to answer a few more questions through the week if there are ones left, though I might not get to everything.
About me: I’m a Senior Research Analyst at Open Philanthropy, where I focus on cause prioritization and AI. 80,000 Hours released a podcast episode with me last week discussing some of my work, and last September I put out a draft report on AI timelines which is discussed in the podcast. Currently, I’m trying to think about AI threat models and how much x-risk reduction we could expect the “last long-termist dollar” to buy. I joined Open Phil in the summer of 2016, and before that I was a student at UC Berkeley, where I studied computer science, co-ran the Effective Altruists of Berkeley student group, and taught a student-run course on EA.
I’m most excited about answering questions related to AI timelines, AI risk more broadly, and cause prioritization, but feel free to ask me anything!
- EA Forum Prize: Winners for January 2021 by 2 Apr 2021 2:58 UTC; 43 points) (
- Running an AMA on the EA Forum by 18 Feb 2021 1:44 UTC; 38 points) (
- What should I ask Ajeya Cotra — senior researcher at Open Philanthropy, and expert on AI timelines and safety challenges? by 28 Oct 2022 15:28 UTC; 23 points) (
- EA Updates for February 2021 by 29 Jan 2021 14:17 UTC; 23 points) (
Hi Ajeya! I”m a huge fan of your timelines report, it’s by far the best thing out there on the topic as far as I know. Whenever people ask me to explain my timelines, I say “It’s like Ajeya’s, except...”
My question is, how important do you think it is for someone like me to do timelines research, compared to other kinds of research (e.g. takeoff speeds, alignment, acausal trade...)
I sometimes think that even if I managed to convince everyone to shift from median 2050 to median 2032 (an obviously unlikely scenario!), it still wouldn’t matter much because people’s decisions about what to work on are mostly driven by considerations of tractability, neglectedness, personal fit, importance, etc. and even that timelines difference would be a relatively minor consideration. On the other hand, intuitively it does feel like the difference between 2050 and 2032 is a big deal and that people who believe one when the other is true will probably make big strategic mistakes.
Bonus question: Murphyjitsu: Conditional on TAI being built in 2025, what happened? (i.e. how was it built, what parts of your model were wrong, what do the next 5 years look like, what do the 5 years after 2025 look like?)
Thanks so much, that’s great to hear! I’ll answer your first question in this comment and leave a separate reply for your Murphyjitsu question.
First of all, I definitely agree that the difference between 2050 and 2032 is a big deal and worth getting to the bottom of; it would make a difference to Open Phil’s prioritization (and internally we’re trying to do projects that could convince us of timelines significantly shorter than in my report). You may be right that it could have a counterintuitively small impact on many individual people’s career choices, for the reasons you say, but I think many others (especially early career people) would and should change their actions substantially.
I think there are roughly three types of reasons why Bob might disagree with Alice about a bottom line conclusion like TAI timelines, which correspond to three types of research or discourse contributions Bob could make in this space:
1. Disagreements can come from Bob knowing more facts than Alice about a key parameter, which can allow Bob to make “straightforward corrections” to Alice’s proposed value for that parameter. E.g., “You didn’t think much about hardware, but I did a solid research project into hardware and I think experts would agree that because of optical computing progress will be faster than you assumed; changing to the better values makes timelines shorter.” If Bob does a good enough job with this empirical investigation, Alice will often just say “Great, thanks!” and adopt Bob’s number.
2. Disagreements can come from Bob modeling out a part of the world in more mechanistic detail that Alice fudged or simplified, which can allow Bob to propose a better structure than Alice’s model. E.g., “You agree that earlier AI systems can generate revenue which can be reinvested into AI research but you didn’t explicitly model that and just made a guess about spending trajectory; I’ll show that accounting for this properly would make timelines shorter.” Alice may feel some hesitance adopting Bob’s model wholesale here, because Alice’s model may fudge/elide one thing in an overly-conservative direction which she feels is counterbalanced by fudging/eliding another thing in an overly-aggressive direction, but it will often be tractable to argue that the new model is better and Alice will often be happy to adopt it (perhaps changing some other fudged parameters a little to preserve intuitions that seemed important to her).
3. Finally, disagreements can come from differences in intuition about the subjective weight different considerations should get when coming up with values for the more debatable parameters (such as the different biological anchor hypotheses). It’s more difficult for Bob to make a contribution toward changing Alice’s bottom line here, because a lot of the action is in hard-to-access mental alchemy going on in Alice and Bob’s minds when they make difficult judgment calls. Bob can try to reframe things, offer intuition pumps, trace disagreements about one topic back to a deeper disagreement about another topic and argue about that, and so on, but he should expect it to be slow going and expect Alice to be pretty hard to move.
In my experience, most large and persistent disagreements between people about big-picture questions like TAI timelines or the magnitude of risk from AI are mostly the third kind of disagreement, and these disagreements can be entangled with dozens of other differences in background assumptions / outlook / worldview. My sense is that your most major disagreements with me fall into the third category: you think that I’m overweighting the hypothesis that we’d need to do meta-learning in which the “inner loop” takes a long subjective time; you may also think that I’m underweighting the possibility of sudden takeoff or overweighting the efficiency of markets in a certain way, which leads me to lend too much credence to considerations like “Well if the low end of the compute range is actually right, we should probably be seeing more economic impact from the slightly-smaller AI systems right now.” If you were to change my mind on this, it might not even be from doing “timelines research”: maybe you do “takeoff speeds research” that convinces me to take sudden takeoff more seriously, which in turn causes me to take shorter timelines (which would imply more sudden takeoff) more seriously.
I’d say tackling category 3 disagreements is high risk and effort but has the possibility of high reward, and tackling category 1 disagreements is lower risk and effort with more moderate reward. My subjective impression is that EAs tend to under-invest in tackling categories 1 and 2 because they perceive category 3 as where the real action is—in some sense they’re right about that, but they may underestimate how hard it’ll be to change people’s minds there. For example, changing someone’s minds about a category 3 disagreement often greatly benefits from having a lot of face time with them, which isn’t very scalable, and arguments may be more particular to individuals: what finally convinces Alice may not be moving to Charlie.
I think one potential way to get at a category 3 disagreement about a long-term forecast is by proposing bets about nearer-term forecasts, although I think this is often a lot harder than it sounds, because people are sensitive to the possibility of “losing on a technicality”: they were right about the big picture but wrong about how that big picture actually translates to a near-term prediction. Even making short-term bets often benefits from having a lot of face time to hash out the terms.
It occurred to me that another way to try to move someone on complicated category 3 disagreements might be to put together a well-constructed survey of a population that the person is inclined to defer to. This approach is definitely still tricky: you’d have to convince the person that the relevant population was provided with the strongest arguments for that person’s view in addition to your counterarguments, and that the individuals surveyed were thinking about it reasonably hard. But if done well, it could be pretty powerful.
Thanks, this was a surprisingly helpful answer, and I had high expectations!
This is updating me somewhat towards doing more blog posts of the sort that I’ve been doing. As it happens, I have a draft of one that is very much Category 3, let me know if you are interested in giving comments!
Your sense of why we disagree is pretty accurate, I think. The only thing I’d add is that I do think we should update downwards on low-end compute scenarios because of market efficiency considerations, just not as strongly as you perhaps, and moreover I also think that we should update upwards for various reasons (the surprising recent sucesses of deep learning, the fact that big corporations are investing heavily-by-historical-standards in AI, the fact that various experts think they are close to achieving AGI) and the upwards update mostly cancels out the downwards update IMO.
Update: The draft I mentioned is now a post!
An extension of Daniel’s bonus question:
If I condition on your report being wrong in an important way (either in its numerical predictions, or via conceptual flaws) and think about how we might figure that out today, it seems like two salient possibilities are inside-view arguments and outside-view arguments.
The former are things like “this explicit assumption in your model is wrong”. E.g. I count my concern about the infeasibility of building AGI using algorithms available in 2020 as an inside-view argument.
The latter are arguments that, based on the general difficulty of forecasting the future, there’s probably some upcoming paradigm shift or crucial consideration which will have a big effect on your conclusions (even if nobody currently knows what it will be).
Are you more worried about the inside-view arguments of current ML researchers, or outside-view arguments?
I generally spend most of my energy looking for inside-view considerations that might be wrong, because they are more likely to suggest a particular directional update (although I’m not focused only on inside view arguments specifically from ML researchers, and place a lot of weight on inside view arguments from generalists too).
It’s often hard to incorporate the most outside-view considerations into bottom line estimates, because it’s not clear what their implication should be. For example, the outside-view argument “it’s difficult to forecast the future and you should be very uncertain” may imply spreading probability out more widely, but that would involve assigning higher probabilities to TAI very soon, which is in tension with another outside view argument along the lines of “Predicting something extraordinary will happen very soon has a bad track record.”
Shouldn’t a combination of those two heuristics lead to spreading out the probability but with somewhat more probability mass on the longer-term rather than the shorter term?
That’s fair, and I do try to think about this sort of thing when choosing e.g. how wide to make my probability distributions and where to center them; I often make them wider than feels reasonable to me. I didn’t mean to imply that I explicitly avoid incorporating such outside view considerations, just that returns to further thinking about them are often lower by their nature (since they’re often about unkown-unkowns).
True. My main concern here is the lamppost issue (looking under the lamppost because that’s where the light is). If the unknown unknowns affect the probability distribution, then personally I’d prefer to incorporate that or at least explicitly acknowledge it. Not a critique—I think you do acknowledge it—but just a comment.
Just in case any readers would misinterpret that statement: I’m pretty sure that what Daniel is saying is unlikely is not that TAI will be built in 2032, but rather that he would be able to convince to shift their median to that date. I think Daniel’s median for when TAI will be built is indeed somewhere around 2032 or perhaps sooner. (I think that based on conversations around October and this post. Daniel can of course correct me if I’m wrong!)
(Maybe no readers would’ve misinterpreted Daniel anyway and this is just a weird comment...)
Yep, my current median is something like 2032. It fluctuates depending on how I estimate it, sometimes I adjust it up or down a bit based on how I’m feeling in the moment and recent updates, etc.
On the object level, I think it would probably turn out to be the case that a) I was wrong about horizon length and something more like ~1 token was sufficient, b) I was wrong about model size and something more like ~10T parameter was sufficient. On a deeper level, it would mean I was wrong about the plausibility of ultra-sudden takeoff and shouldn’t have placed as much weight as I did on the observation that AI isn’t generating a lot of annual revenue right now and its value-added seems to have been increasing relatively smoothly so far.
I would guess that the model looks like a scaled-up predictive model (natural language and/or code), perhaps combined with simple planning or search. Maybe a coding model rapidly trains more-powerful successors in a pretty classically Bostromian / Yudkowskian way.
Since this is a pretty Bostromian scenario, and I haven’t thought deeply about those scenarios, I would default to guessing that the world after looks fairly Bostromian, with risks involving the AI forcibly taking control of most of the world’s resources, and the positive scenario involving cooperatively using the AI to prevent other x-risks (including risks from other AI projects).
Re why AI isn’t generating much revenue—have you considered the productivity paradox? It’s historically normal that productivity slows down before steeply increasing when a new general purpose technologies arrives.
See “Why Future Technological Progress Is Consistent with Low Current Productivity Growth” in “Artificial Intelligence and the Modern Productivity Paradox”
Not sure how relevant, but I saw that Gwern seems to think this comes from a bottleneck of people who can apply AI, not from current AI being insufficient:
And the lack of coders may rapidly disappear soon-ish, right? At least in Germany studying ML seems very popular since a couple of years now.
In some sense I agree with gwern that the reason ML hasn’t generated a lot of value is because people haven’t put in the work (both coding and otherwise) needed to roll it out to different domains, but (I think unlike gwern) the main inference I make from that that it wouldn’t have been hugely profitable to put in the work to create ML-based applications (or else more people would have been diverted from other coding tasks to the task of rolling out ML applications).
I mostly agree with that with the further caveat that I tend to think the low value reflects not that ML is useless but the inertia of a local optima where the gains from automation are low because so little else is automated and vice-versa (“automation as colonization wave”). This is part of why, I think, we see the broader macroeconomic trends like big tech productivity pulling away: many organizations are just too incompetent to meaningful restructure themselves or their activities to take full advantage. Software is surprisingly hard from a social and organizational point of view, and ML more so. A recent example is coronavirus/remote-work: it turns out that remote is in fact totally doable for all sorts of things people swore it couldn’t work for—at least when you have a deadly global pandemic solving the coordination problem...
As for my specific tweet, I wasn’t talking about making $$$ but just doing cool projects and research. People should be a little more imaginative about applications. Lots of people angst about how they can possibly compete with OA or GB or DM, but the reality is, as crowded as specific research topics like ‘yet another efficient Transformer variant’ may be, as soon as you add on a single qualifier like, ‘DRL for dairy herd management’ or ‘for anime’, you suddenly have the entire field to yourself. There’s a big lag between what you see on Arxiv and what’s out in the field. Even DL from 5 years ago, like CNNs, can be used for all sorts of things which they are not at present. (Making money or capturing value is, of course, an entirely different question; as fun as This Anime Does Not Exist may be, there’s not really any good way to extract money. So it’s a good thing we don’t do it for the money.)
Ah yeah, that makes sense—I agree that a lot of the reason for low commercialization is local optima, and also agree that there are lots of cool/fun applications that are left undone right now.
What type of funding opportunities related to AI Safety would OpenPhil want to see more of?
Anything else you can tell me about the funding situation with regards to AI Safety. I’m very confused about why not more people and projects get funded. Is because there is not enough money, or if there is some bottleneck related to evaluation and/or trust?
I primarily do research rather than grantmaking, but I can give my speculations about what grant opportunities people on the grantmaking side of the organization would be excited about. In general, I think it’s exciting when there is an opportunity to fund a relatively senior person with a strong track record who can manage or mentor a number of earlier-career people, because that provides an opportunity for exponential growth in the pool of people who are working on these issues. For example, this could look like funding a new professor who is aligned with our priorities in a sub-area and wants to mentor students to work on problems we are excited about in that sub-area.
In terms of why more people and projects don’t get funded: at least at Open Phil, grantmakers generally try not to evaluate large numbers of applications or inquiries from earlier-career people individually, because each evaluation can be fairly intensive but the grant size is often relatively small; grantmakers at Open Phil prefer to focus on investigations that could lead to larger grants. Open Phil does offer some scholarships for early career researchers (e.g. here and here), but in general we prefer that this sort of grantmaking be handled by organizations like EA Funds.
Looking at the mistakes you’ve made in the past, what fraction of your (importance-weighted) mistakes would you classify the issue as being:
Not being aware of the relevant empirical details/facts (that is both in principle and in practice within your ability to find) versus
Being wrong about stuff due to reasoning errors (that is both in principle and in practice within your ability to correct for)
And what ratios would you assign to this for EAs/career EAs in general?
For context, a coworker and I recently had a discussion about, loosely speaking, whether it was more important for junior researchers within EA to build domain knowledge or general skills. Very very roughly, my coworker was more on the former case because he thought that EAs had an undersupply of domain knowledge over so-called “generalist skills.” However, I leaned more on the latter side of this debate because I weakly believe that more of my mistakes (and more of my most critical mistakes) were due to errors of cognition rather than insufficient knowledge of facts. (Obviously credit assignment is hard in both cases).
I think the inclusion of “in principle” makes the answer kind of boring—when we’re not thinking about practicality at all, I think I’d definitely prefer to know more facts (about e.g. the future of AI or what would happen in the world if we pursued strategy A vs strategy B) than to have better reasoning skills, but that’s not a very interesting answer.
In practice, I’m usually investing a lot more in general reasoning, because I’m operating in a domain (AI forecasting and futurism more generally) where it’s pretty expensive to collect new knowledge/facts, it’s pretty difficult to figure out how to connect facts about the present to beliefs about the distant future, and facts you could gather in 2021 are fairy likely to be obsoleted by new developments in 2022. So I would say most of my importance-weighted errors are going to be in the general reasoning domain. I think it’s fairy similar for most people at Open Phil, and most EAs trying to do global priorities research or cause prioritization, especially within long-termism. I think the more object-level your work is, the more likely it is that your big mistakes will involve being unaware of empirical details.
However, investing in general reasoning doesn’t often look like “explicitly practicing general reasoning” (e.g. doing calibration training, studying probability theory or analytic philosophy, etc). It’s usually incidental improvement that’s happening over the course of a particular project (which will often involve developing plenty of content knowledge too).
Interesting answer.
Given that, could you say a bit more about how “investing from general reasoning” differs from “just working on projects based on what I expect to be directly impactful / what my employer said I should do”, and from “trying to learn content knowledge about some domain(s) while forming intuitions, theories, and predictions about those domain(s)”?
I.e., concretely, what does your belief that “investing in general reasoning” is particularly valuable lead you to spend more or less time doing (compared to if you believed content knowledge was particularly valuable)?
Your other reply in this thread makes me think that maybe you actually think people should basically just spend almost all of their time directly working on projects they expect to be directly impactful, and trust that they’ll pick up both improvements in their general reasoning skills and content knowledge along the way?
For a concrete example: About a month ago, I started making something like 3-15 Anki cards a day as I do my research (as well as learning random things on the side, e.g. from podcasts), and I’m spending something like 10-30 minutes a day reviewing them. This will help with the specific, directly impactful things I’m working on, but it’s not me directly working on those projects—it’s an activity that’s more directly focused on building content knowledge. What would be your views on the value of that sort of thing?
(Maybe the general reasoning equivalent would be spending 10-30 minutes a day making forecasts relevant to the domains one is also concurrently doing research projects on.)
Personally, I don’t do much explicit, dedicated practice or learning of either general reasoning skills (like forecasts) or content knowledge (like Anki decks); virtually all of my development on these axes comes from “just doing my job.” However, I don’t feel strongly that this is how everyone should be—I’ve just found that this sort of explicit practice holds my attention less and subjectively feels like a less rewarding and efficient way to learn, so I don’t invest in it much. I know lots of folks who feel differently, and do things like Anki decks, forecasting practice, or both.
Oh, actually, that all mainly relates to just one underlying reason why the sort of question Linch and I have in mind matters, which is that it could inform how much time EA researchers spend on various different types of specific tasks in their day-to-day work, and what goals they set for themselves on the scale of weeks/months.
Another reason this sort of question matters is that it could inform whether researchers/orgs:
Invest time in developing areas of expertise based essentially around certain domains of knowledge (e.g., nuclear war, AI risk, politics & policy, consciousness), and try to mostly work within those domains (even when they notice a specific high-priority question outside of that domain which no one else is tackling, or when someone asks them to tackle a question outside of that domain, or similar)
Try to become skilled generalists, tackling whatever questions seem highest priority on the margin in a general sense (without paying too much attention to personal fit), or whatever questions people ask them to tackle, or similar, even if those questions are in domains they currently have very little expertise in
(This is of course really a continuum. And there could be other options that aren’t highlighted by the continuum—e.g. developing expertise in some broadly applicable skillsets like forecasting or statistics or maybe policy analysis, and then applying those skills wherever seems highest priority on the margin.)
So I’d be interested in your thoughts on that tradeoff as well. You suggesting that improving on general reasoning often (in some sense) matters more than improving on content knowledge would seem to maybe imply that you lean a bit more towards option 2 in many cases?
My answer to this one is going to be a pretty boring “it depends” unfortunately. I was speaking to my own experience in responding to the top level question, and since I do a pretty “generalist”-y job, improving at general reasoning is likely to be more important for me. At least when restricting to areas that seem highly promising from a long-termist perspective, I think questions of personal fit and comparative advantage will end up determining the degree to which someone should be specialized in a particular topic like machine learning or biology.
I also think that often someone who is a generalist in terms of topic areas still specializes in a certain kind of methodology, e.g. researchers at Open Phil will often do “back of the envelope calculations” (BOTECs) in several different domains, effective “specializing” in the BOTEC skillset.
I’m the coworker in question, and to clarify a little, my position was more like “It’s probably quite useful to build expertise in some area or cluster of areas by building lots of content knowledge in that area/those areas. And this seems worth doing for a typical full-time EA researcher even at the cost of having less time available to work on building general reasoning skills.” And that in turn is partly because I’d guess that it’d be really hard for a typical full-time EA researcher to make substantial further progress on their general reasoning skills than on their content knowledge.
I’d agree there’s a major “undersupply” of general reasoning skills in the sense that all humans are way worse at general reasoning than would be ideal and than seems theoretically possible (if we stripped away all biases, added loads of processing power, etc.). I think Linch and I disagree more on how easy it is to make progress towards that ideal (for a typical full-time EA researcher), rather than on how valuable such progress would be.
(I think we also disagree on how important more content knowledge tends to be.)
And I don’t think I’d say this for most non-EAs. E.g., I think I might actually guess that most non-EAs would benefit more from either reading Rationality: AI to Zombies or absorbing the ideas from it in some other way more fitting for the person (e.g., workshops, podcasts, discussions), rather than spending the same amount of time learning facts and concepts from important domains. (Though I guess I feel unsure precisely what I’m saying or what it means. E.g., I’d feel tempted to put “learning some core concepts from economics and some examples of how they’re applied” in the “improving general reasoning” bucket in addition to the “improving content knowledge” bucket.)
In any case, all of my views here are vaguely stated and weakly held, and I’d be very interested to hear Ajeya’s thoughts on this!
In my reply to Linch, I said that most of my errors were probably in some sense “general reasoning” errors, and a lot of what I’m improving over the course of doing my job is general reasoning. But at the same time, I don’t think that most EAs should spend a large fraction of their time doing things that look like explicitly practicing general reasoning in an isolated or artificial way (for example, re-reading the Sequences, studying probability theory, doing calibration training, etc). I think it’s good to be spending most of your time trying to accomplish something straightforwardly valuable, which will often incidentally require building up some content expertise. It’s just that a lot of the benefit of those things will probably come through improving your general skills.
Apologies if I misrepresented your stance! Was just trying to give my own very rough overview of what you said. :)
Yeah, that makes sense, and no need to apologise. I think your question was already useful without me adding a clarification of what my stance happens to be. I just figured I may as well add that clarification.
I’d be keen to hear your thoughts about the (small) field of AI forecasting and its trajectory. Feel free to say whatever’s easiest or most interesting. Here are some optional prompts:
Do you think the field is progressing ‘well’, however you define ‘well’?
What skills/types of people do you think AI forecasting needs?
What does progress look like in the field? Eg. does it mean producing a more detailed report, getting a narrower credible interval, getting better at making near-term AI predictions...(relatedly, how do we know if we’re making progress?)
Can you make any super rough predictions like ‘by this date I expect we’ll be this good at AI forecasting’?
I know you asked Ajeya, but I’m going to add my own unsolicited opinion that we need more people with professional risk analysis backgrounds, and if we’re going to do expert judgment elicitations as part of forecasting then we need people with professional elicitation backgrounds. Properly done elicitations are hard. (Relevant background: I led an AI forecasting project for about a year.)
Hm, I think I’d say progress at this stage largely looks like being better able to cash out disagreements about big-picture and long-term questions in terms of disagreements about more narrow, empirical, or near-term questions, and then trying to further break down and ultimately answer these sub-questions to try to figure out which big picture view(s) are most correct. I think given the relatively small amount of effort put into it so far and the intrinsic difficulty of this project, returns have been pretty good on that front—it feels like people are having somewhat narrower and more tractable arguments as time goes on.
I’m not sure about what exact skillsets the field most needs. I think the field right now is still in a very early stage and could use a lot of disentanglement research, and it’s often pretty chaotic and contingent what “qualifies” someone for this kind of work. Deep familiarity with the existing discourse and previous arguments/attempts at disentanglement is often useful, and some sort of quantitative background (e.g. economics or computer science or math) or mindset is often useful, and subject matter expertise (in this case machine learning and AI more broadly) is often useful, but none of these things are obviously necessary or sufficient. Often it’s just that someone happens to strike upon an approach to the question that has some purchase, they write it up on the EA Forum or LessWrong, and it strikes a chord with others and results in more progress along those lines.
Interesting answer :)
That made me think to ask the following questions, which are sort-of a tangent and sort-of a generalisation of the kind of questions Alex HT asked:
(These questions are inspired by a post by Max Daniel.)
Do you think many major insights from longtermist macrostrategy or global priorities research have been found since 2015?
If so, what would you say are some of the main ones?
Do you think the progress has been at a good pace (however you want to interpret that)?
Do you think that this pushes for or against allocating more resources (labour, money, etc.) towards that type of work?
Do you think that this suggests we should change how we do this work, or emphasise some types of it more?
(Feel very free to just answer some of these, answer variants of them, etc.)
I think “major insights” is potentially a somewhat loaded framing; it seems to imply that only highly conceptual considerations that change our minds about previously-accepted big picture claims count as significant progress. I think very early on, EA produced a number of somewhat arguments and considerations which felt like “major insights” in that they caused major swings in the consensus of what cause areas to prioritize at a very high level; I think that probably reflected that the question was relatively new and there was low-hanging fruit. I think we shouldn’t expect future progress to take the form of “major insights” that wildly swing views about a basic, high-level question as much (although I still think that’s possible).
Since 2015, I think we’ve seen good analysis and discussion of AI timelines and takeoff speeds, discussion of specific AI risks that go beyond the classic scenario presented in Superintellilgence, better characterization of multipolar and distributed AI scenarios, some interesting and more quantitative debates on giving now vs giving later and “hinge of history” vs “patient” long-termism, etc. None of these have provided definitive / authoritative answers, but they all feel useful to me as someone trying to prioritize where Open Phil dollars should go.
I’m not sure how to answer this; I think taking into account the expected low-hanging fruit effect, and the relatively low investment in this research, progress has probably been pretty good, but I’m very uncertain about the degree of progress I “should have expected” on priors.
I think ideally the world as a whole would be investing much more in this type of work than it is now. A lot of the bottleneck to this is that the work is not very well-scoped or broken into tractable sub-problems, which makes it hard for a large number of people to be quickly on-boarded to it.
Related to the above, I’d love for the work to become better-scoped over time—this is one thing we prioritize highly at Open Phil.
Thanks!
Yeah, to be clear, I don’t intend to imply that we should expect there to have many been “major insights” after EA’s early years, or that that’s the only thing that’s useful. Tobias Baumann said on Max’s post:
That’s basically my view too, and it sounds like your view is sort-of similar. Though your comment makes me notice that some things that don’t seem explicitly captured by either Max’s question or Tobias’s response are:
better framings and disentanglement, to lay the groundwork for future minor/major insights
I.e., things that help make topics more “well-scoped or broken into tractable sub-problems”
better framings, to help us just think through something or be able to form intuitions more easily/reliably
things that are more concrete and practical than what people usually think of as “insights”
E.g., better estimates for some parameter
(ETA: I’ve now copied Ajeya’s answer to these questions as an answer to Max’s post.)
One issue I feel the EA community has badly neglected is the probability given various (including modest) civilizational backslide scenarios of us still being able to (and *actually*) developing the economies of scale needed to become an interstellar species.
To give a single example, a runaway Kessler effect could make putting anything in orbit basically impossible unless governments overcome the global tragedy of the commons and mount an extremely expensive mission to remove enough debris to regain effective orbital access—in a world where we’ve lost satellite technology and everything that depends on it.
EA so far seem to have treated ‘humanity doesn’t go extinct’ in scenarios like this as equivalent to ‘humanity reaches its interstellar potential’, which seems very dangerous to me—intuitively, it feels like there’s at least a 1% chance that we wouldn’t ever solve such a problem in practice, even if civilisation lasted for millennia afterwards. If so, then we should be treating it as (at least) 1/100th of an existential catastrophe—and a couple of orders of magnitude doesn’t seem like that big a deal especially if there are many more such scenarios than there are extinction-causing ones.
Do you have any thoughts on how to model this question in a generalisable way that it could give a heuristic for non-literal-extinction GCRs? Or do you think one would need to research specific GCRs to answer it for each of them?
Also a big fan of your report. :)
Historically, what has caused the subjectively biggest-feeling updates to your timelines views? (e.g. arguments, things you learned while writing the report, events in the world).
Thanks! :)
The first time I really thought about TAI timelines was in 2016, when I read Holden’s blog post. That got me to take the possibility of TAI soonish seriously for the first time (I hadn’t been explicitly convinced of long timelines earlier or anything, I just hadn’t thought about it).
Then I talked more with Holden and technical advisors over the next few years, and formed the impression that there was a relatively simple argument that many technical advisors believed that if a brain-sized model could be transformative, then there’s a relatively tight argument implying it would take X FLOP to train it, which would become affordable in the next couple decades. That meant that if we had a moderate probability on the first premise, we should have a moderate probability on TAI in the next couple decades. This made me take short timelines even more seriously because I found the biological analogy intuitively appealing, and I didn’t think that people who confidently disagreed had strong arguments against it.
Then I started digging into those arguments in mid-2019 for the project that ultimately became the report, and I started to be more skeptical again because it seemed that even conditional on assuming a brain-sized model would constitute TAI, there are many different hypotheses you could have about how much computation it would take to train it (what eventually became the biological anchors), and different technical advisors believed in different versions of this. In particular, it felt like the notion of a horizon length made sense and incorporating it into the argument(s) made timelines seem longer.
Then after writing up an earlier draft of the report, it felt like a number of people (including those who had longish timelines) felt that I was underweighting short and medium horizon lengths, which caused me to upweight those views some.
What cause-prioritization efforts would you most like to see from within the EA community?
I’m most interested in forecasting work that could help us figure out how much to prioritize AI risk over other x-risks, for example estimating transformative AI timelines, trying to characterize what the world would look like in between now and transformative AI, and trying to estimate the magnitude of risk from AI.
If a magic fairy gave you 10 excellent researchers from a range of relevant backgrounds who were to work on a team together to answer important questions about the simulation hypothesis, what are the top n research questions you’d be most excited to discover they are pursuing?
I’m afraid I don’t have crisp enough models of the simulation hypothesis and related sub-questions to have a top n list. My biggest question is something more like “This seems like a pretty fishy argument, and I find myself not fully getting or buying it despite not being able to write down a simple flaw. What’s up with that? Can somebody explain away my intuition that it’s fishy in a more satisfying way and convince me to buy it more wholeheartedly, or else can someone pinpoint the fishiness more precisely?” My second biggest question is something like “Does this actually have any actionable implications for altruists/philanthropists? What are they, and can you justify them in a way that feels more robust and concrete and satisfying than earlier attempts, like Robin Hanson’s How to Live in a Simulation?”
Thanks for doing this AMA!
I’d be interested to hear about what you or Open Phil include (and prioritise) within the “longtermism” bucket. In particular, I’m interested in things like:
When you/Open Phil talk about existential risk, are you (1) almost entirely concerned about extinction risk specifically, (2) mostly concerned about extinction risk specifically, or (3) somewhat similar concerned about extinction risk and other existential risks (i.e., risks of unrecoverable collapse or unrecoverable dystopias)?
When you/Open Phil talk about longtermism, are you (1) almost entirely focused on things that directly or indirectly reduce existential risk or existential risk factors, or (2) also quite interested in causing/preventing other kinds of trajectory changes[1]?
When you/Open Phil talk about longtermism and/or existential risks, are you seeing that as human-centric or animal-inclusive? Or does that distinction seem irrelevant for your thinking on longtermism and/or existential risks?
Do you/Open Phil see totalism (in the population ethics sense) as an essential assumption required for you to prioritise longtermism / existential risk reduction? Or do you see it as just one (key) pathway to such a prioritisation?
If existing write-ups already address these points, feel very free to just point me in their direction!
(The specific trigger for these questions is some of your comments on your (very interesting!) recent 80k podcast appearance. I wrote up a few of my own thoughts on those comments here.)
[1] By “causing/preventing other kinds of trajectory changes”, I basically have in mind:
I’d say that we’re interested in all three of preventing outright extinction, preventing some other kind of existential catastrophe, and in trajectory changes such as moving probability mass from “okay” worlds to “very good” worlds; I would expect some non-trivial fraction of our impact to come from all of those channels. However, I’m unsure how much weight each of these scenarios should get—that depends on various complicated empirical and philosophical questions we haven’t fully investigated (e.g. “What is the probability civilization would recover from collapse of various types?” and “How morally valuable should we think it is if the culture which arises after a recovery from collapse is very different from our current culture, and that culture is the one which gets to determine the long-term future?”). In practice our grantmaking isn’t making fine-grained distinctions between these or premised on one particular channel of impact: biosecurity and pandemic preparedness grantmaking may help prevent both outright extinction and civilizational collapse scenarios, AI alignment grantmaking may help prevent outright extinction or help make an “ok” future into a “great” one, etc.
I’d say that long-termism as a view is inherently animal-inclusive (just as the animal-inclusive view inherently also cares about humans); the view places weight on humans and animals today, and humans / animals / other types of moral patients in the distant future. Often the fact that it’s animal-inclusive is less salient though, because it is concerned with the potential for creating large numbers of thriving digital minds in the future, which we often picture as more human-like than animal-like.
I think the total view on population ethics is one important route to long-termism but others are possible. For example, you could be very uncertain what you value, but reason that it would be easier to figure out what we value and realize our values if we are safer, wiser, and have access to more resources.
Thanks!
FWIW, I think that all matches my own views, with the minor exception that I think longtermism (as typically defined, e.g. by MacAskill) is consistent with human-centrism as well as with animal-inclusivity. (Just as it’s consistent with either intrinsically valuing only happiness and reductions in suffering or also other things like liberty and art, and consistent with weighting reducing suffering more strongly than increasing happiness or weighting them equally.)
Perhaps you meant that Open Philanthropy’s longtermist worldview is inherently animal-inclusive?
(Personally, I adopt an animal-inclusive longtermist view. I just think one can be a human-centric longtermist.)
Yes, I meant that the version of long-termism we think about at Open Phil is animal-inclusive.
How would you define a “cause area” and “cause prioritization”, in a way which extends beyond Open Phil?
I’d say that a “cause” is something analogous to an academic field (like “machine learning theory” or “marine biology”) or an industry (like “car manufacturing” or “corporate law”), organized around a problem or opportunity to improve the world. The motivating problem or opportunity needs to be specific enough and clear enough that it pays off to specialize in it by developing particular skills, reading up on a body of work related to the problem, trying to join particular organizations that also work on the problem, etc.
Like fields and industries, the boundaries around what exactly a “cause” is can be fuzzy, and a cause can have sub-causes (e.g. “marine biology” is a sub-field of “biology” and “car manufacturing” is a sub-industry within “manufacturing”). But some things are clearly too broad to be a cause: “doing good” is not a cause in the same way that “learning stuff” is not an academic field and “making money” is not an industry. Right now, the cause areas that long-termist EAs support are in their infancy, so they’re pretty broad and “generalist”; over time I expect sub-causes to become more clearly defined and deeper specialized expertise to develop within them (e.g. I think it’s fairly recently that most people in the community started thinking of “AI governance and policy” as a distinct sub-cause within “AI risk reduction”).
Both within Open Phil and outside it, I think “cause prioritization” is a type of intellectual inquiry trying to figure out how many resources (often money but sometimes time / human resources) we would want going into different causes within some set, given some normative assumptions (e.g. utilitarianism of some kind).
Hi Ajeya, that’s a wonderful idea—I have a couple of questions below that are more about how you find working as a Senior Research Analyst and in this area:
What do you love about your role / work?
What do you dislike about your role / work?
What’s blocking you from having the impact you’d like to have?
What is the most important thing you did to get to where you are? (e.g., network, trying out lots of jobs / internships, continuity at one job, a particular a course etc.)
The thing I most love about my work is my relationships with my coworkers and manager; they are all deeply thoughtful, perceptive, and compassionate people who help me improve along lots of dimensions.
Like I discussed in the podcast, a demoralizing aspect of my work is that we’re often pursuing questions were deeply satisfying answers are functionally impossible and it’s extremely unclear when something is “done.” It’s easy to spend much longer on a project than you hoped, and to feel that you put in a lot of work to end up with an answer that’s still hopelessly subjective and extremely easy to disagree with.
I think I would do significantly better in my role if I were less sensitive about the possibility that someone (especially experts or fancy people) would think I’m dumb for missing some consideration, not having an excellent response to an objection, not knowing everything about a technical sub-topic, making a mistake, etc. It would allow me to make better judgment calls about when it’s actually worth digging into something more, and to write more freely without getting bogged down in figuring out exactly how to caveat something.
I think the most important thing I did before joining Open Phil was to follow GiveWell’s research closely and to attempt to digest EA concepts well enough to teach them to others; I think this helped me notice when there was a job opportunity at GiveWell and to perform well in the interview process. Once at Open Phil, I think it was good that I asked a lot of questions about everything and pretty consistently said yes to opportunities to work on something harder than what I had done before.
I’m also interested in hearing more of what Ajeya has to say on these questions.
People might also be interested in her answers to questions similar to at least the first and second of those questions on the 80k podcast, from around 2 hours 17 minutes onwards. (I also commented here about how parts of her answers resonated with my own experiences.)
Regarding forecasts on transformative AI:
I’d be really interested in hearing about the discussions you have with people that have earlier median estimates, and/or what you expect those discussions would resolve around. For example, I saw that the Metaculus crowd has a median estimate of 2035 for fully general AI. Skimming their discussions, they might rely more on recent ML progress than you.
Like Linch says, some of the reason the Metaculus median is lower than mine is probably because they have a weaker definition; 2035 seems like a reasonable median for “fully general AI” as they define it, and my best guess may even be sooner.
With that said, I’ve definitely had a number of conversations with people who have shorter timelines than me for truly transformative AI; Daniel Kokotajlo articulates a view in this space here. Disagreements tend to be around the following points:
People with shorter timelines than me tend to feel that the notion of “effective horizon length” either doesn’t make sense, or that training time scales sub-linearly rather than linearly with effective horizon length, or that models with short effective horizon lengths will be transformative despite being “myopic.” They generally prefer a model where a scaled-up GPT-3 constitutes transformative AI. Since I published my draft report, Guille Costa (an intern at Open Philanthropy) released a version of the model that explicitly breaks out “scaled up GPT-3” as a hypothesis, which would imply a median of 2040 if all my other assumptions are kept intact.
They also tend to feel that extrapolations of when existing model architectures will reach human-level performance on certain benchmarks, e.g. a recently-created multitask language learning benchmark, implies that “human-level” capability would be reached at ~1e13 or 1e14 FLOP/subj sec rather than ~1e16 FLOP/subj sec as I guessed in my report. I’m more skeptical of extrapolation from benchmarks because my guess is that the benchmarks we have right now were selected to be hard-but-doable for our current generation of models, and once models start doing extremely well at these benchmarks we will likely generate harder benchmarks with more work, and there may be multiple rounds of this process.
They tend to be less skeptical on priors of sudden takeoff, which leads them to put less weight on considerations like “If transformative AI is going to be developed in only 5-10 years, why aren’t we seeing much more economic impacts from AI today?”
Some of them also feel that I underestimate the algorithmic progress that will be made over the next ~5 years: they may not disagree with my characterization of the current scaling behavior of ML systems, but they place more weight than I would on an influx of researchers (potentially working with ML-enabled tools) making new discoveries that shift us to a different scaling regime, e.g. one more like the “Lifetime Anchor” hypothesis.
Finally, some people with shorter timelines than me tend to expect that rollout of AI technologies will be faster and smoother than I do, and expect there to be less delay from “working out kinks” or making systems robust enough to deploy.
Thanks, super interesting! In my very premature thinking, the question of algorithmic progress is most load-bearing. My background is in cognitive science and my broad impression is that
human cognition is not *that* crazy complex,
that I wouldn’t be surprised at all if one of the broad architectural ideas I’ve seen floating around on human cognition could afford “significant” steps towards proper AGI
e.g. how Bayesian inference and Reinforcement Learning maybe realized in the predictive coding framework was impressive to me, for example flashed out by Steve Byrnes on LessWrong
or e.g. rough sketches of different systems that fulfill specific functions like in the further breakdown of System 2 in Stanovich’s Rationality and the Reflective Mind
when thinking about how many „significant“ steps or insights we still need until AGI, I think more on the order of less than ten
(I’ve heard the idea of “insight-based forecasting” from a Joscha Bach interview)
those insights might not be extremely expensive and, once had, cheap-ish to implement
e.g. the GANs story maybe fits this, they’re not crazy complicated, not crazy hard to implement, but very powerful
This all feels pretty freewheeling so far. Would be really interested in further thoughts or reading recommendation on algorithmic progress.
My approach to thinking about algorithmic progress has been to try to extrapolate the rate of past progress forward; I rely on two sources for this, a paper by Katja Grace and a paper by Danny Hernandez and Tom Brown. One question I’d think about when forming a view on this is whether arguments like the ones you make should lead you to expect algorithmic progress to be significantly faster than the trendline, or whether those considerations are already “priced in” to the existing trendline.
And yes, thanks, the point about thinking with trendlines in mind is really good.
Maybe those two developments could be relevant:
bigger number of recent ML/CogSci/Comp. Neuroscience graduates that academically grew up in times of noticeable AI progress and much more widespread aspirations to build AGI than the previous generation
related to my question about non-academic open-source projects: If there is a certain level of computation necessary to solve interesting general reasoning gridworld problems with new algorithms, then we might unlock a lot of work in the coming years
Thanks! :) I find Grace’s paper a little bit unsatisfying. From the outside, fields around like SAT, factoring, scheduling and linear optimization seem only weakly analogous to the fields around developing general thinking capabilities. It seems to me that the former is about hundreds of researchers going very deep into very specific problems and optimizing a ton to produce slightly more elegant and optimal solutions, whereas the latter is more about smart and creative “pioneers” having new insights how to frame the problem correctly and finding new relatively simple architectures that make a lot of progress.
What would be more informative for me?
by above logic maybe I would focus more on progress of younger fields within computer science
also maybe there is a way to measure how “random” praciticioners perceive the field to be—maybe just asking them how surprised they are by recent breakthroughs is a solid measures of how many other potential breakthroughs are still out there
also I’d be interested in solidifying my very rough impression that breakthroughs like transformers or GANs relatively simple algorithms in comparison with breakthroughs in other areas of computer science
evolution’s algorithmic progress would maybe also be informative to me, i.e. how much trial and error was roughly invested to make specific jumps
e.g. I’m reading Pearls Book of Why and he makes a tentative claim that counterfactual reasoning is something that appeared at some point, and the first sign we can report of it is the lion-man from roughly 40.000 years ago
though of course evolution did not aim at general intelligence, e.g. saying “evolution took hundreds of millions of years to develop an AGI” in this context seems disanalogous
how big of a fraction of human cognition do we actually need for TAI? E.g. we might save about an order of magnitude by ditching vision and focussing on language?
Sherry et al. have a more exhaustive working paper about algorithmic progress in a wide variety of fields.
Note that the definition of “fully general AI” on that Metaculus question is considerably weaker than how Open Phil talks about “transformative AI.”
Thanks, I didn‘t read that carefully enough!
Right, to be clear I think this is (mostly) not your fault.
Unfortunately others have made this and similar mistakes before, for both other questions and this specific question.
Obviously some of the onus is on user error, but I think the rest of us (the forecasting community and the Metaculus platform) should do better on having the intuitive interpretation of the headline question match the question specifications, and vice versa.
Imagine you win $10B in a donor lottery. What sort of interventions—that are unlikely to be funded by Open Phil in the near future—might you fund with that money?
There aren’t $10B worth of giving opportunities that I’d be excited about supporting now, for essentially the same reasons why Open Phil isn’t giving everything away over the next few years. Basically, we expect (and I agree) that there will be more, better giving opportunities in the medium-term future and so it makes sense to save the marginal dollar for future giving, at least right now. There would likely be some differences between what I would fund and what Open Phil is currently funding due to different intuitions about the most promising interventions to investigate with scarce capacity, but I don’t expect them to be large.
How much worldview-diversification and dividing capital into buckets do you have within each of the three main cause areas, if at all? For example, I could imagine a divide between short and long AI Timelines, or a divide between policy-oriented and research-oriented grants.
We don’t have firmly-articulated “worldview divisions” beyond the three laid out in that post, though as I mention towards the end of this section in my podcast, different giving opportunities within a particular worldview can perform differently on important but hard-to-quantify axes such as the strength of feedback loops, the risk of self-delusion, or the extent to which it feels like a “Pascal’s mugging”, and these types of considerations can affect how much we give to particular opportunities.
Thanks for the answer! I want to make sure that I get this clearly, if you are still taking questions :)
Are you making attempts to diversify grants based on these kinds of axes, in cases where there is no clear-cut position? My current understanding is that you do it but mostly implicitly
I’d be interested to hear whether you think eventually expanding beyond our solar system is necessary for achieving a long period with very low extinction risk (and, if so, your reasons for thinking that).
Context for this question (adapted from this comment):
As part of the discussion of “Effective size of the long-term future” during your recent 80k appearance, you and Rob discussed the barriers to and likelihood of various forms of space colonisation.
During that section, I got the impression that you were implicitly thinking that a stable, low-extinction-risk future would require some kind of expansion beyond our solar system. Though I don’t think you said that explicitly, so maybe I’m making a faulty inference? Perhaps what you actually had in mind was just that such expansion could be one way to get a stable, low-extinction-risk future, such that the likelihood of such expansion was one important question in determining whether we can get such a future, and a good question to start with?
I haven’t really thought about this before, but I think I’d guess that we could have a stable, low-extinction-risk future—for, let’s says, hundreds of millions of years—without expanding beyond our solar system. Such expansion could of course help[1], both because it creates “backups” and because there are certain astronomical extinction events that would by default happen eventually to Earth/our solar system. But it seems to me plausible that the right kind of improved technologies and institutions would allow us to reduce extinction risks to negligible levels just on Earth for hundreds of millions of years.
But I’ve never really directly thought about this question before, so I could definitely be wrong.
[1] I’m not saying it’d definitely help—there are ways it could be net negative. And I’m definitely not saying that trying to advance expansion beyond our solar system is an efficient way to reduce extinction risk.
Thanks Michael! I agree space colonization may not be strictly required for achieving a stable state of low x-risk, but because it’s the “canonical” vision of the stable low-risk future, I would feel significantly more uncertain if we were to rule out the possibility of expansion into space, and I would be inclined to be skeptical-by-default, particularly if we are picturing biological humans, because it seems like there are a large number of possible ways the environmental conditions needed for survival might be destroyed and it intuitively seems like “offense” would have an advantage over “defense” there. But I haven’t thought deeply about the technology that would be needed to preserve a state of low x-risk entirely on Earth and I’d expect my views would change a lot with only a few hours of thinking on this.
Apart from the biological anchors approach, what efforts in AI timelines or takeoff dynamics forecasting—both inside and outside Open Phil—are you most excited about?
I’m pretty excited about economic modeling-based approaches, either:
Estimating the value-added from machine learning historically and extrapolating it into the future, or
Doing a takeoff analysis that takes into account how AI progress relates to inputs such as hardware and software effort, and the extent to which AI of a certain quality level can allow hardware to substitute for software effort, similar to the “Intelligence Explosion Microeconomics” paper.
What instrumental goals have you pursued successfully?
In my work, I’ve gotten better at resisting the urge to investigate sub-questions more deeply and instead pulling back and trying to find short-cuts to answering the high-level question. In my personal life, I’ve gotten better at setting up my schedule so I’m having fun in the evenings and weekends instead of mindlessly browsing social media. (I have a long way to go on both of these though.)
Also, I got a university degree :)
For thinking about AI timelines, how do you go about choosing the best reference classes to use (see e.g., here and here)?
I don’t think I have a satisfying general answer to this question; in practice, the approaches I pursue first are heavily influenced by which approaches I happen to find some purchase on, since many theoretically appealing reference classes or high-level approaches to the question may be difficult to make progress on for whatever reason.
[I’m not sure if you’ve thought about the following sort of question much. Also, I haven’t properly read your report—let me know if this is covered in there.]
I’m interested in a question along the lines of “Do you think some work done before TAI is developed matters in a predictable way—i.e., better than 0 value in expectation—for its effects on the post-TAI world, in ways that don’t just flow through how the work affects the pre-TAI world or how the TAI transition itself plays out? If so, to what extent? And what sort of work?”
An example to illustrate: “Let’s say TAI is developed in 2050, and the ‘TAI transition’ is basically ‘done’ by 2060. Could some work to improve institutional decision-making be useful in terms of how it affects what happens from 2060 onwards, and not just via reducing x-risk (or reducing suffering etc.) before 2060 and improving how the TAI transition goes?”
But I’m not sure it’s obvious what I mean by the above, so here’s my attempt to explain:
The question of when TAI will be developed[1] is clearly very important to a whole bunch of prioritisation questions. One reason is that TAI—and probably the systems leading up to it—will very substantially change how many aspects of how society works. Specifically, Open Phil has defined TAI as “AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution” (and Muehlhauser has provided some more detail on what is meant by that).
But I think some EAs implicitly assume something stronger, along the lines of:
But I don’t think that necessarily follows from how TAI is defined. E.g., various countries, religious, ideologies, political systems, technologies, etc., existed both before the Industrial Revolution and for decades/centuries afterwards. And it seems like some pre-Industrial-Revolution actions—e.g. people who pushed for democracy or the abolition of slavery—had effects on the post-Industrial-Revolution world that were probably predictably positive in advance and that weren’t just about affecting how the Industrial Revolution itself occurred.
(Though it may have still been extremely useful for people taking those actions to know that, when, where, and how the IR would occur, e.g. because then they could push for democracy and abolition in the countries that were about to become much more influential and powerful.)
So I’m tentatively inclined to think that some EAs are assuming that short timelines pushes against certain types of work more than it really does, and that certain (often “broad”) interventions could be in expectation useful for influencing the post-TAI world in a relatively “continuous” way. In other words, I’m inclined to thinks there might be less of an extremely abrupt “break” than some people seem to think, even if TAI occurs. (Though it’d still be quite extreme by many standards, just as the Industrial Revolution was.)
[1] Here I’m assuming TAI will be developed, which is questionable, though it seems to me pretty much guaranteed unless some existential catastrophe occurs beforehand.
I haven’t thought very deeply about this, but my first intuition is that the most compelling reason to expect to have an impact that predictably lasts longer than several hundred years without being washed out is because of the possibility of some sort of “lock-in”—technology that allows values and preferences to be more stably transmitted into the very long-term future than current technology allows. For example, the ability to program space probes with instructions for creating the type of “digital life” we would morally value, with error-correcting measures to prevent drift, would count as a technology that allows for effective lock-in in my mind.
A lot of people may act as if we can’t impact anything post-transformative AI because they believe technology that enables lock-in will be built very close in time after transformative AI (since TAI would likely cause R&D towards these types of tech to be greatly accelerated).
[Kind-of thinking aloud; bit of a tangent from your AMA]
Yeah, that basically matches my views.
I guess what I have in mind is that some people seem to:
round up “most compelling reason” to “only reason”
not consider the idea of trying to influence lock-in events that occur after a TAI transition, in ways other than influencing how the TAI transition itself occurs
Such ways could include things like influencing political systems in long-lasting ways
round up “substantial chance that technology that enables lock-in will be built very close in time after TAI” up to “it’s basically guaranteed that...”
I think what concerns me about this is that I get the impression many of people are doing this without noticing it. It seems like maybe some thought leaders recognised that there were questions to ask here, thought about the questions, and formed conclusions, but then other people just got a slightly simplified version of the conclusion without noticing there’s even a question to ask.
A counterpoint is that I think the ideas of “broad longtermism”, and some ideas that people like MacAskill have raised, kind-of highlight the questions I’m suggesting should be highlighted. But even those ideas seem to often be about what to do given the premise that a TAI transition won’t occur for a long time, or how to indirectly influence how a TAI transition occurs. So I think they’re still not exactly about the sort of thing I’m talking about.
To be clear, I do think we should put more longtermist resources towards influencing potential lock-in events prior to or right around the time of a TAI transition than towards non-TAI-focused ways of influencing events after a TAI transition. But it seems pretty plausible to me that some longtermist resources should go towards other things, and it also seems good for people to be aware that a debate could be had on this.
(I should probably think more about this, check whether similar points are already covered well in some existing writings, and if not write something more coherent that these comments.)
To the extent that you have “a worldview” (in scare quotes), what is a short summary of that worldview?
I don’t have an easily-summarizable worldview that ties together the different parts of my life. In my career, effective altruism (something like “Try to do as much good as possible, and think deeply about what that means and be open to counterintuitive answers”) is definitely dominant. In my personal life, I try to be “agenty” about getting what I want, and to be open to trying unusually hard or being “weird” when that’s what works for me and makes me happy. I think these are both evolving a lot in the specifics.
I’m curious about your take on prioritizing between science funding and other causes. In the 80k interview you said:
My question: Is funding in basic science less of a priority because there are compelling reasons to deprioritize funding more projects there generally, because there is less organizational comparative advantage (or not enough expertise yet) or something else?
Decisions about the size of the basic science budget are made within the “near-termist” worldview bucket, since we see the primary case for this funding as the potential for scientific breakthroughs to improve health and welfare over the next several decades; I’m not involved with that since my research focus is on cause prioritization within the “long-termist” worldview.
In terms of high-level principles, the decision would be made by comparing an estimate of the value of marginal science funding against an estimate of the value of the near-termist “last dollar”, but I’m not familiar with the specific numbers myself.
I really appreciated your 80K episode—it was one of my favorites! I created a discussion thread for it.
Some questions—feel free to answer as many as you want:
How much of your day-to-day work involves coding or computer science knowledge in general? I know you created a Jupyter notebook to go with your AI timelines forecast; is there anything else?
What are your thoughts on the public interest tech movement?
More specifically, I’ve been thinking about starting some meta research on using public interest tech to address the most pressing problems from an EA perspective. Do you think that would be useful?
Thanks, I’m glad you liked it so much!
I reasonably often do things like make models in Python, but the actual coding is a pretty small part of my work—something like 5%-10% of my time. I’ve never done a coding project for work that was more complicated than the notebook accompanying my timelines report, and most models I make are considerably simpler (usually implemented in spreadsheets rather than in code).
I’m not familiar with the public interest tech movement unfortunately, so I’m not sure what I think about that research project idea.
Any thoughts on the recent exodus of employees from OpenAI?
In your 80,000 Hours interview you talked about worldview diversification. You emphasized the distinction between total utilitarianism vs. person-affecting views within the EA community. What about diversification beyond utilitarianism entirely? How would you incorporate other normative ethical views into cause prioritization considerations? (I’m aware that in general this is basically just the question of moral uncertainty, but I’m curious how you and Open Phil view this issue in practice.)
Most people at Open Phil aren’t 100% bought into to utilitarianism, but utilitarian thinking has an outsized impact on cause selection and prioritization because under a lot of other ethical perspectives, philanthropy is supererogatory, so those other ethical perspectives are not as “opinionated” about how best to do philanthropy. It seems that the non-utilitarian perspectives we take most seriously usually don’t provide explicit cause prioritization input such as “Fund biosecurity rather than farm animal welfare”, but rather provide input about what rules or constraints we should be operating under, such as “Don’t misrepresent what you believe even if it would increase expected impact in utilitarian terms.”
Hello! I really enjoyed your 80,000 Hours interview, and thanks for answering questions!
1 - Do you have any thoughts about the prudential/personal/non-altruistic implications of transformative AI in our lifetimes?
2 - I find fairness agreements between worldviews unintuitive but also intriguing. Are there any references you’d suggest on fairness agreements besides the OpenPhil cause prioritization update?
Thanks, I’m glad you enjoyed it!
I haven’t put a lot of energy into thinking about personal implications, and don’t have very worked-out views right now.
I don’t have a citation off the top of my head for fairness agreements specifically, but they’re closely related to “variance normalization” approaches to moral uncertainty, which are described here (that blog post links to a few papers).
I’ve been increasingly hearing advice to the effect that “stories” are an effective way for an AI x-safety researcher to figure out what to work on, that drawing scenarios about how you think it could go well or go poorly and doing backward induction to derive a research question is better than traditional methods of finding a research question. Do you agree with this? It seems like the uncertainty when you draw such scenarios is so massive that one couldn’t make a dent in it, but do you think it’s valuable for AI x-safety researchers to make significant (i.e. more than 30% of their time) investments in both 1. doing this directly by telling stories and attempting backward induction, and 2. training so that their stories will be better/more reflective of reality (by studying forecasting, for instance)?
I would love to see more stories of this form, and think that writing stories like this is a good area of research to be pursuing for its own sake that could help inform strategy at Open Phil and elsewhere. With that said, I don’t think I’d advise everyone who is trying to do technical AI alignment to determine what questions they’re going to pursue based on an exercise like this—doing this can be very laborious, and the technical research route it makes the most sense for you to pursue will probably be affected by a lot of considerations not captured in the exercise, such as your existing background, your native research intuitions and aesthetic (which can often determine what approaches you’ll be able to find any purchase on), what mentorship opportunities you have available to you and what your potential mentors are interested in, etc.
Thanks for doing this and for doing the 80k podcast, I enjoyed the episode.
What are some longtermist cause areas other than AI, biorisk and cause prioritisation that you’d be keen to see more work on?
I gather that Open Phil has grown a lot recently. Can you say anything about the growth and hiring you expect for Open Phil over the next say 1-3 years? E.g. would you expect to hire lots more generalists, or maybe specialists in new cause areas, etc.
Thanks, I’m glad you enjoyed it!
This is fairly basic, but EA community building is definitely another cause I’d add to that list. I’m less confident in other potential areas, but I would also be curious about exploring some aspects of improving institutional decision-making as well.
The decision to open a hiring round is usually made at the level of individual focus areas and sub-teams, and we don’t have an organization-wide growth plan, so it’s fairly difficult to estimate exact numbers; with that said, I expect we’ll be doing some hiring of both generalists and program specialists over the next few years. (We have a new open positions page here.)
[The following question might just be confused, might not be important, and will likely be poorly phrased/explained.]
In your recent 80k appearance, you and Rob both say that the way the self-sampling assumption (SSA) leads to the doomsday argument seems sort-of “suspicious”. You then say that, on the other hand, the way the self-indication assumption (SIA) causes an opposing update also seems suspicious.
But I think all of your illustrations of how updates based on the SIA can seem suspicious involved infinities. And we already know that loads of things involving infinities can seem counterintuitive or suspicious. So it seems to me like this isn’t much reason to feel that SIA in particular can cause suspicious updates. In other words, it seems like maybe the “active ingredient” causing the suspiciousness in the examples you give is infinity, not SIA. Whereas the way the SSA leads to the doomsday argument doesn’t have to involve infinity, so there it seems like SSA is itself suspicious.
Does that sound correct to you? Do you think that that make the SIA effectively less suspicious than SSA, and thereby pushes further against the doomsday argument?
(I obviously don’t think we should necessarily dismiss things just because they feel “suspicious”. But it could make sense to update a bit away from them for that reason, and, to the extent that that’s true, a difference in the suspiciousness of SSA vs SIA could matter.)
(Btw, although I’m still not sure I understand SSA and SIA properly, your explanation during the 80k interview caused me to feel like I probably at least understood the gist, for the first time, so thanks for that!)
Thanks, I’m glad you found that explanation helpful!
I think I broadly agree with you that SIA is somewhat less “suspicious” than SSA, with the small caveat that I think most of the weirdness can be preserved with a finite-but-sufficienty-giant world rather than a literally infinite world.
Hi Ajeya! :) What do you think about open source projects like https://www.eleuther.ai/ that replicate cutting-edge projects like GPT-3 or Alphafold? Speaking as an outsider, I imagine that a lot of AI progress comes from “random” tinkering, and so I wondered if “Discord groups tinkering along” are relevant actors in your strategic landscape.
(I really enjoyed listening to the recent interview!)
I’m not very familiar with these open source implementations; they seem interesting! So far, I haven’t explicitly broken out different possible sources of algorithmic progress in my model, since I’m thinking about in a very zoomed-out way (extrapolating big-picture quantitative trends in algorithmic progress). I’m not sure how much of the progress captured in these trends comes from traditional industry/academia sources vs open source projects like these.
Hi Ajeya, thank you for publishing such a massive and detailed report on timelines!! Like other commenters here, it is my go-to reference. Allowing users to adjust the parameters of your model is very helpful for picking out built-in assumptions and being able to update predictions as new developments are made.
In your report you mention that you discount the aggressive timelines in part due to lack of major economic applications of AI so far. I have a few questions along those lines.
Do you think TAI will necessarily be foreshadowed by incremental economic gains? If so, why? I personally don’t see the lack of such applications as a significant signal because the cost and inertia of deploying AI for massive economic benefit is debilitating compared to the current rate of research progress on AI capabilities. For example, I would expect that if a model like GPT-3 had existed for 50 years and was already integrated with the economy it would be ubiquitous in writing-based jobs and provide massive productivity gains. However, from where we are now, it seems likely that several generations of more powerful successors will be developed before the hypothetical benefits of GPT-3 are realized.
If a company like OpenAI heavily invested in productizing their new API (or DeepMind their Alphafold models) and signaled that they saw it as key to the company’s success, would you update your opinion more towards aggressive timelines? Or would you see this as delaying research progress because of the time spent on deployment work?
More generally, how do you see (corporate) groups reorienting (if at all) as capabilities progress and we get close to TAI? Do you expect research to slow broadly as current theoretical, capabilities-driven work is replaced by implementation and deployment of existing methods? Do you see investment in alignment research increasing, including possibly an intentional reduction of pure capabilities work towards safer methods? On the other end of the spectrum, do you see an arms race as likely?
Finally, have you talked much to people outside the alignment/effective altruism communities about your report? How have reactions varied by background? Are you reluctant to publish work like this broadly? If so, why? Do you see risks of increasing awareness of these issues pushing unsafe capabilities work?
Apologies for the number of questions! Feel free to answer whichever are most interesting to you.
Thanks! I’ll answer your cluster of questions about takeoff speeds and commercialization in this comment and leave another comment respond to your questions about sharing my report outside the EA community.
Broadly speaking, I do expect that transformative AI will be foreshadowed by incremental economic gains; I generally expect gradual takeoff , meaning I would bet that at some point growth will be ~10% per year before it hits 30% per year (which was the arbitrary cut-off for “transformative” used in my report). I don’t think it’s necessarily the case; I just think it’ll probably work this way. On the outside view, that’s how most technologies seem to have worked. And on the inside view, it seems like there are lots of valuable-but-not-transformative applications of existing models on the horizon, and industry giants + startups are already on the move trying to capitalize.
My views imply a roughly ~10% probability that the compute to train transformative AI would be affordable in 10 years or less, which wouldn’t really leave time for this kind of gradual takeoff. One reason it’s a pretty low number is because it would imply sudden takeoff and I’m skeptical of that implication (though it’s not the only reason—I think there are separate reasons to be skeptical of the Lifetime Anchor and the Short Horizon Neural Network anchor, which drive short timelines in my model).
I don’t expect that several generations of more powerful successors to GPT-3 will be developed before we see significant commercial applications to GPT-3; I expect commercialization of existing models and scaleup to larger models to be happening in parallel. There are already various applications online, e.g. AI Dungeon (based on GPT-3), TabNine (based on GPT-2), and this list of other apps. I don’t think that evidence OpenAI was productizing GPT-3 would shift my timelines much either way, since I already expect them to be investing pretty heavily in this.
Relative to the present, I expect the machine learning industry to invest a larger share of its resources going forward into commercialization, as opposed to pure R&D: before this point a lot of the models studied in an R&D setting just weren’t very useful (with the major exception of vision models underlying self-driving cars), and now they’re starting to be pretty useful. But at least over the next 5-10 years I don’t think that would slow down scaling / R&D much in an absolute sense, since the industry as a whole will probably grow, and there will be more resources for both scaling R&D and commercialization.
I haven’t engaged much with people outside the EA and AI alignment communities, and I’d guess that very few people outside these communities have heard about the report. I don’t personally feel sold that the risks of publishing this type of analysis more broadly (in terms of potentially increasing capabilities work) outweigh the benefits of helping people better understand what to expect with AI and giving us a better chance of figuring out if our views are wrong. However, some other people in the AI risk reduction community who we consulted (TBC, not my manager or Open Phil as an institution) were more concerned about this, and I respect their judgment, so I chose to publish the draft report on LessWrong and avoid doing things that could result in it being shared much more widely, especially in a “low-bandwidth” way (e.g. just the “headline graph” being shared on social media).
To clarify, we are planning to seek more feedback from people outside the EA community on our views about TAI timelines, but we’re seeing that as a separate project from this report (and may gather feedback from outside the EA community without necessarily publicizing the report more widely).
Hi Ajeya, thanks for doing this and for your recent 80K interview! I’m trying to understand what assumptions are needed for the argument you raise in the podcast discussion on fairness agreements that a longtermist worldview should have been willing to trade up all its influence for ever-larger potential universe. There are two points I was wondering if you could comment on if/how these align with your argument.
My intuition says that the argument requires a prior probability distribution on universe size that has an infinite expectation, rather than just a prior with non-zero probability on all possible universe sizes with a finite expectation (like a power-law distribution with k > 2).
But then I figured that even in a universe that was literally infinite but had a non-zero density of value-maximizing civilizations, the amount of influence over that infinite value that any one civilization or organization has might still be finite. So I’m wondering if what is needed to be willing to trade up for influence over ever larger universes is actually something like the expectation E[V/n] being infinite, where V = total potential value in universe and n = number of value-maximizing civilizations.
I agree that your prior would need to have an infinite expectation for the size of the universe for this argument to go through.
I agree with the generalized statement that your prior over “value-I-can-affect” needs to have an infinite expectation, but I don’t think I agree with the operationalization of “value-I-can-affect” as V/n. It seems possible to me that even if there are a high density of value-maximizing civilizations out there, each one could have an infinite impact through e.g. acausal trade. I’m not sure what a crisp operationalization of “value-I-can-affect” would be.
I see, thank you!
What do you make of Ben Garfinkel’s work on scepticism towards AI’s capacity being separable from its goals/his broader skepticism of brain in a box scenarios?