Paul_Christiano

Karma: 3,559

Paul_Christiano 6 Apr 2024 0:52 UTC
4 points
0 ∶ 0
in reply to: Paul_Christiano’s comment on: We must be very clear: fraud in the service of effective altruism is unacceptable
ARC returned this money to the FTX bankruptcy estate in November 2023.

Paul_Christiano 24 Jan 2024 18:55 UTC
4 points
0 ∶ 0
in reply to: Rebecca’s comment on: How would a language model become goal-seeking?
I replaced the original comment with “goal-directed,” each of them has some baggage and isn’t quite right but on balance I think goal-directed is better. I’m not very systematic about this choice, just a reflection of my mood that day.

Paul_Christiano 6 Jan 2024 18:48 UTC
119 points
20 ∶ 5
in reply to: lilly’s comment on: Survey of 2,778 AI authors: six parts in pictures
Quantitatively how large do you think the non-response bias might be? Do you have some experience or evidence in this area that would help estimate the effect size? I don’t have much to go on, so I’d definitely welcome pointers.
Let’s consider the 40% of people who put a 10% probability on extinction or similarly bad outcomes (which seems like what you are focusing on). Perhaps you are worried about something like: researchers concerned about risk might be 3x more likely to answer the survey than those who aren’t concerned about risk, and so in fact only 20% of people assign a 10% probability, not the 40% suggested by the survey.
Changing from 40% to 20% would be a significant revision of the results, but honestly that’s probably comparable to other sources of error and I’m not sure you should be trying to make that precise an inference.
But more importantly a 3x selection effect seems implausibly large to me. The survey was presented as being about “progress in AI” and there’s not an obvious mechanism for huge selection effects on these questions. I haven’t seen literature that would help estimate the effect size, but based on a general sense of correlation sizes in other domains I’d be pretty surprised by getting a 3x or even 2x selection effect based on this kind of indirect association. (A 2x effect on response rate based on views about risks seems to imply a very serious piranha problem)
The largest demographic selection effects were that some groups (e.g. academia vs industry, junior vs senior authors) were about 1.5x more likely to fill out the survey. Those small selection effects seem more like what I’d expect and are around where I’d set the prior (so: 40% being concerned might really be 30% or 50%).
many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey … the loaded nature of the content of the survey (meaning bias is especially likely),
I think the survey was described as about “progress in AI” (and mostly concerned progress in AI), and this seems like all people saw when deciding to take it. Once people started taking the survey it looks like there was negligible non-response at the question level. You can see the first page of the survey here, which I assume is representative of what people saw when deciding to take the survey.
I’m not sure if this was just a misunderstanding of the way the survey was framed. Or perhaps you think people have seen reporting on the survey in previous years and are aware that the question on risks attracted a lot of public attention, and therefore are much more likely to fill out the survey if they think risk is large? (But I think the mechanism and sign here are kind of unclear.)
specially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well
If compensation is a significant part of why participants take the survey, then I think it lowers the scope for selection bias based on views (though increases the chances that e.g. academics or junior employees are more likely to respond).
I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely
I think it’s dishonest to cite work that you think doesn’t provide evidence. That’s even more true if you think readers won’t review the citations for themselves. In my view the 15% response rate doesn’t undermine the bottom line conclusions very seriously, but if your views about non-response mean the survey isn’t evidence then I think you definitely shouldn’t cite it.
the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise),
I think the goal was to survey researchers in machine learning, and so it was sent to researchers who publish in the top venues in machine learning. I don’t think “expert” was meant to imply that these respondents had e.g. some kind of particular expertise about risk. In fact the preprint emphasizes that very few of the respondents have thought at length about the long-term impacts of AI.
Given my aforementioned concerns, I wonder whether the cost of this survey can be justified
I think it can easily be justified. This survey covers a set of extremely important questions, where policy decisions have trillions of dollars of value at stake and the views of the community of experts are frequently cited in policy discussions.
You didn’t make your concerns about selection bias quantitative, but I’m skeptical about quantitatively how much they decrease the value of information. And even if we think non-response is fatal for some purposes, it doesn’t interfere as much with comparisons across questions (e.g. what tasks do people expect to be accomplished sooner or later, what risks do they take more or less seriously) or for observing how the views of the community change with time.
I think there are many ways in which the survey could be improved, and it would be worth spending additional labor to make those improvements. I agree that sending a survey to a smaller group of recipients with larger compensation could be a good way to measure the effects of non-response bias (and might be more respectful of the research community’s time).
I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey
I think the main takeaway w.r.t. risk is that typical researchers in ML (like most of the public) have not thought about impacts of AI very seriously but their intuitive reaction is that a range of negative outcomes are plausible. They are particularly concerned about some impacts (like misinformation), particularly unconcerned about others (like loss of meaning), and are more ambivalent about others (like loss of control).
I think this kind of “haven’t thought about it” is a much larger complication for interpreting the results of the survey, although I think it’s fine as long as you bear it in mind. (I think ML researchers who have thought about the issue in detail tend if anything to be somewhat more concerned than the survey respondents.)
many AI researchers just don’t seem too concerned about the risks posed by AI
My impressions of academic opinion have been broadly consistent with these survey results. I agree there is large variation and that many AI researchers are extremely skeptical about risk.

Paul_Christiano 27 Nov 2023 2:30 UTC
12 points
0 ∶ 0
in reply to: MichaelStJules’s comment on: Paper out now on creatine and cognitive performance
Yes, I’d bet the effects are even smaller than what this study found. This study gives a small amount of evidence of an effect > 0.05 SD. But without a clear mechanism I think an effect of < 0.05 SD is significantly more likely. One of the main reasons we were expecting an effect here was a prior literature that is now looking pretty bad.
That said, this was definitely some evidence for a positive effect, and the prior literature is still some evidence for a positive effect even if it’s not looking good. And the upside is pretty large here since creatine supplementation is cheap. So I think this is good enough grounds for me to be willing to fund a larger study.

Paul_Christiano 26 Nov 2023 17:19 UTC
67 points
7 ∶ 0
on: Paper out now on creatine and cognitive performance
My understanding of the results: for the preregistered tasks you measured effects of 1 IQ point (for RAPM) and 2.5 IQ points (for BDS), with a standard error of ~2 IQ points. This gives weak evidence in favor of a small effect, and strong evidence against a large effect.
You weren’t able to measure a difference between vegetarians and omnivores. For the exploratory cognitive tasks you found no effect. (I don’t know if you’d expect those tests to be sensitive enough to notice such a small effect.)
At this point it seems a bit unlikely to me that there is a clinically significant effect, maybe I’d bet at 4:1 against the effect being >0.05 SD. That said I still think it would be worthwhile for someone to do a larger study that could detect a 0.1 SD effect, since that would be clinically significant and is very weakly suggested by this data (and would make supplementation worthwhile given how cheap it is).
(See also gwern’s meta-analysis.)

Paul_Christiano 21 Oct 2023 17:28 UTC
4 points
0 ∶ 0
in reply to: SummaryBot’s comment on: Superforecasting the premises in “Is power-seeking AI an existential risk?”
I think the “alignment difficulty” premise was given higher probability by superforecasters, not lower probability.

Paul_Christiano 15 Sep 2023 21:55 UTC
10 points
3 ∶ 0
in reply to: Jaime Sevilla’s comment on: Jsevillamol’s Shortform
Agree that it’s easier to talk about (change)/(time) rather than (time)/(change). As you say, (change)/(time) adds better. And agree that % growth rates are terrible for a bunch of reasons once you are talking about rates >50%.
I’d weakly advocate for “doublings per year:” (i) 1 doubling / year is more like a natural unit, that’s already a pretty high rate of growth, and it’s easier to talk about multiple doublings per year than a fraction of an OOM per year, (ii) there is a word for “doubling” and no word for “increased by an OOM,” (iii) I think the arithmetic is easier.
But people might find factors of 10 so much more intuitive than factors of 2 that OOMs/year is better. I suspect this is increasingly true as you are talking more to policy makers and less to people in ML, but might even be true in ML since people are so used to quoting big numbers in scientific notation.
(I’d probably defend my definitional choice for slow takeoff, but that seems like a different topic.)

Paul_Christiano 15 Sep 2023 6:29 UTC
16 points
4 ∶ 0
in reply to: Jacob_Peacock’s comment on: Price-, Taste-, and Convenience-Competitive Plant-Based Meat Would Not Currently Replace Meat
Yes, I’m not entirely certain Impossible meat is equivalent in taste to animal-based ground beef. However, I do find the evidence I cite in the second paragraph of this section somewhat compelling.
Are you referring to the blind taste test? It seems like that’s the only direct evidence on this question.
It doesn’t look like the preparations are necessarily analogous. At a minimum the plant burger had 6x more salt. All burgers were served with a “pinch” of salt but it’s hard to know what that means, and in any case the plant burger probably ended up at least 2x as salty.^[1] You note this as a complicating factor, but salt has a huge impact on taste and it seems to me like it can easily dominate the results of a 2-3 bite taste test between vaguely comparable foods.
I also have no idea at all how good or bad the comparison burger was. Food varies a lot. (It’s kind of coincidental the salt happened to show up in the nutrition information—otherwise I wouldn’t even be able to make this concrete criticism). It seems really hard to draw conclusions about taste competitiveness of a meat substitute from this kind of n=1 study, beyond saying that you are in the same vague zone.
Have you compared these foods yourself? I eat both of them regularly. Taste competitiveness seemed plausible the first time I ate impossible ground beef, but at this point the difference feels obviously large. I seriously doubt that the typical omnivore would consider them equivalent after eating them a few times.
Overall, despite these caveats on taste, lots of plant-based meat was still sold, so it was “good enough” in some sense, but there was still potentially little resulting displacement of beef (although maybe somewhat more of chicken).
My conclusion would be: plant substitutes are good enough that some people will eat them, but bad enough that some people won’t. They are better than some foods and worse than others.
It feels like you are simultaneously arguing that high uptake is a sign that taste is “good enough,” and that low uptake is a sign that “good enough” taste isn’t sufficient to replace meat. I don’t think you can have it both ways, it’s not like there is a “good enough” threshold where sales jump up to the same level as if you had competitive taste. Better taste just continuously helps with sales.
I agree and discuss this issue some in the Taste section. In short, this is part of why I think informed taste tests would be more relevant than blind: in naturalistic settings, it is possible that people would report not liking the taste of PBM even though it passes a blind taste test. So I think this accurately reflects what we should expect in practice.
I disagree. Right now I think that plant-based meat substitutes have a reputation as tasting worse than meat largely because they actually taste worse. People also have memories of disliking previous plant-based substitutes they tried. In the past the gap was even larger and there is inertia in both of these.
If you had taste competitive substitutes, then I think their reputation and perception would likely improve over time. That might be wrong, but I don’t see any evidence here against the common-sense story.
1. ^
  The plant burger had about 330mg vs 66mg of salt. If a “pinch” is 200mg then it would end up exactly 2x as salty. But hard to know exactly what a pinch means, and also it matters if you cook salt into the beef or put a pinch on top, and so on.

Paul_Christiano 3 Sep 2023 19:00 UTC
25 points
6 ∶ 2
in reply to: Benevolent_Rain’s comment on: Ulrik Horn’s Quick takes
The linked LW post points out that nuclear power was cheaper in the past than it is today, and that today the cost varies considerably between different jurisdictions. Both of these seem to suggest that costs would be much lower if there was a lower regulatory burden. The post also claims that nuclear safety is extremely high, much higher than we expect in other domains and much higher than would be needed to make nuclear preferable to alternative technologies. So from that post I would be inclined to believe that overregulation is the main reason for a high cost (together with the closely related fact that we’ve stopped building nuclear plants and so don’t benefit from economies of scale).
I can definitely believe the linked post gives a misleading impression. But I think if you want to correct that impression it would be really useful to explain why it’s wrong. It would be even better to provide pointers to some evidence or analysis, but just a clear statement of disagreement would already be really helpful.
Do you think that greater adoption of nuclear power would be harmful (e.g. because the safety profile isn’t good, because it would crowd out investments in renewables, because it would contribute to nuclear proliferation, or something else)? That lowering regulatory requirements would decrease safety enough that nuclear would become worse than alternative power sources, even if it isn’t already? That regulation isn’t actually responsible for the majority of costs? A mixture of the above? Something else altogether?
My own sense is that using more nuclear would have been a huge improvement over the actual power mix we’ve ended up with, and that our failure to build nuclear was mostly a policy decision. I don’t fully understand the rationale, but it seems like the outcome was regulation that renders nuclear uncompetitive in the US, and it looks like this was a mistake driven in large part by excessive focus on safety. I don’t know much about this so I obviously wouldn’t express this opinion with confidence, and it would be great to get a link to a clear explanation of an alternative view.

Paul_Christiano 27 Aug 2023 23:22 UTC
27 points
7 ∶ 0
on: Price-, Taste-, and Convenience-Competitive Plant-Based Meat Would Not Currently Replace Meat
I’m confused about your analysis of the field experiment. It seems like the three options are {Veggie, Impossible, Steak}. But wouldn’t Impossible be a comparison for ground beef, not for steak? Am I misunderstanding something here?
Beyond that, while I think Impossible meat is great, I don’t think it’s really equivalent on taste. I eat both beef and Impossible meat fairly often (>1x / week for both) and I would describe the taste difference as pretty significant when they are similarly prepared.
If I’m understanding you correctly then 22% of the people previously eating steak burritos switched to Impossible burritos, which seems like a really surprisingly large fraction to me.
(Even further, consumer beliefs are presumably anchored to their past experiences, to word of mouth, etc. and so even if you did have taste equivalence here I wouldn’t expect people’s decisions to be perfectly informed by that fact. If you produced a taste equivalent meat substitute tomorrow and were able to get 22% of people switching in your first deployment, that would seem like a surprisingly high success rate that’s very consistent with even a strong form of PTC, I wouldn’t expect consumers to switch immediately even if they will switch eventually. Getting those results with Impossible meat vs steak seems even more encouraging.)

Paul_Christiano 14 Jun 2023 4:59 UTC
2 points
0 ∶ 0
in reply to: Ted Sanders’s comment on: Transformative AGI by 2043 is <1% likely
I didn’t mean to imply that human-level AGI could do human-level physical labor with existing robotics technology; I was using “powerful” to refer to a higher level of competence. I was using “intermediate levels” to refer to human-level AGI, and assuming it would need cheap human-like bodies.
Though mostly this seems like a digression. As you mention elsewhere, the bigger crux is that it seems to me like automating R&D would radically shorten timelines to AGI and be amongst the most important considerations in forecasting AGI.
(For this reason I don’t often think about AGI timelines, especially not for this relatively extreme definition. Instead I think about transformative AI, or AI that is as economically impactful as a simulated human for $X, or something along those lines.)

Paul_Christiano 14 Jun 2023 4:44 UTC
8 points
1 ∶ 0
in reply to: Ted Sanders’s comment on: Transformative AGI by 2043 is <1% likely
My point in asking “Are you assigning probabilities to a war making AGI impossible?” was to emphasize that I don’t understand what 70% is a probability of, or why you are multiplying these numbers. I’m sorry if the rhetorical question caused confusion.
My current understanding is that 0.7 is basically just the ratio (Probability of AGI before thinking explicitly about the prospect of war) / (Probability of AGI after thinking explicitly about prospect of war). This isn’t really a separate event from the others in the list, it’s just a consideration that lengthens timelines. It feels like it would also make sense to list other considerations that tend to shorten timelines.
(I do think disruptions and weird events tend to make technological progress slower rather than faster, though I also think they tend to pull tiny probabilities up by adding uncertainty.)

Paul_Christiano 7 Jun 2023 4:25 UTC
3 points
1 ∶ 0
in reply to: Ted Sanders’s comment on: Transformative AGI by 2043 is <1% likely
That’s fair, this was some inference that is probably not justified.
To spell it out: you think brains are as effective as 1e20-1e21 flops. I claimed that humans use more than 1% of their brain when driving (e.g. our visual system is large and this seems like a typical task that engages the whole utility of the visual system during the high-stakes situations that dominate performance), but you didn’t say this. I concluded (but you certainly didn’t say) that a human-level algorithm for driving would not have much chance of succeeding using 1e14 flops.

Paul_Christiano 6 Jun 2023 22:30 UTC
19 points
3 ∶ 0
in reply to: Ted Sanders’s comment on: Transformative AGI by 2043 is <1% likely
Incidentally, I’m puzzled by your comment and others that suggest we might already have algorithms for AGI in 2023. Perhaps we’re making different implicit assumptions of realistic compute vs infinite compute, or something else. To me, it feels clear we don’t have the algorithms and data for AGI at present
I would guess that more or less anything done by current ML can be done by ML from 2013 but with much more compute and fiddling. So it’s not at all clear to me whether existing algorithms are sufficient for AGI given enough compute, just as it wasn’t clear in 2013. I don’t have any idea what makes this clear to you.
Given that I feel like compute and algorithms mostly trade off, hopefully it’s clear why I’m confused about what the 60% represents. But I’m happy for it to mean something like: it makes sense at all to compare AI performance vs brain performance, and expect them to be able to solve a similar range of tasks within 5-10 orders of magnitude of the same amount of compute.
But as we discuss in the essay, 20 years is not a long time, much easier problems are taking longer, and there’s a long track record of AI scientists being overconfident about the pace of progress (counterbalanced, to be sure, by folks on the other side who are overconfident about things that would not be achieved and subsequently were).
If 60% is your estimate for “possible with any amount of compute,” I don’t know why you think that anything is taking a long time. We just don’t get to observe how easy problems are if you have plenty of compute, and it seems increasingly clear that weak performance is often explained by limited compute. In fact, even if 60% is your estimate for “doable with similar compute to the brain,” I don’t see why you are updating from our failure to do tasks with orders of magnitude less compute than a brain (even before considering that you think individual neurons are incredibly potent).
Section 2: Likelihood of fast reinforcement training
I still don’t fully understand the claims being made in this section. I guess you are saying that there’s a significant chance that the serial time requirements will be large and that will lead to a large delay? Like maybe you’re saying something like: a 20% chance that it will add >20 years of delay, a 30% chance of 10-20 years of delay, a 40% chance of 1-10 years of delay, a 10% chance of <1 year of delay?
In addition to not fully understanding the view, I don’t fully understand the discussion in this section or why it’s justifying this probability. It seems like if you had human-level learning (as we are conditioning on from sections 1+3) then things would probably work in <2 years unless parallelization is surprisingly inefficient. And even setting aside the comparison to humans, such large serial bottlenecks aren’t really consistent with any evidence to date. And setting any concrete details, you are already assuming we have truly excellent algorithms and so there are lots of ways people could succeed. So I don’t buy the number, but that may just be a disagreement.
You seem to be leaning heavily on the analogy to self-driving cars but I don’t find that persuasive—you’ve already postulated multiple reasons why you shouldn’t expect them to have worked so far. Moreover, the difficulties there also just don’t seem very similar to the kind of delay from serial time you are positing here, they seem much more closely related to “man we don’t have algorithms that learn anything like humans.”
Section 3: Operating costs
I think I’ve somehow misunderstood this section.
It looks to me like you are trying to estimate the difficulty of automating tasks by comparing to the size of brains of animals that perform the task (and in particular human brains). And you are saying that you expect it to take about 1e7 flops for each synapse in a human brain, and then define a probability distribution around there. Am I misunderstanding what’s going on here or is that a fair summary?
(I think my comment about GPT-3 = small brain isn’t fair, but the reverse direction seems fair: “takes a giant human brain to do human-level vision” --> “takes 7 orders of magnitude larger model to do vision.” If that isn’t valid, then why is “takes a giant human brain to do job X” --> “takes 7 orders of magnitude larger model to automate job X” valid? Is it because you are considering the worst-case profession?)
Your biological analysis seems to hinge on the assertion that precise simulation of neurons is necessary to get similar levels of computational utility
We do not believe this.
I don’t think I understand where your estimates come from, unless we are just disagreeing about the word “precise.” You cite the computational cost of learning a fairly precise model of a neuron’s behavior as an estimate for the complexity per neuron. You also talk about some low level dynamics without trying to explain why they may be computationally relevant. And then you give pretty confident estimates for the useful computation done in a brain. Could you fill in the missing steps in that estimate a bit more, both for the mean (of 1e6 per neuron*spike) and for the standard deviation of the log (which seems to be about ~1 oom)?
build computers that operate at the Landauer limit (as you are apparently confident the brain does)
I think I misunderstood your claims somehow.
I think you are claiming that the brain does 1e20-1e21 flops of useful computation. I don’t know exactly how you are comparing between brains and floating point operations. A floating point operation is more like 1e5 bit erasures today and is necessarily at least 16 bit erasures at fp16 (and your estimates don’t allow for large precision reductions e.g. to 1 bit arithmetic). Let’s call it 1.6e21 bit erasures per second, I think quite conservatively?
I might be totally wrong about the Landauer limit, but I made this statement by looking at Wikipedia which claims 3e-21 J per bit erasure at room temperature. So if you multiply that by 1.6e21 bit erasures per second, isn’t that 5 W, nearly half the power consumption of the brain?
Is there a mistake somewhere in there? Am I somehow thinking about this differently from you?
To multiply these probabilities together, one cannot multiply their unconditional expectations; rather, one must multiply their cascading conditional probabilities. You may disagree with our probabilities, but our framework specifically addresses this point. Our unconditional probabilities are far lower for some of these events, because we believe they will be rapidly accelerated conditional on progress in AGI.
I understand this, but the same objection applies for normal distributions being more than 0. Talking about conditional probabilities doesn’t help.
Are you saying that e.g. a war between China and Taiwan makes it impossible to build AGI? Or that serial time requirements make AGI impossible? Or that scaling chips means AGI is impossible? It seems like each of these just makes it harder. These are factors you should be adding up. Some things can go wrong and you can still get AGI by 2043. If you want to argue you can’t build AGI if something goes wrong, that’s a whole different story. So multiplying probabilities (even conditional probabilities) for none of these things happening doesn’t seem right.
Lastly, can I kindly ask what your cascading conditional probabilities would be in our framework? (Let’s hold the framework constant for this question, even if you disagree with it.)
I don’t know what the events in your decomposition refer to well enough to assign them probabilities:
- I still don’t know what “algorithms for AGI” means. I think you are somehow ignoring compute costs, but if so I don’t know on what basis you are making any kind of generalization from our experience with the difficulty of designing extremely fast algorithms. In most domains algorithmic issues are ~the whole game and that seems true in AI as well.
- I don’t really know what “invent a way for AGI to learn faster than humans” means, as distinct from the estimates in the next section about the cost of AGI algorithms. Again, are you trying to somehow abstract out compute costs of learning here? Then my probabilities are very but uninteresting.
- Taken on its own, it seems like the third probability (“AGI inference costs drop below $25/hr (per human equivalent)”) implies the conclusion. So I assume you are doing something where you say “Ignoring increases in demand and the possibility of supply chain disruptions and...” or something like that? So the forecast you are making about compute prices aren’t unconditional forecasts?
- I don’t know what level of cheap, quality robots you refer to. The quality of robotics needed to achieve transformative AI depends completely on the quality of your AI. For powerful AI it can be done with existing robot bodies, for weak AI it would need wildly superhuman bodies, at intermediate levels it can be done if humanoid robots cost millions of dollars each. And conversely the previous points aren’t really defined unless you specify something about the robotic platform. I assume you address this in the section but I think it’s going to be hard to define enough that I can give a number.
- I don’t know what massively scaling chips mean—again, it seems like this just depends crucially on how good your algorithms are. It feels more like you should be estimating multiple numbers and then seeing the probability that the product is large enough to be impactful.
- I don’t know what “avoid derailment” means. It seems like these are just factors that affect the earlier estimates, so I guess the earlier quantities were supposed to be something like “the probability of developing AGI given that nothing weird happens in the world”? Or something? But weird stuff is guaranteed to be happening in the world. I feel like this is the same deal as above, you should be multiplying out factors.
From your comment, I think the biggest crux between us is the rate of AI self-improvement. If the rate is lower, the world may look like what we’re envisioning. If the rate is higher, progress may take off in a way not well predicted by current trends, and the world may look more like what you’re envisioning. This causes our conditional probabilities to look too low and too independent, from your point of view. Do you think that’s a fair assessment?
I think this seems right.
In particular, it seems like some of your estimates make more sense to me if I read them as saying “Well there will likely exist some task that AI systems can’t do.” But I think such claims aren’t very relevant for transformative AI, which would in turn lead to AGI.
By the same token, if the AIs were looking at humans they might say “Well there will exist some tasks that humans can’t do” and of course they’d be right, but the relevant thing is the single non-cherry-picked variable of overall economic impact. The AIs would be wrong to conclude that humans have slow economic growth because we can’t do some tasks that AIs are great at, and the humans would be wrong to conclude that AIs will have slow economic growth because they can’t do some tasks we are great at. The exact comparison is only relevant for assessing things like complementarity, which make large impacts happen strictly more quickly than they would otherwise.
(This might be related to me disliking AGI though, and then it’s kind of on OpenPhil for asking about it. They could also have asked about timelines to 100000x electricity production and I’d be making broadly the same arguments, so in some sense it must be me who is missing the point.)
I do think it reflects a decent mental model of how the world works, which leads to decent calibration for what’s 3% likely vs 30% likely. The main reason I mention it in the paper is just to help folks realize that we’re not wackos predicting 1% because we “really feel” confident. In many other situations (e.g., election forecasting, sports betting, etc.) I often find myself on the humble and uncertain side of the fence, trying to warn people that the world is more complicated and unpredictable than their gut is telling them.
That makes sense, and I’m ready to believe you have more calibrated judgments on average than I do. I’m also in the business of predicting a lot of things, but not as many and not with nearly as much tracking and accountability. That seems relevant to the question at hand, but still leaves me feeling very intuitively skeptical about this kind of decomposition.

Paul_Christiano 6 Jun 2023 19:25 UTC
16 points
5 ∶ 0
in reply to: Ted Sanders’s comment on: Transformative AGI by 2043 is <1% likely
You start off saying that existing algorithms are not good enough to yield AGI (and you point to the hardness of self-driving cars as evidence) and fairly likely won’t be good enough for 20 years. And also you claim that existing levels of compute would be a way too low to learn to drive even if we had human-level algorithms. Doesn’t each of those factors on its own explain the difficulty of self-driving? How are you also using the difficulty of self-driving to independently argue for a third conjunctive source of difficulty?
Maybe another related question: can you make a forecast about human-level self-driving (e.g. similar accident rates vs speed tradeoffs to a tourist driving in a random US city) and explain its correlation with your forecast about human-level AI overall? If you think full self-driving is reasonably likely in the next 10 years, that superficially appears to undermine the way you are using it as evidence for very unlikely AGI in 20 years. Conversely, if you think self-driving is very unlikely in the next 10 years, then it would be easier for people to update their overall views about your forecasts after observing (or failing to observe) full self-driving.
I think there is significantly more than a 50% chance that there will be human-level self-driving cars, in that sense, within 10 years. Maybe my chance is 80% though I haven’t thought about it hard. (Note that I already lost one bet about self-driving cars: in 2017 my median for # of US cities where a member of the public could hail a self-driving taxi in mid-2023 was 10-20, whereas reality turned out to be 0-1 depending on details of the operationalization in Phoenix. But I’ve won and lost 50-50 bets about technology in both the too-optimistic and too-pessimistic directions, and I’d be happy to bet about self-driving again.)
(Note that I also think this is reasonably likely to be preempted by explosive technological change driven by AI, which highlights an important point of disagreement with your estimate, but here I’m willing to try to isolate the disagreement about the difficulty of full self-driving.)
ETA: let me try to make the point about self-driving cars more sharply. You seem to think there’s a <15% chance that by 2043 we can do what a human brain can do even using 1e17 flops (a 60% chance of “having the algorithms” and a 20% chance of being 3 OOMs better than 1e20 flops). Driving uses quite a lot of the functions that human brains are well-adapted to perform—perception, prediction, planning, control. If we call it one tenth of a brain, that’s 1e16 flops. Whereas I think existing self-driving cars use closer to 1e14 flops. So shouldn’t you be pretty much shocked if self-driving cars could be made to work using any amount of data with so little computing hardware? How can you be making meaningful updates from the fact that they don’t?

Paul_Christiano 6 Jun 2023 18:09 UTC
91 points
17 ∶ 1
on: Transformative AGI by 2043 is <1% likely
I don’t think I understand the structure of this estimate, or else I might understand and just be skeptical of it. Here are some quick questions and points of skepticism.
Starting from the top, you say:
We estimate optimistically that there is a 60% chance that all the fundamental algorithmic improvements needed for AGI will be developed on a suitable timeline.
This section appears to be an estimate of all-things-considered feasibility of transformative AI, and draws extensively on evidence about how lots of things go wrong in practice when implementing complicated projects. But then in subsequent sections you talk about how even if we “succeed” at this step there is still a significant probability of failing because the algorithms don’t work in a realistic amount of time.
Can you say what exactly you are assigning a 60% probability to, and why it’s getting multiplied with ten other factors? Are you saying that there is a 40% chance that by 2043 AI algorithms couldn’t yield AGI no matter how much serial time and compute they had available? (It seems surprising to claim that even by 2023!) Presumably not that, but what exactly are you giving a 60% chance?
(ETA: after reading later sections more carefully I think you might be saying 60% chance that our software is about as good as nature’s, and maybe implicitly assuming there is a ~0% chance of being significantly better than that or building TAI without that? I’m not sure if that’s right though, if so it’s a huge point of methodological disagreement. I’ll return to this point later.)
In section 2 you say:
Transformative AGI by 2043 depends critically on the development of non-sequential reinforcement learning training methods with no real human analogue.
And give this a 40% probability. I don’t think I understand this claim or its justification. (This is related to my uncertainty about what your “60%” in the last section was referring to.)
It seems to me that if you had human-like learning you would be able to produce transformative AGI by 2043:
1. In fact it looks like human-like learning would enable AI to learn human-level physical skills:
  1. 10 years is sufficient for humans to learn most physical skills from scratch, and you are talking about 20 year timelines. So why is the serial time for learning even a candidate blocker?
  2. Humans learn new physical skills (including e.g. operating unfamiliar machinery) within tens of hours. This requires transfer from other things humans have learned, but those tasks are not always closely related (e.g. I learn to drive a car based on experience walking) and AI systems will have access to transfer from tasks that seem if anything more similar (e.g. prediction of the relevant physical environments, predictions of expert behavior in similar domains, closed-loop behavior in a wide range of simulated environments, closed-loop behavior on physical tasks with shorter timescales, behavior in virtual environments...).
  3. We can easily run tens of thousands of copies of AI systems in parallel. Existing RL is massively parallelizable. Human evolution gives no evidence about the difficulty of parallelizing learning in this way. Based on observations of human learning it seems extremely likely to me that parallelization 10,000 fold can reduce serial time by at least 10x (which is all that is needed). Extrapolations of existing RL algorithms seem to suggest serial requirements more like 10,000 episodes, with almost all of the compute used to run a massive number of episodes in parallel, which would be 1 year even for a 1-hour task. It seems hard to construct physical tasks that don’t provide rich feedback after even shorter horizons than 1 hour (and therefore suitable for a gradient descent step given enough parallel samples) so this seems pretty conservative.
2. Regardless of learning physical tasks, humans are able to learn to do R&D after 20 years of experience. AI systems operate at 10x speed and most environments relevant to hardware and software R&D can be sped up by at least 10x. So it seems like AI systems could be human-level at a wide range of tasks, sufficient to accelerate further AI progress, even if they just used non-parallelized human learning over 2 years. If you really thought physical tasks were somehow impossibly difficult (which I don’t think is justified) then this becomes the dominant path to AGI. This is particularly important because multiple of your later points also seem to rest on the distinctive difficulty of automating physical tasks, which should just shift your probability further and further to an explosion of automated R&D which drives automation of physical labor.
I think you are disagreeing with these claims, but I’m not sure about that. For example, you mention parallelizable learning but seem to give it <10% probability despite the fact that it is the overwhelmingly dominant paradigm in current practice and you don’t say anything about why it might not work.
(This isn’t super relevant to my mainline view, since in fact I think AI is much worse at learning quickly than humans and will likely be transformative way before reaching parity. This is related to the general point about being unnecessarily conjunctive, but here I’m just trying to understand and express disagreement with the particular path you lay out and the probabilities you assign.)
In section 3 you say:
Software and hardware efficiencies combine to surpass current computation cost efficiency, and/or the efficiency of the human brain, by at least five orders of magnitude.
I think you claim that each synapse firing event requires about 1-10 million floating point operations (with some error bars), and that there is only a 16% chance that computers will be able to do enough compute for $25/hour.
This is probably the part of the report I am most skeptical of:
- How do you square this with our experience in AI so far? Overall you seem to think it is possible that AI will be as effective as brains but unlikely to be much better. But if a biological neuron is probably ten million times more efficient than an artificial neuron, then aren’t we already much better than biology in tons of domains? Is there any task for which performance can be quantified and where you think this estimate provides a sane guideline to the inference-time compute required to solve the task? Shouldn’t you be putting significant probability on our algorithms being radically better than biology in many important ways?
  - Replicating the human visual cortex should take millions of times more compute than we have ever used, yet we can match human performance on a range of quantifiable perceptual tasks and are making rapid progress, and I’m actually not aware of tasks where it’s even plausible that we are 6 orders of magnitude away.
  - Learned policies for robotic control using only hundreds of thousands of neurons already seem to reach comparable competence to insects, but you should expect it to be significantly worse than a nematode. Aren’t you surprised to observe successful grasping and walking?
  - Traditional control systems like those used by Boston Dynamics seem to produce more competent motor control than small animals despite using amounts of compute close to 1 flop per synapse. You focus on ML, but I don’t know why—isn’t classical control a more reasonable point of comparison to small animals that have algorithms designed directly by evolution rather than learned in a lifetime, and doesn’t your argument very strongly predict that it should be impossible?
  - Qualitatively it’s hard to compare GPT-3 to humans, but just to be clear you are saying that it should behave like a brain with ~1000 neurons. This is at least surprising (e.g. I think would have led to big misses if it had been used to make any qualitative predictions), and to me casts doubt on a story where you can’t get transformative AI using less than the analog of a hundred billion neurons.
- Your biological analysis seems to hinge on the assertion that precise simulation of neurons is necessary to get similar levels of computational utility (and even from there the analysis is pretty conservative, e.g. by assuming that performing that you need to perform a very expensive computation thousands of times a second). I don’t personally consider this plausible and I think the main argument given for it is that “if not, why would we have all these proteins?” which I don’t find persuasive (since synapses are under a huge number of important constraints and serve many important functions beyond implementing computationally complex functions at inference time). I’ve seen zero candidates for useful purposes for such an incredible amount of local computation with negligible quantities of long-distance communication, and there are very few examples of human-designed computations structured in this way / it seems to involve an extremely implausible model of what neurons are doing (apparently some nearly-embarassingly parallelizable task with work concentrated in individual neurons?). I don’t really want to argue with this at length, but want to flag that you are very confident about it and it drives a large part of your estimate whereas something like 50-50 seems more appropriate even before updating on the empirical success of ML.
- In general you seem to be making the case very unnecessarily conjunctive—you are asking how likely it is that we will find algorithms as good as the brain, and then also build computers that operate at the Landauer limit (as you are apparently confident the brain does), and then also deploy AI in a way that is competitive at a $25/hour price point, and so on. But in fact one of these areas can outperform your benchmark (and if you are right in this section, then it’s definitely the case that we are radically more efficient than biology on many tasks already!), and it seems like you are dropping a lot of probability by ignoring that possibility. It’s like asking about the probability that a sum of 5 normal distributions will be above the mean, and estimating it’s 1/2^5 because each of 5 normal distributions needs to be above its mean.
(ETA: this criticism of section 3 is unfair: you do discuss the prospect of much better than human performance in the 2-page section “On the computational intensity of AGI,” and indeed this plays a completely central role in your bottom line estimate. But I’m still left wondering what the earlier 60% and 40% (and all the other numbers!) are supposed to represent, given that you are apparently putting all the work of “maybe humans will design efficient algorithms that are as good as the brain” in this section. You also don’t really discuss existing experience, where your estimates already appear to be many orders of magnitude off in domains where it is easiest to make comparisons between biology and ML (like vision or classical control) and where I don’t see how to argue we aren’t already 1000x better than biology using your 10 million flops per synapse number. Aside from me disagreeing with your mean, you describe these as conservative error bars since they put 20% probability on 1000x improvements over biology, but I think that’s really not the case given that it includes uncertainty about the useful compute done by the brain (where you already disagree by >>3 OOMs with plausible estimates) as well as algorithmic progress (where 1000x improvements over 20 years seem common both within software and ML).)
I’ll stop here rather than going on to sections 4+, though I think I have a lot to object to along similar lines (primarily that the story is being made unreasonably conjunctive).
Overall your estimation strategy looks crazy to me and I’m skeptical of the the implicit claim that this kind of methodology would perform well in historical examples. That said, if this sort of methodology actually does work well in practice then I think that trumps some a priori speculation and would be an important thing for me to really absorb. Your personal forecasting successes seem like a big part of the evidence for that, so it might be helpful to understand what kinds of predictions were involved and how methodologically analogous they are. Superficially it looks like the SciCast technology forecasting tournament is by far the most relevant; is there a pointer to the list of questions (other info like participants and list of predictions would also be awesome if available)? Or do you think one of the other items is more relevant?

Paul_Christiano 6 May 2023 1:53 UTC
21 points
3 ∶ 0
on: How much do you believe your results?
There was a related GiveWell post from 12 years ago, including a similar example where higher “unbiased” estimates correspond to lower posterior expectations.
That post is mostly focused on practical issues about being a human, and much less amusing, but it speaks directly to your question #2.
(Of course, I’m most interested in question #3!)

Paul_Christiano 25 Apr 2023 2:15 UTC
5 points
1 ∶ 0
in reply to: Habryka’s comment on: The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns
I agree. When I give numbers I usually say “We should keep the risk of AI takeover beneath 1%” (though I haven’t thought about it very much and mostly the numbers seem less important than the qualitative standard of evidence).
I think that 10% is obviously too high. I think that a society making reasonable tradeoffs could end up with 1% risk, but that it’s not something a government should allow AI developers to do without broader public input (and I suspect that our society would not choose to take this level of risk).

Paul_Christiano 24 Apr 2023 23:52 UTC
2 points
0 ∶ 0
in reply to: Greg_Colbourn’s comment on: The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns
Yeah, the sentence cut off. I was saying: obviously a 10% risk is socially unacceptable. Trying to convince someone it’s not in their interest is not the right approach, because doing so requires you to argue that P(doom) is much greater than 10% (at least with some audiences who care a lot about winning a race). Whereas trying to convince policy makers and the public that they shouldn’t tolerate the risk requires meeting a radically lower bar, probably even 1% is good enough.

Paul_Christiano 24 Apr 2023 23:49 UTC
2 points
0 ∶ 0
in reply to: Habryka’s comment on: The Cruel Trade-Off Between AI Misuse and AI X-risk Concerns
I mean that 90% or 99% seem like clearly reasonable asks, and 100% is a clearly unreasonable ask.
I’m just saying that the argument “this is a suicide race” is really not the way we should go. We should say the risk is >10% and that’s obviously unacceptable, because that’s an argument we can actually win.

Paul_Christiano

Section 2: Likelihood of fast reinforcement training

Section 3: Operating costs