Former AI safety research engineer, now AI governance researcher at OpenAI. Blog: thinkingcomplete.blogspot.com
richard_ngo
@Linch, see the article I linked above, which identifies a bunch of specific bottlenecks where lobbying and/or targeted funding could have been really useful. I didn’t know about these when I wrote my comment above, but I claim prediction points for having a high-level heuristic that led to the right conclusion anyway.
The article I linked above has changed my mind back again. Apparently the RTS,S vaccine has been in clinical trials since 1997. So the failure here wasn’t just an abstract lack of belief in technology: the technology literally already existed the whole time that the EA movement (or anyone who’s been in this space for less than two decades) has been thinking about it.
An article on why we didn’t get a vaccine sooner: https://worksinprogress.co/issue/why-we-didnt-get-a-malaria-vaccine-sooner
This seems like significant evidence for the tractability of speeding things up. E.g. a single (unjustified) decision by the WHO in 2015 delayed the vaccine by almost a decade, four years of which were spent in fundraising. It seems very plausible that even 2015 EA could have sped things up by multiple years in expectation either lobbying against the original decision, or funding the follow-up trial.
Great comment, thank you :) This changed my mind.
This is a good point. The two other examples which seem salient to me:
Deutsch’s brand of techno-optimism (which comes through particularly clearly when he tries to reason about the future of AI by saying things like “AIs will be people, therefore...”).
Yudkowsky on misalignment.
Ah, I see. I think the two arguments I’d give here:
Founding 1DaySooner for malaria 5-10 years earlier is high-EV and plausibly very cheap; and there are probably another half-dozen things in this reference class.
We’d need to know much more about the specific interventions in that reference class to confidently judge that we made a mistake. But IMO if everyone in 2015-EA had explicitly agreed “vaccines will plausibly dramatically slash malaria rates within 10 years” then I do think we’d have done much more work to evaluate that reference class. Not having done that work can be an ex-ante mistake even if it turns out it wasn’t an ex-post mistake.
Hmm, your comment doesn’t really resonate with me. I don’t think it’s really about being monomaniacal. I think the (in hindsight) correct thought process here would be something like:
”Over the next 20 or 50 years, it’s very likely that the biggest lever in the space of malaria will be some kind of technological breakthrough. Therefore we should prioritize investigating the hypothesis that there’s some way of speeding up this biggest lever.”I don’t think you need this “move heaven and earth” philosophy to do that reasoning; I don’t think you need to focus on EA growth much more than we did. The mental step could be as simple as “Huh, bednets seem kinda incremental. Is there anything that’s much more ambitious?”
(To be clear I think this is a really hard mental step, but one that I would expect from an explicitly highly-scope-sensitive movement like EA.)
Makes sense, though I think that global development was enough of a focus of early EA that this type of reasoning should have been done anyway.
I’m more sympathetic about it not being done after, say, 2017.
A different BOTEC: 500k deaths per year, at $5000 per death prevented by bednets, we’d have to get a year of vaccine speedup for $2.5 billion to match bednets.
I agree that $2.5 billion to speed up development of vaccines by a year is tricky. But I expect that $2.5 billion, or $250 million, or perhaps even $25 million to speed up deployment of vaccines by a year is pretty plausible. I don’t know the details but apparently a vaccine was approved in 2021 that will only be rolled out widely in a few months, and another vaccine will be delayed until mid-2024: https://marginalrevolution.com/marginalrevolution/2023/10/what-is-an-emergency-the-case-of-rapid-malaria-vaccination.html
So I think it’s less a question of whether EA could have piled more money on and more a question of whether EA could have used that money + our talent advantage to target key bottlenecks.
(Plus the possibility of getting gene drives done much earlier, but I don’t know how to estimate that.)
That’s very useful info, ty. Though I don’t think it substantively changes my conclusion because:
Government funding tends to go towards more legible projects (like R&D). I expect that there are a bunch of useful things in this space where there are more funding gaps (e.g. lobbying for rapid vaccine rollouts).
EA has sizeable funding, but an even greater advantage in directing talent, which I think would have been our main source of impact.
There were probably a bunch of other possible technological approaches to addressing malaria that were more speculative and less well-funded than mRNA vaccines. Ex ante, it was probably a failure not to push harder towards them, rather than focusing on less scalable approaches which could never realistically have solved the full problem.
To be clear, I think it’s very commendable that OpenPhil has been funding gene drive work for a long time. I’m sad about the gap between “OpenPhil sends a few grants in that direction” and “this is a central example of what the EA community focuses on” (as bednets have been); but that shouldn’t diminish the fact that even the former is a great thing to have happen.
It currently seems likely to me that we’re going to look back on the EA promotion of bednets as a major distraction from focusing on scientific and technological work against malaria, such as malaria vaccines and gene drives.
I don’t know very much about the details of either. But it seems important to highlight how even very thoughtful people trying very hard to address a serious problem still almost always dramatically underrate the scale of technological progress.
I feel somewhat mournful about our failure on this front; and concerned about whether the same is happening in other areas, like animal welfare, climate change, and AI risk. (I may also be missing a bunch of context on what actually happened, though—please fill me in if so.)
More precisely, the cascade is:
- Probability of us developing TAGI, assuming no derailments
- Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailmentGot it. As mentioned I disagree with your 0.7 war derailment. Upon further thought I don’t necessarily disagree with your 0.7 “regulation derailment”, but I think that in most cases where I’m talking to people about AI risk, I’d want to factor this out (because I typically want to make claims like “here’s what happens if we don’t do something about it”).
Anyway, the “derailment” part isn’t really the key disagreement here. The key disagreement is methodological. Here’s one concrete alternative methodology which I think is better: a more symmetric model which involves three estimates:
Probability of us developing TAGI, assuming that nothing extreme happens
Probability of us being derailed, conditional on otherwise being on track to develop TAGI
Probability of us being rerailed, conditional on otherwise not being on track to develop TAGI
By “rerailed” here I mean roughly “something as extreme as a derailment happens, but in a way which pushes us over the threshold to be on track towards TAGI by 2043″. Some possibilities include:
An international race towards AGI, akin to the space race or race towards nukes
A superintelligent but expensive AGI turns out to good enough at science to provide us with key breakthroughs
Massive economic growth superheats investment into TAGI
Suppose we put 5% credence on each of these “rerailing” us. Then our new calculation (using your numbers) would be:
The chance of being on track assuming that nothing extreme happens: 0.6*0.4*0.16*0.6*0.46 = 1%
P(no derailment conditional on being on track) = 0.7*0.9*0.7*0.9*0.95 = 38%
P(rerailment conditional on not being on track) = 1 − 0.95*0.95*0.95 = 14%
P(TAGI by 2043) = 0.99*0.14 + 0.01*0.38 = 14.2%
That’s over 30x higher than your original estimate, and totally changes your conclusions! So presumably you must think either that there’s something wrong with the structure I’ve used here, or else that 5% is way too high for each of those three rerailments. But I’ve tried to make the rerailments as analogous to the derailments as possible. For example, if you think a depression could derail us, then it seems pretty plausible that the opposite of a depression could rerail us using approximately the same mechanisms.
You might say “look, the chance of being on track to hit all of events 1-5 by 2043 is really low. This means that in worlds where we’re on track, we’re probably barely on track; whereas in worlds where we’re not on track, we’re often missing it by decades. This makes derailment much easier than rerailment.” Which… yeah, conditional on your numbers for events 1-5, this seems true. But the low likelihood of being on track also means that even very low rerailment probabilities could change your final estimate dramatically—e.g. even 1% for each of the rerailments above would increase your headline estimate by almost an order of magnitude. And I do think that many people would interpret a headline claim of “<1%” as pretty different from “around 3%”.
Having said that, speaking for myself, I don’t care very much about <1% vs 3%; I care about 3% vs 30% vs 60%. The difference between those is going to primarily depend on events 1-5, not on derailments or rerailments. I have been trying to avoid getting into the weeds on that, since everyone else has been doing so already. So I’ll just say the following: to me, events 1-5 all look pretty closely related. “Way better algorithms” and “far more rapid learning” and “cheaper inference” and “better robotic control” all seem in some sense to be different facets of a single underlying trend; and chip + power production will both contribute to that trend and also be boosted by that trend. And so, because of this, it seems likely to me that there are alternative factorizations which are less disjoint and therefore get very different results. I think this was what Paul was getting at, but that discussion didn’t seem super productive, so if I wanted to engage more with it a better approach might be to just come up with my own alternative factorization and then argue about whether it’s better or worse than yours. But this comment is already too long so will leave it be for now.
If events 1-5 constitute TAGI, and events 6-10 are conditional on AGI, and TAGI is very different from AGI, then you can’t straightforwardly get an overall estimate by multiplying them together. E.g. as I discuss above, 0.3 seems like a reasonable estimate of P(derailment from wars) if the chip supply remains concentrated in Taiwan, but doesn’t seem reasonable if the supply of chips is on track to be “massively scaled up”.
One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there’s:
No rigorous basis for that the use of mechanistic interpretability would “open up possibilities” to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint’s hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.
I would like to point to this as a central example of the type of thing I’m worried about in scenario 2: the sort of doom spiral where people end up actively opposed to the most productive lines of research we have, because they’re conceiving of the problem as being arbitrarily hard. This feels very reminiscent of the environmentalists who oppose carbon capture or nuclear energy because it might make people feel better without solving the “real problem”.
It looks like, on net, people disagree with my take in the original post. So I’d like to ask the people who disagree: do you have reasons to think that the sort of position I’ve quoted here won’t become much more common as AI safety becomes much more activism-focused? Or do you think it would be good if it did?
Unless we know that alignment is going to be easier, pushing forward on capabilities without an outsized alignment benefit seems needlessly risky.
I am not disputing this :) I am just disputing the factual claim that we know which is easier.
I’d say “alignment is harder than capabilities” seems almost certainly true
Are you making the claim that we’re almost certainly not in a world where alignment is easy? (E.g. only requires something like Debate/IA and maybe some rudimentary interpretability techniques.) I don’t see how you could know that.
Yepp, that seems right. I do think this is a risk, but I also think it’s often overplayed in EA spaces. E.g. I’ve recently heard a bunch of people talking about the capability infohazards that might arise from interpretability research. To me, it seems pretty unlikely that this concern should prevent people from doing or sharing interpretability research.
What’s the disagreement here? One part of it is just that some people are much more pessimistic about alignment research than I am. But it’s not actually clear that this by itself should make a difference, because even if they’re pessimistic they should “play to their outs”, and “interpretability becomes much better” seems like one of the main ways that pessimists could be wrong.
The main case I see for being so concerned about capability infohazards as to stop interpretability research is if you’re pessimistic about alignment but optimistic about governance. But I think that governance will still rely on e.g. a deep understanding of the systems involved. I’m pretty skeptical about strategies which only work if everything is shut down (and Scenario 2 is one attempt to gesture at why).
I put little weight on this analysis because it seems like a central example of the multiple stage fallacy. But it does seem worth trying to identify clear example of the authors not accounting properly for conditionals. So here are three concrete criticisms (though note that these are based on skimming rather than close-reading the PDF):
A lot of the authors’ analysis about the probability of war derailment is focused on Taiwan, which is currently a crucial pivot point. But conditional on chip production scaling up massively, Taiwan would likely be far less important.
If there is extensive regulation of AI, it will likely slow down both algorithmic and hardware progress. So conditional on the types of progress listed under events 1-5, the probability of extensive regulation is much lower than it would be otherwise.
The third criticism is more involved; I’ll summarize it as “the authors are sometimes treating the different events as sequential in time, and sometimes sequential in logical flow”. For example, the authors assign around 1% to events 1-5 happening before 2043. If they’re correct, then conditioning on events 1-5 happening before 2043, they’ll very likely only happen just before 2043. But this leaves very little time for any “derailing” to occur after that, and so the conditional probability of derailing should be far smaller than what they’ve given (62%).
The authors might instead say that they’re not conditioning on events 1-5 literally happening when estimating conditional probability of derailing, but rather conditioning on something more like “events 1-5 would have happened without the 5 types of disruption listed”. That way, their 10% estimate for a derailing pandemic could include a pandemic in 2025 in a world which was otherwise on track for reaching AGI. But I don’t think this is consistent, because the authors often appeal to the assumption that AGI already exists when talking about the probability of derailing (e.g. the probability of pandemics being created). So it instead seems to me like they’re explicitly treating the events as sequential in time, but implicitly treating the events as sequential in logical flow, in a way which significantly decreases the likelihood they assign to TAI by 2043.
I suspect that I have major disagreements with the way the authors frame events 1-5 as well, but don’t want to try to dig into those now.
My entry: The King and the Golem.
Indeed ,my comment was regarding the 99.999 percent of people ( including myself) who are not AI researchers. I completely agree that researchers should be working on the latest models and paying for chat GPT 4, but that wasn’t my point.
I’d extend this not just to include AI researchers, but people who are involved in AI safety more generally. But on the question of the wider population, we agree.
The environmentalists I know who don’t fly, don’t use it to virtue signal at all, they are doing it to help the world a little and show integrity with their lifestyles, which is admirable whether you agree it’s helpful or not.
“show integrity with their lifestyles” is a nicer way of saying “virtue signalling”, it just happens to be signalling a virtue that you agree with. I do think it’s an admirable display of non-selfishness (and far better than vice signalling, for example), but so too are plenty of other types of costly signalling like asceticism. A common failure mode for groups people trying to do good is “pick a virtue that’s somewhat correlated with good things and signal the hell out of it until it stops being correlated”. I’d like this not to happen in AI safety (more than it already has: I think this has already happened with pessimism-signalling, and conversely happens with optimism-signalling in accelerationist circles).
You say this as if there were ways to respond which would have prevented this. I’m not sure these exist, and in general I think “ignore it” is a really really solid heuristic in an era where conflict drives clicks.