Most of my stuff (even the stuff of interest to EAs) can be found on LessWrong: https://www.lesswrong.com/users/daniel-kokotajlo
kokotajlod
My version of Simulacra Levels
Nice. Well, I guess we just have different intuitions then—for me, the chance of extinction or worse in the Octopcracy case seems a lot bigger than “small but non-negligible” (though I also wouldn’t put it as high as 99%).
Human groups struggle against each other for influence/power/control constantly; why wouldn’t these octopi (or AIs) also seek influence? You don’t need to be an expected utility maximizer to instrumentally converge; humans instrumentally converge all the time.
Oh also you might be interested in Joe Carlsmith’s report on power-seeking AI, it has a relatively thorough discussion of the overall argument for risk.
Nice analysis!
I think a main point of disagreement is that I don’t think systems need to be “dangerous maximizers” in the sense you described in order to predictably disempower humanity and then kill everyone. Humans aren’t dangerous maximizers yet we’ve killed many species of animals, neanderthals, and various other human groups (genocide, wars, oppression of populations by governments, etc.) Katja’s scenario sounds plausible for me except for the part where somehow it all turns out OK in the end for humans. :)
Another, related point of disagreement:
“look, LLMs distill human cognition, much of this cognition implicitly contains plans, human-like value judgements, etc.” I start from a place where I currently believe “future systems have human-like inductive biases” will be a better predictive abstraction than “randomly sample from the space of simplicity-weighted plans”. And … I just don’t currently see the argument for rejecting my current view?
I actually agree that current and future systems will have human-like concepts, human-like inductive biases, etc. -- relative to the space of all possible minds at least. But their values will be sufficiently alien that humanity will be in deep trouble. (Analogy: Suppose we bred some octopi to be smarter and smarter, in an environment where they were e.g. trained with pavlovian conditioning + artificial selection to be really good at reading internext text and predicting it, and then eventually writing it also.… They would indeed end up a lot more human-like than regular wild octopi. But boy would it be scary if they started getting generally smarter than humans and being integrated deeply into lots of important systems and humans started trusting them a lot etc.)
Alright, let’s make it happen! I’ll DM you + Timothy + anyone else who replies to this comment in the next few days, and we can arrange something.
Great list, thanks!
My current tentative expectation is that we’ll see a couple things in 1, but nothing in 2+, until it’s already too late (i.e. until humanity is already basically in a game of chess with a superior opponent, i.e. until there’s no longer a realistic hope of humanity coordinating to stop the slide into oblivion, by contrast with today where we are on a path to oblivion but there’s a realistic possibility of changing course.)
I definitely agree that near term, non-agentic AI will cause a lot of chaos. I just don’t expect it to be so much chaos that the world as a whole feels significantly more chaotic than usual. But I also agree that might happen too.
I also agree that this sort of thing will have a warning-shot effect that makes a Covid-in-feb-2020-type response plausible.
It seems we maybe don’t actually disagree that much?
Re: uncharitability: I think I was about as uncharitable as you were. That said, I do apologize—I should hold myself to a higher standard.
I agree they might be impossible. (If it only finds some niche application in medicine, that means it’s impossible, btw. Anything remotely similar to what Drexler described would be much more revolutionary than that.)
If they are possible though, and it takes (say) 50 years for ordinary human scientists to figure it out starting now… then it’s quite plausible to me that it could take 2 OOMs less time than that, or possibly even 4 OOMs, for superintelligent AI scientists to figure it out starting whenever superintelligent AI scientists appear (assuming they have access to proper experimental facilities. I am very uncertain about how large such facilities would need to be.) 2 OOMs less time would be 6 months; 4 OOMs would be Yudkowsky’s bathtub nanotech scenario (except not necessarily in a single bathtub, presumably it’s much more likely to be feasible if they have access to lots of laboratories). I also think it’s plausible that even for a superintelligence it would take at least 5 years (only 1 OOM speedup over humans). (again, conditional on it being possible at all + taking about 50 years for ordinary human scientists) A crux for me here would be if you could show that deciding what experiments to run and interpreting the results are both pretty easy for ordinary human scientists, and that the bottleneck is basically just getting the funding and time to run all the experiments.
To be clear I’m pretty uncertain about all this. I’m prompting you with stuff like this to try to elicit your expertise, and get you to give arguments or intuition pumps that might address my cruxes.
Oh, I thought you had much more intense things in mind than that. Malicious actor using LLMs in some hacking scheme to get security breaches seems probable to me.
But that wouldn’t cause instability to go above baseline. Things like this happen every year. Russia invaded Ukraine last year, for example—for the world to generally become less stable there needs to be either events that are a much bigger deal than that invasion, or events like that invasion happening every few months.
If you can convince me of the “many many years” claim, that would be an update. Other than that you are just saying things I already know and believe.
I never claimed that nanotech would be the best plan, nor that it would be Yudkowky’s bathtub-nanotech scenario instead of a scenario involving huge amounts of experimentation. I was just reacting to your terrible leaps of logic, e.g. “nanobots are a terrible method for world destruction given that they have not been invented yet” and “making nanobots requires experimentation and resources therefore AIs won’t do it.” (I agree that if it takes many many years, there will surely be a faster method than nanobots, but you haven’t really argued for that.)
I’d love to see some sort of quantitative estimate from you of how long it would take modern civilization to build nanotech if it really tried. Like, suppose nanotech became the new Hot Thing starting now and all the genius engineers currently at SpaceX and various other places united to make nanotech startups, funded by huge amounts of government funding and VC investment, etc. And suppose the world otherwise remains fairly static, so e.g. climate change doesn’t kill us, AGI doesn’t happen, etc. How many years until we have the sorts of things Drexler described? (Assume that they are possible)
It seems a lot of people are interested in this one! For my part, the answer is “Infohazards kinda, but mostly it’s just that I haven’t gotten around to it yet.” I was going to do it two years ago but never finished the story.
If there’s enough interest, perhaps we should just have a group video call sometime and talk it over? That would be easier for me than writing up a post, and plus, I have no idea what kinds of things you find plausible and implausible, so it’ll be valuable data for me to hear these things from you.
A superintelligent AI will be able to do significant amounts of experimentation and acquire significant amounts of resources.
Yeah I should have taken more care to explain myself: I do think the sorts of large-but-not-catastrophic harms you are talking about might happen, I just think that more likely than not, they won’t happen, because timelines are short. (My 50% mark for AGI, or if you want to be more precise, AI capable of disempowering humanity, is 2027)
So, my answers to your questions would be:
1. It seems we are on the cusp of agentic AGI right now in 2023, and that godlike AGI will come around 2027 or so.
2. Unclear. Could be quite chaotic & dangerous, but I’m thinking it probably won’t be. Human governments and AI companies have a decent amount of control, at least up until about a year before godlike AGI, and they’ll probably use that control to maintain stability and peace rather than fight each other or sow chaos. I’m not particularly confident though.
3. I think it depends on the details of the bad thing that happened. I’d be interested to hear what sort of bad things you have in mind.
My guess is that the first number is too small, such that there’s always going to be someone willing to trade. However, I’m not confident in this stuff yet.
I agree that not all of the civilizations that care about what happens to us, care in the ways we want them to care. For example as you say maybe there are some that want things to stay the same. I don’t have a good sense of the relative ratios / prevalence of different types of civilizations, though we can make some guesses e.g. it’s probable that much more civilizations want us not to suffer than want us to suffer.
Yeah, it’s loosely analogous to how various bits of jungle are preserved because faraway westerners care about preserving it and intercede on its behalf. If somewhere far away there are powerful AGIs that care about humanity and do ECL, (which is plausible since the universe is very big) and the unaligned AI we build does ECL such that it cooperates with faraway AGIs also doing ECL, then hopefully (and probably, IMO) the result of this cooperation will be some sort of protection and care for humans.
My own response is that AIs which can cause very bad things (but not human disempowerment) will indeed come before AIs which can cause human disempowerment, and if we had an indefinitely long period where such AIs were widely deployed and tinkered with by many groups of humans, such very bad things would come to pass. However, instead the period will be short, since the more powerful and more dangerous kind of AI will arrive soon.
(Analogy: “Surely before an intelligent species figures out how to make AGI, it’ll figure out how to make nukes and bioweapons. Therefore whenever AGI appears in the universe, it must be in the post-apocalyptic remnants of a civilization already wracked by nuclear and biological warfare.” Wrong! These things can happen, and maybe in the limit of infinite time they have to happen, but they don’t have to happen in any given relatively short time period; our civilization is a case in point.)
I think it’s pretty clear now that the default trajectory of AI development is taking us towards pretty much exactly the sorts of agentic AGI that MIRI et al were worried about 11 years ago. We are not heading towards a world of AI tools by default; coordination is needed to not build agents.
If in 5 more years the state of the art, most-AGI-ish systems are still basically autocomplete, not capable of taking long series of action-input-action-input-etc. with humans out of the loop, not capable of online learning, and this had nothing to do with humans coordinating to slow down progress towards agentic AGI, I’ll count myself as having been very wrong and very surprised.
I feel like it was only a year or so ago that the standard critique of the AI safety community was that they were too abstract, too theoretical, that they lacked hands-on experience, lacked contact with empirical reality, etc...
I think different humans would choose differently. According to various people in this comment section and elsewhere, childbirth is extremely painful and lasts on the order of an hour. Yet people still choose to have children, even though some of those children will grow up to experience childbirth. My own tentative answer is that I’d ask to experience the pain myself a bit first, and also want to get a clearer sense of what life would be like afterwards—if it’s a normal healthy late-20th-century middle class American life, I could see myself choosing 1, pending results from experiencing it myself for a bit.
Maybe I should follow in Ren’s footsteps and get a tattoo.
NIce post! My current guess is that the inter-civ selection effect is extremely weak and that the intra-civ selection effect is fairly weak. N=1, but in our civilization the people gunning for control of AGI seem more grabby than average but not drastically so, and it seems possible for this trend to reverse e.g. if the US government nationalizes all the AGI projects.