Most of my stuff (even the stuff of interest to EAs) can be found on LessWrong: https://www.lesswrong.com/users/daniel-kokotajlo
kokotajlod
My understanding is that relatively few EAs are actual hardcore classic hedonist utilitarians. I think this is ~sufficient to explain why more haven’t become accelerationists.
Have you cornered a classic hedonist utilitarian EA and asked them? Have you cornered three? What did they say?
Thanks for discussing with me!
(I forgot to mention an important part of my argument, oops—You wouldn’t have said “at least 100 years off” you would have said “at least 5000 years off.” Because you are anchoring to recent-past rates of progress rather than looking at how rates of progress increase over time and extrapolating. (This is just an analogy / data point, not the key part of my argument, but look at GWP growth rates as a proxy for tech progress rates: According to this GWP doubling time was something like 600 years back then, whereas it’s more like 20 years now. So 1.5 OOMs faster.) Saying “at least a hundred years off” in 1600 would be like saying “at least 3 years off” today. Which I think is quite reasonable.)
I agree with the claims “this problem is extremely fucking hard” and “humans aren’t cracking this anytime soon” and I suspect Yudkowsky does too these days.
I disagree that nanotech has to predate taking over the world; that wasn’t an assumption I was making or a conclusion I was arguing for at any rate. I agree it is less likely that ASIs will make nanotech before takeover than that they will make nanotech while still on earth.
I like your suggestion to model a more earthly scenario but I lack the energy and interest to do so right now.
My closing statement is that I think your kind of reasoning would have been consistently wrong had it been used in the past—e.g. in 1600 you would have declared so many things to be impossible on the grounds that you didn’t see a way for the natural philosophers and engineers of your time to build them. Things like automobiles, flying machines, moving pictures, thinking machines, etc. It was indeed super difficult to build those things, it turns out—‘impossible’ relative to the R&D capabilities of 1600 -- but R&D capabilities improved by many OOMs, and the impossible became possible.
Cool. Seems you and I are mostly agreed on terminology then.
Yeah we definitely disagree about that crux. You’ll see. Happy to talk about it more sometime if you like.
Re: galaxy vs. earth: The difference is one of degree, not kind. In both cases we have a finite amount of resources and a finite amount of time with which to do experiments. The proper way to handle this, I think, is to smear out our uncertainty over many orders of magnitude. E.g. the first OOM gets 5% of our probability mass, the second OOM gets 5% of the remaining probability mass, and so forth. Then we look at how many OOMs of extra research and testing (compared to what humans have done) a million ASIs would be able to do in a year, and compare it to how many OOMs extra (beyond that level) a galaxy worth of ASI would be able to do in many years. And crunch the numbers.
What if he just said “Some sort of super-powerful nanofactory-like thing?”
He’s not citing some existing literature that shows how to do it, but rather citing some existing literature which should make it plausible to a reasonable judge that a million superintelligences working for a year could figure out how to do it. (If you dispute the plausibility of this, what’s your argument? We have an unfinished exchange on this point elsewhere in this comment section. Seems you agree that a galaxy full of superintelligences could do it; I feel like it’s pretty plausible that if a galaxy of superintelligences could do it, a mere million also could do it.)
I hope you are right.
I think the tech companies—and in particular the AGI companies—are already too powerful for such an informal public backlash to slow them down significantly.
I said IMO. In context it was unnecessary for me to justify the claim, because I was asking whether or not you agreed with it.
I take it that not only do you disagree, you agree it’s the crux? Or don’t you? If you agree it’s the crux (i.e. you agree that probably a million cooperating superintelligences with an obedient nation of humans would be able to make some pretty awesome self-replicating nanotech within a few years) then I can turn to the task of justifying the claim that such a scenario is plausible. If you don’t agree, and think that even such a superintelligent nation would be unable make such things (say, with >75% credence), then I want to talk about that instead.
(Re: people tipping off, etc.: I’m happy to say more on this but I’m going to hold off for now since I don’t want to lose the main thread of the conversation.)
What part of the scenario would you dispute? A million superintelligences will probably exist by 2030, IMO; the hard part is getting to superintelligence at all, not getting to a million of them (since you’ll probably have enough compute to make a million copies)
I agree that the question is about the actual scenario, not the galaxy. The galaxy is a helpful thought experiment though; it seems to have succeeded in establishing the right foundations: How many OOMs of various inputs (compute, experiments, genius insights) will be needed? Presumably a galaxy’s worth would be enough. What about a solar system? What about a planet? What about a million superintelligences and a few years? Asking these questions helps us form a credence distribution over OOMs.
And my point is that our credence distribution should be spread out over many OOMs, but since a million superintelligences would be capable of many more OOMs of nanotech research in various relevant dimensions than all humanity has been able to achieve thus far, it’s plausible that this would be enough. How plausible? Idk I’m guessing 50% or so. I just pulled that number out of my ass, but as far as I can tell you are doing the same with your numbers.
I didn’t say they’d covertly be building it. It would probably be significantly harder if covert, they wouldn’t be able to get as many OOMs. But they’d still get some probably.
I don’t think using humans would mean going at a human pace. The humans would just be used as actuators. I also think making a specialized automated lab might take less than a year, or else a couple years, not more than a few years. (For a million superintelligences with an obedient human nation of servants, that is)
I also would like to see such breakdowns, but I think you are drawing the wrong conclusions from this example.
Just because Yudkowsky’s first guess about how to make nanotech, as an amateur, didn’t pan out, doesn’t mean that nanotech is impossible for a million superintelligences working for a year. In fact it’s very little evidence. When there are a million superintelligences they will surely be able to produce many technological marvels very quickly, and for each such marvel, if you had asked Yudkowsky to speculate about how to build it, he would have failed.
(Similarly, the technological marvels produced in the 20th century would not have been correctly guessed-how-to-build by people in the 19th century, yet they still happened, and someone in the 19th century could have predicted that many of them would happen despite not being able to guess how. E.g. heavier-than-air flight.)
Thanks for this thoughtful and detailed deep dive!
I think it misses the main cruxes though. Yes, some people (Drexler and young Yudkowsky) thought that ordinary human science would get us all the way to atomically precise manufacturing in our lifetimes. For the reasons you mention, that seems probably wrong.
But the question I’m interested in is whether a million superintelligences could figure it out in a few years or less. (If it takes them, say, 10 years or longer, then probably they’ll have better ways of taking over the world) Since that’s the situation we’ll actually be facing.To answer that question, we need to ask questions like
(1) Is it even in principle possible? Is there some configuration of atoms, that would be a general-purpose nanofactory, capable of making more of itself, that uses diamandoid instead of some other material? Or is there no such configuration?
Seems like the answer is “Probably, though not necessarily; it might turn out that the obstacles discussed are truly insurmountable. Maybe 80% credence.” If we remove the diamandoid criterion and allow it to be built of any material (but still require it to be dramatically more impressive and general-purpose / programmable than ordinary life forms) then I feel like the credence shoots up to 95%, the remaining 5% being model uncertainty.
(2) Is it practical for an entire galactic empire of superintelligences to build in a million years? (Conditional on 1, I think the answer to 2 is ‘of course.’)
(3) OK, conditional on the above, the question becomes what the limiting factor is—is it genius insights about clever binding processes or mini-robo-arm-designs exploiting quantum physics to solve the stickiness problems mentioned in this post? Is it mucking around in a laboratory performing experiments to collect data to refine our simulations? Is it compute & sim-algorithms, to run the simulations and predict what designs should in theory work? Genius insights will probably be pretty cheap to come by for a million superintelligences. I’m torn about whether the main constraint will be empirical data to fit the simulations, or compute to run the simulations.
(4) What’s our credence distribution over orders of magnitude of the following inputs: Genius, experiments, and compute, in each case assuming that it’s the bottleneck? Not sure how to think about genius, but it’s OK because I don’t think it’ll be the bottleneck. Our distributions should range over many orders of magnitude, and should update on our observation so far that however many experiments and simulations humans have done didn’t seem close to being enough.
I wildly guess something like 50% that we’ll see some sort of super powerful nanofactory-like thing. I’m more like 5% that it consists of diamandoid in particular, there are so many different material designs and even if diamandoid is viable and in some sense theoretically the best, the theoretical best probably takes several OOMs more inputs to achieve than something else which is just merely good enough.
OK, so our credences aren’t actually that different after all. I’m actually at less than 65%, funnily enough! (But that’s for doom = extinction. I think human extinction is unlikely for reasons to do with acausal trade; there will be a small minority of AIs that care about humans, just not on Earth. I usually use a broader definition of “doom” as “About as bad as human extinction, or worse.”)
I am pretty confident that what happens in the next 100 years will straightforwardly translate to what happens in the long run. If humans are still well-cared-for in 2100 they probably also will be in 2100,000,000.
I agree that if some AIs care about humans, or if all AIs care a little bit about humans, the situation looks proportionately better. Unfortunately that’s not what I expect to happen by default on Earth.
In most of these stories, including in Ajeya’s story IIRC, humanity just doesn’t seem to try very hard to reduce misalignment? I don’t think that’s a very reasonable assumption. (Charitably, it could be interpreted as a warning rather than a prediction.) I think that as systems get more capable, we will see a large increase in our alignment efforts and monitoring of AI systems, even without any further intervention from longtermists.That’s not really an answer to my question—Ajeya’s argument is about how today’s alignment techniques (e.g. RLHF + monitoring) won’t work even if turbocharged with huge amounts of investment. It sounds like you are disagreeing, and saying that if we just spend lots of $$$ doing lots and lots of RLHF, it’ll work. Or when you say humanity will try harder, do you mean they’ll use some other technique than the ones Ajeya thinks won’t work? If so, which technique?
(Separately, I tend to think humanity will probably invest less in alignment than it does in her stories, but that’s not the crux between us I think.)
Those words were not yours, but you did say you agreed it was the main crux, and in context it seemed like you were agreeing that it was a crux for you too. I see now on reread that I misread you and you were instead saying it was a secondary crux. Here, let’s cut through the semantics and get quantitative:
What is your credence in doom conditional on AIs not caring for humans?
If it’s >50%, then I’m mildly surprised that you think the risk of accidentally creating a permanent pause is worse than the risks from not-pausing. I guess you did say that you think AIs will probably just be ethical if we train them hard enough to be… What is your response to the standard arguments that ‘just train them hard to be ethical’ won’t work? E.g. Ajeya Cotra’s writings on the training game.
Re: “I don’t see how the first part of that leads to the second part” Come on, of course you do, you just don’t see it NECESSARILY leading to the second part. On that I agree. Few things are certain in this world. What is your credence in doom conditional on AIs not caring for humans & there being multiple competing AIs?
IMO the “Competing factions of superintelligent AIs, none of whom care about humans, may soon arise, but even if so, humans will be fine anyway somehow” hypothesis is pretty silly and the burden of proof is on you to defend it. I could cite formal models as well as historical precedents to undermine the hypothesis, but I’m pretty sure you know about them already.The question I’m asking is: why? You have told me what you expect to happen, but I want to see an argument for why you’d expect that to happen. In the absence of some evidence-based model of the situation, I don’t think speculating about specific scenarios is a reliable guide.
Why what? I answered your original question:
Why are rogue AI motives so much more likely to lead to disaster than rogue human motives? Yes, AIs will be more powerful than humans, but there are already many people who are essentially powerless (not to mention many non-human animals) who survive despite the fact that their interests are in competition with much more powerful entities.
with:
Powerless humans survive because of a combination of (a) many powerful humans actually caring about their wellbeing and empowerment, and (b) those powerful humans who don’t care, having incentives such that it wouldn’t be worth it to try to kill the powerless humans and take their stuff. E.g. if Putin started killing homeless people in Moscow and pawning their possessions, he’d lose way more in expectation than he’d gain. Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won’t be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.
My guess is that you disagree with the “whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment...” bit.
Why? Seems pretty obvious to me, I feel like your skepticism is an isolated demand for rigor.
But I’ll go ahead and say more anyway:
Giving humans equal treatment would be worse (for the AIs, which by hypothesis don’t care about humans at all) than other salient available options to them, such as having the humans be second-class in various ways or complete pawns/tools/slaves. Eventually, when the economy is entirely robotic, keeping humans alive at all would be an unnecessary expense.
Historically, if you look at relations between humans and animals, or between colonial powers and native powers, this is the norm. Cases in which the powerless survive and thrive despite none of the powerful caring about them are the exception, and happen for reasons that probably won’t apply in the case of AI. E.g. Putin killing homeless people would be bad for his army’s morale, and that would far outweigh the benefits he’d get from it. (Arguably this is a case of some powerful people in Russia caring about the homeless, so maybe it’s not even an exception after all)
Can you say more about what model you have in mind? Do you have a model? What about a scenario, can you spin a plausible story in which all the ASIs don’t care at all about humans but humans are still fine?
Wanna meet up sometime to talk this over in person? I’ll be in Berkeley this weekend and next week!
First of all, you are goal-post-moving if you make this about “confident belief in total doom by default” instead of the original “if you really don’t think unchecked AI will kill everyone.” You need to defend the position that the probability of existential catastrophe conditional on misaligned AI is <50%.
Secondly, “AI motives will generalize extremely poorly from the training distribution” is a confused and misleading way of putting it. The problem is that it’ll generalize in a way that wasn’t the way we hoped it would generalize.
Third, to answer your questions:
1. The difference in power will be great & growing rapidly, compared to historical cases. I support implementing things like model amnesty, but I don’t expect them to work, and anyhow we are not anywhere close to having such things implemented.
2. It’ll be AI vs. AI with humanity on the sidelines, yes. Humans will be killed off, enslaved, or otherwise misused as pawns. It’ll be like colonialism all over again but on steroids. Unless takeoff is fast enough that there is only one AI faction. Doesn’t really matter, either way humans are screwed.
3. Powerless humans survive because of a combination of (a) many powerful humans actually caring about their wellbeing and empowerment, and (b) those powerful humans who don’t care, having incentives such that it wouldn’t be worth it to try to kill the powerless humans and take their stuff. E.g. if Putin started killing homeless people in Moscow and pawning their possessions, he’d lose way more in expectation than he’d gain. Neither (a) nor (b) will save us in the AI case (at least, keeping acausal trade and the like out of the picture) because until we make significant technical progress on alignment there won’t be any powerful aligned AGIs to balance against the unaligned ones, and because whatever norms and society a bunch of competing unaligned AGIs set up between themselves, it is unlikely to give humans anything close to equal treatment, and what consideration it gives to humans will erode rapidly as the power differential grows.
Thanks!
I think this is evidence for a groupthink phenomenon amongst superforecasters. Interestingly my other experiences talking with superforecasters have also made me update in this direction (they seemed much more groupthinky than I expected, as if they were deferring to each other a lot. Which, come to think of it, makes perfect sense—I imagine if I were participating in forecasting tournaments, I’d gradually learn to reflexively defer to superforecasters too, since they genuinely would be performing well.)
Ironically, one of the two predictions you quote as example of bad prediction, is in fact an example of a good prediction: “The most realistic estimate for a seed AI transcendence is 2020.”
Currently it seems that AGI/superintelligence/singularity/etc. will happen sometime in the 2020′s. Yudkowsky’s median estimate in 1999 was 2020 apparently, so he probably had something like 30% of his probability mass in the 2020s, and maybe 15% of it in the 2025-2030 period when IMO it’s most likely to happen.
Now let’s compare to what other people would have been saying at the time. They would almost all have been saying 0%, and then maybe the smarter and more rational ones would have been saying things like 1%, for the 2025-2030 period.
To put it in nonquantitative terms, almost everyone else in 1999 would have been saying “AGI? Singularity? That’s not a thing, don’t be ridiculous.” The smarter and more rational ones would have been saying “OK it might happen eventually but it’s nowhere in sight, it’s silly to start thinking about it now.” Yudkowsky said “It’s about 21 years away, give or take; we should start thinking about it now.” Now with the benefit of 24 years of hindsight, Yudkowsky was a lot closer to the truth than all those other people.
Also, you didn’t reply to my claim. Who else has been talking about AGI etc. for 20+ years and has a similarly good track record? Which of them managed to only make correct predictions when they were teenagers? Certainly not Kurzweil.
The XPT forecast about compute in 2030 still boggles my mind. I’m genuinely confused what happened there. Is anybody reading this familiar with the answer?
Fair, but still: In 2019 Microsoft invested a billion dollars in OpenAI, roughly half of which was compute: Microsoft invests billions more dollars in OpenAI, extends partnership | TechCrunch
And then GPT-3 happened, and was widely regarded to be a huge success and proof that scaling is a good idea etc.
So the amount of compute-spending that the most aggressive forecasters think could be spent on a single training run in 2032… is about 25% as much compute-spending as Microsoft gave OpenAI starting in 2019, before GPT-3 and before the scaling hypothesis. The most aggressive forecasters.
Also, if you do various searches on LW and Astral Codex Ten looking for comments I’ve made, you might see some useful ones maybe.
I agree that as time goes on states will take an increasing and eventually dominant role in AI stuff.
My position is that timelines are short enough, and takeoff is fast enough, that e.g. decisions and character traits of the CEO of an AI lab will explain more of the variance in outcomes than decisions and character traits of the US President.