How confident are you that alignment will be solved in time? What level of x-risk do you think is acceptable for the endgame to be happening in? How far do you think your contribution could go toward reducing x-risk to such an acceptable level? These are crucial considerations. I think that we are pretty close to the end game already (maybe 2 years), and that there’s very little chance for alignment to be solved / x-risk reduced to acceptable levels in time. Have you considered that the best strategy now is a global moratorium on AGI (hard as that may be)? I think having more alignment researchers openly advocating for this would be great. We need more time.
I think that we are pretty close to the end game already (maybe 2 years), and that there’s very little chance for alignment to be solved / x-risk reduced to acceptable levels in time.
I believe with 95-99.9% probability that this is purely hype, and that we will not in fact see AI that radically transforms the world or that is essentially able to do every task that automates physical infrastructure in 2 years.
Given this, I’d probably disagree with this:
Have you considered that the best strategy now is a global moratorium on AGI (hard as that may be)? I think having more alignment researchers openly advocating for this would be great. We need more time.
Or at least state it less strongly.
I see a massive claim here without much evidence shown here, and I’d like to see why you believe that we are so close to the endgame of AI.
You’re entitled to disagree with short-timelines people (and I do too) but I don’t like the use of the word “hype” here (and “purely hype” is even worse); it seems inaccurate, and kinda an accusation of bad faith. “Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias). None of those applies to Greg here, AFAICT. Instead, you can just say “he’s wrong” etc.
“Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias).
All of this seems to apply to AI-risk-worriers?
AI-risk-worriers are promoting a narrative that powerful AI will come soon
AI-risk-worriers are taken more seriously, have more job opportunities, get more status, get more of their policy proposals, etc, to the extent that this narrative is successful
My experience is that AI products are less impressive than the impression I would get from listening to AI-risk-worriers, and self-serving bias seems like an obvious explanation for this.
I generally agree that as a discourse norm you don’t want to go around accusing people of bad faith, but as a matter of truth-seeking my best guess is that a substantial fraction of short-timelines amongst AI-risk-worriers is in fact “hype”, as you’ve defined it.
Hmm. Touché. I guess another thing on my mind is the mood of the hype-conveyer. My stereotypical mental image of “hype” involves Person X being positive & excited about the product they’re hyping, whereas the imminent-doom-ers that I’ve talked to seem to have a variety of moods including distraught, pissed, etc. (Maybe some are secretly excited too? I dunno; I’m not very involved in that community.)
FWIW I am not seeking job opportunities or policy proposals that favour me financially. Rather—policy proposals that keep me, my family, and everyone else alive. My self-interest here is merely in staying alive (and wanting the rest of the planet to stay alive too). I’d rather this wasn’t an issue and just enjoy my retirement. I want to spend money on this (pay for people to work on Pause / global AGI moratorium / Shut It Down campaigns). Status is a trickier thing to untangle. I’d be lying, as a human, if I said I didn’t care about it. But I’m not exactly getting much here by being an “AI-risk-worrier”. And I could probably get more doing something else. No one is likely to thank me if a disaster doesn’t happen.
Re AI products being less impressive than the impression you get from AI-risk-worriers, what do you make of Connor Leahy’s take that LLMs are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2” type thinking, which are freely being offered by the AutoGPT crowd)?
First off, let me say that I’m not accusing you specifically of “hype”, except inasmuch as I’m saying that for any AI-risk-worrier who has ever argued for shorter timelines (a class which includes me), if you know nothing else about that person, there’s a decent chance their claims are partly “hype”. Let me also say that I don’t believe you are deliberately benefiting yourself at others’ expense.
That being said, accusations of “hype” usually mean an expectation that the claims are overstated due to bias. I don’t really see why it matters if the bias is survival motivated vs finance motivated vs status motivated. The point is that there is bias and so as an observer you should discount the claims somewhat (which is exactly how it was used in the original comment).
what do you make of Connor Leahy’s take that LLMs are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2″ type thinking, which are freely being offered by the AutoGPT crowd)?
Could happen, probably won’t, though it depends what is meant by “a generation or two”, and what is meant by “full AGI” (I’m thinking of a bar like transformative AI).
(I haven’t listened to the podcast but have thought about this idea before. I do agree it’s good to think of LLMs as general cognition engines, and that plugins / other similar approaches will be a big deal.)
I guess you’re right that “hype” here could also come from being survival motivated. But surely the easier option is to just stop worrying so much? (I mean, it’s not like stress doesn’t have health effects). Read the best counter-arguments and reduce your p(doom) accordingly. Unfortunately, I haven’t seen any convincing counterarguments. I’m with Richard Ngo here when he says:
the counterarguments are *much* worse—I’ve never seen a plausible rebuttal to the core claims. That’s terrifying.
What are the best counter-arguments you are aware of?
I’m always a bit confused by people saying they have a p(doom|TAI) of 1-10%: like what is the mechanistic reason for expecting that the default, or bulk of the probability mass, is not doom? How is the (transformative) AI spontaneously becoming aligned enough to be safe!? It often reads to me as people (who understand the arguments for x-risk) wanting to sound respectable and not alarmist, rather than actually having a good reason to not worry so much.
depends what is meant by “a generation or two”
GPT-5 or GPT-6 (1 or 2 further generations of large AI model development).
and what is meant by “full AGI” (I’m thinking of a bar like transformative AI).
Yes, TAI, or PASTA, or AI that can do everything as good as the best humans (including AI Research Engineering).
Could happen, probably won’t
Would you be willing to put this in numerical form (% chance) as a rough expectation?
Mostly though I think you aren’t going to get what you’re looking for because it’s a complicated question that doesn’t have a simple answer.
(I think this regardless of whether you frame the question as “do we die?” or “do we live?”, if you think the case for doom is straightforward I think you are mistaken. All the doom arguments I know of seem to me like they establish plausibility, not near-certainty, though I’m not going to defend that here.)
Would you be willing to put this in numerical form (% chance) as a rough expectation?
Idk, I don’t really want to make claims about GPT-5 / GPT-6, since that depends on OpenAI’s naming decisions. But I’m at < 5% (probably < 1%, but I’d want to think about it) on “the world will be transformed” (in the TAI sense) within the next 3 years.
Thanks. Regarding the conversations from 2019, I think we are in a different world now (post GPT-4 + AutoGPT/plugins). [Paul Christiano] “Perhaps there’s no problem at all”—saying this really doesn’t help! I want to know why might that be the case! “concerted effort by longtermists could reduce it”—seems less likely now given shorter timelines. “finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems”—this could be a way out, but again, little time. We need a Pause first to have time to firmly establish impossibility. However, “coordinate to not build dangerous AI systems” is not part of p(non-doom|AGI) [I’m interested in why people think there won’t be doom, given we get AGI]. So far, Paul’s section does basically nothing to update me on p(doom|AGI).
[Rohin Shah] “A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn’t scale.”—yes, this is a crux for me. How do the fixes scale, with 0 failure modes in the limit of superintelligence? You mention interpretability as a basis for scalable AI-assisted alignment above this, but progress in interpretability remains far behind the scaling of the models, so doesn’t hold much hope imo. “I’m also less worried about race dynamics increasing accident risk”; “the Nash equilibrium is for all agents to be cautious”—I think this has been blown out of the water with the rush to connect GPT-4 to the internet and spread it far and wide as quickly as possible. As I said, we’re in a different world now. “If I condition on discontinuous takeoff… I… get a lot more worried about AI risk”—this also seems cruxy (and I guess we’ve discussed a bit above). What do you think the likelihood is of model trained with 100x more compute (affordable by Microsoft or Google) being able to do AI Research Engineering as well as the median AI Research Engineer? To me it seems pretty high (given scaling so far). Imagining a million of them then working for a million years subjective time, within say, the year 2025, and a fast take-off seems pretty likely. If 100x GPT-4 compute isn’t enough, what about 1000x (affordable by a major state)? “most of my optimism comes from the more outside view type considerations: that we’ll get warning signs that the ML community won’t ignore”—well, I think we are getting warning signs now, and, whilst not ignoring them, the ML community is not taking them anywhere seriously enough! We need to Pause. Now. “and that the AI risk arguments are not watertight.”—sure, but that doesn’t mean we’re fine by default! (Imo alignment needs to be watertight to say default we’re fine.) At least in your (Rohin’s) conversation from 2019, there are cruxes. I’m coming down on the side of doom on them though in our current world of 2023.
[Robin Hanson] “The current AI boom looks similar to previous AI booms, which didn’t amount to much in the past.”—GPT-4 is good evidence against this. “intelligence is actually a bunch of not-very-general tools that together let us do many things”—multimodals models are good evidence against this. Foundation transformer models seem to be highly general. “human uniqueness...it’s our ability to process culture (communicating via language, learning from others, etc).”—again, GPT-4 can basically do this. “principal-agent problems tend to be bounded”—this seems a priori unlikely to apply with superhuman AI, and you (Rohin) yourself say you disagree with this (and people are complaining they can’t find the literature Robin claims backs this up). “Effort is much more effective and useful once the problem becomes clear, or once you are working with a concrete design; we have neither of these right now”—what about now? Maybe after the release of Google DeepMind’s next big multimodal model this will be clear. I don’t find Robin’s reasons for optimism convincing (and I’ll also note that I find his vision of the future—Age of Em—horrifying, so his default “we’ll be fine” is actually also a nightmare.) [Rohin’s opinion] “once AI capabilities on these factors [ability to process culture] reach approximately human level, we will “suddenly” start to see AIs beating humans on many tasks, resulting in a “lumpy” increase on the metric of “number of tasks on which AI is superhuman”″ - would you agree that this is happening with GPT-4?
[Adam Gleave] “as we get closer to AGI we’ll have many more powerful AI techniques that we can leverage for safety”—again this seems to suffer from the problem of grounding them in having a reliable AI in the first place (as Eliezer says “getting the AI to do your Alignment” homework” isn’t a good strategy). “expect that AI researchers will eventually solve safety problems; they don’t right now because it seems premature to work on those problems”—certainly not premature now. But are we anywhere near on track to solving them in time? “would be more worried if there were more arms race dynamics, or more empirical evidence or solid theoretical arguments in support of speculative concerns like inner optimizers.”—well, we’ve got bothnow. “10-20% likely that AGI comes only from small variations of current techniques”—seems much higher to me now with GPT-4 and multimodal models on the way. “would see this as more likely if we hit additional milestones by investing more compute and data”—well, we have. Overall Adam’s 2019 conversation has done nothing to allay my 2023 doom concerns. I’m guessing that based on what is said, Adam himself has probably updated in the direction of doom.
Reading Paul’s more detailed disagreements with Eliezer from last year doesn’t really update me on doom either, given that he agrees with more than enough of Eliezer’s lethalities (i.e. plenty enough to make the case for high p(doom|AGI)). The same applies to the Deepmind alignment team’s response.
All the doom arguments I know of seem to me like they establish plausibility, not near-certainty, though I’m not going to defend that here.
I think I can easily just reverse this (i.e. it does depend on whether you frame the question as “do we die?” or “do we live?”, and you are doing the latter here). Although to be fair, I’d use “possible”, rather than “plausible”: all the “we’ll be fine” arguments I know of seem to me like they establish possibility, not near-certainty.
Overall, none of this has helped in reducing my p(doom|AGI); it’s not even really touching the sides, so to speak. Do you (or anyone else) have anything better? Note that I have also asked this question here.
Would appreciate it if the agreement downvoters could link to what they think are the best (pref detailed) explanations for why we should expect the default of no doom, given AGI. I want to be less doomy.
But I’m at < 5% (probably < 1%, but I’d want to think about it) on “the world will be transformed” (in the TAI sense) within the next 3 years.
Pedantic, but are you using the bio anchors definition? (“software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it)”)
I thought yes, but I’m a bit unhappy about that assumption (I forgot it was there). If you go by the intended spirit of the assumption (see the footnote) I’m probably on board, but it seems ripe for misinterpretation (“well if you had just deployed GPT-5 it really could have run an automated company, even though in practice we didn’t do that because we were worried about safety and/or legal liability and/or we didn’t know how to prompt it etc”).
This. I generally also agree with your 3 observations, and the reason I was focusing on truth seeking is because my epistemic environment tends to reward worrying AI claims more than it probably should due to negativity bias, as well as looking at AI Twitter hype.
No, at the higher end of probability. Things still need to be done. It does mean we should stop freaking out every time a new AI capability is released, and importantly it means we probably don’t need to go to extreme actions, at least right away.
The “Sparks” paper; chatGPT plugins, AutoGPTs and other scaffolding to make LLM’s more agent-like. Given these, I think there’s way too much risk for comfort of GPT-5 being able to make GPT-6 (with a little human direction that would be freely given), leading to a foom. Re physical infrastructure, to see how this isn’t a barrier, consider that a superintelligence could easily manipulate humans into doing things as a first (easy) step. And such an architecture, especially given the current progress on AI Alignment, would be defaultunaligned and lethal to the planet.
The reason I’m so confident in this is that right now is because I assess a significant probability of the AI developments starting with GPT-4 to be a hype cycle, and in particular I am probably >50% confident in the idea that most of the flashiest stuff on AI will prove to be over hyped.
In particular, I am skeptical of the general hype on AI right now, and that a lot of capabilities tests essentially test it on paper tests, not real world tasks, which are much less Goodhartable than paper tests
Now I’d agree with you conditioning on the end game being 2 years or less that surprisingly extreme actions would have to be taken, but even then I assess the current techniques for alignment as quite a bit better than you imply.
I also am 40% confident in a model where the capabilities for human level AI in almost every human domain is in the 2030s.
Given this, I think you’re being overly alarmed right now, and that we probably have at least some chance of a breakthrough in AI alignment/safety comparable to what happened to the climate warming problem.
Also, the evals ARC is doing is essentially a best case scenario for the AI, as it has access to the weights, and most importantly no human resistance is modeled, like say blocking the ability of the AI to get GPUs, and it assumes that the AI can set up arbitrarily scalable ways of gaining power. We should expect the true capabilities of an AI attempting to gain control as reliably less than what ARC evals show.
What kind of a breakthrough are you envisaging? How do we get from here to 100% watertight alignment of an arbitrarily capable AGI? Climate change is very different in that the totality of all emissions reductions / clean tech development can all add up to solving the problem. AI Alignment is much more all or nothing. For the analogy to hold it would be like emissions rising on a Moore’s Law (or faster) trajectory, and the threshold for runaway climate change reducing each year (cf algorithm improvements / hardware overhang), to a point where even a single start up company’s emissions (Open AI; X.AI) could cause the end of the world.
Re ARC Evals, on the flip side, they aren’t factoring in humans doing things that make things worse - chatGPT plugins, AutoGPT, BabyAGI, ChaosGPT etc all showing that this is highly likely to happen!
We may never get a Fire Alarm of sufficient intensity to jolt everyone into high gear. But I think GPT-4 is it for me and many others. I think this is a Risk Aware Moment (Ram).
What kind of a breakthrough are you envisaging? How do we get from here to 100% watertight alignment of an arbitrarily capable AGI?
Scalable alignment is the biggest way to align a smarter intelligence.
Now, Pretraining from Human Feedback showed that at least for one of the subproblems of alignment, outer alignment, we managed to make the AI more aligned as it gets more data.
If this generalizes, it’s huge news, as it implies we can at least align an AI’s goals with human goals as we get more data. This matters because it means that scalable alignment isn’t as doomed as we thought.
Re ARC Evals, on the flip side, they aren’t factoring in humans doing things that make things worse - chatGPT plugins, AutoGPT, BabyAGI, ChaosGPT etc all showing that this is highly likely to happen!
The point is that ARC Evals will be an upper bound mostly, not a lower bound, given that it generally makes very optimistic assumptions for the AI under testing. Maybe they’re right, but the key here is that they are closer to a maximum for an AI’s capabilities than a minimum for analysis, which means that the most likely bet is on reduced impact, but there’s a possibility that the ARC Evals are close to what happens in real life.
It’s possible for an early takeoff of AI to happen in 2 years, I just don’t consider that possibility very likely right now.
Generalizing is one thing, but how can scalable alignment ever be watertight? Have you seen all the GPT-4 jailbreaks!? How can every single one be patched using this paradigm? There needs to be an ever decreasing number of possible failure modes, as power level increases, to the limit of 0 failure modes for a superintelligent AI. I don’t see how scalable alignment can possibly work that well.
How confident are you that alignment will be solved in time? What level of x-risk do you think is acceptable for the endgame to be happening in? How far do you think your contribution could go toward reducing x-risk to such an acceptable level? These are crucial considerations. I think that we are pretty close to the end game already (maybe 2 years), and that there’s very little chance for alignment to be solved / x-risk reduced to acceptable levels in time. Have you considered that the best strategy now is a global moratorium on AGI (hard as that may be)? I think having more alignment researchers openly advocating for this would be great. We need more time.
I believe with 95-99.9% probability that this is purely hype, and that we will not in fact see AI that radically transforms the world or that is essentially able to do every task that automates physical infrastructure in 2 years.
Given this, I’d probably disagree with this:
Or at least state it less strongly.
I see a massive claim here without much evidence shown here, and I’d like to see why you believe that we are so close to the endgame of AI.
You’re entitled to disagree with short-timelines people (and I do too) but I don’t like the use of the word “hype” here (and “purely hype” is even worse); it seems inaccurate, and kinda an accusation of bad faith. “Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias). None of those applies to Greg here, AFAICT. Instead, you can just say “he’s wrong” etc.
All of this seems to apply to AI-risk-worriers?
AI-risk-worriers are promoting a narrative that powerful AI will come soon
AI-risk-worriers are taken more seriously, have more job opportunities, get more status, get more of their policy proposals, etc, to the extent that this narrative is successful
My experience is that AI products are less impressive than the impression I would get from listening to AI-risk-worriers, and self-serving bias seems like an obvious explanation for this.
I generally agree that as a discourse norm you don’t want to go around accusing people of bad faith, but as a matter of truth-seeking my best guess is that a substantial fraction of short-timelines amongst AI-risk-worriers is in fact “hype”, as you’ve defined it.
Hmm. Touché. I guess another thing on my mind is the mood of the hype-conveyer. My stereotypical mental image of “hype” involves Person X being positive & excited about the product they’re hyping, whereas the imminent-doom-ers that I’ve talked to seem to have a variety of moods including distraught, pissed, etc. (Maybe some are secretly excited too? I dunno; I’m not very involved in that community.)
FWIW I am not seeking job opportunities or policy proposals that favour me financially. Rather—policy proposals that keep me, my family, and everyone else alive. My self-interest here is merely in staying alive (and wanting the rest of the planet to stay alive too). I’d rather this wasn’t an issue and just enjoy my retirement. I want to spend money on this (pay for people to work on Pause / global AGI moratorium / Shut It Down campaigns). Status is a trickier thing to untangle. I’d be lying, as a human, if I said I didn’t care about it. But I’m not exactly getting much here by being an “AI-risk-worrier”. And I could probably get more doing something else. No one is likely to thank me if a disaster doesn’t happen.
Re AI products being less impressive than the impression you get from AI-risk-worriers, what do you make of Connor Leahy’s take that LLMs are basically “general cognition engines” and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid “System 2” type thinking, which are freely being offered by the AutoGPT crowd)?
First off, let me say that I’m not accusing you specifically of “hype”, except inasmuch as I’m saying that for any AI-risk-worrier who has ever argued for shorter timelines (a class which includes me), if you know nothing else about that person, there’s a decent chance their claims are partly “hype”. Let me also say that I don’t believe you are deliberately benefiting yourself at others’ expense.
That being said, accusations of “hype” usually mean an expectation that the claims are overstated due to bias. I don’t really see why it matters if the bias is survival motivated vs finance motivated vs status motivated. The point is that there is bias and so as an observer you should discount the claims somewhat (which is exactly how it was used in the original comment).
Could happen, probably won’t, though it depends what is meant by “a generation or two”, and what is meant by “full AGI” (I’m thinking of a bar like transformative AI).
(I haven’t listened to the podcast but have thought about this idea before. I do agree it’s good to think of LLMs as general cognition engines, and that plugins / other similar approaches will be a big deal.)
I guess you’re right that “hype” here could also come from being survival motivated. But surely the easier option is to just stop worrying so much? (I mean, it’s not like stress doesn’t have health effects). Read the best counter-arguments and reduce your p(doom) accordingly. Unfortunately, I haven’t seen any convincing counterarguments. I’m with Richard Ngo here when he says:
What are the best counter-arguments you are aware of?
I’m always a bit confused by people saying they have a p(doom|TAI) of 1-10%: like what is the mechanistic reason for expecting that the default, or bulk of the probability mass, is not doom? How is the (transformative) AI spontaneously becoming aligned enough to be safe!? It often reads to me as people (who understand the arguments for x-risk) wanting to sound respectable and not alarmist, rather than actually having a good reason to not worry so much.
GPT-5 or GPT-6 (1 or 2 further generations of large AI model development).
Yes, TAI, or PASTA, or AI that can do everything as good as the best humans (including AI Research Engineering).
Would you be willing to put this in numerical form (% chance) as a rough expectation?
You could look at these older conversations. There’s also Where I agree and disagree with Eliezer (see also my comment) though I suspect that won’t be what you’re looking for.
Mostly though I think you aren’t going to get what you’re looking for because it’s a complicated question that doesn’t have a simple answer.
(I think this regardless of whether you frame the question as “do we die?” or “do we live?”, if you think the case for doom is straightforward I think you are mistaken. All the doom arguments I know of seem to me like they establish plausibility, not near-certainty, though I’m not going to defend that here.)
Idk, I don’t really want to make claims about GPT-5 / GPT-6, since that depends on OpenAI’s naming decisions. But I’m at < 5% (probably < 1%, but I’d want to think about it) on “the world will be transformed” (in the TAI sense) within the next 3 years.
Thanks. Regarding the conversations from 2019, I think we are in a different world now (post GPT-4 + AutoGPT/plugins). [Paul Christiano] “Perhaps there’s no problem at all”—saying this really doesn’t help! I want to know why might that be the case! “concerted effort by longtermists could reduce it”—seems less likely now given shorter timelines. “finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems”—this could be a way out, but again, little time. We need a Pause first to have time to firmly establish impossibility. However, “coordinate to not build dangerous AI systems” is not part of p(non-doom|AGI) [I’m interested in why people think there won’t be doom, given we get AGI]. So far, Paul’s section does basically nothing to update me on p(doom|AGI).
[Rohin Shah] “A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn’t scale.”—yes, this is a crux for me. How do the fixes scale, with 0 failure modes in the limit of superintelligence? You mention interpretability as a basis for scalable AI-assisted alignment above this, but progress in interpretability remains far behind the scaling of the models, so doesn’t hold much hope imo. “I’m also less worried about race dynamics increasing accident risk”; “the Nash equilibrium is for all agents to be cautious”—I think this has been blown out of the water with the rush to connect GPT-4 to the internet and spread it far and wide as quickly as possible. As I said, we’re in a different world now. “If I condition on discontinuous takeoff… I… get a lot more worried about AI risk”—this also seems cruxy (and I guess we’ve discussed a bit above). What do you think the likelihood is of model trained with 100x more compute (affordable by Microsoft or Google) being able to do AI Research Engineering as well as the median AI Research Engineer? To me it seems pretty high (given scaling so far). Imagining a million of them then working for a million years subjective time, within say, the year 2025, and a fast take-off seems pretty likely. If 100x GPT-4 compute isn’t enough, what about 1000x (affordable by a major state)? “most of my optimism comes from the more outside view type considerations: that we’ll get warning signs that the ML community won’t ignore”—well, I think we are getting warning signs now, and, whilst not ignoring them, the ML community is not taking them anywhere seriously enough! We need to Pause. Now. “and that the AI risk arguments are not watertight.”—sure, but that doesn’t mean we’re fine by default! (Imo alignment needs to be watertight to say default we’re fine.) At least in your (Rohin’s) conversation from 2019, there are cruxes. I’m coming down on the side of doom on them though in our current world of 2023.
[Robin Hanson] “The current AI boom looks similar to previous AI booms, which didn’t amount to much in the past.”—GPT-4 is good evidence against this. “intelligence is actually a bunch of not-very-general tools that together let us do many things”—multimodals models are good evidence against this. Foundation transformer models seem to be highly general. “human uniqueness...it’s our ability to process culture (communicating via language, learning from others, etc).”—again, GPT-4 can basically do this. “principal-agent problems tend to be bounded”—this seems a priori unlikely to apply with superhuman AI, and you (Rohin) yourself say you disagree with this (and people are complaining they can’t find the literature Robin claims backs this up). “Effort is much more effective and useful once the problem becomes clear, or once you are working with a concrete design; we have neither of these right now”—what about now? Maybe after the release of Google DeepMind’s next big multimodal model this will be clear. I don’t find Robin’s reasons for optimism convincing (and I’ll also note that I find his vision of the future—Age of Em—horrifying, so his default “we’ll be fine” is actually also a nightmare.) [Rohin’s opinion] “once AI capabilities on these factors [ability to process culture] reach approximately human level, we will “suddenly” start to see AIs beating humans on many tasks, resulting in a “lumpy” increase on the metric of “number of tasks on which AI is superhuman”″ - would you agree that this is happening with GPT-4?
[Adam Gleave] “as we get closer to AGI we’ll have many more powerful AI techniques that we can leverage for safety”—again this seems to suffer from the problem of grounding them in having a reliable AI in the first place (as Eliezer says “getting the AI to do your Alignment” homework” isn’t a good strategy). “expect that AI researchers will eventually solve safety problems; they don’t right now because it seems premature to work on those problems”—certainly not premature now. But are we anywhere near on track to solving them in time? “would be more worried if there were more arms race dynamics, or more empirical evidence or solid theoretical arguments in support of speculative concerns like inner optimizers.”—well, we’ve got both now. “10-20% likely that AGI comes only from small variations of current techniques”—seems much higher to me now with GPT-4 and multimodal models on the way. “would see this as more likely if we hit additional milestones by investing more compute and data”—well, we have. Overall Adam’s 2019 conversation has done nothing to allay my 2023 doom concerns. I’m guessing that based on what is said, Adam himself has probably updated in the direction of doom.
Reading Paul’s more detailed disagreements with Eliezer from last year doesn’t really update me on doom either, given that he agrees with more than enough of Eliezer’s lethalities (i.e. plenty enough to make the case for high p(doom|AGI)). The same applies to the Deepmind alignment team’s response.
I think I can easily just reverse this (i.e. it does depend on whether you frame the question as “do we die?” or “do we live?”, and you are doing the latter here). Although to be fair, I’d use “possible”, rather than “plausible”: all the “we’ll be fine” arguments I know of seem to me like they establish possibility, not near-certainty.
Overall, none of this has helped in reducing my p(doom|AGI); it’s not even really touching the sides, so to speak. Do you (or anyone else) have anything better? Note that I have also asked this question here.
Would appreciate it if the agreement downvoters could link to what they think are the best (pref detailed) explanations for why we should expect the default of no doom, given AGI. I want to be less doomy.
Pedantic, but are you using the bio anchors definition? (“software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it)”)
I thought yes, but I’m a bit unhappy about that assumption (I forgot it was there). If you go by the intended spirit of the assumption (see the footnote) I’m probably on board, but it seems ripe for misinterpretation (“well if you had just deployed GPT-5 it really could have run an automated company, even though in practice we didn’t do that because we were worried about safety and/or legal liability and/or we didn’t know how to prompt it etc”).
This. I generally also agree with your 3 observations, and the reason I was focusing on truth seeking is because my epistemic environment tends to reward worrying AI claims more than it probably should due to negativity bias, as well as looking at AI Twitter hype.
Also, reversing 95-99.9%, are you ok with a 0.1-5% x-risk?
No, at the higher end of probability. Things still need to be done. It does mean we should stop freaking out every time a new AI capability is released, and importantly it means we probably don’t need to go to extreme actions, at least right away.
The “Sparks” paper; chatGPT plugins, AutoGPTs and other scaffolding to make LLM’s more agent-like. Given these, I think there’s way too much risk for comfort of GPT-5 being able to make GPT-6 (with a little human direction that would be freely given), leading to a foom. Re physical infrastructure, to see how this isn’t a barrier, consider that a superintelligence could easily manipulate humans into doing things as a first (easy) step. And such an architecture, especially given the current progress on AI Alignment, would be default unaligned and lethal to the planet.
The reason I’m so confident in this is that right now is because I assess a significant probability of the AI developments starting with GPT-4 to be a hype cycle, and in particular I am probably >50% confident in the idea that most of the flashiest stuff on AI will prove to be over hyped.
In particular, I am skeptical of the general hype on AI right now, and that a lot of capabilities tests essentially test it on paper tests, not real world tasks, which are much less Goodhartable than paper tests
Now I’d agree with you conditioning on the end game being 2 years or less that surprisingly extreme actions would have to be taken, but even then I assess the current techniques for alignment as quite a bit better than you imply.
I also am 40% confident in a model where the capabilities for human level AI in almost every human domain is in the 2030s.
Given this, I think you’re being overly alarmed right now, and that we probably have at least some chance of a breakthrough in AI alignment/safety comparable to what happened to the climate warming problem.
Also, the evals ARC is doing is essentially a best case scenario for the AI, as it has access to the weights, and most importantly no human resistance is modeled, like say blocking the ability of the AI to get GPUs, and it assumes that the AI can set up arbitrarily scalable ways of gaining power. We should expect the true capabilities of an AI attempting to gain control as reliably less than what ARC evals show.
What kind of a breakthrough are you envisaging? How do we get from here to 100% watertight alignment of an arbitrarily capable AGI? Climate change is very different in that the totality of all emissions reductions / clean tech development can all add up to solving the problem. AI Alignment is much more all or nothing. For the analogy to hold it would be like emissions rising on a Moore’s Law (or faster) trajectory, and the threshold for runaway climate change reducing each year (cf algorithm improvements / hardware overhang), to a point where even a single start up company’s emissions (Open AI; X.AI) could cause the end of the world.
Re ARC Evals, on the flip side, they aren’t factoring in humans doing things that make things worse - chatGPT plugins, AutoGPT, BabyAGI, ChaosGPT etc all showing that this is highly likely to happen!
We may never get a Fire Alarm of sufficient intensity to jolt everyone into high gear. But I think GPT-4 is it for me and many others. I think this is a Risk Aware Moment (Ram).
Scalable alignment is the biggest way to align a smarter intelligence.
Now, Pretraining from Human Feedback showed that at least for one of the subproblems of alignment, outer alignment, we managed to make the AI more aligned as it gets more data.
If this generalizes, it’s huge news, as it implies we can at least align an AI’s goals with human goals as we get more data. This matters because it means that scalable alignment isn’t as doomed as we thought.
The point is that ARC Evals will be an upper bound mostly, not a lower bound, given that it generally makes very optimistic assumptions for the AI under testing. Maybe they’re right, but the key here is that they are closer to a maximum for an AI’s capabilities than a minimum for analysis, which means that the most likely bet is on reduced impact, but there’s a possibility that the ARC Evals are close to what happens in real life.
It’s possible for an early takeoff of AI to happen in 2 years, I just don’t consider that possibility very likely right now.
Generalizing is one thing, but how can scalable alignment ever be watertight? Have you seen all the GPT-4 jailbreaks!? How can every single one be patched using this paradigm? There needs to be an ever decreasing number of possible failure modes, as power level increases, to the limit of 0 failure modes for a superintelligent AI. I don’t see how scalable alignment can possibly work that well.
Open AI says in their GPT-4 release announcement that “GPT-4 responds to sensitive requests (e.g., medical advice and self-harm) in accordance with our policies 29% more often.” A 29% reduction of harm. This is the opposite of reassuring when thinking about x-risk.
(And all this is not even addressing inner alignment!)