Thanks. Regarding the conversations from 2019, I think we are in a different world now (post GPT-4 + AutoGPT/plugins). [Paul Christiano] “Perhaps there’s no problem at all”—saying this really doesn’t help! I want to know why might that be the case! “concerted effort by longtermists could reduce it”—seems less likely now given shorter timelines. “finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems”—this could be a way out, but again, little time. We need a Pause first to have time to firmly establish impossibility. However, “coordinate to not build dangerous AI systems” is not part of p(non-doom|AGI) [I’m interested in why people think there won’t be doom, given we get AGI]. So far, Paul’s section does basically nothing to update me on p(doom|AGI).
[Rohin Shah] “A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn’t scale.”—yes, this is a crux for me. How do the fixes scale, with 0 failure modes in the limit of superintelligence? You mention interpretability as a basis for scalable AI-assisted alignment above this, but progress in interpretability remains far behind the scaling of the models, so doesn’t hold much hope imo. “I’m also less worried about race dynamics increasing accident risk”; “the Nash equilibrium is for all agents to be cautious”—I think this has been blown out of the water with the rush to connect GPT-4 to the internet and spread it far and wide as quickly as possible. As I said, we’re in a different world now. “If I condition on discontinuous takeoff… I… get a lot more worried about AI risk”—this also seems cruxy (and I guess we’ve discussed a bit above). What do you think the likelihood is of model trained with 100x more compute (affordable by Microsoft or Google) being able to do AI Research Engineering as well as the median AI Research Engineer? To me it seems pretty high (given scaling so far). Imagining a million of them then working for a million years subjective time, within say, the year 2025, and a fast take-off seems pretty likely. If 100x GPT-4 compute isn’t enough, what about 1000x (affordable by a major state)? “most of my optimism comes from the more outside view type considerations: that we’ll get warning signs that the ML community won’t ignore”—well, I think we are getting warning signs now, and, whilst not ignoring them, the ML community is not taking them anywhere seriously enough! We need to Pause. Now. “and that the AI risk arguments are not watertight.”—sure, but that doesn’t mean we’re fine by default! (Imo alignment needs to be watertight to say default we’re fine.) At least in your (Rohin’s) conversation from 2019, there are cruxes. I’m coming down on the side of doom on them though in our current world of 2023.
[Robin Hanson] “The current AI boom looks similar to previous AI booms, which didn’t amount to much in the past.”—GPT-4 is good evidence against this. “intelligence is actually a bunch of not-very-general tools that together let us do many things”—multimodals models are good evidence against this. Foundation transformer models seem to be highly general. “human uniqueness...it’s our ability to process culture (communicating via language, learning from others, etc).”—again, GPT-4 can basically do this. “principal-agent problems tend to be bounded”—this seems a priori unlikely to apply with superhuman AI, and you (Rohin) yourself say you disagree with this (and people are complaining they can’t find the literature Robin claims backs this up). “Effort is much more effective and useful once the problem becomes clear, or once you are working with a concrete design; we have neither of these right now”—what about now? Maybe after the release of Google DeepMind’s next big multimodal model this will be clear. I don’t find Robin’s reasons for optimism convincing (and I’ll also note that I find his vision of the future—Age of Em—horrifying, so his default “we’ll be fine” is actually also a nightmare.) [Rohin’s opinion] “once AI capabilities on these factors [ability to process culture] reach approximately human level, we will “suddenly” start to see AIs beating humans on many tasks, resulting in a “lumpy” increase on the metric of “number of tasks on which AI is superhuman”″ - would you agree that this is happening with GPT-4?
[Adam Gleave] “as we get closer to AGI we’ll have many more powerful AI techniques that we can leverage for safety”—again this seems to suffer from the problem of grounding them in having a reliable AI in the first place (as Eliezer says “getting the AI to do your Alignment” homework” isn’t a good strategy). “expect that AI researchers will eventually solve safety problems; they don’t right now because it seems premature to work on those problems”—certainly not premature now. But are we anywhere near on track to solving them in time? “would be more worried if there were more arms race dynamics, or more empirical evidence or solid theoretical arguments in support of speculative concerns like inner optimizers.”—well, we’ve got bothnow. “10-20% likely that AGI comes only from small variations of current techniques”—seems much higher to me now with GPT-4 and multimodal models on the way. “would see this as more likely if we hit additional milestones by investing more compute and data”—well, we have. Overall Adam’s 2019 conversation has done nothing to allay my 2023 doom concerns. I’m guessing that based on what is said, Adam himself has probably updated in the direction of doom.
Reading Paul’s more detailed disagreements with Eliezer from last year doesn’t really update me on doom either, given that he agrees with more than enough of Eliezer’s lethalities (i.e. plenty enough to make the case for high p(doom|AGI)). The same applies to the Deepmind alignment team’s response.
All the doom arguments I know of seem to me like they establish plausibility, not near-certainty, though I’m not going to defend that here.
I think I can easily just reverse this (i.e. it does depend on whether you frame the question as “do we die?” or “do we live?”, and you are doing the latter here). Although to be fair, I’d use “possible”, rather than “plausible”: all the “we’ll be fine” arguments I know of seem to me like they establish possibility, not near-certainty.
Overall, none of this has helped in reducing my p(doom|AGI); it’s not even really touching the sides, so to speak. Do you (or anyone else) have anything better? Note that I have also asked this question here.
Would appreciate it if the agreement downvoters could link to what they think are the best (pref detailed) explanations for why we should expect the default of no doom, given AGI. I want to be less doomy.
Thanks. Regarding the conversations from 2019, I think we are in a different world now (post GPT-4 + AutoGPT/plugins). [Paul Christiano] “Perhaps there’s no problem at all”—saying this really doesn’t help! I want to know why might that be the case! “concerted effort by longtermists could reduce it”—seems less likely now given shorter timelines. “finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems”—this could be a way out, but again, little time. We need a Pause first to have time to firmly establish impossibility. However, “coordinate to not build dangerous AI systems” is not part of p(non-doom|AGI) [I’m interested in why people think there won’t be doom, given we get AGI]. So far, Paul’s section does basically nothing to update me on p(doom|AGI).
[Rohin Shah] “A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn’t scale.”—yes, this is a crux for me. How do the fixes scale, with 0 failure modes in the limit of superintelligence? You mention interpretability as a basis for scalable AI-assisted alignment above this, but progress in interpretability remains far behind the scaling of the models, so doesn’t hold much hope imo. “I’m also less worried about race dynamics increasing accident risk”; “the Nash equilibrium is for all agents to be cautious”—I think this has been blown out of the water with the rush to connect GPT-4 to the internet and spread it far and wide as quickly as possible. As I said, we’re in a different world now. “If I condition on discontinuous takeoff… I… get a lot more worried about AI risk”—this also seems cruxy (and I guess we’ve discussed a bit above). What do you think the likelihood is of model trained with 100x more compute (affordable by Microsoft or Google) being able to do AI Research Engineering as well as the median AI Research Engineer? To me it seems pretty high (given scaling so far). Imagining a million of them then working for a million years subjective time, within say, the year 2025, and a fast take-off seems pretty likely. If 100x GPT-4 compute isn’t enough, what about 1000x (affordable by a major state)? “most of my optimism comes from the more outside view type considerations: that we’ll get warning signs that the ML community won’t ignore”—well, I think we are getting warning signs now, and, whilst not ignoring them, the ML community is not taking them anywhere seriously enough! We need to Pause. Now. “and that the AI risk arguments are not watertight.”—sure, but that doesn’t mean we’re fine by default! (Imo alignment needs to be watertight to say default we’re fine.) At least in your (Rohin’s) conversation from 2019, there are cruxes. I’m coming down on the side of doom on them though in our current world of 2023.
[Robin Hanson] “The current AI boom looks similar to previous AI booms, which didn’t amount to much in the past.”—GPT-4 is good evidence against this. “intelligence is actually a bunch of not-very-general tools that together let us do many things”—multimodals models are good evidence against this. Foundation transformer models seem to be highly general. “human uniqueness...it’s our ability to process culture (communicating via language, learning from others, etc).”—again, GPT-4 can basically do this. “principal-agent problems tend to be bounded”—this seems a priori unlikely to apply with superhuman AI, and you (Rohin) yourself say you disagree with this (and people are complaining they can’t find the literature Robin claims backs this up). “Effort is much more effective and useful once the problem becomes clear, or once you are working with a concrete design; we have neither of these right now”—what about now? Maybe after the release of Google DeepMind’s next big multimodal model this will be clear. I don’t find Robin’s reasons for optimism convincing (and I’ll also note that I find his vision of the future—Age of Em—horrifying, so his default “we’ll be fine” is actually also a nightmare.) [Rohin’s opinion] “once AI capabilities on these factors [ability to process culture] reach approximately human level, we will “suddenly” start to see AIs beating humans on many tasks, resulting in a “lumpy” increase on the metric of “number of tasks on which AI is superhuman”″ - would you agree that this is happening with GPT-4?
[Adam Gleave] “as we get closer to AGI we’ll have many more powerful AI techniques that we can leverage for safety”—again this seems to suffer from the problem of grounding them in having a reliable AI in the first place (as Eliezer says “getting the AI to do your Alignment” homework” isn’t a good strategy). “expect that AI researchers will eventually solve safety problems; they don’t right now because it seems premature to work on those problems”—certainly not premature now. But are we anywhere near on track to solving them in time? “would be more worried if there were more arms race dynamics, or more empirical evidence or solid theoretical arguments in support of speculative concerns like inner optimizers.”—well, we’ve got both now. “10-20% likely that AGI comes only from small variations of current techniques”—seems much higher to me now with GPT-4 and multimodal models on the way. “would see this as more likely if we hit additional milestones by investing more compute and data”—well, we have. Overall Adam’s 2019 conversation has done nothing to allay my 2023 doom concerns. I’m guessing that based on what is said, Adam himself has probably updated in the direction of doom.
Reading Paul’s more detailed disagreements with Eliezer from last year doesn’t really update me on doom either, given that he agrees with more than enough of Eliezer’s lethalities (i.e. plenty enough to make the case for high p(doom|AGI)). The same applies to the Deepmind alignment team’s response.
I think I can easily just reverse this (i.e. it does depend on whether you frame the question as “do we die?” or “do we live?”, and you are doing the latter here). Although to be fair, I’d use “possible”, rather than “plausible”: all the “we’ll be fine” arguments I know of seem to me like they establish possibility, not near-certainty.
Overall, none of this has helped in reducing my p(doom|AGI); it’s not even really touching the sides, so to speak. Do you (or anyone else) have anything better? Note that I have also asked this question here.
Would appreciate it if the agreement downvoters could link to what they think are the best (pref detailed) explanations for why we should expect the default of no doom, given AGI. I want to be less doomy.