I applaud the decision to take a big swing, but I think the reasoning is unsound and probably leads to worse worlds.
I think there are actions that look like “making AI go well” that actually are worse than not doing anything at all, because things like “keep human in control over AI” can very easily lead to something like value lock-in, or at least leaving it in the hands of immoral stewards. It’s plausible that if ASI is developed and still controlled by humans, hundreds of trillions of animals would suffer, because humans still want to eat meat from an animal. I think it’s far from clear that factors like faster alternative proteins development outweigh/outpace this risk—it’s plausible humans will always want animal meat instead of identical cultured meat for similar reasons to why some prefer human-created art over AI-created art.
If society had positive valence, I think redirecting more resources to AI and minimising x-risk are worth it, the “neutral” outcome may be plausibly that things just scale up to galactic scales which seems ok/good, and “doom” is worse than that. However, I think that when farmed animals are considered, civilisation’s valence is probably significantly negative. If the “neutral” option of scale up occurs, astronomical suffering seems plausible. This seems worse than “doom”.
Meanwhile, in worlds where ASI isn’t achieved soon, or is achieved and doesn’t lead to explosive economic growth or other transformative outcomes, redirecting people towards focusing on that instead of other cause areas probably isn’t very good.
Promoting a wider portfolio of career paths/cause areas seems more sensible, and more beneficial to the world.
One reason we use phrases “making AGI go well,” rather than some alternatives, is because 80k is concerned about risks like lock-in of really harmful values, in addition to human disempowerment and extinction risk — so I sympathise with your worries here.
Figuring out how to avoid these kinds of risks is really important, and recognising that they might arise soon is definitely within the scope of our new strategy. We have written about ways the future can look very bad even if humans have control of AI, for example here, here, and here.
I think it’s plausible to worry that not enough is being done about these kinds of concerns — that depends a lot on how plausible they are and how tractable the solutions are, which I don’t have very settled views on.
You might also think that there’s nothing tractable to do about these risks, so it’s better to focus on interventions that pay off in the short-term. But my view at least is that it is worth putting more effort into figuring out what the solutions here might be.
Thanks Cody. I appreciate the thoughtfulness of the replies given by you and others. I’m not sure if you were expecting the community response to be as it is.
My expressed thoughts were a bit muddled. I have a few reasons why I think 80k’s change is not good. I think it’s unclear how AI will develop further, and multiple worlds seem plausible. Some of my reasons apply to some worlds and not others. The inconsistent overlap is perhaps leading to a lack of clarity. Here’s a more general category of failure mode of what I was trying to point to.
I think in cases where AGI does lead to explosive outcomes soon, it’s suddenly very unclear what is best, or even good. It’s something like a wicked problem, with lots of unexpected second order effects and so on. I don’t think we have a good track record of thinking about this problem in a way that leads to solutions even on a first order effects level, as Geoffrey Miller highlighted earlier in the thread. In most of these worlds, what I expect will happen is something like:
Thinkers and leaders in the movement have genuinely interesting ideas and insights about what AGI could imply at an abstract or cosmic level.
Other leaders start working out what this actually implies individuals and organisations should do. This doesn’t work though, because we don’t know what we’re doing. Due to unknown unknowns, the most important things are missed, and because of the massive level of detail in reality, the things that are suggested are significantly wrong at load-bearing points. There are also suggestions in the spirit of “we’re not sure which of these directly opposing views X and Y are correct, and encourage careful consideration”, because it is genuinely hard.
People looking for career advice or organisational direction etc. try to think carefully about things, but in the end, most just use it to rationalise a messy choice they make between X and Y that they actually make based on factors like convenience, cost and reputational risk.
I think the impact of most actions here is basically chaotic. There are some things that are probably good, like trying to ensure it’s not controlled by a single individual. I also think “make the world better in meaningful ways in our usual cause areas before AGI is here” probably helps in many worlds, due to things like AI maybe trying to copy our values, or AI could be controlled by the UN or whatever and it’s good to get as much moral progress in there as possible beforehand, or just updates on the amount of morally aligned training data being used.
There are worlds where AGI doesn’t take off soon. I think that more serious consideration of the Existential Risk Persuasion Tournament leads one to conclude that wildly transformational outcomes just aren’t that likely in the short/medium term. I’m aware the XPT doesn’t ask about that specifically, but it seems like one of the better data points we have. I worry that focusing on things like expected value leads to some kind of Pascal’s mugging, which is a shame because the counterfactual—refusing to be mugged—is still good in this case.
I still think AI an issue worth considering seriously, dedicating many resources to addressing, etc. I think significant de-emphasis on other cause areas is not good. Depending on how long 80k make the change for, it also plausibly leads to new people not entering other causes areas in significant numbers for quite some time, which is probably bad in movement-building ways that is greater than the sum of its parts (fewer people leads to feelings of defeat, stagnation etc and few new people mean better, newer ideas can’t take over).
I hope 80k reverse this change after the first year or two. I hope that, if they don’t, it’s worth it.
Thanks for the additional context! I think I understand your views better now and I appreciate your feedback.
Just speaking for myself here, I think I can identify some key cruxes between us. I’ll take them one by one:
I think the impact of most actions here is basically chaotic.
I disagree with this. I think it’s better if people have a better understanding of the key issues raised by the emergence of AGI. We don’t have all the answers, but we’ve thought about these issues a lot and have ideas about what kinds of problems are most pressing to address and what some potential solutions are. Communicating these ideas more broadly and to people who may be able to help is just better in expectation than failing to do so (all else equal), even though, as with any problem, you can’t be sure you’re making things better, and there’s some chance you make things worse.
I also think “make the world better in meaningful ways in our usual cause areas before AGI is here” probably helps in many worlds, due to things like AI maybe trying to copy our values, or AI could be controlled by the UN or whatever and it’s good to get as much moral progress in there as possible beforehand, or just updates on the amount of morally aligned training data being used.
I don’t think I agree with this. I think the value of doing work in areas like global health or helping animals is largely in the direct impact of these actions, rather than any impact on what it means for the arrival of AGI. I don’t think even if, in an overwhelming success, we cut malaria deaths in half next year, that will meaningfully increase the likelihood that AGI is aligned or that the training data reflects a better morality. It’s more likely that directly trying to work to create beneficial AI will have these effects. Of course, the case for saving lives from malaria is still strong, because people’s lives matter and are worth saving.
I think that more serious consideration of the Existential Risk Persuasion Tournament leads one to conclude that wildly transformational outcomes just aren’t that likely in the short/medium term.
Recall that the XPT is from 2022, so there’s a lot that’s happened since. Even still, here’s what Ezra Karger noted about expectations of the experts and forecasters views when we interviewed him on the 80k podcast:
One of the pieces of this work that I found most interesting is that even though domain experts and superforecasters disagreed strongly, I would argue, about AI-caused risks, they both believed that AI progress would continue very quickly.
So we did ask superforecasters and domain experts when we would have an advanced AI system, according to a definition that relied on a long list of capabilities. And the domain experts gave a year of 2046, and the superforecasters gave a year of 2060.
My understanding is that XPT was using the definition of AGI used in the Metaculus question cited in Niel’s original post (though see his comment for some caveats about the definition). In March 2022, that forecast was around 2056-2058; it’s now at 2030. The Metaculus question also has over 1500 forecasters, whereas XPT had around 30 superforecasters, I believe. So overall I wouldn’t consider XPT to be strong evidence against short timelines.
I think there is some general “outside view” reason to be sceptical of short timelines. But I think there are good reasons to think that kind of perspective would miss big changes like this, and there is enough reason to believe short timelines are plausible to take action on that basis.
Thanks, I think you’ve done a decent job of identifying cruxes, and I appreciate the additional info too. Your comment about the XPT being from 2022 does update me somewhat.
One thing I’ll highlight and will be thinking about: there’s some tension between the two positions of
a) “recent AI developments are very surprising, so therefore we should update our p|doom to be significantly higher than superforecasters from 2022” and
b) “in 2022, superforecasters thought AI progress would continue very quickly beyond current day levels”
This is potentially partially resolved by the statement:
c) “superforecasters though AI progress would be fast, but it’s actually very fast, so therefore we are right to update to be significantly higher”.
This is a sensible take, and is supported by things like the Metaculus survey you cite. However, I think that if they thought it was already going to be fast, and yet still only had a small chance of extinction in 2022, then recent developments would make them give a higher probability, but not significantly higher. The exact amount it has changed, and what counts as “significantly higher” vs marginally higher has unfortunately been left as an exercise for the reader, and it’s not the only risk, so I think I do understand your position.
I applaud the decision to take a big swing, but I think the reasoning is unsound and probably leads to worse worlds.
I think there are actions that look like “making AI go well” that actually are worse than not doing anything at all, because things like “keep human in control over AI” can very easily lead to something like value lock-in, or at least leaving it in the hands of immoral stewards. It’s plausible that if ASI is developed and still controlled by humans, hundreds of trillions of animals would suffer, because humans still want to eat meat from an animal. I think it’s far from clear that factors like faster alternative proteins development outweigh/outpace this risk—it’s plausible humans will always want animal meat instead of identical cultured meat for similar reasons to why some prefer human-created art over AI-created art.
If society had positive valence, I think redirecting more resources to AI and minimising x-risk are worth it, the “neutral” outcome may be plausibly that things just scale up to galactic scales which seems ok/good, and “doom” is worse than that. However, I think that when farmed animals are considered, civilisation’s valence is probably significantly negative. If the “neutral” option of scale up occurs, astronomical suffering seems plausible. This seems worse than “doom”.
Meanwhile, in worlds where ASI isn’t achieved soon, or is achieved and doesn’t lead to explosive economic growth or other transformative outcomes, redirecting people towards focusing on that instead of other cause areas probably isn’t very good.
Promoting a wider portfolio of career paths/cause areas seems more sensible, and more beneficial to the world.
One reason we use phrases “making AGI go well,” rather than some alternatives, is because 80k is concerned about risks like lock-in of really harmful values, in addition to human disempowerment and extinction risk — so I sympathise with your worries here.
Figuring out how to avoid these kinds of risks is really important, and recognising that they might arise soon is definitely within the scope of our new strategy. We have written about ways the future can look very bad even if humans have control of AI, for example here, here, and here.
I think it’s plausible to worry that not enough is being done about these kinds of concerns — that depends a lot on how plausible they are and how tractable the solutions are, which I don’t have very settled views on.
You might also think that there’s nothing tractable to do about these risks, so it’s better to focus on interventions that pay off in the short-term. But my view at least is that it is worth putting more effort into figuring out what the solutions here might be.
Thanks Cody. I appreciate the thoughtfulness of the replies given by you and others. I’m not sure if you were expecting the community response to be as it is.
My expressed thoughts were a bit muddled. I have a few reasons why I think 80k’s change is not good. I think it’s unclear how AI will develop further, and multiple worlds seem plausible. Some of my reasons apply to some worlds and not others. The inconsistent overlap is perhaps leading to a lack of clarity. Here’s a more general category of failure mode of what I was trying to point to.
I think in cases where AGI does lead to explosive outcomes soon, it’s suddenly very unclear what is best, or even good. It’s something like a wicked problem, with lots of unexpected second order effects and so on. I don’t think we have a good track record of thinking about this problem in a way that leads to solutions even on a first order effects level, as Geoffrey Miller highlighted earlier in the thread. In most of these worlds, what I expect will happen is something like:
Thinkers and leaders in the movement have genuinely interesting ideas and insights about what AGI could imply at an abstract or cosmic level.
Other leaders start working out what this actually implies individuals and organisations should do. This doesn’t work though, because we don’t know what we’re doing. Due to unknown unknowns, the most important things are missed, and because of the massive level of detail in reality, the things that are suggested are significantly wrong at load-bearing points. There are also suggestions in the spirit of “we’re not sure which of these directly opposing views X and Y are correct, and encourage careful consideration”, because it is genuinely hard.
People looking for career advice or organisational direction etc. try to think carefully about things, but in the end, most just use it to rationalise a messy choice they make between X and Y that they actually make based on factors like convenience, cost and reputational risk.
I think the impact of most actions here is basically chaotic. There are some things that are probably good, like trying to ensure it’s not controlled by a single individual. I also think “make the world better in meaningful ways in our usual cause areas before AGI is here” probably helps in many worlds, due to things like AI maybe trying to copy our values, or AI could be controlled by the UN or whatever and it’s good to get as much moral progress in there as possible beforehand, or just updates on the amount of morally aligned training data being used.
There are worlds where AGI doesn’t take off soon. I think that more serious consideration of the Existential Risk Persuasion Tournament leads one to conclude that wildly transformational outcomes just aren’t that likely in the short/medium term. I’m aware the XPT doesn’t ask about that specifically, but it seems like one of the better data points we have. I worry that focusing on things like expected value leads to some kind of Pascal’s mugging, which is a shame because the counterfactual—refusing to be mugged—is still good in this case.
I still think AI an issue worth considering seriously, dedicating many resources to addressing, etc. I think significant de-emphasis on other cause areas is not good. Depending on how long 80k make the change for, it also plausibly leads to new people not entering other causes areas in significant numbers for quite some time, which is probably bad in movement-building ways that is greater than the sum of its parts (fewer people leads to feelings of defeat, stagnation etc and few new people mean better, newer ideas can’t take over).
I hope 80k reverse this change after the first year or two. I hope that, if they don’t, it’s worth it.
Thanks for the additional context! I think I understand your views better now and I appreciate your feedback.
Just speaking for myself here, I think I can identify some key cruxes between us. I’ll take them one by one:
I disagree with this. I think it’s better if people have a better understanding of the key issues raised by the emergence of AGI. We don’t have all the answers, but we’ve thought about these issues a lot and have ideas about what kinds of problems are most pressing to address and what some potential solutions are. Communicating these ideas more broadly and to people who may be able to help is just better in expectation than failing to do so (all else equal), even though, as with any problem, you can’t be sure you’re making things better, and there’s some chance you make things worse.
I don’t think I agree with this. I think the value of doing work in areas like global health or helping animals is largely in the direct impact of these actions, rather than any impact on what it means for the arrival of AGI. I don’t think even if, in an overwhelming success, we cut malaria deaths in half next year, that will meaningfully increase the likelihood that AGI is aligned or that the training data reflects a better morality. It’s more likely that directly trying to work to create beneficial AI will have these effects. Of course, the case for saving lives from malaria is still strong, because people’s lives matter and are worth saving.
Recall that the XPT is from 2022, so there’s a lot that’s happened since. Even still, here’s what Ezra Karger noted about expectations of the experts and forecasters views when we interviewed him on the 80k podcast:
My understanding is that XPT was using the definition of AGI used in the Metaculus question cited in Niel’s original post (though see his comment for some caveats about the definition). In March 2022, that forecast was around 2056-2058; it’s now at 2030. The Metaculus question also has over 1500 forecasters, whereas XPT had around 30 superforecasters, I believe. So overall I wouldn’t consider XPT to be strong evidence against short timelines.
I think there is some general “outside view” reason to be sceptical of short timelines. But I think there are good reasons to think that kind of perspective would miss big changes like this, and there is enough reason to believe short timelines are plausible to take action on that basis.
Again, thanks for engaging with all this!
Thanks, I think you’ve done a decent job of identifying cruxes, and I appreciate the additional info too. Your comment about the XPT being from 2022 does update me somewhat.
One thing I’ll highlight and will be thinking about: there’s some tension between the two positions of
a) “recent AI developments are very surprising, so therefore we should update our p|doom to be significantly higher than superforecasters from 2022” and
b) “in 2022, superforecasters thought AI progress would continue very quickly beyond current day levels”
This is potentially partially resolved by the statement:
c) “superforecasters though AI progress would be fast, but it’s actually very fast, so therefore we are right to update to be significantly higher”.
This is a sensible take, and is supported by things like the Metaculus survey you cite. However, I think that if they thought it was already going to be fast, and yet still only had a small chance of extinction in 2022, then recent developments would make them give a higher probability, but not significantly higher. The exact amount it has changed, and what counts as “significantly higher” vs marginally higher has unfortunately been left as an exercise for the reader, and it’s not the only risk, so I think I do understand your position.