I expect that your search for a “unified resource” will be unsatisfying. I think people disagree enough on their threat models/expectations that there is no real “EA perspective”.
Some things you could consider doing:
Having a dialogue with 1-2 key people you disagree with
Pick one perspective (e.g., Paul’s worldview, Eliezer’s worldview) and write about areas you disagree with it.
Write up a “Matthew’s worldview” doc that focuses more on explaining what you expect to happen and isn’t necessarily meant as a “counterargument” piece.
Among the questions you list, I’m most interested in these:
How bad human disempowerment would likely be from a utilitarian perspective
Whether there will be a treacherous turn event, during which AIs violently take over the world after previously having been behaviorally aligned with humans
How likely AIs are to kill every single human if they are unaligned with humans
How society is likely to respond to AI risks, and whether they’ll sleepwalk into a catastrophe
I agree there’s no single unified resource. Having said that, I found Richard Ngo’s “five alignment clusters” pretty helpful for bucketing different groups & arguments together. Reposting below:
MIRI cluster. Think that P(doom) is very high, based on intuitions about instrumental convergence, deceptive alignment, etc. Does work that’s very different from mainstream ML. Central members: Eliezer Yudkowsky, Nate Soares.
Structural risk cluster. Think that doom is more likely than not, but not for the same reasons as the MIRI cluster. Instead, this cluster focuses on systemic risks, multi-agent alignment, selective forces outside gradient descent, etc. Often work that’s fairly continuous with mainstream ML, but willing to be unusually speculative by the standards of the field. Central members: Dan Hendrycks, David Krueger, Andrew Critch.
Constellation cluster. More optimistic than either of the previous two clusters. Focuses more on risk from power-seeking AI than the structural risk cluster, but does work that is more speculative or conceptually-oriented than mainstream ML. Central members: Paul Christiano, Buck Shlegeris, Holden Karnofsky. (Named after Constellation coworking space.)
Prosaic cluster. Focuses on empirical ML work and the scaling hypothesis, is typically skeptical of theoretical or conceptual arguments. Short timelines in general. Central members: Dario Amodei, Jan Leike, Ilya Sutskever.
Mainstream cluster. Alignment researchers who are closest to mainstream ML. Focuses much less on backchaining from specific threat models and more on promoting robustly valuable research. Typically more concerned about misuse than misalignment, although worried about both. Central members: Scott Aaronson, David Bau.
To return to the question “what is the current best single article (or set of articles) that provide a well-reasoned and comprehensive case for believing that there is a substantial (>10%) probability of an AI catastrophe this century?”, my guess is that these different groups would respond as follows:[1]
I expect that your search for a “unified resource” will be unsatisfying. I think people disagree enough on their threat models/expectations that there is no real “EA perspective”.
I agree that there is no real “EA perspective”, but it seems like there could be a unified doc that a large cluster of people end up roughly endorsing. E.g., I think that if Joe Carlsmith wrote another version of “Is Power-Seeking AI an Existential Risk?” in the next several years, then it’s plausible that a relevant cluster of people would end up thinking this basically lays out the key arguments and makes the right arguments. (I’m unsure what I currently think about the old version of the doc, but I’m guessing I’ll think it misses some key arguments that now seem more obvious.)
I expect that your search for a “unified resource” will be unsatisfying. I think people disagree enough on their threat models/expectations that there is no real “EA perspective”.
Some things you could consider doing:
Having a dialogue with 1-2 key people you disagree with
Pick one perspective (e.g., Paul’s worldview, Eliezer’s worldview) and write about areas you disagree with it.
Write up a “Matthew’s worldview” doc that focuses more on explaining what you expect to happen and isn’t necessarily meant as a “counterargument” piece.
Among the questions you list, I’m most interested in these:
How bad human disempowerment would likely be from a utilitarian perspective
Whether there will be a treacherous turn event, during which AIs violently take over the world after previously having been behaviorally aligned with humans
How likely AIs are to kill every single human if they are unaligned with humans
How society is likely to respond to AI risks, and whether they’ll sleepwalk into a catastrophe
I agree there’s no single unified resource. Having said that, I found Richard Ngo’s “five alignment clusters” pretty helpful for bucketing different groups & arguments together. Reposting below:
To return to the question “what is the current best single article (or set of articles) that provide a well-reasoned and comprehensive case for believing that there is a substantial (>10%) probability of an AI catastrophe this century?”, my guess is that these different groups would respond as follows:[1]
MIRI cluster: List of Lethalities, Sharp Left Turn, Superintelligence
Structural Risk cluster: Natural selection favours AIs, RAAP
Constellation cluster: Is Power-seeking AI an x-risk, some Cold Takes posts, Scheming AIs
Prosaic cluster: Concrete problems in AI safety, [perhaps something more recent?]
Mainstream cluster: Reform AI Alignment, [not sure—perhaps nothing arguing for >10%?]
But I could easily be misrepresenting these different groups’ “core” argument, and I haven’t read all of these, so could be misunderstanding
I agree that there is no real “EA perspective”, but it seems like there could be a unified doc that a large cluster of people end up roughly endorsing. E.g., I think that if Joe Carlsmith wrote another version of “Is Power-Seeking AI an Existential Risk?” in the next several years, then it’s plausible that a relevant cluster of people would end up thinking this basically lays out the key arguments and makes the right arguments. (I’m unsure what I currently think about the old version of the doc, but I’m guessing I’ll think it misses some key arguments that now seem more obvious.)