A lot of longtermist effort is going into AI safety at the moment. I think it’s hard to make the case that something in AI safety has legibly or concretely reduced AI risk, since (a) the field is still considered quite pre-paradigmatic, (b) the risk comes from systems that are more powerful than the ones we currently have, and (c) even in less speculative fields, research often takes several years before it is shown to legibly help anyone.
But with those caveats in mind, I think:
The community has made some progress in understanding possible risks and threats from advanced AI system. (See DeepMind’s review of alignment threat models).
Interpretability research seems relatively legible. The basic case “we’re building powerful models and it would be valuable to understand how they work” makes intuitive sense. There are also several more nuanced ways interpretability research could be helpful (see Neel’s longlist of theories for impact).
The fact that most of the major AGI labs have technical safety teams and governance teams seems quite concrete/legible. I’m not sure how much credit should go to the longtermist communities, but I think several of these teams have been inspired/influenced by ideas in the AI safety community. (To be fair, this might just be a case of “get lots of people to think seriously about reducing x-risk”, but I think it’s a bit more tangible/concrete.)
In AI governance, the structured access approach seems pretty common among major AGI labs (again, a bit unclear how much credit should go to longtermists but my guess is a non-negligble amount).
In AI governance, some work on reducing misuse risks and recognizing the dual-use nature of AI technologies seems somewhat legible. A lot of people who did this research are now working at major AGI labs, and it seems plausible that they’re implementing some of the best practices they suggested (which would be especially legible, though I’m not aware of any specific examples, though this might be because labs keep a lot of this stuff confidential).
I think it’s also easy to make a case that longtermist efforts have increased the x-risk of artificial intelligence, with the money and talent that grew some of the biggest hype machines in AI (Deepmind, OpenAI) coming from longtermist places.
It’s possible that EA has shaved a couple counterfactual years off of time to catastrophic AGI, compared to a world where the community wasn’t working on it.
Can you say more about which longtermist efforts you’re referring to?
I think a case can be made, but I don’t think it’s an easy (or clear) case.
My current impression is that Yudkowsky & Bostrom’s writings about AGI inspired the creation of OpenAI/DeepMind. And I believe FTX invested a lot in Anthropic and OP invested a little bit (in relative terms) into OpenAI. Since then, there have been capabilities advances and safety advances made by EAs, and I don’t think it’s particularly clear which outweighs.
It seems unclear to me what the sign of these effects are. Like, maybe no one thinks about AGI for decades. Or maybe 3-5 years after Yudkowsky starts thinking about AGI, someone else much less safety-concerned starts thinking about AGI, and we get a world with AGI labs that are much less concerned about safety than status-quo.
I’m not advocating for this position, but I’m using it to illustrate how the case seems far-from-easy.
Is most of the AI capabilities work here causally downstream of Superintelligence, even if Superintelligence may have been (heavily ?) influenced by Yudkowsky? Both Musk and Altman recommended Superintelligence, altough Altman has also directly said Yudkowsky has accelerated timelines the most:
If things stayed in the LW/Rat/EA community, that might have been best. If Yudkowsky hadn’t written about AI, then there might not be much of an AI safety community at all now (it might just be MIRI quietly hacking away at it, and most of MIRI seems to have given up now), and doom would be more likely, just later. Someone had to write about AI safety publicly to build the community, but writing and promoting a popular book on the topic is much riskier, because you bring it to the attention of uncareful people, including entrepreneurial types.
I guess they might have tried to keep the public writing limited to academia, but the AI community has been pretty dismissive of AI safety, so it might have been too hard to build the community that way.
Did Superintelligence have a dramatic effect on people like Elon Musk? I can imagine Elon getting involved without it. That involvement might have been even more harmful (e.g. starting an AGI lab with zero safety concerns).
In college, he thought about what he wanted to do with his life, using as his starting point the question, “What will most affect the future of humanity?” The answer he came up with was a list of five things: “the internet; sustainable energy; space exploration, in particular the permanent extension of life beyond Earth; artificial intelligence; and reprogramming the human genetic code.”
Overall, causality is multifactorial and tricky to analyze, so concepts like “causally downstream” can be misleading.
(Nonetheless, I do think it’s plausible that publishing Superintelligence was a bad idea, at least in 2014.)
I think my general feeling on these is that it’s hard for me to tell if they actually reduced existential risk. Maybe this is just because I don’t understand the mechanisms for a global catastrophe from AI well enough. (e.g. because of this, linking to Neel’s longlist of theories for impact was helpful, so thank you for that!)
E.g. my impression is that some people with relevant knowledge seem to think that technical safety work currently can’t achieve very much.
(Hopefully this response isn’t too annoying—I could put in the work to understand the mechanisms for a global catastrophe from AI better, and maybe I will get round to this someday)
A lot of longtermist effort is going into AI safety at the moment. I think it’s hard to make the case that something in AI safety has legibly or concretely reduced AI risk, since (a) the field is still considered quite pre-paradigmatic, (b) the risk comes from systems that are more powerful than the ones we currently have, and (c) even in less speculative fields, research often takes several years before it is shown to legibly help anyone.
But with those caveats in mind, I think:
The community has made some progress in understanding possible risks and threats from advanced AI system. (See DeepMind’s review of alignment threat models).
Interpretability research seems relatively legible. The basic case “we’re building powerful models and it would be valuable to understand how they work” makes intuitive sense. There are also several more nuanced ways interpretability research could be helpful (see Neel’s longlist of theories for impact).
The fact that most of the major AGI labs have technical safety teams and governance teams seems quite concrete/legible. I’m not sure how much credit should go to the longtermist communities, but I think several of these teams have been inspired/influenced by ideas in the AI safety community. (To be fair, this might just be a case of “get lots of people to think seriously about reducing x-risk”, but I think it’s a bit more tangible/concrete.)
In AI governance, the structured access approach seems pretty common among major AGI labs (again, a bit unclear how much credit should go to longtermists but my guess is a non-negligble amount).
In AI governance, some work on reducing misuse risks and recognizing the dual-use nature of AI technologies seems somewhat legible. A lot of people who did this research are now working at major AGI labs, and it seems plausible that they’re implementing some of the best practices they suggested (which would be especially legible, though I’m not aware of any specific examples, though this might be because labs keep a lot of this stuff confidential).
I think it’s also easy to make a case that longtermist efforts have increased the x-risk of artificial intelligence, with the money and talent that grew some of the biggest hype machines in AI (Deepmind, OpenAI) coming from longtermist places.
It’s possible that EA has shaved a couple counterfactual years off of time to catastrophic AGI, compared to a world where the community wasn’t working on it.
Can you say more about which longtermist efforts you’re referring to?
I think a case can be made, but I don’t think it’s an easy (or clear) case.
My current impression is that Yudkowsky & Bostrom’s writings about AGI inspired the creation of OpenAI/DeepMind. And I believe FTX invested a lot in Anthropic and OP invested a little bit (in relative terms) into OpenAI. Since then, there have been capabilities advances and safety advances made by EAs, and I don’t think it’s particularly clear which outweighs.
It seems unclear to me what the sign of these effects are. Like, maybe no one thinks about AGI for decades. Or maybe 3-5 years after Yudkowsky starts thinking about AGI, someone else much less safety-concerned starts thinking about AGI, and we get a world with AGI labs that are much less concerned about safety than status-quo.
I’m not advocating for this position, but I’m using it to illustrate how the case seems far-from-easy.
Is most of the AI capabilities work here causally downstream of Superintelligence, even if Superintelligence may have been (heavily ?) influenced by Yudkowsky? Both Musk and Altman recommended Superintelligence, altough Altman has also directly said Yudkowsky has accelerated timelines the most:
https://twitter.com/elonmusk/status/495759307346952192?lang=en
https://blog.samaltman.com/machine-intelligence-part-1
https://twitter.com/sama/status/1621621724507938816
If things stayed in the LW/Rat/EA community, that might have been best. If Yudkowsky hadn’t written about AI, then there might not be much of an AI safety community at all now (it might just be MIRI quietly hacking away at it, and most of MIRI seems to have given up now), and doom would be more likely, just later. Someone had to write about AI safety publicly to build the community, but writing and promoting a popular book on the topic is much riskier, because you bring it to the attention of uncareful people, including entrepreneurial types.
I guess they might have tried to keep the public writing limited to academia, but the AI community has been pretty dismissive of AI safety, so it might have been too hard to build the community that way.
Did Superintelligence have a dramatic effect on people like Elon Musk? I can imagine Elon getting involved without it. That involvement might have been even more harmful (e.g. starting an AGI lab with zero safety concerns).
Here’s one notable quote about Elon (source), who started college over 20 years before Superintelligence:
Overall, causality is multifactorial and tricky to analyze, so concepts like “causally downstream” can be misleading.
(Nonetheless, I do think it’s plausible that publishing Superintelligence was a bad idea, at least in 2014.)
Thanks for these!
I think my general feeling on these is that it’s hard for me to tell if they actually reduced existential risk. Maybe this is just because I don’t understand the mechanisms for a global catastrophe from AI well enough. (e.g. because of this, linking to Neel’s longlist of theories for impact was helpful, so thank you for that!)
E.g. my impression is that some people with relevant knowledge seem to think that technical safety work currently can’t achieve very much.
(Hopefully this response isn’t too annoying—I could put in the work to understand the mechanisms for a global catastrophe from AI better, and maybe I will get round to this someday)