Indeed, 4 and 5 are the weakest parts of the AI risk argument, and often seems to be based on an overly magical view of what computation/intelligence can achieve, and neglecting the fact that all intelligences are fallible. There is an overly large reliance on making up science fiction scenarios without putting any effort into proving that said scenarios are likely or even possible (see Yudkowsky’s absurd “mixing proteins to make nanobots that kill everyone” scenario).
I’m working on a post elaborating in more depth on this based on my experience as a computational physicist.
Thanks for your comment, which helps me to zoom in on claims 4 and 5 in my own thinking.
I was thinking of another point on intelligence fallibility, specifically whether intelligence really allows the AGI to fully shape the future to its will. Was thinking along the lines of Laplace’s Demon which asks the question: if there is a demon which knows the position of every atom in the universe, and the direction which it travels in, will it be able to predict (and hence shape) the future? I think it is not clear that it will. In fact, Heisenberg’s uncertainty principle suggests that it will not (at least at the quantum level). Similarly, it is not clear that the AGI would be able to do so even if it has complete knowledge of everything.
Happy to comment on your post before/when you publish it!
This sounds to me like an understatement. Before homo sapiens, most of the world had the biodiversity of charismatic megafauna we still see today in Africa. 15,000 years ago, North America had mammoths, ground sloths, glypodonts, giant camels, and a whole bunch of other things. Humans may not have been involved in all of those extinctions, but it is a good guess they had something to do with many. It is even more plausible that we caused the extinction of every other homo species. There were a few that had been doing reasonably well until we expanded into their areas.
Thanks for the comment! Wonder if you, or @Derek Shiller knows of any research on the number or proportion of extinctions caused by humans? Thinking it would be a useful number to use as a prior!
My impression is that it is very unclear. In the historical record, we see a lot of disappearances of species around when humans first arrived at an area, but it isn’t clear that humans always arrived before the extinctions occurred. Our understanding of human migration timing is imperfect. There were also other factors, such as temperature changes, that may have been sufficient for extinction (or at least significant depopulation). So I think the frequency of human-caused extinction is an open question. We shouldn’t be confident that it was relatively rare.
Firstly, even with a misaligned AGI, itsintended objectiveswould provide some disincentive to act in ways that lead to disastrous consequences for humanity
You aren’t factoring in mesa-optimisation or reward hacking in this section. Or the AI being much more powerful than humans (cf we don’t trade with ants).
Finally, even if the AGI initially acts in destructive ways, amending its programming to avoid causing unacceptable harm wouldnotbe particularly difficult, as long as we are able to appeal to itsfundamental objectives.
In this section it seems like you are assuming that corrigibility is solved, as well as outer alignment. These are major unsolved problems in AI Alignment. I don’t want to sound too mean, but I get the impression that your 300 hours of reading didn’t include the AGI Safety Fundamentals syllabus.
“the AGI would likely have to face global modern militaries and other non-general – though not necessarily less powerful – AI systems.”
I don’t think you’re really thinking of superintelligent AIs here. The speed the first one will be operating at makes it unlikely a second could ever catch up to it. And human militaries would pose no threat to a superintelligence, almost by definition (i.e. something more intelligent than all of humanity put together).
How are these established? You can’t train for them as that would require breaking them (and you can’t do trial and error if you’re dead after the first try!) How would you hard code them in? (You’re basically solving outer alignment if you can do this!)
Indeed, 4 and 5 are the weakest parts of the AI risk argument, and often seems to be based on an overly magical view of what computation/intelligence can achieve, and neglecting the fact that all intelligences are fallible. There is an overly large reliance on making up science fiction scenarios without putting any effort into proving that said scenarios are likely or even possible (see Yudkowsky’s absurd “mixing proteins to make nanobots that kill everyone” scenario).
I’m working on a post elaborating in more depth on this based on my experience as a computational physicist.
Thanks for your comment, which helps me to zoom in on claims 4 and 5 in my own thinking.
I was thinking of another point on intelligence fallibility, specifically whether intelligence really allows the AGI to fully shape the future to its will. Was thinking along the lines of Laplace’s Demon which asks the question: if there is a demon which knows the position of every atom in the universe, and the direction which it travels in, will it be able to predict (and hence shape) the future? I think it is not clear that it will. In fact, Heisenberg’s uncertainty principle suggests that it will not (at least at the quantum level). Similarly, it is not clear that the AGI would be able to do so even if it has complete knowledge of everything.
Happy to comment on your post before/when you publish it!
I encourage you to publish that post. I also feel that the AI safety argument leans too heavily on the DNA sequences → diamondoid nanobots scenario
Consider entering your post in this competition: https://forum.effectivealtruism.org/posts/W7C5hwq7sjdpTdrQF/announcing-the-future-fund-s-ai-worldview-prize
Humans have caused the extinction of some species. And chickens are typically disempowered due to the actions of humans.
This sounds to me like an understatement. Before homo sapiens, most of the world had the biodiversity of charismatic megafauna we still see today in Africa. 15,000 years ago, North America had mammoths, ground sloths, glypodonts, giant camels, and a whole bunch of other things. Humans may not have been involved in all of those extinctions, but it is a good guess they had something to do with many. It is even more plausible that we caused the extinction of every other homo species. There were a few that had been doing reasonably well until we expanded into their areas.
Thanks for the comment! Wonder if you, or @Derek Shiller knows of any research on the number or proportion of extinctions caused by humans? Thinking it would be a useful number to use as a prior!
My impression is that it is very unclear. In the historical record, we see a lot of disappearances of species around when humans first arrived at an area, but it isn’t clear that humans always arrived before the extinctions occurred. Our understanding of human migration timing is imperfect. There were also other factors, such as temperature changes, that may have been sufficient for extinction (or at least significant depopulation). So I think the frequency of human-caused extinction is an open question. We shouldn’t be confident that it was relatively rare.
You aren’t factoring in mesa-optimisation or reward hacking in this section. Or the AI being much more powerful than humans (cf we don’t trade with ants).
In this section it seems like you are assuming that corrigibility is solved, as well as outer alignment. These are major unsolved problems in AI Alignment. I don’t want to sound too mean, but I get the impression that your 300 hours of reading didn’t include the AGI Safety Fundamentals syllabus.
I don’t think you’re really thinking of superintelligent AIs here. The speed the first one will be operating at makes it unlikely a second could ever catch up to it. And human militaries would pose no threat to a superintelligence, almost by definition (i.e. something more intelligent than all of humanity put together).
How are these established? You can’t train for them as that would require breaking them (and you can’t do trial and error if you’re dead after the first try!) How would you hard code them in? (You’re basically solving outer alignment if you can do this!)
Has it been for Chai Research, following the suicide of one of their users, which was the direct result of talking to their unaligned AI chatbot?