Thanks for the reply! I think the intuitive core that I was arguing for is more-or-less just a more detailed version of what you say here:
“If we create AI systems that are, broadly, more powerful than we are, and their goals diverge from ours, this would be bad—because we couldn’t stop them from doing things we don’t want. And it might be hard to ensure, as we’re developing increasingly sophisticated AI systems, that there aren’t actually subtle but extremely important divergences in some of these systems’ goals.”
The key difference is that I don’t think orthogonality thesis, instrumental convergence or progress being eventually fast are wrong—you just need extra assumptions in addition to them to get to the expectation that AI will cause a catastrophe.
My point in this comment (and follow up) was that the Orthogonality Thesis, Instrumental Convergence and eventual fast progress are essential for any argument about AI risk, even if you also need other assumptions in there—you need to know the OT will apply to your method of developing AI, you need more specific reasons to think the particular goals of your system look like those that lead to instrumental convergence.
If you approached the classic arguments with that framing, then perhaps it begins to look like less a matter of them being mistaken and more a case of having a vague philosophical picture that then got filled in with more detailed considerations—that’s how I see the development over the last 10 years.
The only mistake was in mistaking the vague initial picture for the whole argument—and that was a mistake, but it’s not the same kind of mistake as just having completely false assumptions. You might compare it to the early development of a new scientific field. Perhaps seeing it that way might lead you to have a different view about how much to update against trusting complicated conceptual arguments about AI risk!
“AI safety and alignment issues exist today. In the future, we’ll have crazy powerful AI systems with crazy important responsibilities. At least the potential badness of safety and alignment failures should scale up with these systems’ power and responsibility. Maybe it’ll actually be very hard to ensure that we avoid the worst-case failures.”
This is how Stuart Russell likes to talk about the issue, and I have a go at explaining that line of thinking here.
The key difference is that I don’t think orthogonality thesis, instrumental convergence or progress being eventually fast are wrong—you just need extra assumptions in addition to them to get to the expectation that AI will cause a catastrophe.
Quick belated follow-up: I just wanted to clarify that I also don’t think that the orthogonality thesis or instrumental convergence thesis are incorrect, as they’re traditionally formulated. I just think they’re not nearly sufficient to establish a high level of risk, even though, historically, many presentations of AI risk seemed to treat them as nearly sufficient. Insofar as there’s a mistake here, the mistake concerns way conclusions have been drawn from these theses; I don’t think the mistake is in the theses themselves. (I may not stress this enough in the interview/slides.)
On the other hand, progress/growth eventually becoming much faster might be wrong (this is an open question in economics). The ‘classic arguments’ also don’t just predict that growth/progress will become much faster. In the FOOM debate, for example, both Yudkowsky and Hanson start from the position that growth will become much faster; their disagreement is about how sudden, extreme, and localized the increase will be. If growth is actually unlikely to increase in a sudden, extreme, and localized fashion, then this would be a case of the classic arguments containing a “mistaken” (not just insufficient) premise.
Hi Ben,
Thanks for the reply! I think the intuitive core that I was arguing for is more-or-less just a more detailed version of what you say here:
The key difference is that I don’t think orthogonality thesis, instrumental convergence or progress being eventually fast are wrong—you just need extra assumptions in addition to them to get to the expectation that AI will cause a catastrophe.
My point in this comment (and follow up) was that the Orthogonality Thesis, Instrumental Convergence and eventual fast progress are essential for any argument about AI risk, even if you also need other assumptions in there—you need to know the OT will apply to your method of developing AI, you need more specific reasons to think the particular goals of your system look like those that lead to instrumental convergence.
If you approached the classic arguments with that framing, then perhaps it begins to look like less a matter of them being mistaken and more a case of having a vague philosophical picture that then got filled in with more detailed considerations—that’s how I see the development over the last 10 years.
The only mistake was in mistaking the vague initial picture for the whole argument—and that was a mistake, but it’s not the same kind of mistake as just having completely false assumptions. You might compare it to the early development of a new scientific field. Perhaps seeing it that way might lead you to have a different view about how much to update against trusting complicated conceptual arguments about AI risk!
This is how Stuart Russell likes to talk about the issue, and I have a go at explaining that line of thinking here.
Quick belated follow-up: I just wanted to clarify that I also don’t think that the orthogonality thesis or instrumental convergence thesis are incorrect, as they’re traditionally formulated. I just think they’re not nearly sufficient to establish a high level of risk, even though, historically, many presentations of AI risk seemed to treat them as nearly sufficient. Insofar as there’s a mistake here, the mistake concerns way conclusions have been drawn from these theses; I don’t think the mistake is in the theses themselves. (I may not stress this enough in the interview/slides.)
On the other hand, progress/growth eventually becoming much faster might be wrong (this is an open question in economics). The ‘classic arguments’ also don’t just predict that growth/progress will become much faster. In the FOOM debate, for example, both Yudkowsky and Hanson start from the position that growth will become much faster; their disagreement is about how sudden, extreme, and localized the increase will be. If growth is actually unlikely to increase in a sudden, extreme, and localized fashion, then this would be a case of the classic arguments containing a “mistaken” (not just insufficient) premise.