I’m going to instead answer the question “What evidence would persuade you that further work on AI safety is low value compared to other things?”
Note that a lot of my beliefs here disagree substantially with my coworkers.
I’m going to split the answer into two steps: what situations could we be in such that I thought we should deprioritize AI safety work, and for each of those, what could I learn that would persuade me we were in them.
Situations in which AI safety work looks much less valuable:
We’ve already built superintelligence, in which case the problem is moot
Seems like this would be pretty obvious if it happened
We have clear plans for how to align AI that work even when it’s superintelligent, and we don’t think that we need to do more work in order to make these plans more competitive or easier for leading AGI projects to adopt.
What would persuade me of this:
I’m not sure what evidence would be required for me to be inside-view persuaded of this. I find it kind of hard to be inside view persuaded, for the same reason that I find it hard to imagine being persuaded that an operating system is secure.
But I can imagine what it might feel like to hear some “solutions to the alignment problem” which I feel pretty persuaded by.
I can imagine someone explaining a theory of AGI/intelligence/optimization that felt really persuasive and elegant and easy-to-understand, and then building alignment within this theory.
Thinking about alignment of ML systems, it’s much easier for me to imagine being persuaded that we’d solved outer alignment than inner alignment.
More generally, I feel like it’s hard to know what kinds of knowledge could exist in a field, so it’s hard to know what kind of result could persuade me here, but I think it’s plausible that the result might exist.
If a sufficient set of people whose opinions I respected all thought that alignment was solved, that would convince me to stop working on it. Eg Eliezer, Paul Christiano, Nate Soares, and Dario Amodei would be sufficient (that list is biased towards people I know, this isn’t my list of best AI safety people).
Humans no longer have a comparative advantage at doing AI safety work (compared to AI or whole brain emulations or something else)
Seems like this would be pretty obvious if it happened.
For some reason, the world is going to do enough AI alignment research on its own.
Possible reasons:
It turns out that AI alignment is really easy
It turns out that you naturally end up needing to solve alignment problems as you try to improve AI capabilities, and so all the companies working on AI are going to do all the safety work that they’d need to
The world is generally more reasonable than I think it is
AI development is such that before we could build an AGI that would kill everyone, we would have had lots of warning shots where misaligned AI systems did things that were pretty bad but not GCR level.
What would persuade me of this:
Some combination of developments in the field of AGI and developments in the field of alignment
It looks like the world is going to be radically transformed somehow before AGI has a chance to radically transform it. Possible contenders here: whole brain emulation, other x-risks, maybe major GCRs which seem like they’ll mess up the structure of the world a lot.
What would persuade me of this:
Arguments that AGI timelines are much longer than I think. A big slowdown in ML would be a strong argument for longer timelines. If I thought there was a <30% chance of AGI within 50 years, I’d probably not be working on AI safety.
Arguments that one of these other things is much more imminent than I think.
I can also imagine being persuaded that AI alignment research is as important as I think but something else is even more important, like maybe s-risks or some kind of AI coordination thing.
Thanks, that’s really interesting! I was especially surprised by “If I thought there was a <30% chance of AGI within 50 years, I’d probably not be working on AI safety.”
Yeah, I think that a lot of EAs working on AI safety feel similarly to me about this.
I expect the world to change pretty radically over the next 100 years, and I probably want to work on the radical change that’s going to matter first. So compared to the average educated American I have shorter AI timelines but also shorter timelines to the world becoming radically different for other reasons.
If I thought there was a <30% chance of AGI within 50 years, I’d probably not be working on AI safety.
I expect the world to change pretty radically over the next 100 years.
I find these statements surprising, and would be keen to hear more about this from you. I suppose that the latter goes a long way towards explaining the former. Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn’t heavily involved.
(For reference, while I assign more than 30% credence to AGI within 50 years, it’s not that much more).
I suppose that the latter goes a long way towards explaining the former.
Yeah, I suspect you’re right.
Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn’t heavily involved.
I think there are a couple more radically transformative technologies which I think are reasonably likely over the next hundred years, eg whole brain emulation. And I suspect we disagree about the expected pace of change with bioengineering and maybe nanotech.
I can also imagine being persuaded that AI alignment research is as important as I think but something else is even more important, like maybe s-risks or some kind of AI coordination thing.
Huh, my impression was that the most plausible s-risks we can sort-of-specifically foresee are AI alignment problems—do you disagree? Or is this statement referring to s-risks as a class of black swans for which we don’t currently have specific imaginable scenarios, but if those scenarios became more identifiable you would consider working on them instead?
I’m going to instead answer the question “What evidence would persuade you that further work on AI safety is low value compared to other things?”
Note that a lot of my beliefs here disagree substantially with my coworkers.
I’m going to split the answer into two steps: what situations could we be in such that I thought we should deprioritize AI safety work, and for each of those, what could I learn that would persuade me we were in them.
Situations in which AI safety work looks much less valuable:
We’ve already built superintelligence, in which case the problem is moot
Seems like this would be pretty obvious if it happened
We have clear plans for how to align AI that work even when it’s superintelligent, and we don’t think that we need to do more work in order to make these plans more competitive or easier for leading AGI projects to adopt.
What would persuade me of this:
I’m not sure what evidence would be required for me to be inside-view persuaded of this. I find it kind of hard to be inside view persuaded, for the same reason that I find it hard to imagine being persuaded that an operating system is secure.
But I can imagine what it might feel like to hear some “solutions to the alignment problem” which I feel pretty persuaded by.
I can imagine someone explaining a theory of AGI/intelligence/optimization that felt really persuasive and elegant and easy-to-understand, and then building alignment within this theory.
Thinking about alignment of ML systems, it’s much easier for me to imagine being persuaded that we’d solved outer alignment than inner alignment.
More generally, I feel like it’s hard to know what kinds of knowledge could exist in a field, so it’s hard to know what kind of result could persuade me here, but I think it’s plausible that the result might exist.
If a sufficient set of people whose opinions I respected all thought that alignment was solved, that would convince me to stop working on it. Eg Eliezer, Paul Christiano, Nate Soares, and Dario Amodei would be sufficient (that list is biased towards people I know, this isn’t my list of best AI safety people).
Humans no longer have a comparative advantage at doing AI safety work (compared to AI or whole brain emulations or something else)
Seems like this would be pretty obvious if it happened.
For some reason, the world is going to do enough AI alignment research on its own.
Possible reasons:
It turns out that AI alignment is really easy
It turns out that you naturally end up needing to solve alignment problems as you try to improve AI capabilities, and so all the companies working on AI are going to do all the safety work that they’d need to
The world is generally more reasonable than I think it is
AI development is such that before we could build an AGI that would kill everyone, we would have had lots of warning shots where misaligned AI systems did things that were pretty bad but not GCR level.
What would persuade me of this:
Some combination of developments in the field of AGI and developments in the field of alignment
It looks like the world is going to be radically transformed somehow before AGI has a chance to radically transform it. Possible contenders here: whole brain emulation, other x-risks, maybe major GCRs which seem like they’ll mess up the structure of the world a lot.
What would persuade me of this:
Arguments that AGI timelines are much longer than I think. A big slowdown in ML would be a strong argument for longer timelines. If I thought there was a <30% chance of AGI within 50 years, I’d probably not be working on AI safety.
Arguments that one of these other things is much more imminent than I think.
I can also imagine being persuaded that AI alignment research is as important as I think but something else is even more important, like maybe s-risks or some kind of AI coordination thing.
Thanks, that’s really interesting! I was especially surprised by “If I thought there was a <30% chance of AGI within 50 years, I’d probably not be working on AI safety.”
Yeah, I think that a lot of EAs working on AI safety feel similarly to me about this.
I expect the world to change pretty radically over the next 100 years, and I probably want to work on the radical change that’s going to matter first. So compared to the average educated American I have shorter AI timelines but also shorter timelines to the world becoming radically different for other reasons.
I find these statements surprising, and would be keen to hear more about this from you. I suppose that the latter goes a long way towards explaining the former. Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn’t heavily involved.
(For reference, while I assign more than 30% credence to AGI within 50 years, it’s not that much more).
Yeah, I suspect you’re right.
I think there are a couple more radically transformative technologies which I think are reasonably likely over the next hundred years, eg whole brain emulation. And I suspect we disagree about the expected pace of change with bioengineering and maybe nanotech.
Huh, my impression was that the most plausible s-risks we can sort-of-specifically foresee are AI alignment problems—do you disagree? Or is this statement referring to s-risks as a class of black swans for which we don’t currently have specific imaginable scenarios, but if those scenarios became more identifiable you would consider working on them instead?
Most of them are related to AI alignment problems, but it’s possible that I should work specifically on them rather than other parts of AI alignment.
An s-risk could occur via a moral failure, which could happen even if we knew how to align our AIs.