The problem with general arguments is that they tell you very little about how to solve the problem
Agreed!
If I were producing key EA content/fellowships/etc, I would be primarily interested in getting people to solve the problem
I think this is true for some kinds of content/fellowships/etc, but not all. For those targeted at people who aren’t already convinced that AI safety/governance should be prioritised (which is probably the majority), it seems more important to present them with the strongest arguments for caring about AI safety/governance in the first place. This suggests presenting more general arguments.
Then, I agree that you want to get people to help solve the problem, which requires talking about specific failure modes. But I think that doing this prematurely can lead people to dismiss the case for shaping AI development for bad reasons.
Another way of saying this: for AI-related EA content/fellowships/etc, it seems worth separating motivation (“why should I care?”) and action (“if I do care, what should I do?”). This would get you the best of both worlds: people are presented with the strongest arguments, allowing them to make an informed decision about how much AI stuff should be prioritised, and then also the chance to start to explore specific ways to solve the problem.
I think this maybe applies to longtermism in general. We don’t yet have that many great ideas of what to do if longtermism is true, and I think that people sometimes (incorrectly) dismiss longtermism for this reason.
it seems worth separating motivation (“why should I care?”) and action (“if I do care, what should I do?”)
Imagine Alice, an existing AI safety researcher, having such a conversation with Bob, who doesn’t currently care about AI safety:
Alice: AGI is decently likely to be built in the next century, and if it is it will have a huge impact on the world, so it’s really important to deal with it now.
Bob: Huh, okay. It does seem like it’s pretty important to make sure that AGI doesn’t discriminate against people of color. And we better make sure that AGI isn’t used in the military, or all nations will be forced to do so thanks to Moloch, and wars will be way more devastating.
Alice: Great, I’m glad you agree.
Bob: Okay, so what can we do to shape AGI?
Alice: Well, we should ensure that AGIs don’t pursue goals that weren’t the ones we intended, so you work on learning from human feedback.
Bob: <works on those topics>
I feel like something has gone wrong in this conversation; you have tricked Bob into working on learning from human feedback, rather than convincing him to do so. Based on how you convinced him to care, he should really be (say) advocating against the use of AI in the military.
(This feels very similar to a motte and bailey, where the motte is “AI will be a huge deal so we should influence it” and the bailey is “you should be working on learning from human feedback”.)
I think it’s more accurate to say that you’ve answered “why should I care about X?” and “if I do care about Y, what should I do?”, without noticing that X and Y are actually different.
I feel like something has gone wrong in this conversation; you have tricked Bob into working on learning from human feedback, rather than convincing him to do so.
I agree with this. If people become convinced to work on AI stuff by specific argument X, then they should definitely go and try to fix X, not something else (e.g. what other people tell them needs doing in AI safety/governance).
I think when I said I wanted a more general argument to be the “default”, I was meaning something very general, that doesn’t clearly imply any particular intervention—like the one in the most important century series, or the “AI is a big deal” argument (I especially like Max Daniel’s version of this).
Then, it’s very important to think clearly about what will actually go wrong, and how to actually fix that. But I think it’s fine to do this once you’re already convinced that you should work on AI, by some general argument.
I’d be really curious if you still disagree with this?
Agreed!
I think this is true for some kinds of content/fellowships/etc, but not all. For those targeted at people who aren’t already convinced that AI safety/governance should be prioritised (which is probably the majority), it seems more important to present them with the strongest arguments for caring about AI safety/governance in the first place. This suggests presenting more general arguments.
Then, I agree that you want to get people to help solve the problem, which requires talking about specific failure modes. But I think that doing this prematurely can lead people to dismiss the case for shaping AI development for bad reasons.
Another way of saying this: for AI-related EA content/fellowships/etc, it seems worth separating motivation (“why should I care?”) and action (“if I do care, what should I do?”). This would get you the best of both worlds: people are presented with the strongest arguments, allowing them to make an informed decision about how much AI stuff should be prioritised, and then also the chance to start to explore specific ways to solve the problem.
I think this maybe applies to longtermism in general. We don’t yet have that many great ideas of what to do if longtermism is true, and I think that people sometimes (incorrectly) dismiss longtermism for this reason.
Imagine Alice, an existing AI safety researcher, having such a conversation with Bob, who doesn’t currently care about AI safety:
Alice: AGI is decently likely to be built in the next century, and if it is it will have a huge impact on the world, so it’s really important to deal with it now.
Bob: Huh, okay. It does seem like it’s pretty important to make sure that AGI doesn’t discriminate against people of color. And we better make sure that AGI isn’t used in the military, or all nations will be forced to do so thanks to Moloch, and wars will be way more devastating.
Alice: Great, I’m glad you agree.
Bob: Okay, so what can we do to shape AGI?
Alice: Well, we should ensure that AGIs don’t pursue goals that weren’t the ones we intended, so you work on learning from human feedback.
Bob: <works on those topics>
I feel like something has gone wrong in this conversation; you have tricked Bob into working on learning from human feedback, rather than convincing him to do so. Based on how you convinced him to care, he should really be (say) advocating against the use of AI in the military.
(This feels very similar to a motte and bailey, where the motte is “AI will be a huge deal so we should influence it” and the bailey is “you should be working on learning from human feedback”.)
I think it’s more accurate to say that you’ve answered “why should I care about X?” and “if I do care about Y, what should I do?”, without noticing that X and Y are actually different.
(Apologies for my very slow reply.)
I agree with this. If people become convinced to work on AI stuff by specific argument X, then they should definitely go and try to fix X, not something else (e.g. what other people tell them needs doing in AI safety/governance).
I think when I said I wanted a more general argument to be the “default”, I was meaning something very general, that doesn’t clearly imply any particular intervention—like the one in the most important century series, or the “AI is a big deal” argument (I especially like Max Daniel’s version of this).
Then, it’s very important to think clearly about what will actually go wrong, and how to actually fix that. But I think it’s fine to do this once you’re already convinced that you should work on AI, by some general argument.
I’d be really curious if you still disagree with this?
I agree with that, and that’s what I meant by this statement above: