Thanks, Locke, this is a series of great points. In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims. Likewise, the points about the relative usefulness of spending time learning technical stuff are well taken, though I think I put more value on technical understanding than you do; for example, while of course policy professionals can ask people they trust, they have to somehow be able to assess the judgment of these people on the object-level thing.
Also, while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration. Relatedly, it seems maybe easier to do costly fit-tests (like taking a first full time job) in technical research and switch to policy than vice versa.
Edit: for the final point about risk models, I definitely don’t have state funding for safety research in mind; what I mean is that since I think it’s very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved. I think there are many things governments and private decision-makers can do to improve the chances this happens before AGI, which is why I’m still planning on pursuing a governance career!
In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims.
I appreciate you taking this seriously! I do want to emphasize I’m not very confident in the ~25 number, and I think people with more expansive definitions of “policy” would reach higher numbers (e.g. I wouldn’t count people at FHI as doing “policy” work even if they do non-technical work, but my sense is that many EAs lump together all non-technical work under headings such as “governance”/”strategy” and implicitly treat this as synonymous with “policy”). To the extent that it feels crux-y to someone whether the true “policy” number is closer to 25 or 50 or 75, it might be worth doing a more thorough inventory. (I would be highly skeptical of any list that claims it’s >75, if you limit it to people who do government policy-related and reasonably high-quality work, but I could be wrong.)
while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration
I certainly agree that sometimes technical credentials can provide a boost to policy careers. However, that typically involves formal technical credentials (e.g.a CS graduate degree), and “three months of AISTR self-study” won’t be of much use as career capital (and may even be a negative if it requires you to have a strange-looking gap on your CV or an affiliation with a weird-looking organization). A technical job taken for fit-testing purposes at a mainstream-looking organization could indeed, in some scenarios, help open doors to/provide a boost for some types of policy jobs. But I don’t think that effect is true (or large) often enough for this to really reduce my concerns about opportunity costs for most individuals.
I definitely don’t have state funding for safety research in mind; what I mean is that since I think it’s very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved.
This question is worth a longer investigation at some point, but I don’t see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow “solve” the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers). In which case one could flip your argument: since it’s very unlikely AGI-related risks will be addressed solely through technical means, success ultimately depends on having thoughtful people in government working on these problems. In reality, for safety-related risks, both technical and policy solutions are necessary.
And for security-related risk models, i.e. bad actors using powerful AI systems to cause deliberate harm (potentially up to existential risks in certain capability scenarios), technical alignment research is neither a necessary nor a sufficient part of the solution, but policy is at least necessary. (In this scenario, other kinds of technical research may be more important, but my understanding is that most “safety” researchers are not focused on those sorts of problems.)
Edited the post substantially (and, hopefully, transparently, via strikethrough and “edit:” and such) to reflect the parts of this and the previous comment that I agree with.
Regarding this:
I don’t see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow “solve” the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers).
I’ve heard this elsewhere, including at an event for early-career longtermists interested in policy where a very policy-skeptical, MIRI-type figure was giving a Q&A. A student asked: if we solved the alignment problem, wouldn’t we need to enforce its adoption? The MIRI-type figure said something along the lines of:
“Solving the alignment problem” probably means figuring out how to build an aligned AGI. The top labs all want to build an aligned AGI; they just think the odds of the AGIs they’re working on being aligned are much higher than I think they are. But if we have a solution, we can just go to the labs and say, here, this is how you build it in a way that we don’t all die, and I can prove that this makes us not all die. And if you can’t say that, you don’t actually have a solution. And they’re mostly reasonable people who want to build AGI and make a ton of money and not die, so they will take the solution and say, thanks, we’ll do it this way now.
So, was MIRI-type right? Or would we need policy levers to enforce adoption of the problem, even in this model? The post you cite chronicles how long it took for safety advocates to address glaring risks in the nuclear missile system. My initial model says that if the top labs resemble today’s OpenAI and DeepMind, it would be much easier to convince them than the entrenched, securitized bureaucracy described in the post: the incentives are much better aligned, and the cultures are more receptive to suggestions of change. But this does seem like a cruxy question. If MIRI-type is wrong, this would justify a lot of investigation into what those levers would be, and how to prepare governments to develop and pull these levers. If not, this would support more focus on buying time in the first place, as well as on trying to make sure the top firms at the time the alignment problem is solved are receptive.
(E.g., if the MIRI-type model is right, a US lead over China seems really important: if we expect that the solution will come from alignment researchers in Berkeley, maybe it’s more likely that they implement it if they are private, “open”-tech-culture companies, who speak the same language and live in the same milieu and broadly have trusting relationships with the proponent, etc. Or maybe not!)
I put some credence on the MIRI-type view, but we can’t simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some “performance” penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.
The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it’s going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It’s not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I’ve similarly witnessed some MIRI staff’s “policy-skepticism”) strikes me as naive and irresponsible.
You can essentially think of it as two separate problems:
Problem 1: Conditional on us having a technical solution to AI alignment, how do we ensure the first AGI built implements it?
Problem 2: Conditional on us having a technical solution to AI alignment, how do we ensure no AGI is ever built that does NOT implement it, or some other equivalent solution?
I feel like you are talking about Problem 1, and Locke is talking about Problem 2. I agree with the MIRI-type that Problem 1 is easy to solve, and the hard part of that problem is having the solution. I do believe the existing labs working on AGI would implement a solution to AI alignment if we had one. That still leaves Problem 2 that needs to be solved—though at least if we’re facing Problem 2, we do have an aligned AGI to help with the problem.
Hmm. I don’t have strong views on unipolar vs multipolar outcomes, but I think MIRI-type thinks Problem 2 is also easy to solve, due to the last couple clauses of your comment.
Thanks, Locke, this is a series of great points. In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims. Likewise, the points about the relative usefulness of spending time learning technical stuff are well taken, though I think I put more value on technical understanding than you do; for example, while of course policy professionals can ask people they trust, they have to somehow be able to assess the judgment of these people on the object-level thing. Also, while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration. Relatedly, it seems maybe easier to do costly fit-tests (like taking a first full time job) in technical research and switch to policy than vice versa. Edit: for the final point about risk models, I definitely don’t have state funding for safety research in mind; what I mean is that since I think it’s very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved. I think there are many things governments and private decision-makers can do to improve the chances this happens before AGI, which is why I’m still planning on pursuing a governance career!
Thanks!
I appreciate you taking this seriously! I do want to emphasize I’m not very confident in the ~25 number, and I think people with more expansive definitions of “policy” would reach higher numbers (e.g. I wouldn’t count people at FHI as doing “policy” work even if they do non-technical work, but my sense is that many EAs lump together all non-technical work under headings such as “governance”/”strategy” and implicitly treat this as synonymous with “policy”). To the extent that it feels crux-y to someone whether the true “policy” number is closer to 25 or 50 or 75, it might be worth doing a more thorough inventory. (I would be highly skeptical of any list that claims it’s >75, if you limit it to people who do government policy-related and reasonably high-quality work, but I could be wrong.)
I certainly agree that sometimes technical credentials can provide a boost to policy careers. However, that typically involves formal technical credentials (e.g.a CS graduate degree), and “three months of AISTR self-study” won’t be of much use as career capital (and may even be a negative if it requires you to have a strange-looking gap on your CV or an affiliation with a weird-looking organization). A technical job taken for fit-testing purposes at a mainstream-looking organization could indeed, in some scenarios, help open doors to/provide a boost for some types of policy jobs. But I don’t think that effect is true (or large) often enough for this to really reduce my concerns about opportunity costs for most individuals.
This question is worth a longer investigation at some point, but I don’t see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow “solve” the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers). In which case one could flip your argument: since it’s very unlikely AGI-related risks will be addressed solely through technical means, success ultimately depends on having thoughtful people in government working on these problems. In reality, for safety-related risks, both technical and policy solutions are necessary.
And for security-related risk models, i.e. bad actors using powerful AI systems to cause deliberate harm (potentially up to existential risks in certain capability scenarios), technical alignment research is neither a necessary nor a sufficient part of the solution, but policy is at least necessary. (In this scenario, other kinds of technical research may be more important, but my understanding is that most “safety” researchers are not focused on those sorts of problems.)
Edited the post substantially (and, hopefully, transparently, via strikethrough and “edit:” and such) to reflect the parts of this and the previous comment that I agree with.
Regarding this:
I’ve heard this elsewhere, including at an event for early-career longtermists interested in policy where a very policy-skeptical, MIRI-type figure was giving a Q&A. A student asked: if we solved the alignment problem, wouldn’t we need to enforce its adoption? The MIRI-type figure said something along the lines of:
So, was MIRI-type right? Or would we need policy levers to enforce adoption of the problem, even in this model? The post you cite chronicles how long it took for safety advocates to address glaring risks in the nuclear missile system. My initial model says that if the top labs resemble today’s OpenAI and DeepMind, it would be much easier to convince them than the entrenched, securitized bureaucracy described in the post: the incentives are much better aligned, and the cultures are more receptive to suggestions of change. But this does seem like a cruxy question. If MIRI-type is wrong, this would justify a lot of investigation into what those levers would be, and how to prepare governments to develop and pull these levers. If not, this would support more focus on buying time in the first place, as well as on trying to make sure the top firms at the time the alignment problem is solved are receptive.
(E.g., if the MIRI-type model is right, a US lead over China seems really important: if we expect that the solution will come from alignment researchers in Berkeley, maybe it’s more likely that they implement it if they are private, “open”-tech-culture companies, who speak the same language and live in the same milieu and broadly have trusting relationships with the proponent, etc. Or maybe not!)
I put some credence on the MIRI-type view, but we can’t simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some “performance” penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.
The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it’s going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It’s not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I’ve similarly witnessed some MIRI staff’s “policy-skepticism”) strikes me as naive and irresponsible.
Agreed; it strikes me that I’ve probably been over-anchoring on this model
You can essentially think of it as two separate problems:
Problem 1: Conditional on us having a technical solution to AI alignment, how do we ensure the first AGI built implements it?
Problem 2: Conditional on us having a technical solution to AI alignment, how do we ensure no AGI is ever built that does NOT implement it, or some other equivalent solution?
I feel like you are talking about Problem 1, and Locke is talking about Problem 2. I agree with the MIRI-type that Problem 1 is easy to solve, and the hard part of that problem is having the solution. I do believe the existing labs working on AGI would implement a solution to AI alignment if we had one. That still leaves Problem 2 that needs to be solved—though at least if we’re facing Problem 2, we do have an aligned AGI to help with the problem.
Hmm. I don’t have strong views on unipolar vs multipolar outcomes, but I think MIRI-type thinks Problem 2 is also easy to solve, due to the last couple clauses of your comment.