From a policy perspective, I think some of the claims here are too strong.
This post lays out some good arguments in favor of AISTR work but I don’t think it’s super informative about the comparative value of AISTR v. other work (such as policy), nor does it convince me that spending months on AISTR-type work is relevant even for policy people who have high opportunity costs and many other things they could (/need to) be learning. As Linch commented: “there just aren’t that many people doing longtermist EA work, so basically every problem will look understaffed, relative to the scale of the problem”.
Based on my experience in the field for a few years, a gut estimate for people doing relevant, high-quality AI policy work is in the low dozens (not counting people who do more high-level/academic “AI strategy” research, most of which is not designed for policy-relevance). I’m not convinced that on the current margin, for a person who could do both, the right choice is to become the 200th person going into AISTR than the 25th person going into applied AI policy.
Specific claims:
For people who eventually decide to do AI policy/strategy research, early exploration in AI technical material seems clearly useful, in that it gives you a better sense of how and when different AI capabilities might develop and helps you distinguish useful and “fake-useful” AI safety research, which seems really important for this kind of work. (Holden Karnofsky says “I think the ideal [strategy] researcher would also be highly informed on, and comfortable with, the general state of AI research and AI alignment research, though they need not be as informed on these as for the previous section [about alignment].)
Re: the Karnofsky quote, I also think there’s a big difference between “strategy” and “policy”. If you’re doing strategy research to inform e.g. OP’s priorities, that’s pretty different from doing policy research to inform e.g. the US government’s decision-making. This post seems to treat them as interchangeable but there’s a pretty big distinction. I myself do policy so I’ll focus on that here.
For policy folks, it might be useful to understand but I think in many cases the opportunity costs are too high. My prior (not having thought about this very much) is that maybe 25% of AI policy researchers/practitioners should spend significant time on this (especially if their policy work is related to AI safety R&D and related topics), so that there are people literate in both fields. But overall it’s good to have a division of labor. I have not personally encountered a situation in 3+ years of policy work (not on AI safety R&D) where being able to personally distinguish between “useful and fake-useful” AI safety research would have been particularly helpful. And if I did, I would’ve just reached out to 3-5 people I trust instead of taking months to study the topic myself.
I argue that most … policy professionals should build some fairly deep familiarity with the field in order to do their jobs effectively
Per the above, I think “most” is too strong. I also think sequencing matters here. I would only encourage policy professionals to take time to develop strong internal models of AISTR once it has proven useful for their policy work, instead of assuming ex ante (before someone starts their policy career) that it will provide useful and so should/could be done beforehand. There are ways to upskill in technical fields while you’re already in policy.
Even people who decide to follow a path like “accumulate power in the government/private actors to spend it at critical AI junctures,” it seems very good to develop your views about timelines and key inputs; otherwise, I am concerned that they will not be focused on climbing the right ladders or will not know who to listen to. Spending a few months really getting familiar with the field, and then spending a few hours a week staying up to date, seems sufficient for this purpose.
Again, division of labor and a prudent approach to deference can get you the benefits without these opportunity costs. I also think that in many cases it’s simply not realistic to expect successful policy professionals to spend “a few hours a week staying up to date” with an abstract and technical literature. Every week the things that I want to read and have some chance of being decision/strategy-relevant is already >3x as long as what I have time for.
While other careers in the AI space — policy work … — can be very highly impactful, that impact is predicated on the technical researchers, at some point, solving the problems, and if a big fraction of our effort is not on the object-level problem, this seems likely to be a misallocation of resources.
I think this assumes a particular risk model from AI that isn’t the only risk model. Unless I am misreading you, this assumes policy success looks like getting more funding for AI technical research. But it could also look like affecting the global distribution of AI capabilities, slowing down/speeding up general AI progress, targeting deliberate threat actors (i.e. “security” rather than “safety”), navigating second-order effects from AI (e.g. destabilizing great power relations or nuclear deterrence) rather than direct threats from misaligned AI, and many other mechanisms. Another reason to focus on policy careers is that you can flexibly pivot between problems depending on how our threat models and prioritization evolve (e.g. doing both AI and bio work at the same time, countering different AI threat models at different times).
Thanks, Locke, this is a series of great points. In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims. Likewise, the points about the relative usefulness of spending time learning technical stuff are well taken, though I think I put more value on technical understanding than you do; for example, while of course policy professionals can ask people they trust, they have to somehow be able to assess the judgment of these people on the object-level thing.
Also, while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration. Relatedly, it seems maybe easier to do costly fit-tests (like taking a first full time job) in technical research and switch to policy than vice versa.
Edit: for the final point about risk models, I definitely don’t have state funding for safety research in mind; what I mean is that since I think it’s very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved. I think there are many things governments and private decision-makers can do to improve the chances this happens before AGI, which is why I’m still planning on pursuing a governance career!
In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims.
I appreciate you taking this seriously! I do want to emphasize I’m not very confident in the ~25 number, and I think people with more expansive definitions of “policy” would reach higher numbers (e.g. I wouldn’t count people at FHI as doing “policy” work even if they do non-technical work, but my sense is that many EAs lump together all non-technical work under headings such as “governance”/”strategy” and implicitly treat this as synonymous with “policy”). To the extent that it feels crux-y to someone whether the true “policy” number is closer to 25 or 50 or 75, it might be worth doing a more thorough inventory. (I would be highly skeptical of any list that claims it’s >75, if you limit it to people who do government policy-related and reasonably high-quality work, but I could be wrong.)
while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration
I certainly agree that sometimes technical credentials can provide a boost to policy careers. However, that typically involves formal technical credentials (e.g.a CS graduate degree), and “three months of AISTR self-study” won’t be of much use as career capital (and may even be a negative if it requires you to have a strange-looking gap on your CV or an affiliation with a weird-looking organization). A technical job taken for fit-testing purposes at a mainstream-looking organization could indeed, in some scenarios, help open doors to/provide a boost for some types of policy jobs. But I don’t think that effect is true (or large) often enough for this to really reduce my concerns about opportunity costs for most individuals.
I definitely don’t have state funding for safety research in mind; what I mean is that since I think it’s very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved.
This question is worth a longer investigation at some point, but I don’t see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow “solve” the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers). In which case one could flip your argument: since it’s very unlikely AGI-related risks will be addressed solely through technical means, success ultimately depends on having thoughtful people in government working on these problems. In reality, for safety-related risks, both technical and policy solutions are necessary.
And for security-related risk models, i.e. bad actors using powerful AI systems to cause deliberate harm (potentially up to existential risks in certain capability scenarios), technical alignment research is neither a necessary nor a sufficient part of the solution, but policy is at least necessary. (In this scenario, other kinds of technical research may be more important, but my understanding is that most “safety” researchers are not focused on those sorts of problems.)
Edited the post substantially (and, hopefully, transparently, via strikethrough and “edit:” and such) to reflect the parts of this and the previous comment that I agree with.
Regarding this:
I don’t see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow “solve” the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers).
I’ve heard this elsewhere, including at an event for early-career longtermists interested in policy where a very policy-skeptical, MIRI-type figure was giving a Q&A. A student asked: if we solved the alignment problem, wouldn’t we need to enforce its adoption? The MIRI-type figure said something along the lines of:
“Solving the alignment problem” probably means figuring out how to build an aligned AGI. The top labs all want to build an aligned AGI; they just think the odds of the AGIs they’re working on being aligned are much higher than I think they are. But if we have a solution, we can just go to the labs and say, here, this is how you build it in a way that we don’t all die, and I can prove that this makes us not all die. And if you can’t say that, you don’t actually have a solution. And they’re mostly reasonable people who want to build AGI and make a ton of money and not die, so they will take the solution and say, thanks, we’ll do it this way now.
So, was MIRI-type right? Or would we need policy levers to enforce adoption of the problem, even in this model? The post you cite chronicles how long it took for safety advocates to address glaring risks in the nuclear missile system. My initial model says that if the top labs resemble today’s OpenAI and DeepMind, it would be much easier to convince them than the entrenched, securitized bureaucracy described in the post: the incentives are much better aligned, and the cultures are more receptive to suggestions of change. But this does seem like a cruxy question. If MIRI-type is wrong, this would justify a lot of investigation into what those levers would be, and how to prepare governments to develop and pull these levers. If not, this would support more focus on buying time in the first place, as well as on trying to make sure the top firms at the time the alignment problem is solved are receptive.
(E.g., if the MIRI-type model is right, a US lead over China seems really important: if we expect that the solution will come from alignment researchers in Berkeley, maybe it’s more likely that they implement it if they are private, “open”-tech-culture companies, who speak the same language and live in the same milieu and broadly have trusting relationships with the proponent, etc. Or maybe not!)
I put some credence on the MIRI-type view, but we can’t simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some “performance” penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.
The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it’s going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It’s not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I’ve similarly witnessed some MIRI staff’s “policy-skepticism”) strikes me as naive and irresponsible.
You can essentially think of it as two separate problems:
Problem 1: Conditional on us having a technical solution to AI alignment, how do we ensure the first AGI built implements it?
Problem 2: Conditional on us having a technical solution to AI alignment, how do we ensure no AGI is ever built that does NOT implement it, or some other equivalent solution?
I feel like you are talking about Problem 1, and Locke is talking about Problem 2. I agree with the MIRI-type that Problem 1 is easy to solve, and the hard part of that problem is having the solution. I do believe the existing labs working on AGI would implement a solution to AI alignment if we had one. That still leaves Problem 2 that needs to be solved—though at least if we’re facing Problem 2, we do have an aligned AGI to help with the problem.
Hmm. I don’t have strong views on unipolar vs multipolar outcomes, but I think MIRI-type thinks Problem 2 is also easy to solve, due to the last couple clauses of your comment.
From a policy perspective, I think some of the claims here are too strong.
This post lays out some good arguments in favor of AISTR work but I don’t think it’s super informative about the comparative value of AISTR v. other work (such as policy), nor does it convince me that spending months on AISTR-type work is relevant even for policy people who have high opportunity costs and many other things they could (/need to) be learning. As Linch commented: “there just aren’t that many people doing longtermist EA work, so basically every problem will look understaffed, relative to the scale of the problem”.
Based on my experience in the field for a few years, a gut estimate for people doing relevant, high-quality AI policy work is in the low dozens (not counting people who do more high-level/academic “AI strategy” research, most of which is not designed for policy-relevance). I’m not convinced that on the current margin, for a person who could do both, the right choice is to become the 200th person going into AISTR than the 25th person going into applied AI policy.
Specific claims:
Re: the Karnofsky quote, I also think there’s a big difference between “strategy” and “policy”. If you’re doing strategy research to inform e.g. OP’s priorities, that’s pretty different from doing policy research to inform e.g. the US government’s decision-making. This post seems to treat them as interchangeable but there’s a pretty big distinction. I myself do policy so I’ll focus on that here.
For policy folks, it might be useful to understand but I think in many cases the opportunity costs are too high. My prior (not having thought about this very much) is that maybe 25% of AI policy researchers/practitioners should spend significant time on this (especially if their policy work is related to AI safety R&D and related topics), so that there are people literate in both fields. But overall it’s good to have a division of labor. I have not personally encountered a situation in 3+ years of policy work (not on AI safety R&D) where being able to personally distinguish between “useful and fake-useful” AI safety research would have been particularly helpful. And if I did, I would’ve just reached out to 3-5 people I trust instead of taking months to study the topic myself.
Per the above, I think “most” is too strong. I also think sequencing matters here. I would only encourage policy professionals to take time to develop strong internal models of AISTR once it has proven useful for their policy work, instead of assuming ex ante (before someone starts their policy career) that it will provide useful and so should/could be done beforehand. There are ways to upskill in technical fields while you’re already in policy.
Again, division of labor and a prudent approach to deference can get you the benefits without these opportunity costs. I also think that in many cases it’s simply not realistic to expect successful policy professionals to spend “a few hours a week staying up to date” with an abstract and technical literature. Every week the things that I want to read and have some chance of being decision/strategy-relevant is already >3x as long as what I have time for.
I think this assumes a particular risk model from AI that isn’t the only risk model. Unless I am misreading you, this assumes policy success looks like getting more funding for AI technical research. But it could also look like affecting the global distribution of AI capabilities, slowing down/speeding up general AI progress, targeting deliberate threat actors (i.e. “security” rather than “safety”), navigating second-order effects from AI (e.g. destabilizing great power relations or nuclear deterrence) rather than direct threats from misaligned AI, and many other mechanisms. Another reason to focus on policy careers is that you can flexibly pivot between problems depending on how our threat models and prioritization evolve (e.g. doing both AI and bio work at the same time, countering different AI threat models at different times).
Thanks, Locke, this is a series of great points. In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims. Likewise, the points about the relative usefulness of spending time learning technical stuff are well taken, though I think I put more value on technical understanding than you do; for example, while of course policy professionals can ask people they trust, they have to somehow be able to assess the judgment of these people on the object-level thing. Also, while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration. Relatedly, it seems maybe easier to do costly fit-tests (like taking a first full time job) in technical research and switch to policy than vice versa. Edit: for the final point about risk models, I definitely don’t have state funding for safety research in mind; what I mean is that since I think it’s very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved. I think there are many things governments and private decision-makers can do to improve the chances this happens before AGI, which is why I’m still planning on pursuing a governance career!
Thanks!
I appreciate you taking this seriously! I do want to emphasize I’m not very confident in the ~25 number, and I think people with more expansive definitions of “policy” would reach higher numbers (e.g. I wouldn’t count people at FHI as doing “policy” work even if they do non-technical work, but my sense is that many EAs lump together all non-technical work under headings such as “governance”/”strategy” and implicitly treat this as synonymous with “policy”). To the extent that it feels crux-y to someone whether the true “policy” number is closer to 25 or 50 or 75, it might be worth doing a more thorough inventory. (I would be highly skeptical of any list that claims it’s >75, if you limit it to people who do government policy-related and reasonably high-quality work, but I could be wrong.)
I certainly agree that sometimes technical credentials can provide a boost to policy careers. However, that typically involves formal technical credentials (e.g.a CS graduate degree), and “three months of AISTR self-study” won’t be of much use as career capital (and may even be a negative if it requires you to have a strange-looking gap on your CV or an affiliation with a weird-looking organization). A technical job taken for fit-testing purposes at a mainstream-looking organization could indeed, in some scenarios, help open doors to/provide a boost for some types of policy jobs. But I don’t think that effect is true (or large) often enough for this to really reduce my concerns about opportunity costs for most individuals.
This question is worth a longer investigation at some point, but I don’t see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow “solve” the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers). In which case one could flip your argument: since it’s very unlikely AGI-related risks will be addressed solely through technical means, success ultimately depends on having thoughtful people in government working on these problems. In reality, for safety-related risks, both technical and policy solutions are necessary.
And for security-related risk models, i.e. bad actors using powerful AI systems to cause deliberate harm (potentially up to existential risks in certain capability scenarios), technical alignment research is neither a necessary nor a sufficient part of the solution, but policy is at least necessary. (In this scenario, other kinds of technical research may be more important, but my understanding is that most “safety” researchers are not focused on those sorts of problems.)
Edited the post substantially (and, hopefully, transparently, via strikethrough and “edit:” and such) to reflect the parts of this and the previous comment that I agree with.
Regarding this:
I’ve heard this elsewhere, including at an event for early-career longtermists interested in policy where a very policy-skeptical, MIRI-type figure was giving a Q&A. A student asked: if we solved the alignment problem, wouldn’t we need to enforce its adoption? The MIRI-type figure said something along the lines of:
So, was MIRI-type right? Or would we need policy levers to enforce adoption of the problem, even in this model? The post you cite chronicles how long it took for safety advocates to address glaring risks in the nuclear missile system. My initial model says that if the top labs resemble today’s OpenAI and DeepMind, it would be much easier to convince them than the entrenched, securitized bureaucracy described in the post: the incentives are much better aligned, and the cultures are more receptive to suggestions of change. But this does seem like a cruxy question. If MIRI-type is wrong, this would justify a lot of investigation into what those levers would be, and how to prepare governments to develop and pull these levers. If not, this would support more focus on buying time in the first place, as well as on trying to make sure the top firms at the time the alignment problem is solved are receptive.
(E.g., if the MIRI-type model is right, a US lead over China seems really important: if we expect that the solution will come from alignment researchers in Berkeley, maybe it’s more likely that they implement it if they are private, “open”-tech-culture companies, who speak the same language and live in the same milieu and broadly have trusting relationships with the proponent, etc. Or maybe not!)
I put some credence on the MIRI-type view, but we can’t simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some “performance” penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.
The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it’s going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It’s not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I’ve similarly witnessed some MIRI staff’s “policy-skepticism”) strikes me as naive and irresponsible.
Agreed; it strikes me that I’ve probably been over-anchoring on this model
You can essentially think of it as two separate problems:
Problem 1: Conditional on us having a technical solution to AI alignment, how do we ensure the first AGI built implements it?
Problem 2: Conditional on us having a technical solution to AI alignment, how do we ensure no AGI is ever built that does NOT implement it, or some other equivalent solution?
I feel like you are talking about Problem 1, and Locke is talking about Problem 2. I agree with the MIRI-type that Problem 1 is easy to solve, and the hard part of that problem is having the solution. I do believe the existing labs working on AGI would implement a solution to AI alignment if we had one. That still leaves Problem 2 that needs to be solved—though at least if we’re facing Problem 2, we do have an aligned AGI to help with the problem.
Hmm. I don’t have strong views on unipolar vs multipolar outcomes, but I think MIRI-type thinks Problem 2 is also easy to solve, due to the last couple clauses of your comment.