FWIW, the way I now think about these scenarios is that there’s a tradeoff between technical ability and political ability:
- If you have infinite technical ability (one person can create an aligned Jupiter Brain in their basement), then you don’t need any political ability and can do whatever you want.
- If you have infinite political ability (Xi Jinping cures aging, leads the CCP to take over the world, and becomes God-Emperor of Man), you don’t need any technical ability and can just do whatever you want.
I don’t think either of those are plausible and a realistic strategy will need both, although in varying proportion, but having less of one will demand more of the other. Some closely related ideas are:
- The weaker and less general an AI is, the safer it is to align and test. Potentially dangerous AIs should be as weak as possible while still doing the job, in the same way that Android apps should have as few permissions as are reasonably practical. A technique that reduces an AI’s abilities in some important way, while still fulfilling the main goal, is a net win. (Eg. scrubbing computer code from the train set of something like GPT-3.)
- Likewise, everything you might try for alignment will almost certainly fail if you turn up the AI power level *enough*, just as any system can be hacked into if you try infinitely hard. No alignment advance will “solve the problem”, but it may make a somewhat-more-powerful AI safer to run. (Eg. I doubt better interpretability would do much to help with a Jupiter Brain, but would help you understand smaller AIs.)
- An unexpected shock (eg. COVID-19, or the release of GPT-3) won’t make existing political actors smarter, but may make them change their priorities. (Eg. if, when COVID-19 happened, you had already met everyone at the FDA, had vaccine factories and supply chains built and emergency trial designs ready in advance, it would have been a lot easier to get rapid approval. Likewise, many random SWEs that I tell about PaLM or DALL-E now instinctively see it as dangerous and start thinking about safety; they don’t have a plan but now see one as important.)
(Plan to write more about this in the future, this is just a quick conceptual sketch.)
the way I now think about these scenarios is that there’s a tradeoff between technical ability and political ability
I also like this, and appreciate you pointing out a tradeoff where the discouse was presenting an either-or decision. I’d actually considered a follow-up post on the pareto boundary between unilaterally maximizing (altruistic) utility and multilaterally preserving coordination boundaries and consent norms.
Relating your ontology to mine, I’d say that in the AGI arena, technical ability contributes more to the former (unilaterally maximizing...) than the latter (multilaterally preserving...), and political ability contributes more to the latter than the former.
FWIW, the way I now think about these scenarios is that there’s a tradeoff between technical ability and political ability:
- If you have infinite technical ability (one person can create an aligned Jupiter Brain in their basement), then you don’t need any political ability and can do whatever you want.
- If you have infinite political ability (Xi Jinping cures aging, leads the CCP to take over the world, and becomes God-Emperor of Man), you don’t need any technical ability and can just do whatever you want.
I don’t think either of those are plausible and a realistic strategy will need both, although in varying proportion, but having less of one will demand more of the other. Some closely related ideas are:
- The weaker and less general an AI is, the safer it is to align and test. Potentially dangerous AIs should be as weak as possible while still doing the job, in the same way that Android apps should have as few permissions as are reasonably practical. A technique that reduces an AI’s abilities in some important way, while still fulfilling the main goal, is a net win. (Eg. scrubbing computer code from the train set of something like GPT-3.)
- Likewise, everything you might try for alignment will almost certainly fail if you turn up the AI power level *enough*, just as any system can be hacked into if you try infinitely hard. No alignment advance will “solve the problem”, but it may make a somewhat-more-powerful AI safer to run. (Eg. I doubt better interpretability would do much to help with a Jupiter Brain, but would help you understand smaller AIs.)
- An unexpected shock (eg. COVID-19, or the release of GPT-3) won’t make existing political actors smarter, but may make them change their priorities. (Eg. if, when COVID-19 happened, you had already met everyone at the FDA, had vaccine factories and supply chains built and emergency trial designs ready in advance, it would have been a lot easier to get rapid approval. Likewise, many random SWEs that I tell about PaLM or DALL-E now instinctively see it as dangerous and start thinking about safety; they don’t have a plan but now see one as important.)
(Plan to write more about this in the future, this is just a quick conceptual sketch.)
I also like this, and appreciate you pointing out a tradeoff where the discouse was presenting an either-or decision. I’d actually considered a follow-up post on the pareto boundary between unilaterally maximizing (altruistic) utility and multilaterally preserving coordination boundaries and consent norms. Relating your ontology to mine, I’d say that in the AGI arena, technical ability contributes more to the former (unilaterally maximizing...) than the latter (multilaterally preserving...), and political ability contributes more to the latter than the former.
I really like your framing about the trade-off between some plans requiring more technical ability and some requiring more political ability.
What is it about these that you think convinces them? Are these people that you tried to convince before?