TL;DR, Are they any works similar to Wei_Dai’s Ai Safety “Success Stories” that provide a framework to think about the landscape of possible success stories & pathways humanity will take to survive misaligned AI?
I’ve been trying to think of systematic ways of assessing non-technical proposals for improving the odds of humanity’s survival from misaligned AI.
Aside from numerous frameworks for assessing technical alignment proposals, I haven’t seen much resources on non-technical proposals that provide a concrete framework to think about the question: “What technological/geopolitical/societal pathway will our civilization most likely take (or should ideally take) in order to survive AI?”
Having such a framework seems pretty valuable since it would let us think about the exact alignment-pathway & context in which [proposals that want to help with alignment] would be effective at.
For example, a pretty clear dimension in which people’s opinions differ is in the necessity of pivotal acts i.e. “pivotal act vs gradual steering” (kind of oversimplified)—here, any proposal’s theory of impact will necessarily depend on their beliefs regarding (a) which position on the spectrum currently appears to be most likely by default, and (b) which position on the spectrum we should be aiming for.
If, say, my pessimistic containment strategy was about communicating AI risk to the capabilities people in order to promote cooperation between AI labs it would be incoherent for me to, at the same time, be ultra-pessimistic about humanity’s chances of enacting any cooperative regulation in the future.
Or if I thought a Pivotal Act was the best option humanity has, and wants to suggest some proposal that would be a force-multiplier if that line of strategy does happen in the future, it would make sense for my proposal to consider the forms in which this unilateralist org’s AI will take.
where will it developed?
will it be a corrigible AI whose safety features depend on human operators?
will it be a CEV-type AI whose safety feature won’t depend on humans?
how likely is it that the first AI capable of enacting a Pivotal Act will need to rely on human infrastructure—for how long—and could interventions help?
I’ve seen a lot of similar frameworks for technical alignment proposals, but nothing much for pathways our civilization will actually take to survive (Wei_Dai’s post is similar, but is mostly about the form that the AI will end up taking, without mentioning the pathways in which we’ll arrive at that outcome).
Any resources I might be missing? (if there aren’t any, I might write one)
Any further work on AI Safety Success Stories?
TL;DR, Are they any works similar to Wei_Dai’s Ai Safety “Success Stories” that provide a framework to think about the landscape of possible success stories & pathways humanity will take to survive misaligned AI?
I’ve been trying to think of systematic ways of assessing non-technical proposals for improving the odds of humanity’s survival from misaligned AI.
Aside from numerous frameworks for assessing technical alignment proposals, I haven’t seen much resources on non-technical proposals that provide a concrete framework to think about the question: “What technological/geopolitical/societal pathway will our civilization most likely take (or should ideally take) in order to survive AI?”
Having such a framework seems pretty valuable since it would let us think about the exact alignment-pathway & context in which [proposals that want to help with alignment] would be effective at.
For example, a pretty clear dimension in which people’s opinions differ is in the necessity of pivotal acts i.e. “pivotal act vs gradual steering” (kind of oversimplified)—here, any proposal’s theory of impact will necessarily depend on their beliefs regarding (a) which position on the spectrum currently appears to be most likely by default, and (b) which position on the spectrum we should be aiming for.
If, say, my pessimistic containment strategy was about communicating AI risk to the capabilities people in order to promote cooperation between AI labs it would be incoherent for me to, at the same time, be ultra-pessimistic about humanity’s chances of enacting any cooperative regulation in the future.
Or if I thought a Pivotal Act was the best option humanity has, and wants to suggest some proposal that would be a force-multiplier if that line of strategy does happen in the future, it would make sense for my proposal to consider the forms in which this unilateralist org’s AI will take.
where will it developed?
will it be a corrigible AI whose safety features depend on human operators?
will it be a CEV-type AI whose safety feature won’t depend on humans?
how likely is it that the first AI capable of enacting a Pivotal Act will need to rely on human infrastructure—for how long—and could interventions help?
I’ve seen a lot of similar frameworks for technical alignment proposals, but nothing much for pathways our civilization will actually take to survive (Wei_Dai’s post is similar, but is mostly about the form that the AI will end up taking, without mentioning the pathways in which we’ll arrive at that outcome).
Any resources I might be missing? (if there aren’t any, I might write one)