Low alignment tax + coordination around alignment: Having an aligned model is probably more costly than having a non-aligned model. This “cost of alignment” is also called the “alignment tax”. The goal in some agendas is to lower the alignment tax so far that it is reasonable to institute regulations that mandate these alignment guarantees to be implemented, very similar to safety regulations in the real world, similar to what happened to cars, factory work and medicine. This approach works best in worlds where AI systems are relatively easy to align, they don’t become much more capable quickly. Even if some systems are not aligned, we might have enough aligned systems such that we are reasonably protected by those (especially since the aligned systems might be able to copy strategies that unaligned systems are using to attack humanity).
Exiting the acute risk period: If there is one (or very few) aligned superintelligent AI systems, we might simply ask it what the best strategy for achieving existential security is, and if the people in charge are at least slightly benevolent they will probably also ask about how to help other people, especially at low cost. (I very much hope the policy people have something in mind to prevent malevolent actors to come into possession of powerful AI systems, though I don’t remember seeing any such strategies.)
Pivotal act + aligned singleton: If abrupt takeoff scenarios are likely, then one possible plan is to perform a so-called pivotal act. Concretely, such an act would (1) prevent anyone else from building powerful AI systems and (2) allow the creators to think deeply enough about how to build AI that implements our mechanism for moral progress. Such a pivotal act might be to build an AI system that is powerful enough to e.g. “turn all GPUs int rubik’s cubes” but not general enough to be very dangerous (for example limiting its capacity for self-improvement), and then augment human intelligence so that the creators can figure out alignment and moral philosophy in full generality and depth. This strategy is useful in very pessimistic scenarios, where alignment is very hard, AIs become smarter through self-improvement very quickly, and people are very reckless about building powerful systems.
Like, what is the incentive for everyone using existing models to adopt and incorporate the new aligned AI?
Or is there a (spoken or unspoken) consensus that working on aligned AI means working on aligned superintelligent AI?
There are several plans for this scenario.
Low alignment tax + coordination around alignment: Having an aligned model is probably more costly than having a non-aligned model. This “cost of alignment” is also called the “alignment tax”. The goal in some agendas is to lower the alignment tax so far that it is reasonable to institute regulations that mandate these alignment guarantees to be implemented, very similar to safety regulations in the real world, similar to what happened to cars, factory work and medicine. This approach works best in worlds where AI systems are relatively easy to align, they don’t become much more capable quickly. Even if some systems are not aligned, we might have enough aligned systems such that we are reasonably protected by those (especially since the aligned systems might be able to copy strategies that unaligned systems are using to attack humanity).
Exiting the acute risk period: If there is one (or very few) aligned superintelligent AI systems, we might simply ask it what the best strategy for achieving existential security is, and if the people in charge are at least slightly benevolent they will probably also ask about how to help other people, especially at low cost. (I very much hope the policy people have something in mind to prevent malevolent actors to come into possession of powerful AI systems, though I don’t remember seeing any such strategies.)
Pivotal act + aligned singleton: If abrupt takeoff scenarios are likely, then one possible plan is to perform a so-called pivotal act. Concretely, such an act would (1) prevent anyone else from building powerful AI systems and (2) allow the creators to think deeply enough about how to build AI that implements our mechanism for moral progress. Such a pivotal act might be to build an AI system that is powerful enough to e.g. “turn all GPUs int rubik’s cubes” but not general enough to be very dangerous (for example limiting its capacity for self-improvement), and then augment human intelligence so that the creators can figure out alignment and moral philosophy in full generality and depth. This strategy is useful in very pessimistic scenarios, where alignment is very hard, AIs become smarter through self-improvement very quickly, and people are very reckless about building powerful systems.
I hope this answers the question somewhat :-)
Thanks for taking the time to respond; I appreciate it.