Causal Foundations is probably 4-8 full-timers, depending on how you count the small-to-medium slices of time from various PhD students. Several of our 2023 outputs seem comparably important to the deception paper:
Towards Causal Foundations of Safe AGI, The Alignment Forum—the summary of everything we’re doing.
Characterising Decision Theories with Mechanised Causal Graphs, arXiv—the most formal treatment yet of TDT and UDT, together with CDT and EDT in a shared framework.
Human Control: Definitions and Algorithms, UAI—a paper arguing that corrigibility is not exactly the right thing to be aiming for, to assure good shut down behaviour.
Discovering Agents, Artificial Intelligence Journal—an investigation of the “retargetability” notion of agency.
Causal Foundations is probably 4-8 full-timers, depending on how you count the small-to-medium slices of time from various PhD students. Several of our 2023 outputs seem comparably important to the deception paper:
Towards Causal Foundations of Safe AGI, The Alignment Forum—the summary of everything we’re doing.
Characterising Decision Theories with Mechanised Causal Graphs, arXiv—the most formal treatment yet of TDT and UDT, together with CDT and EDT in a shared framework.
Human Control: Definitions and Algorithms, UAI—a paper arguing that corrigibility is not exactly the right thing to be aiming for, to assure good shut down behaviour.
Discovering Agents, Artificial Intelligence Journal—an investigation of the “retargetability” notion of agency.
excellent, thanks, will edit