RyanCarey comments on Shallow review of live agendas in alignment & safety

RyanCarey 28 Nov 2023 9:14 UTC
2 points
0 ∶ 0
Causal Foundations is probably 4-8 full-timers, depending on how you count the small-to-medium slices of time from various PhD students. Several of our 2023 outputs seem comparably important to the deception paper:
- Towards Causal Foundations of Safe AGI, The Alignment Forum—the summary of everything we’re doing.
- Characterising Decision Theories with Mechanised Causal Graphs, arXiv—the most formal treatment yet of TDT and UDT, together with CDT and EDT in a shared framework.
- Human Control: Definitions and Algorithms, UAI—a paper arguing that corrigibility is not exactly the right thing to be aiming for, to assure good shut down behaviour.
- Discovering Agents, Artificial Intelligence Journal—an investigation of the “retargetability” notion of agency.
- Gavin 28 Nov 2023 10:07 UTC
  2 points
  0 ∶ 0
  Parent
  excellent, thanks, will edit