Anonymous_EA comments on What success looks like

Anonymous_EA 28 Jun 2022 17:56 UTC
14 points
0 ∶ 0
Great post!
From Scenario 1, in which alignment is easy:
The alignment problem turns out much easier than expected. Increasingly better AI models have a better understanding of human values, and they do not naturally develop strong influence-seeking tendencies. Moreover, in cases of malfunctions and for preventative measures, interpretability tools now allow us to understand important parts of large models on the most basic level and ELK-like tools allow us to honestly communicate with AI systems.
Here you seem to be imagining that technical AI alignment turns out to be easy, but you don’t discuss the political/governance problem of making sure the AI (or AIs) are aligned with the right goals.
E.g. what if the first aligned transformative AI systems are built by bad actors? What if they’re built by well-intentioned actors who nevertheless have no idea what to do with the aligned TAI(s) they’ve developed? (My impression is that we don’t currently have much idea of what a lab should be looking to do in the case where they succeed in technical alignment. Maybe the aligned system could help them decide what to do, but I’m pretty nervous about counting on that.)
From my perspective a full success story should include answers to these questions.
- mariushobbhahn 28 Jun 2022 19:55 UTC
  5 points
  0 ∶ 0
  Parent
  Yes, that is true. We made the decision to not address all possible problems with every approach because it would have made the post much longer. It’s a fair point of criticism though.