james comments on How might we align transformative AI if it’s developed very soon?

james 2 Sep 2022 3:21 UTC
5 points
0 ∶ 0
It also appears that the link to ELK in this section is incorrect
- Making use of an AI’s internal state,² not just its outputs. For example, giving positive reinforcement to an AI when it seems likely to be “honest” based on an examination of its internal state (and negative reinforcement when it seems likely not to be). Eliciting Latent Knowledge provides some sketches of how this might look.
- Holden Karnofsky 18 Mar 2023 0:55 UTC
  2 points
  0 ∶ 0
  Parent
  Very belatedly fixed—thanks!