Ryan Greenblatt comments on Clarifying two uses of “alignment”

Ryan Greenblatt 10 Mar 2024 20:02 UTC
4 points
0 ∶ 0
It’s possible that one key disagreement I have with Matthew is that I think the level of verification required for AI checks and balances isn’t clearly technically easy by default.
That is, the technical measures required for ‘We have several “different” AIs and sufficent safeguards that if at least one AI doesn’t collude with the other AIs and instead works with the AI lab, we’d remain in control.’ are non-trivial.
For instance, if we have serious ELK problems (black-box techniques like debate are insufficient and we can’t figure out something else that worse), then this implies that we won’t be able to distinguish between an AI with our best interests in mind and an AI which is tampering with all of our measuresments and deluding us. This is despite both of these AIs trying to present their best evidence that they are action in our interests. Further, tampering with all of our measurements and deluding us could look better than acting in our best interests.
This certainly isn’t the only disagreement I have with Matthew, but it might explain a lot of differences in how we think about the situation.
- Ryan Greenblatt 10 Mar 2024 22:25 UTC
  1 point
  0 ∶ 0
  Parent
  Also, note that this still applies when trying to pay AIs for goods and services. (Unless humanity has already augmented it’s intelligence, but if so, how did this happen in a desirable way?)