Max Nadeau answers What should I ask Ajeya Cotra — senior researcher at Open Philanthropy, and expert on AI timelines and safety challenges?

Max Nadeau 28 Oct 2022 17:18 UTC
13 points
0 ∶ 0
Artir Kel (aka José Luis Ricón Fernández de la Puente) at Nintil wrote an essay broadly sympathetic to AI risk scenarios but doubtful of a particular step in the power-seeking stories Cotra, Gwern, and others have told. In particular, he has a hard time believing that a scaled-up version of present systems (e.g. Gato) would learn facts about itself (e.g. that it is an AI in a training process, what its trainers motivations would be, etc) and incorporate those facts into its planning (Cotra calls this “situational awareness”). Some AI safety researchers I’ve spoken to personally agree with Kel’s skepticism on this point.
Since incorporating this sort of self-knowledge into one’s plans is necessary for breaking out of training, initiating deception, etc, this seems like a pretty important disagreement. In fact, Kel claims that if he came around on this point, he would agree almost entirely with Cotra’s analysis.
Can she describe in more detail what situational awareness means? Could it be demonstrated with current/nearterm models? Why does she think that Kel (and others) think it’s so unlikely?
- Jose Luis Ricon 14 Nov 2022 20:09 UTC
  2 points
  0 ∶ 0
  Parent
  I wonder too!