Just brainstorming here, I have zero experience with actual psychology research:
- It might be interesting to try and identify some psychological traits that lead people to becoming EAs / becoming alignment researchers, in order to aid future recruitment/education/community-building efforts.
- This is a medium-term concern rather than about alignment itself, but I would be interested to get a clearer picture on how “botpocalypse” concerns will play out. (See this ACX article for more detail, as well as the relevantrecurringsection of theZvi’s AI newsletter—basically, how worried should I be that a world full of convincing AI chatbots will totally destroy our culture’s ability to have important collective conversations, have trust that we’re talking to real humans, etc?) Some psych experiments might be informative here, eg hosting a political debate on a controversial topic among 10 humans in a chatroom, versus among 5 humans and 5 AI instances in a chatroom. Can the AIs successfully derail the conversation? Or maybe people easily detect the AIs, but this nevertheless results in a lower-trust space and less productive conversation even between fellow humans? etc
- A related concern about medium-term AI is that people might find them much more persuasive or empathetic once AIs are capable of conversing via realistic zoom-call-esque CGI videos, rather than just chatting back and forth in text. It might be interesting to see how easily people are able to be swayed by an argument for “AI rights”, depending on whether it is presented in text, or as an interactive chat, or as AI-synthesized audio, or as AI-deepfaked video.
- Research organizations like “Ought” have run some rudimentary AI alignment experiments with human subjects in place of the AI, like seeing how well it works for a human non-expert to judge complex technical debates between a deceptive expert vs an honest expert. Perhaps there are other psych-style experiments that could be set up to test various alignment concepts? ie, imagine a situation where an expert computer programmer is being tasked by a novice programmer to write some code that accomplishes simple task X, but the expert programmer is also trying to sneak in “dangerous” functionality Y. You could experiment with equipping the novice programmer various different abilities (consult with another expert, run snippets of the code but not the whole thing, create unit-tests designed to detect dangerous functionality, etc) and see what helps detect deception.
Then there are a bunch of things that are basically more “focus testing” than psych experiments: - Figuring out which AI safety messages are most persuasive - Polling people about their attitudes towards AI regulation - Doing some research about different attitudes towards doom—how have people reacted in similar situations, eg, activists and researchers who get depressed about climate change, or people living during the Cuban missile crisis who thought nuclear war was probably imminent, or etc?
Just brainstorming here, I have zero experience with actual psychology research:
- It might be interesting to try and identify some psychological traits that lead people to becoming EAs / becoming alignment researchers, in order to aid future recruitment/education/community-building efforts.
- This is a medium-term concern rather than about alignment itself, but I would be interested to get a clearer picture on how “botpocalypse” concerns will play out. (See this ACX article for more detail, as well as the relevant recurring section of theZvi’s AI newsletter—basically, how worried should I be that a world full of convincing AI chatbots will totally destroy our culture’s ability to have important collective conversations, have trust that we’re talking to real humans, etc?) Some psych experiments might be informative here, eg hosting a political debate on a controversial topic among 10 humans in a chatroom, versus among 5 humans and 5 AI instances in a chatroom. Can the AIs successfully derail the conversation? Or maybe people easily detect the AIs, but this nevertheless results in a lower-trust space and less productive conversation even between fellow humans? etc
- A related concern about medium-term AI is that people might find them much more persuasive or empathetic once AIs are capable of conversing via realistic zoom-call-esque CGI videos, rather than just chatting back and forth in text. It might be interesting to see how easily people are able to be swayed by an argument for “AI rights”, depending on whether it is presented in text, or as an interactive chat, or as AI-synthesized audio, or as AI-deepfaked video.
- Research organizations like “Ought” have run some rudimentary AI alignment experiments with human subjects in place of the AI, like seeing how well it works for a human non-expert to judge complex technical debates between a deceptive expert vs an honest expert. Perhaps there are other psych-style experiments that could be set up to test various alignment concepts? ie, imagine a situation where an expert computer programmer is being tasked by a novice programmer to write some code that accomplishes simple task X, but the expert programmer is also trying to sneak in “dangerous” functionality Y. You could experiment with equipping the novice programmer various different abilities (consult with another expert, run snippets of the code but not the whole thing, create unit-tests designed to detect dangerous functionality, etc) and see what helps detect deception.
Then there are a bunch of things that are basically more “focus testing” than psych experiments:
- Figuring out which AI safety messages are most persuasive
- Polling people about their attitudes towards AI regulation
- Doing some research about different attitudes towards doom—how have people reacted in similar situations, eg, activists and researchers who get depressed about climate change, or people living during the Cuban missile crisis who thought nuclear war was probably imminent, or etc?
Jackson—great ideas; thanks so much for your thoughtful and creative suggestions here!