Here’s my attempt to reflect on the topic: https://forum.effectivealtruism.org/posts/PWKWEFJMpHzFC6Qvu/alignment-is-hard-communicating-that-is-harder
Eleni_A
Karma: 389
Talking EA to my philosophy friends
Why did I misunderstand utilitarianism so badly?
[Question] Slowing down AI progress?
[Question] AI risks: the most convincing argument
“Normal accidents” and AI systems
Deception as the optimal: mesa-optimizers and inner alignment
Alignment’s phlogiston
Who ordered alignment’s apple?
Alignment is hard. Communicating that, might be harder
But what are *your* core values?
An Epistemological Account of Intuitions in Science
Three scenarios of pseudo-alignment
A New York Times article on AI risk
It’s (not) how you use it
I don’t think it’s restricted only to agentic technologies; my model is for all technologies that involve risk. My toy example is that even producing a knife requires the designer to think about its dangers in advance and propose precautions.
There is no royal road to alignment
Both Redwood and Anthropic have labs and do empirical work. This is also an example of experimental work: https://twitter.com/Karolis_Ram/status/1540301041769529346
Five types of people on AI risks:
Wants AGI as soon as possible, ignores safety.
Wants AGI, but primarily cares about alignment.
Doesn’t understand AGI/doesn’t think it’ll happen anytime during her lifetime; thinks about robots that might take people’s jobs.
Understands AGI, but thinks the timelines are long enough not to worry about it right now.
Doesn’t worry about AGI; being locked-in in our choices and “normal accidents” are both more important/risky/scary.