Regarding the sharp left turn, Byrnes’ opinionated review is the best argument for worrying about this that I’m aware of, but he isn’t talking about today’s LLMs and their descendants, which rules out your last paragraph’s pointer to current work. Roger Dearnaley’s intuition pump behind his take that the sharp left turn might not be as hopeless as it seems is resonant with me, but his description seems vibes-based so I can’t tell if he’s misunderstanding the sharp left turn. I do think Dearnaley’s personal “full-stack” attempt at assessing alignment progress is the sort of answer I’d want to your question re: what sort of work would be good evidence, although my impression is you disagree for high-level generator reasons that would be ~intractable to resolve within the margins of EA forum comments…
By empirical evidence I meant anything empirical at all, including things like emergent misalignment and what might come out of Jacob Steinhardt’s interpretability program and what Ryan Greenblatt says here and whatever the right value-analogue of Anthropic’s functional emotions paper is (below) and so on, not just observable behavior. Maybe I’m conflating things or overloading “empirical”, in which case my apologies.
Regarding the sharp left turn, Byrnes’ opinionated review is the best argument for worrying about this that I’m aware of, but he isn’t talking about today’s LLMs and their descendants, which rules out your last paragraph’s pointer to current work. Roger Dearnaley’s intuition pump behind his take that the sharp left turn might not be as hopeless as it seems is resonant with me, but his description seems vibes-based so I can’t tell if he’s misunderstanding the sharp left turn. I do think Dearnaley’s personal “full-stack” attempt at assessing alignment progress is the sort of answer I’d want to your question re: what sort of work would be good evidence, although my impression is you disagree for high-level generator reasons that would be ~intractable to resolve within the margins of EA forum comments…