Related, have we made any tangible progress in the past ~5 years that a significant consensus of AI Safety experts agree is decreasing P(doom) or prolonging timelines?
I share this concern and would like to see more responses to this question. Importance and neglectedness seem very high, but tractability is harder to justify.
Given 100 additional talented people (especially people anywhere close to the level of someone like Paul Christiano) working on new research directions, it sounds intuitively absurd (to me) to say the probability of an AI catastrophe would not meaningfully decrease. But the only justification I can give is that generally people make progress when they work on research questions in related fields (e.g. in math, physics, computer science, etc, although the case is weaker for philosophy).
”Significant concensus of AI Safety experts agree” is a high bar. Personally, I’m more excited about work that a smaller group of experts (e.g. Nate Soares) agree is actually useful. People disagree on what work is helpful in AI safety. Some might say a key achievement was that reward learning from human feedback gained prominence years before it would have otherwise (I think I saw Richard Ngo write this somewhere). Others might say that it was really important to clarify and formalize concepts related to inner alignment. I encourage you to read an overview of all the research agendas here (or this shorter cheat sheet) and come to your own conclusions.
Throughout, I’ve been answering through the lens of “AI Safety = technical AI alignment research.” My answer completely ignores some other important categories of work like AI governance and AI safety field building, which are relevant to prolonging timelines.
Why do you think AI Safety is tractable?
Related, have we made any tangible progress in the past ~5 years that a significant consensus of AI Safety experts agree is decreasing P(doom) or prolonging timelines?
Edit: I hadn’t noticed there was already a similar question
I share this concern and would like to see more responses to this question. Importance and neglectedness seem very high, but tractability is harder to justify.
Given 100 additional talented people (especially people anywhere close to the level of someone like Paul Christiano) working on new research directions, it sounds intuitively absurd (to me) to say the probability of an AI catastrophe would not meaningfully decrease. But the only justification I can give is that generally people make progress when they work on research questions in related fields (e.g. in math, physics, computer science, etc, although the case is weaker for philosophy).
”Significant concensus of AI Safety experts agree” is a high bar. Personally, I’m more excited about work that a smaller group of experts (e.g. Nate Soares) agree is actually useful. People disagree on what work is helpful in AI safety. Some might say a key achievement was that reward learning from human feedback gained prominence years before it would have otherwise (I think I saw Richard Ngo write this somewhere). Others might say that it was really important to clarify and formalize concepts related to inner alignment. I encourage you to read an overview of all the research agendas here (or this shorter cheat sheet) and come to your own conclusions.
Throughout, I’ve been answering through the lens of “AI Safety = technical AI alignment research.” My answer completely ignores some other important categories of work like AI governance and AI safety field building, which are relevant to prolonging timelines.