Yarrow, thank you for this sharp and clarifying discussion.
You have completely convinced me that my earlier arguments from “investment as a signal” or “LHC/Pascal’s Wager” were unrigorous, and I concede those points.
I think I can now articulate my one, non-speculative crux.
The “so what” of Toby Ord’s (excellent) analysis is that it provides a perfect, rigorous, “hindsight” view of the last paradigm—what I’ve been calling “Phase 1” RL for alignment.
My core uncertainty isn’t speculative “what-if” hope. It’s that the empirical ground is shifting.
The very recent papers we discussed (Khatri et al. on the “art” of scaling, and Tan et al. on math reasoning) are, for me, the first public, rigorous evidence for a “Phase 2″ capability paradigm.
• They provide a causal mechanism for why the old, simple scaling data may be an unreliable predictor.
• They show this “Phase 2” regime is different: it’s not a simple power law but a complex, recipe-dependent “know-how” problem (Khatri), and it has different efficiency dynamics (Tan).
This, for me, is the action-relevant dilemma.
We are no longer in a state of “pure speculation”. We are in a state of grounded, empirical uncertainty where the public research is just now documenting a new, more complex scaling regime that the private labs have been pursuing in secret.
Given that the lead time for any serious safety work is measured in years, and the nature of the breakthrough is a proprietary, secret “recipe,” the “wait for public proof” strategy seems non-robust.
That’s the core of my concern. I’m now much clearer on the crux of the argument, and I can’t thank you enough for pushing me to be more rigorous. This has been incredibly helpful, and I’ll leave it there.
Hello, Matt. Let me just say I really appreciate your friendly, supportive, and positive approach to this conversation. It’s very nice. Discussions on the EA Forum can get pretty sour sometimes, and I’m probably not entirely blameless in that myself.
You don’t have to reply if you don’t want, but I just wanted to follow up in case you still did.
Can you explain what you mean about the data efficiency of the new RL techniques in the papers you mentioned? You say it’s more complex, but that doesn’t help me understand.
By the way, did you use an LLM like Claude or ChatGPT to help write your comment? It has some of the hallmarks of LLM writing for me. I’m just saying this to help you — you may not realize how much LLMs’ writing style sticks out like a sore thumb (depending on how you use them) and it will likely discourage people from engaging with you if they detect that. I keep encouraging people to trust themselves as writers, trust their own voice, and reassuring them that the imperfections of their writing doesn’t make us, the readers, like it less, it makes us like it more.
Yarrow, thank you for this sharp and clarifying discussion.
You have completely convinced me that my earlier arguments from “investment as a signal” or “LHC/Pascal’s Wager” were unrigorous, and I concede those points.
I think I can now articulate my one, non-speculative crux.
The “so what” of Toby Ord’s (excellent) analysis is that it provides a perfect, rigorous, “hindsight” view of the last paradigm—what I’ve been calling “Phase 1” RL for alignment.
My core uncertainty isn’t speculative “what-if” hope. It’s that the empirical ground is shifting.
The very recent papers we discussed (Khatri et al. on the “art” of scaling, and Tan et al. on math reasoning) are, for me, the first public, rigorous evidence for a “Phase 2″ capability paradigm.
• They provide a causal mechanism for why the old, simple scaling data may be an unreliable predictor.
• They show this “Phase 2” regime is different: it’s not a simple power law but a complex, recipe-dependent “know-how” problem (Khatri), and it has different efficiency dynamics (Tan).
This, for me, is the action-relevant dilemma.
We are no longer in a state of “pure speculation”. We are in a state of grounded, empirical uncertainty where the public research is just now documenting a new, more complex scaling regime that the private labs have been pursuing in secret.
Given that the lead time for any serious safety work is measured in years, and the nature of the breakthrough is a proprietary, secret “recipe,” the “wait for public proof” strategy seems non-robust.
That’s the core of my concern. I’m now much clearer on the crux of the argument, and I can’t thank you enough for pushing me to be more rigorous. This has been incredibly helpful, and I’ll leave it there.
Hello, Matt. Let me just say I really appreciate your friendly, supportive, and positive approach to this conversation. It’s very nice. Discussions on the EA Forum can get pretty sour sometimes, and I’m probably not entirely blameless in that myself.
You don’t have to reply if you don’t want, but I just wanted to follow up in case you still did.
Can you explain what you mean about the data efficiency of the new RL techniques in the papers you mentioned? You say it’s more complex, but that doesn’t help me understand.
By the way, did you use an LLM like Claude or ChatGPT to help write your comment? It has some of the hallmarks of LLM writing for me. I’m just saying this to help you — you may not realize how much LLMs’ writing style sticks out like a sore thumb (depending on how you use them) and it will likely discourage people from engaging with you if they detect that. I keep encouraging people to trust themselves as writers, trust their own voice, and reassuring them that the imperfections of their writing doesn’t make us, the readers, like it less, it makes us like it more.