Effectiveness of moderate success. If you get a non-global slowdown, or a slowdown that ends too early, or a slowdown regime that’s evadable, or if you differentially slow cautious labs, or even if you just differentially slow leading labs, the effect is likely net-negative. (Increasing multipolarity among labs + differentially boosting less-cautious actors + compute overhang enabling rapidly scaling up training compute. See Slowing AI: Foundations.)
(I’d be excited to talk about proposals more specific than ‘push for a pause,’ or outcomes more specific than ‘pause until proven <0.1% doom.’ Who is doing the pausing; what are the rules? Or maybe you don’t have specific proposals/outcomes in mind, in which case I support you searching for new great ideas, but it’s not like others haven’t tried and failed already.)
1. Can you elaborate on your comment “Tractability”?
2. I’m less worried about multipolarity because the leading labs are so far ahead AND I have short timelines (~ 10 years). My guess is if you had short timelines, you might agree?
3. If we had moderate short term success, my intuition is that we’ve actually found an effective strategy that could then be scaled. I worry that your thinking is basically pointing to ‘it needs to be an immediately perfect strategy or don’t bother!’
Pushing a magic button would be easy; affecting the real world is hard. Even if slowing is good, we should notice whether there exist tractable interventions (or: notice interventions’ opportunity cost).
Nope, my sense is that DeepMind, OpenAI, and Anthropic do and will have a small lead over Meta, Inflection, and others, such that I would be concerned (re increasing multipolarity among labs) about slowing DeepMind, OpenAI, and Anthropic now. (And I have 50% credence on human-level AI [noting this is underspecified] within 9 years.)
Yeah, maybe, depending. I’m relatively excited about “short term success” that seems likely to support the long-term policy regimes I’m excited about, like global monitoring of compute and oversight of training runs with model evals for dangerous capabilities and misalignment, maybe plus a compute cap. I fear that most pause-flavored examples of “short term success” won’t really support great long-term plans. (Again, I’d be excited to talk about specific proposals/outcomes/interventions.)
Messy practical reasons.
I agree with Larks that most of us would press a magic button to slow down AI progress on dangerous paths.
But we can’t, which raises two problems:
Tractability.
Effectiveness of moderate success. If you get a non-global slowdown, or a slowdown that ends too early, or a slowdown regime that’s evadable, or if you differentially slow cautious labs, or even if you just differentially slow leading labs, the effect is likely net-negative. (Increasing multipolarity among labs + differentially boosting less-cautious actors +
compute overhangenabling rapidly scaling up training compute. See Slowing AI: Foundations.)(I’d be excited to talk about proposals more specific than ‘push for a pause,’ or outcomes more specific than ‘pause until proven <0.1% doom.’ Who is doing the pausing; what are the rules? Or maybe you don’t have specific proposals/outcomes in mind, in which case I support you searching for new great ideas, but it’s not like others haven’t tried and failed already.)
Thanks for the comment Zach.
1. Can you elaborate on your comment “Tractability”?
2. I’m less worried about multipolarity because the leading labs are so far ahead AND I have short timelines (~ 10 years). My guess is if you had short timelines, you might agree?
3. If we had moderate short term success, my intuition is that we’ve actually found an effective strategy that could then be scaled. I worry that your thinking is basically pointing to ‘it needs to be an immediately perfect strategy or don’t bother!’
Pushing a magic button would be easy; affecting the real world is hard. Even if slowing is good, we should notice whether there exist tractable interventions (or: notice interventions’ opportunity cost).
Nope, my sense is that DeepMind, OpenAI, and Anthropic do and will have a small lead over Meta, Inflection, and others, such that I would be concerned (re increasing multipolarity among labs) about slowing DeepMind, OpenAI, and Anthropic now. (And I have 50% credence on human-level AI [noting this is underspecified] within 9 years.)
Yeah, maybe, depending. I’m relatively excited about “short term success” that seems likely to support the long-term policy regimes I’m excited about, like global monitoring of compute and oversight of training runs with model evals for dangerous capabilities and misalignment, maybe plus a compute cap. I fear that most pause-flavored examples of “short term success” won’t really support great long-term plans. (Again, I’d be excited to talk about specific proposals/outcomes/interventions.)