I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
Why? Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,” both because there are value judgements, and because we don’t know how capabilities arise—so there needs to be some cutoff past which we ban everything, until we have an actual regulatory structure in place to evaluate. Is that different than the “conditional ban” you’re advocating? If so, how?
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
I know? I never said you asked for a blanket ban on ML?
Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,”
My post discusses “pause once an agent passes 10 or more of the ARC Evals tasks”. I think this is too weak a criterion and I’d argue for a harder test, but I think this is already better than a blanket ban on very large models.
ARC Evals is pushing responsible scaling, which is a conditional pause proposal.
But also, the blanket ban on very large models is implicitly saying that “more powerful than GPT-4” is the criterion of “too high risk of doom”, so I really don’t understand at all where you’re coming from.
I didn’t say “blanket ban on ML,” I said “a blanket ban on very large models.”
Why? Because I have not seen, and don’t think anyone can make, a clear criterion for “too high risk of doom,” both because there are value judgements, and because we don’t know how capabilities arise—so there needs to be some cutoff past which we ban everything, until we have an actual regulatory structure in place to evaluate. Is that different than the “conditional ban” you’re advocating? If so, how?
I know? I never said you asked for a blanket ban on ML?
My post discusses “pause once an agent passes 10 or more of the ARC Evals tasks”. I think this is too weak a criterion and I’d argue for a harder test, but I think this is already better than a blanket ban on very large models.
Anthropic just committed to a conditional pause.
ARC Evals is pushing responsible scaling, which is a conditional pause proposal.
But also, the blanket ban on very large models is implicitly saying that “more powerful than GPT-4” is the criterion of “too high risk of doom”, so I really don’t understand at all where you’re coming from.