I think the answer is ‘yes’ for a general layperson’s understanding of ‘pushing capabilities’, but the emerging EA discourse on this seems to be at risk on inflating several questions:
Has Claude-3 shown better capability than other models? Yes under certain specific conditions and benchmarks
Do those benchmarks matter/​actually capture performance of interest? No, in my opinion. I’d recommend reading Melanie Mitchell’s takes on this.
Does Claude-3′s extra capabilities make it more likely to cause an x-risk event? No, or at least the probability that the current frontier AI model will cause an x-risk event has gone from ~epsilon to ~epsilon
Will Claude-3′s release increase or decrease x-risk? Very difficult to say, I don’t know how people get over cluelessness objections to these questions.
So I guess in your post ‘frontier’ is covering 2 separate concepts, the ‘frontier’ in terms of published benchmarks and the ‘fronitier’ in terms of marginal x-risk increase. In my opinion, Claude-3 may be an interesting case where these come apart.
I think the answer is ‘yes’ for a general layperson’s understanding of ‘pushing capabilities’, but the emerging EA discourse on this seems to be at risk on inflating several questions:
Has Claude-3 shown better capability than other models? Yes under certain specific conditions and benchmarks
Do those benchmarks matter/​actually capture performance of interest? No, in my opinion. I’d recommend reading Melanie Mitchell’s takes on this.
Does Claude-3′s extra capabilities make it more likely to cause an x-risk event? No, or at least the probability that the current frontier AI model will cause an x-risk event has gone from ~epsilon to ~epsilon
Will Claude-3′s release increase or decrease x-risk? Very difficult to say, I don’t know how people get over cluelessness objections to these questions.
So I guess in your post ‘frontier’ is covering 2 separate concepts, the ‘frontier’ in terms of published benchmarks and the ‘fronitier’ in terms of marginal x-risk increase. In my opinion, Claude-3 may be an interesting case where these come apart.