Super interesting to see this analysis, especially the table of current capabilities—thank you!
I have interpreted [feasible] as, one year after the forecasted date, have AI labs achieved these milestones, and disclosed this publicly?
It seems to me that this ends up being more conservative than the original “Ignore the question of whether they would choose to” , which presumably makes the expert forecasts worse than they seem to be here.
For example, a task like “win angry birds” seems pretty achievable to me, just that no one’s thinking about angry birds these days so it probably hasn’t been attempted. Does that sound right to you?
I’m curious if you have a rough estimate of how many of these tasks would be achievable within a year if top labs attempted them?
I have a rough (i.e. considered for <15 minutes) take: if top labs one year ago had attempted these particular milestones, and had the same policies on disclosing capabilities as they currently seem to, then there’s a 40-50% chance they would have achieved 2 of Angry Birds,Atari fifty ,Laundry and Go low by now. But I don’t put much weight on my prediction, whereas I put a lot more weight on my analysis of what has happened (though this is also somewhat subjective!).
I agree though that checking what has actually happened ends up being more conservative than the original “Ignore the question of whether they would choose to” , which makes the expert forecasts worse than they seem to be here. This is a weakness of this analysis! And of the resolvability of the original survey.
Do you have an estimate of how many of the tasks would have been achieved by now if labs tried a year ago?
Super interesting to see this analysis, especially the table of current capabilities—thank you!
It seems to me that this ends up being more conservative than the original “Ignore the question of whether they would choose to” , which presumably makes the expert forecasts worse than they seem to be here.
For example, a task like “win angry birds” seems pretty achievable to me, just that no one’s thinking about angry birds these days so it probably hasn’t been attempted. Does that sound right to you?
I’m curious if you have a rough estimate of how many of these tasks would be achievable within a year if top labs attempted them?
Thanks Adam :)
I have a rough (i.e. considered for <15 minutes) take: if top labs one year ago had attempted these particular milestones, and had the same policies on disclosing capabilities as they currently seem to, then there’s a 40-50% chance they would have achieved 2 of Angry Birds, Atari fifty , Laundry and Go low by now. But I don’t put much weight on my prediction, whereas I put a lot more weight on my analysis of what has happened (though this is also somewhat subjective!).
I agree though that checking what has actually happened ends up being more conservative than the original “Ignore the question of whether they would choose to” , which makes the expert forecasts worse than they seem to be here. This is a weakness of this analysis! And of the resolvability of the original survey.
Do you have an estimate of how many of the tasks would have been achieved by now if labs tried a year ago?