This is a really great exchange, and thank you for responding to the post.
I just wanted to leave a quick comment to say: It seems crazy to me that someone would say the āslowā scenario has āalready been achievedā!
Unless Iām missing something, the āslowā scenario says that half of all freelance software engineering jobs taking <8 hours can be fully automated, that any task a competent human assistant can do in <1 hour can be fully automated with no drop in quality (what if I ask my human assistant to solve some ARC-2 problems for me?), that the majority of customer complaints in a typical business will be fully resolved by AI in those businesses that use it, and that AI will be capable of writing hit songs (at least if humans arenāt made aware that it is AI-generated)?
I suppose the scenario is framed only to say that AI is capable of all of the above, rather than that it is being used like this in practice. That still seems like an incorrect summary of current capability to me, but is slightly more understandable. But in that case, it seems the scenario should have just been framed that way: āSlow progress: No significant improvement in AI capabilities from 2025, though possibly a significant increase in adoptionā. There could then be a separate question on what people think about the level that current capabilities are at?
Otherwise disagreements about current capabilities and progress are getting blurred in the single question. Describing the āslowā scenario as āslowā and putting it at the extreme end of the spectrum is inevitably priming people to think about current capabilities in a certain way. Still struggling to understand the point of view that says this is an acceptable way to frame this question.
Thanks for the thoughts! The question is indeed framed as being about capabilities and not adoption, and this is absolutely central.
Second, people have a wide range of views on any given topic, and surveys reflect this distribution. I think this is a feature, not a bug. Additionally, if you take any noisy measurement (which all surveys are), reading too much into the tails can lead one astray (I donāt think thatās happening in this specific instance, but I want to guard against the view that the existence of noise implies the nonexistence of signal). Nevertheless, I do appreciate the careful read.
Your comments here are part of why I think including the third disclaimer we add that allows for jagged capabilities is important. Additionally, we donāt require that all capabilities are achieved, hence the ābest matchingā qualifier, rather than looking at the minimum across the capabilities space.
We indeed developed/ātested versions of this question which included a section on current capabilities. Survey burden is another source of noise/ābias in surveys, so such modifications are not costless. I absolutely agree that current views of progress will impact responses to this question.
Iāll reiterate that LEAP is a portfolio of questions, and I think we have other questions where disagreement about current capabilities is less of an issue because the target is much less dependent on subjective assessment, but those questions will sacrifice some degree of being complete pictures of AI capabilities. Lastly, any expectation of the future necessarily includes some model of the present.
Always happy to hear suggestions for a new question or revised version of this question!
This is a really great exchange, and thank you for responding to the post.
I just wanted to leave a quick comment to say: It seems crazy to me that someone would say the āslowā scenario has āalready been achievedā!
Unless Iām missing something, the āslowā scenario says that half of all freelance software engineering jobs taking <8 hours can be fully automated, that any task a competent human assistant can do in <1 hour can be fully automated with no drop in quality (what if I ask my human assistant to solve some ARC-2 problems for me?), that the majority of customer complaints in a typical business will be fully resolved by AI in those businesses that use it, and that AI will be capable of writing hit songs (at least if humans arenāt made aware that it is AI-generated)?
I suppose the scenario is framed only to say that AI is capable of all of the above, rather than that it is being used like this in practice. That still seems like an incorrect summary of current capability to me, but is slightly more understandable. But in that case, it seems the scenario should have just been framed that way: āSlow progress: No significant improvement in AI capabilities from 2025, though possibly a significant increase in adoptionā. There could then be a separate question on what people think about the level that current capabilities are at?
Otherwise disagreements about current capabilities and progress are getting blurred in the single question. Describing the āslowā scenario as āslowā and putting it at the extreme end of the spectrum is inevitably priming people to think about current capabilities in a certain way. Still struggling to understand the point of view that says this is an acceptable way to frame this question.
Thanks for the thoughts! The question is indeed framed as being about capabilities and not adoption, and this is absolutely central.
Second, people have a wide range of views on any given topic, and surveys reflect this distribution. I think this is a feature, not a bug. Additionally, if you take any noisy measurement (which all surveys are), reading too much into the tails can lead one astray (I donāt think thatās happening in this specific instance, but I want to guard against the view that the existence of noise implies the nonexistence of signal). Nevertheless, I do appreciate the careful read.
Your comments here are part of why I think including the third disclaimer we add that allows for jagged capabilities is important. Additionally, we donāt require that all capabilities are achieved, hence the ābest matchingā qualifier, rather than looking at the minimum across the capabilities space.
We indeed developed/ātested versions of this question which included a section on current capabilities. Survey burden is another source of noise/ābias in surveys, so such modifications are not costless. I absolutely agree that current views of progress will impact responses to this question.
Iāll reiterate that LEAP is a portfolio of questions, and I think we have other questions where disagreement about current capabilities is less of an issue because the target is much less dependent on subjective assessment, but those questions will sacrifice some degree of being complete pictures of AI capabilities. Lastly, any expectation of the future necessarily includes some model of the present.
Always happy to hear suggestions for a new question or revised version of this question!