What I just said: AI systems acting like a toddler or a cat would make me think AGI might be developed soon.
Iâm not sure FrontierMath is any more meaningful than any other benchmark, including those on which LLMs have already gotten high scores. But I donât know.
I asked about genuine research creativity not AGI, but I donât think this conversation is going anywhere at this point. It seems obvious to me that âdoes stuff mathematicians say makes up the building blocks of real researchâ is meaningful evidence that the chance that models will do research level maths in the near future is not ultra-low, given that capabilities do increase with time. I donât think this analogous to IQ tests or the bar exam, and for other benchmarks, I would really need to see what your claiming is the equivalent of the transfer from frontier math 4 to real math that was intuitive but failed.
What percentage probability would you assign to your ability to accurately forecast this particular question?
Iâm not sure why youâre interested in getting me to forecast this. I havenât ever made any forecasts about AI systemsâ ability to do math research. I havenât made any statements about AI systemsâ current math capabilities. I havenât said that evidence of AI systemsâ ability to do math research would affect how I think about AGI. So, whatâs the relevance? Does it have a deeper significance, or is it just a random tangent?
If there is a connection to the broader topic of AGI or AI capabilities, I already gave a bunch of examples of evidence I would consider to be relevant and that would change my mind. Math wasnât one of them. I would be happy to think of more examples as well.
I think a potentially good counterexample to your argument about FrontierMath â original math research is natural language processing â replacing human translators. Surely you would agree that LLMs have mastered the basic building blocks of translation? So, 2-3 years after GPT-4, why is demand for human translators still growing? One analysis claims that growth is counterfactually less that it would have been without the increase in the usage of machine translation, but demand is still growing.
I think this points to the difficulty in making these sorts of predictions. If back in 2015, someone had described to you the capabilities and benchmark performance of GPT-4 in 2023, as well as the rate of scaling of new models and progress on benchmarks, would you have thought that demand for human translators would continue to grow for at least the next 2-3 years?
I donât have any particular point other than what seems intuitively obvious in the realm of AI capabilities forecasting may in fact be false, and I am skeptical of hazy extrapolations.
The most famous example of a failed prediction of this sort is Geoffrey Hintonâs prediction in 2016 that radiologistsâ jobs would be fully automated by 2021. Almost ten years after this prediction, the number of radiologists is still growing and radiologistsâ salaries are growing. AI tools that assist in interpreting radiology scans exist, but evidence is mixed on whether they actually help or hinder radiologists (and possibly harm patients).
What I just said: AI systems acting like a toddler or a cat would make me think AGI might be developed soon.
Iâm not sure FrontierMath is any more meaningful than any other benchmark, including those on which LLMs have already gotten high scores. But I donât know.
I asked about genuine research creativity not AGI, but I donât think this conversation is going anywhere at this point. It seems obvious to me that âdoes stuff mathematicians say makes up the building blocks of real researchâ is meaningful evidence that the chance that models will do research level maths in the near future is not ultra-low, given that capabilities do increase with time. I donât think this analogous to IQ tests or the bar exam, and for other benchmarks, I would really need to see what your claiming is the equivalent of the transfer from frontier math 4 to real math that was intuitive but failed.
What percentage probability would you assign to your ability to accurately forecast this particular question?
Iâm not sure why youâre interested in getting me to forecast this. I havenât ever made any forecasts about AI systemsâ ability to do math research. I havenât made any statements about AI systemsâ current math capabilities. I havenât said that evidence of AI systemsâ ability to do math research would affect how I think about AGI. So, whatâs the relevance? Does it have a deeper significance, or is it just a random tangent?
If there is a connection to the broader topic of AGI or AI capabilities, I already gave a bunch of examples of evidence I would consider to be relevant and that would change my mind. Math wasnât one of them. I would be happy to think of more examples as well.
I think a potentially good counterexample to your argument about FrontierMath â original math research is natural language processing â replacing human translators. Surely you would agree that LLMs have mastered the basic building blocks of translation? So, 2-3 years after GPT-4, why is demand for human translators still growing? One analysis claims that growth is counterfactually less that it would have been without the increase in the usage of machine translation, but demand is still growing.
I think this points to the difficulty in making these sorts of predictions. If back in 2015, someone had described to you the capabilities and benchmark performance of GPT-4 in 2023, as well as the rate of scaling of new models and progress on benchmarks, would you have thought that demand for human translators would continue to grow for at least the next 2-3 years?
I donât have any particular point other than what seems intuitively obvious in the realm of AI capabilities forecasting may in fact be false, and I am skeptical of hazy extrapolations.
The most famous example of a failed prediction of this sort is Geoffrey Hintonâs prediction in 2016 that radiologistsâ jobs would be fully automated by 2021. Almost ten years after this prediction, the number of radiologists is still growing and radiologistsâ salaries are growing. AI tools that assist in interpreting radiology scans exist, but evidence is mixed on whether they actually help or hinder radiologists (and possibly harm patients).