So, to be clear, you think that if LLMs continue to complete software engineering tasks of exponentially increasing lengths at exponentially decreasing risk of failure, then that tell us nothing about whether LLMs will reach AGI?
I expect most EAs who have enough money to consider investing them to already be investing them in index funds, which, by design, long the Magnificent Seven already.
Iām not sure if youāre asking about the METR graph on task length or about the practical use of AI coding assistants, which the METR study found is currently negative.
If I understand it correctly, the METR graph doesnāt measure an exponentially decreasing failure rate, just a 50% failure rate. (Thereās also a version of the graph with a 20% failure rate, but thatās not the one people typically cite.)
I also think automatically graded tasks used in benchmarks donāt usually deserve to be called āsoftware engineeringā or anything that implies that the actual tasks the LLM is doing are practically useful, economically valuable, or could actually substitute for tasks that humans get paid to do.
I think many of these LLM benchmarks are trying to measure such narrow things and such toy problems, which seem to be largely selected so as to make the benchmarks easier for LLMs, that they arenāt particularly meaningful.
In terms of studies of real world performance like METRās study on human coders using an AI coding assistant, thatās much more interesting and important. Although I find most LLM benchmarks practically meaningless for measuring AGI progress, I think practical performance in economically valuable contexts is much more meaningful.
My point in the above comment was just that an unambiguously useful AI coding assistant would not by itself be strong evidence for near-term AGI. AI systems mastering games like chess and go is impressive and interesting and probably tells us some information about AGI progress, but if someone pointed to AlphaGo beating Lee Seedol as strong evidence that AGI would have been created within 7 years of that point, they would have been wrong.
In other words, progress in AI probably tells us something about AGI progress, but just taking impressive results in AI and saying that implies AGI within 7 years isnāt correct, or at least itās unsupported. Why 7 years and not 17 years or 77 years or 177 years?
If you assume whatever rate of progress you like, that will support any timeline you like based on any evidence you like, but, in my opinion, thatās no way to make an argument.
On the topic of betting and investing, itās true that index funds have exposure to AI, and indeed personally I worry about how much exposure the S&P 500 has (global index funds that include small-cap stocks have less, but I donāt know how much less). My argument in the comment above is simply that if someone thought it was rational to bet some amount of money on AGI arriving within 7 years, then surely it would be rational to invest that same amount of money in a 100% concentrated investment in AI and not, say, the S&P 500.
So, to be clear, you think that if LLMs continue to complete software engineering tasks of exponentially increasing lengths at exponentially decreasing risk of failure, then that tell us nothing about whether LLMs will reach AGI?
I expect most EAs who have enough money to consider investing them to already be investing them in index funds, which, by design, long the Magnificent Seven already.
Iām not sure if youāre asking about the METR graph on task length or about the practical use of AI coding assistants, which the METR study found is currently negative.
If I understand it correctly, the METR graph doesnāt measure an exponentially decreasing failure rate, just a 50% failure rate. (Thereās also a version of the graph with a 20% failure rate, but thatās not the one people typically cite.)
I also think automatically graded tasks used in benchmarks donāt usually deserve to be called āsoftware engineeringā or anything that implies that the actual tasks the LLM is doing are practically useful, economically valuable, or could actually substitute for tasks that humans get paid to do.
I think many of these LLM benchmarks are trying to measure such narrow things and such toy problems, which seem to be largely selected so as to make the benchmarks easier for LLMs, that they arenāt particularly meaningful.
In terms of studies of real world performance like METRās study on human coders using an AI coding assistant, thatās much more interesting and important. Although I find most LLM benchmarks practically meaningless for measuring AGI progress, I think practical performance in economically valuable contexts is much more meaningful.
My point in the above comment was just that an unambiguously useful AI coding assistant would not by itself be strong evidence for near-term AGI. AI systems mastering games like chess and go is impressive and interesting and probably tells us some information about AGI progress, but if someone pointed to AlphaGo beating Lee Seedol as strong evidence that AGI would have been created within 7 years of that point, they would have been wrong.
In other words, progress in AI probably tells us something about AGI progress, but just taking impressive results in AI and saying that implies AGI within 7 years isnāt correct, or at least itās unsupported. Why 7 years and not 17 years or 77 years or 177 years?
If you assume whatever rate of progress you like, that will support any timeline you like based on any evidence you like, but, in my opinion, thatās no way to make an argument.
On the topic of betting and investing, itās true that index funds have exposure to AI, and indeed personally I worry about how much exposure the S&P 500 has (global index funds that include small-cap stocks have less, but I donāt know how much less). My argument in the comment above is simply that if someone thought it was rational to bet some amount of money on AGI arriving within 7 years, then surely it would be rational to invest that same amount of money in a 100% concentrated investment in AI and not, say, the S&P 500.