Whereas many people in EA seem to think the probability of AGI being created within the next 7 years is 50% or more, I think that probability is significantly less than 0.1%.
In principle, yes, but in a typical bet structure, there is no upside for the person taking the other side of that bet, so what would be the point of it for them? I would gladly accept a bet where someone has to pay me an amount of money on January 1, 2033 if AGI isn’t created by then (and vice versa), but why would they accept that bet? There’s only downside for them.
Sometimes these bets are structured as loans. As in, I would loan someone money and they would promise to pay me that money back plus a premium after 7 years. But I don’t want to give a stranger from another country a 7-year loan that I wouldn’t be able to compel them to repay once the time is up. From my point of view, that would just be me giving a cash gift to a stranger for no particularly good reason.
There is Long Bets, which is a nice site, but since everything goes to charity, it’s largely symbolic. (Also, the money is paid up by both sides in advance, and the Long Now Foundation just holds onto it until the bet is resolved. So, it’s a little bit wasteful in that respect. The money is tied up for the duration of the bet and there is a time value of money.)
Right, I’d forgotten that betting on this is hard. I was thinking if one could do a sort of cross-over between an end-of-the-world bet crossed with betting a specific on a proportion of one’s net worth. This is the most fleshed-out proposal I’ve seen so far.
But I don’t want to give a stranger from another country a 7-year loan that I wouldn’t be able to compel them to repay once the time is up.
I wonder if this could be solved via a trusted third person who knows both bettors. (I think there are possible solutions here via blockchains, e.g. the ability to unilaterally destroy an escrow, but I guess that’s going to become quite complicated, not worth the setup, and using a technology I guess you’re skeptical of anyway)
Are you referring to the length of tasks that LLMs are able to complete with a 50% success rate? I don’t see that as a meaningful indicator of AGI. Indeed, I would say it’s practically meaningless. It truly just doesn’t make sense an indicator of progress toward AGI. I find it strange that anyone thinks otherwise. Why should we see that metric as indicating AGI progress anymore than, say, the length of LLMs’ context windows?
I think a much more meaningful indicator from METR would be the rate at which AI coding assistants speeds up coding tasks for human coders. Currently, METR’s finding is that it slows them down by 19%. But this is asymmetric. Failing to clear a low bar like being an unambiguously useful coding assistant in such tests is strong evidence against models nearing human-level capabilities, but clearing a low bar is not strong evidence for models nearing human-level capabilities. By analogy, we might take an AI system being bad at chess as evidence that it has much less than human-level general intelligence. But we shouldn’t take an AI system (such as Deep Blue or AlphaZero) being really good at chess as evidence that it has human-level or greater general intelligence.
If I wanted to settle for an indirect proxy for progress toward AGI, I could short companies like Nvidia, Microsoft, Google, or Meta (e.g. see my recent question about this), but, of course, those companies stock prices’ don’t directly measure AGI progress. Conversely, someone who wanted to take the other side of the bet could take a long position in those stocks. But then this isn’t much of an improvement on the above. If LLMs became much more useful coding assistants, then this could help justify these companies’ stock prices, but it wouldn’t say much about progress toward AGI. Likewise for other repetitive, text-heavy tasks, like customer support via web chat.
It seems like the flip side should be different: if you do think AGI is very likely to be created within 7 years, shouldn’t that imply a long position in stocks like Nvidia, Microsoft, Google, or Meta would be lucrative? In principle, you could believe that LLMs are some number of years away from being able to make a lot of money and at most 7 years away from progressing to AGI, and that the market will give up on LLMs making a lot of money just a few years too soon. But I would find this to be a strange and implausible view.
So, to be clear, you think that if LLMs continue to complete software engineering tasks of exponentially increasing lengths at exponentially decreasing risk of failure, then that tell us nothing about whether LLMs will reach AGI?
I expect most EAs who have enough money to consider investing them to already be investing them in index funds, which, by design, long the Magnificent Seven already.
I’m not sure if you’re asking about the METR graph on task length or about the practical use of AI coding assistants, which the METR study found is currently negative.
If I understand it correctly, the METR graph doesn’t measure an exponentially decreasing failure rate, just a 50% failure rate. (There’s also a version of the graph with a 20% failure rate, but that’s not the one people typically cite.)
I also think automatically graded tasks used in benchmarks don’t usually deserve to be called “software engineering” or anything that implies that the actual tasks the LLM is doing are practically useful, economically valuable, or could actually substitute for tasks that humans get paid to do.
I think many of these LLM benchmarks are trying to measure such narrow things and such toy problems, which seem to be largely selected so as to make the benchmarks easier for LLMs, that they aren’t particularly meaningful.
In terms of studies of real world performance like METR’s study on human coders using an AI coding assistant, that’s much more interesting and important. Although I find most LLM benchmarks practically meaningless for measuring AGI progress, I think practical performance in economically valuable contexts is much more meaningful.
My point in the above comment was just that an unambiguously useful AI coding assistant would not by itself be strong evidence for near-term AGI. AI systems mastering games like chess and go is impressive and interesting and probably tells us some information about AGI progress, but if someone pointed to AlphaGo beating Lee Seedol as strong evidence that AGI would have been created within 7 years of that point, they would have been wrong.
In other words, progress in AI probably tells us something about AGI progress, but just taking impressive results in AI and saying that implies AGI within 7 years isn’t correct, or at least it’s unsupported. Why 7 years and not 17 years or 77 years or 177 years?
If you assume whatever rate of progress you like, that will support any timeline you like based on any evidence you like, but, in my opinion, that’s no way to make an argument.
On the topic of betting and investing, it’s true that index funds have exposure to AI, and indeed personally I worry about how much exposure the S&P 500 has (global index funds that include small-cap stocks have less, but I don’t know how much less). My argument in the comment above is simply that if someone thought it was rational to bet some amount of money on AGI arriving within 7 years, then surely it would be rational to invest that same amount of money in a 100% concentrated investment in AI and not, say, the S&P 500.
Are you willing to bet on this?
In principle, yes, but in a typical bet structure, there is no upside for the person taking the other side of that bet, so what would be the point of it for them? I would gladly accept a bet where someone has to pay me an amount of money on January 1, 2033 if AGI isn’t created by then (and vice versa), but why would they accept that bet? There’s only downside for them.
Sometimes these bets are structured as loans. As in, I would loan someone money and they would promise to pay me that money back plus a premium after 7 years. But I don’t want to give a stranger from another country a 7-year loan that I wouldn’t be able to compel them to repay once the time is up. From my point of view, that would just be me giving a cash gift to a stranger for no particularly good reason.
There is Long Bets, which is a nice site, but since everything goes to charity, it’s largely symbolic. (Also, the money is paid up by both sides in advance, and the Long Now Foundation just holds onto it until the bet is resolved. So, it’s a little bit wasteful in that respect. The money is tied up for the duration of the bet and there is a time value of money.)
Right, I’d forgotten that betting on this is hard. I was thinking if one could do a sort of cross-over between an end-of-the-world bet crossed with betting a specific on a proportion of one’s net worth. This is the most fleshed-out proposal I’ve seen so far.
I wonder if this could be solved via a trusted third person who knows both bettors. (I think there are possible solutions here via blockchains, e.g. the ability to unilaterally destroy an escrow, but I guess that’s going to become quite complicated, not worth the setup, and using a technology I guess you’re skeptical of anyway)
You could bet on shorter-term indicators e.g. whether the METR trend will stop or accelerate.
Are you referring to the length of tasks that LLMs are able to complete with a 50% success rate? I don’t see that as a meaningful indicator of AGI. Indeed, I would say it’s practically meaningless. It truly just doesn’t make sense an indicator of progress toward AGI. I find it strange that anyone thinks otherwise. Why should we see that metric as indicating AGI progress anymore than, say, the length of LLMs’ context windows?
I think a much more meaningful indicator from METR would be the rate at which AI coding assistants speeds up coding tasks for human coders. Currently, METR’s finding is that it slows them down by 19%. But this is asymmetric. Failing to clear a low bar like being an unambiguously useful coding assistant in such tests is strong evidence against models nearing human-level capabilities, but clearing a low bar is not strong evidence for models nearing human-level capabilities. By analogy, we might take an AI system being bad at chess as evidence that it has much less than human-level general intelligence. But we shouldn’t take an AI system (such as Deep Blue or AlphaZero) being really good at chess as evidence that it has human-level or greater general intelligence.
If I wanted to settle for an indirect proxy for progress toward AGI, I could short companies like Nvidia, Microsoft, Google, or Meta (e.g. see my recent question about this), but, of course, those companies stock prices’ don’t directly measure AGI progress. Conversely, someone who wanted to take the other side of the bet could take a long position in those stocks. But then this isn’t much of an improvement on the above. If LLMs became much more useful coding assistants, then this could help justify these companies’ stock prices, but it wouldn’t say much about progress toward AGI. Likewise for other repetitive, text-heavy tasks, like customer support via web chat.
It seems like the flip side should be different: if you do think AGI is very likely to be created within 7 years, shouldn’t that imply a long position in stocks like Nvidia, Microsoft, Google, or Meta would be lucrative? In principle, you could believe that LLMs are some number of years away from being able to make a lot of money and at most 7 years away from progressing to AGI, and that the market will give up on LLMs making a lot of money just a few years too soon. But I would find this to be a strange and implausible view.
So, to be clear, you think that if LLMs continue to complete software engineering tasks of exponentially increasing lengths at exponentially decreasing risk of failure, then that tell us nothing about whether LLMs will reach AGI?
I expect most EAs who have enough money to consider investing them to already be investing them in index funds, which, by design, long the Magnificent Seven already.
I’m not sure if you’re asking about the METR graph on task length or about the practical use of AI coding assistants, which the METR study found is currently negative.
If I understand it correctly, the METR graph doesn’t measure an exponentially decreasing failure rate, just a 50% failure rate. (There’s also a version of the graph with a 20% failure rate, but that’s not the one people typically cite.)
I also think automatically graded tasks used in benchmarks don’t usually deserve to be called “software engineering” or anything that implies that the actual tasks the LLM is doing are practically useful, economically valuable, or could actually substitute for tasks that humans get paid to do.
I think many of these LLM benchmarks are trying to measure such narrow things and such toy problems, which seem to be largely selected so as to make the benchmarks easier for LLMs, that they aren’t particularly meaningful.
In terms of studies of real world performance like METR’s study on human coders using an AI coding assistant, that’s much more interesting and important. Although I find most LLM benchmarks practically meaningless for measuring AGI progress, I think practical performance in economically valuable contexts is much more meaningful.
My point in the above comment was just that an unambiguously useful AI coding assistant would not by itself be strong evidence for near-term AGI. AI systems mastering games like chess and go is impressive and interesting and probably tells us some information about AGI progress, but if someone pointed to AlphaGo beating Lee Seedol as strong evidence that AGI would have been created within 7 years of that point, they would have been wrong.
In other words, progress in AI probably tells us something about AGI progress, but just taking impressive results in AI and saying that implies AGI within 7 years isn’t correct, or at least it’s unsupported. Why 7 years and not 17 years or 77 years or 177 years?
If you assume whatever rate of progress you like, that will support any timeline you like based on any evidence you like, but, in my opinion, that’s no way to make an argument.
On the topic of betting and investing, it’s true that index funds have exposure to AI, and indeed personally I worry about how much exposure the S&P 500 has (global index funds that include small-cap stocks have less, but I don’t know how much less). My argument in the comment above is simply that if someone thought it was rational to bet some amount of money on AGI arriving within 7 years, then surely it would be rational to invest that same amount of money in a 100% concentrated investment in AI and not, say, the S&P 500.