I’m also a little surprised you think that modeling when we will have systems using similar compute as the human brain is very helpful for modeling when economic growth rates will change. (Like, for sure someone should be doing it, but I’m surprised you’re concentrating on it much.) As you note, the history of automation is one of smooth adoption. And, as I think Eliezer said (roughly), there don’t seem to be many cases where new tech was predicted based on when some low-level metric would exceed the analogous metric in a biological system. The key threshold for recursive feedback loops (*especially* compute-driven ones) is how well they perform on the relevant tasks, not all tasks. And the way in which machines perform tasks usually looks very different than how biological systems do it (bird vs. airplanes, etc.).
If you think that compute is the key bottleneck/driver, then I would expect you to be strongly interested in what the automation of the semiconductor industry would look like.
I’m also a little surprised you think that modeling when we will have systems using similar compute as the human brain is very helpful for modeling when economic growth rates will change.
In this post, when I mentioned human brain FLOP, it was mainly used as a quick estimate of AGI inference costs. However, different methodologies produce similar results (generally within 2 OOMs). A standard formula to estimate compute costs is 6*N per forward pass, where N is the number of parameters. Currently the largest language models have are estimated to be between 100 billion to 1 trillion parameters, which would work out to being 6e11 to 6e12 FLOP/forward pass.
The chinchilla scaling law suggests that inference costs will grow at about half the rate of training compute costs. If we take the estimate of 10^32 training FLOP for TAI (in 2023 algorithms) that I gave in the post, which was itself partly based on the Direct Approach, then we’d expect inference costs to grow to something like 1e15-1e16 per forward pass, although I expect subsequent algorithmic progress will bring this figure down, depending on how much algorithmic progress translates into data efficiency vs. parameter efficiency. A remaining uncertainty here is how a single forward pass for a TAI model will compare to one second of inference for humans, although I’m inclined to think that they’ll be fairly similar.
And, as I think Eliezer said (roughly), there don’t seem to be many cases where new tech was predicted based on when some low-level metric would exceed the analogous metric in a biological system. [...] And the way in which machines perform tasks usually looks very different than how biological systems do it (bird vs. airplanes, etc.).
This data shows that Shorty [hypothetical character introduced earlier in the post] was entirely correct about forecasting heavier-than-air flight. (For details about the data, see appendix.) Whether Shorty will also be correct about forecasting TAI remains to be seen.
In some sense, Shorty has already made two successful predictions: I started writing this argument before having any of this data; I just had an intuition that power-to-weight is the key variable for flight and that therefore we probably got flying machines shortly after having comparable power-to-weight as bird muscle. Halfway through the first draft, I googled and confirmed that yes, the Wright Flyer’s motor was close to bird muscle in power-to-weight. Then, while writing the second draft, I hired an RA, Amogh Nanjajjar, to collect more data and build this graph. As expected, there was a trend of power-to-weight improving over time, with flight happening right around the time bird-muscle parity was reached.
I listed this example in my comment, it was incorrect by an order of magnitude, and it was a retrodiction. “I didn’t look up the data on Google beforehand” does not make it a prediction.
Yeah sorry, I didn’t mean to say this directly contradicted anything you said. It just felt like a good reference that might be helpful to you or other people reading the thread. (In retrospect, I should have said that and/or linked it in response to the mention in your top-level comment instead.)
(Also, personally, I do care about how much effort and selection is required to find good retrodictions like this, so in my book “I didn’t look up the data on Google beforehand” is relevant info. But it would have been way more impressive if someone had been able to pull that off in 1890, and I agree this shouldn’t be confused for that.)
Re “it was incorrect by an order of magnitude”: that seems fine to me. If we could get that sort of precision for predicting TAI, that would be awesome and outperform any other prediction method I know about.
I’m also a little surprised you think that modeling when we will have systems using similar compute as the human brain is very helpful for modeling when economic growth rates will change. (Like, for sure someone should be doing it, but I’m surprised you’re concentrating on it much.) As you note, the history of automation is one of smooth adoption. And, as I think Eliezer said (roughly), there don’t seem to be many cases where new tech was predicted based on when some low-level metric would exceed the analogous metric in a biological system. The key threshold for recursive feedback loops (*especially* compute-driven ones) is how well they perform on the relevant tasks, not all tasks. And the way in which machines perform tasks usually looks very different than how biological systems do it (bird vs. airplanes, etc.).
If you think that compute is the key bottleneck/driver, then I would expect you to be strongly interested in what the automation of the semiconductor industry would look like.
In this post, when I mentioned human brain FLOP, it was mainly used as a quick estimate of AGI inference costs. However, different methodologies produce similar results (generally within 2 OOMs). A standard formula to estimate compute costs is 6*N per forward pass, where N is the number of parameters. Currently the largest language models have are estimated to be between 100 billion to 1 trillion parameters, which would work out to being 6e11 to 6e12 FLOP/forward pass.
The chinchilla scaling law suggests that inference costs will grow at about half the rate of training compute costs. If we take the estimate of 10^32 training FLOP for TAI (in 2023 algorithms) that I gave in the post, which was itself partly based on the Direct Approach, then we’d expect inference costs to grow to something like 1e15-1e16 per forward pass, although I expect subsequent algorithmic progress will bring this figure down, depending on how much algorithmic progress translates into data efficiency vs. parameter efficiency. A remaining uncertainty here is how a single forward pass for a TAI model will compare to one second of inference for humans, although I’m inclined to think that they’ll be fairly similar.
From Birds, Brains, Planes, and AI:
I listed this example in my comment, it was incorrect by an order of magnitude, and it was a retrodiction. “I didn’t look up the data on Google beforehand” does not make it a prediction.
Yeah sorry, I didn’t mean to say this directly contradicted anything you said. It just felt like a good reference that might be helpful to you or other people reading the thread. (In retrospect, I should have said that and/or linked it in response to the mention in your top-level comment instead.)
(Also, personally, I do care about how much effort and selection is required to find good retrodictions like this, so in my book “I didn’t look up the data on Google beforehand” is relevant info. But it would have been way more impressive if someone had been able to pull that off in 1890, and I agree this shouldn’t be confused for that.)
Re “it was incorrect by an order of magnitude”: that seems fine to me. If we could get that sort of precision for predicting TAI, that would be awesome and outperform any other prediction method I know about.