On footnote 16, you “For example, the application of Laplace’s law described below implies that there was a 50% chance of AGI being developed in the first year of effort”. But historically, participants in the Dartmouth conference were gloriously optimistic
“We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”
When you write “I also find that pr(AGI by 2036) from Laplace’s law is too high,” what outside-view consideration are you basing that on? Also, is it really too high?
If you rule out AGI until 2028 (as you do in your report), the Laplace prior gives you 1 - (1-[1/(2028-1956)+1])^(2036-2028) ≈ 10.4% ≈ 10%, which is well withing your range of 1% to 18%, and really near to your estimate of 8%.
The point that Laplace’s prior depends on the unit of time chosen is really interesting, but it ends up not mattering once a bit of time has passed. For example, if we choose to use days instead of years, with (days since June 18 1956=23660, days until Mar 29 2028=2557, days until Jan 1 2036=5391), then Laplace’s rule would give for the probability of AGI until 2036: 1 - (1-[1/(23660+2557+1)])^(5391-2557) = 10.2% ≈ 10%, pretty much the same as above.
It’s fun to see that (1-(1/x))^x converges to 1/e pretty quickly, and that changing from years to days is equivalent to changing from ~(1+(1/x))^(x*r) to ~(1+(1/(365*x)))^(365*x*r) , where x is the time passed in years and x*r is the time remaining in years. But both converge pretty quickly to (1/e)^r.
It is not clear to me that by adjusting the Laplace prior down when you categorize AGI as a “highly ambitious but feasible technology” you are not updating twice: Once on the actual passage of time and another time given that AGI seems “highly ambitious”. But one knows that AGI is “highly ambitious” because it has hasn’t been solved in the first 65 years.
Given that, I’d still be tempted to go with the Laplace prior for this question, though I haven’t really digested the report yet.
Thanks for these thoughts! You raise many interesting points.
On footnote 16, you “For example, the application of Laplace’s law described below implies that there was a 50% chance of AGI being developed in the first year of effort”. But historically, participants in the Dartmouth conference were gloriously optimistic
I’m not sure whether the participants at Dartmouth would have assigned 50% to creating AGI within a year and >90% within a decade, as implied by the Laplace prior. But either way I do think these probabilities would have been too high. It’s very rare, perhaps unprecedented, for such transformative tech progress to be made with so little effort. Even listing some of the best examples of quick and dramatic tech progress, I found the average time for a milestone to be achieved was >50 years, and the list omits the many failed projects.
That said, I agree that the optimism before Dartmouth is some reason to use a high first-trial probability (though I don’t think as high as 50%).
The point that Laplace’s prior depends on the unit of time chosen is really interesting, but it ends up not mattering once a bit of time has passed.
Agreed! (Interestingly, it only doesn’t matter once enough time has passed that Laplace strongly expects AGI to have already happened.) Still, Laplace’s predictions about the initial years of effort do depend on the trial definition: defining a ‘trial’ as 1 day, 1 year, or 30 years gives very different results. I think this shows something is wrong with the rule more generally. The root of the problem is that that Laplace assigns 50% probability of the first trial succeeding no matter how we define a trial. I think my alternative rule, where you choose the trial definition and the first-trial probability in tandem, addresses this issue.
If you rule out AGI until 2028 (as you do in your report), the Laplace prior gives you 1 - (1-[1/(2028-1956)+1])^(2036-2028) ≈ 10.4% ≈ 10%, which is well withing your range of 1% to 18%, and really near to your estimate of 8%
My estimate of 8% only rules out AGI by the end of 2020. If I rule out AGI by the end of 2028, it becomes ~4%. This is quite a lot smaller than the 10% from Laplace.
The top of my range would be 9%, which is close to Laplace. However, this high-end is driven by forecasting that the inputs to AI R&D will grow faster than their historical average, so more trials occur per year. I don’t think such high values would be reasonable without taking these forecasts into account.
When you write “I also find that pr(AGI by 2036) from Laplace’s law is too high,” what outside-view consideration are you basing that on? Also, is it really too high?
I find it too low mostly because it follows from aggressive assumptions about the chance of success in the first few years of effort, but also because of the reference classes discussed in the report.
Another way to justify ruling out Laplace is that if you had a hyper-prior, putting some weight on Laplace and some on more conservative rules, you would put extremely little weight on Laplace by now. (Although I personally wouldn’t put much weight on Laplace even in an initial hyper-prior.)
There’s a counter-intuitive example that illustrates this hyper-prior behaviour nicely. Suppose you assigned 20% to “AGI impossible” and 80% to another prior. If the other prior is Laplace, then your weight on “AGI impossible” rises to 92% by 2020, and you only assign 8% to Laplace. Your pr(AGI by 2036) is 1.6%. By contrast, if you reduce the first-trial probability in Laplace down to 1⁄100 then your weight on “AGI impossible” only rises to 29% by 2020 and your pr(AGI by 2036) is 6.3%. So having a lower first-trial probability ends up increasingpr(AGI by 2036).
It is not clear to me that by adjusting the Laplace prior down when you categorize AGI as a “highly ambitious but feasible technology” you are not updating twice
This is an interesting idea, thanks. I think the description “highly ambitious” would have been appropriate in 1956: AGI would allow automation of ~all labour. In addition, it did seem hard to me to find reference classes supporting first-trial probability values above 1⁄50, and some reference classes I looked into suggest lower values.
That said, it’s possible that my favoured range for the first-trial probability [1⁄100, 1/1000]was influenced by my knowledge that we failed to develop AGI. If so, this would have made the range too conservative.
Some notes on the Laplace prior:
On footnote 16, you “For example, the application of Laplace’s law described below implies that there was a 50% chance of AGI being developed in the first year of effort”. But historically, participants in the Dartmouth conference were gloriously optimistic
When you write “I also find that pr(AGI by 2036) from Laplace’s law is too high,” what outside-view consideration are you basing that on? Also, is it really too high?
If you rule out AGI until 2028 (as you do in your report), the Laplace prior gives you 1 - (1-[1/(2028-1956)+1])^(2036-2028) ≈ 10.4% ≈ 10%, which is well withing your range of 1% to 18%, and really near to your estimate of 8%.
The point that Laplace’s prior depends on the unit of time chosen is really interesting, but it ends up not mattering once a bit of time has passed. For example, if we choose to use days instead of years, with (days since June 18 1956=23660, days until Mar 29 2028=2557, days until Jan 1 2036=5391), then Laplace’s rule would give for the probability of AGI until 2036: 1 - (1-[1/(23660+2557+1)])^(5391-2557) = 10.2% ≈ 10%, pretty much the same as above.
It’s fun to see that (1-(1/x))^x converges to 1/e pretty quickly, and that changing from years to days is equivalent to changing from ~(1+(1/x))^(x*r) to ~(1+(1/(365*x)))^(365*x*r) , where x is the time passed in years and x*r is the time remaining in years. But both converge pretty quickly to (1/e)^r.
It is not clear to me that by adjusting the Laplace prior down when you categorize AGI as a “highly ambitious but feasible technology” you are not updating twice: Once on the actual passage of time and another time given that AGI seems “highly ambitious”. But one knows that AGI is “highly ambitious” because it has hasn’t been solved in the first 65 years.
Given that, I’d still be tempted to go with the Laplace prior for this question, though I haven’t really digested the report yet.
Thanks for these thoughts! You raise many interesting points.
I’m not sure whether the participants at Dartmouth would have assigned 50% to creating AGI within a year and >90% within a decade, as implied by the Laplace prior. But either way I do think these probabilities would have been too high. It’s very rare, perhaps unprecedented, for such transformative tech progress to be made with so little effort. Even listing some of the best examples of quick and dramatic tech progress, I found the average time for a milestone to be achieved was >50 years, and the list omits the many failed projects.
That said, I agree that the optimism before Dartmouth is some reason to use a high first-trial probability (though I don’t think as high as 50%).
Agreed! (Interestingly, it only doesn’t matter once enough time has passed that Laplace strongly expects AGI to have already happened.) Still, Laplace’s predictions about the initial years of effort do depend on the trial definition: defining a ‘trial’ as 1 day, 1 year, or 30 years gives very different results. I think this shows something is wrong with the rule more generally. The root of the problem is that that Laplace assigns 50% probability of the first trial succeeding no matter how we define a trial. I think my alternative rule, where you choose the trial definition and the first-trial probability in tandem, addresses this issue.
My estimate of 8% only rules out AGI by the end of 2020. If I rule out AGI by the end of 2028, it becomes ~4%. This is quite a lot smaller than the 10% from Laplace.
The top of my range would be 9%, which is close to Laplace. However, this high-end is driven by forecasting that the inputs to AI R&D will grow faster than their historical average, so more trials occur per year. I don’t think such high values would be reasonable without taking these forecasts into account.
I find it too low mostly because it follows from aggressive assumptions about the chance of success in the first few years of effort, but also because of the reference classes discussed in the report.
Another way to justify ruling out Laplace is that if you had a hyper-prior, putting some weight on Laplace and some on more conservative rules, you would put extremely little weight on Laplace by now. (Although I personally wouldn’t put much weight on Laplace even in an initial hyper-prior.)
There’s a counter-intuitive example that illustrates this hyper-prior behaviour nicely. Suppose you assigned 20% to “AGI impossible” and 80% to another prior. If the other prior is Laplace, then your weight on “AGI impossible” rises to 92% by 2020, and you only assign 8% to Laplace. Your pr(AGI by 2036) is 1.6%. By contrast, if you reduce the first-trial probability in Laplace down to 1⁄100 then your weight on “AGI impossible” only rises to 29% by 2020 and your pr(AGI by 2036) is 6.3%. So having a lower first-trial probability ends up increasingpr(AGI by 2036).
This is an interesting idea, thanks. I think the description “highly ambitious” would have been appropriate in 1956: AGI would allow automation of ~all labour. In addition, it did seem hard to me to find reference classes supporting first-trial probability values above 1⁄50, and some reference classes I looked into suggest lower values.
That said, it’s possible that my favoured range for the first-trial probability [1⁄100, 1/1000] was influenced by my knowledge that we failed to develop AGI. If so, this would have made the range too conservative.