[I wrote this for an application a couple of weeks ago, but thought I might as well dump it here in case someone was interested in my views. / It might sometimes be useful to be able to link to this.]
[In this post I generally state what I think before updating on other people’s views – i.e., what’s sometimes known as ‘impressions’ as opposed to ‘beliefs.’]
Summary
Transformative AI (TAI) – the prospect of AI having impacts at least as consequential as the Industrial Revolution – would plausibly (~40%) be our best lever for influencing the long-term future if it happened this century, which I consider to be unlikely (~20%) but worth betting on.
The value of TAI depends not just on the technological options available to individual actors, but also on the incentives governing the strategic interdependence between actors. Policy could affect both the amount and quality of technical safety research and the ‘rules of the game’ under which interactions between actors will play out.
Why I’m interested in TAI as a lever to improve the long-run future
I expect my perspective to be typical of someone who has become interested in TAI through their engagement with the effective altruism (EA) community. In particular,
My overarching interest is to make the lives of as many moral patients as possible to go as well as possible, no matter where or when they live; and
I think that in the world we find ourselves in – it could have been otherwise –, this goal entails strong longtermism, i.e. the claim that “the primary determinant of the value of our actions today is how those actions affect the very long-term future.”
Less standard but not highly unusual (within EA) high-level views I hold more tentatively:
The indirect long-run impacts of our actions are extremely hard to predict and don’t ‘cancel out’ in expectation. In other words, I think that what Greaves (2016) calls complex cluelessness is a pervasive problem. In particular, evidence that an action will have desirable effects in the short term generally is not a decisive reason to believe that this action would be net positive overall, and neither will we be able to establish the latter through any other means.
Increasing the relative influence of longtermist actors is one of the very few strategies we have good reasons to consider net positive. Shaping TAI is a particularly high-leverage instance of this strategy, where the main mechanism is reaping an ‘epistemic rent’ from having anticipated TAI earlier than other actors. I take this line of support to be significantly more robust than any particular story on how TAI might pose a global catastrophic risk including even broad operationalizations of the ‘value alignment problem.’
My empirical views on TAI
I think the strongest reasons to expect TAI this century are relatively outside-view-based (I talk about this century just because I expect that later developments are harder to predictably influence, not because I think a century is particularly meaningful time horizon or because I think TAI would be less important later):
We’ve been able to automate an increasing number of tasks (with increasing performance and falling cost), and I’m not aware of a convincing argument for why we should be highly confident that this trend will stop short of full automation – i.e., AI systems being able to do all tasks more economically efficiently than humans –, despite moderate scientific and economic incentives to find and publish one.
Independent types of weak evidence such as trend extrapolation and expert surveys suggest we might achieve full automation this century.
Incorporating full automation into macroeconomic growth models predicts – at least under some assumptions – a sustained higher rate of economic growth (e.g. Hanson 2001, Nordhaus 2015, Aghion et al. 2017), which arguably was the main driver of the welfare-relevant effects of the Industrial Revolution.
Accelerating growth this century is consistent with extrapolating historic growth rates, e.g. Hanson (2000[1998]).
I think there are several reasons to be skeptical, but that the above succeeds in establishing a somewhat robust case for TAI this century not being wildly implausible.
My impression is that I’m less confident than the typical longtermist EA in various claims around TAI, such as:
Uninterrupted technological progress would eventually result in TAI;
TAI will happen this century;
we can currently anticipate any specific way of positively shaping the impacts of TAI;
if the above three points were true then shaping TAI would be the most cost-effective way of improving the long-term future.
My guess is this is due to different priors, and due to frequently having found extant specific arguments for TAI-related claims (including by staff at FHI and Open Phil) less convincing than I would have predicted. I still think that work on TAI is among the few best shots for current longtermists.
Awesome post, Max, many thanks for this. I think it would be good if these difficult questions were discussed more on the forum by leading researchers like yourself.
I think you should post this as a normal post; it’s far too good and important to be hidden away on the shortform.
Thanks for putting your thoughts together, I only accidentally stumbled on this and I think it would be a great post, too.
I was really surprised about you giving ~20% for TAI this century, and am still curious about your reasoning, because it seems to diverge strongly from your peers. Why do you find inside-view based arguments less convincing? I’ve updated pretty strongly on the deep (reinforcement) learning successes of the last years, and on our growing computational and algorithmic level understanding of the human mind. I’ve found AI Impacts’ collection of inside- and outside-view arguments against current AI leading to AGI fairly unconvincing, e.g. the list of “lacking capacities” seem to me (as someone following CogSci, ML and AI Safety related blogs) to get a lot of productive research attention.
[Some of my high-level views on AI risk.]
[I wrote this for an application a couple of weeks ago, but thought I might as well dump it here in case someone was interested in my views. / It might sometimes be useful to be able to link to this.]
[In this post I generally state what I think before updating on other people’s views – i.e., what’s sometimes known as ‘impressions’ as opposed to ‘beliefs.’]
Summary
Transformative AI (TAI) – the prospect of AI having impacts at least as consequential as the Industrial Revolution – would plausibly (~40%) be our best lever for influencing the long-term future if it happened this century, which I consider to be unlikely (~20%) but worth betting on.
The value of TAI depends not just on the technological options available to individual actors, but also on the incentives governing the strategic interdependence between actors. Policy could affect both the amount and quality of technical safety research and the ‘rules of the game’ under which interactions between actors will play out.
Why I’m interested in TAI as a lever to improve the long-run future
I expect my perspective to be typical of someone who has become interested in TAI through their engagement with the effective altruism (EA) community. In particular,
My overarching interest is to make the lives of as many moral patients as possible to go as well as possible, no matter where or when they live; and
I think that in the world we find ourselves in – it could have been otherwise –, this goal entails strong longtermism, i.e. the claim that “the primary determinant of the value of our actions today is how those actions affect the very long-term future.”
Less standard but not highly unusual (within EA) high-level views I hold more tentatively:
The indirect long-run impacts of our actions are extremely hard to predict and don’t ‘cancel out’ in expectation. In other words, I think that what Greaves (2016) calls complex cluelessness is a pervasive problem. In particular, evidence that an action will have desirable effects in the short term generally is not a decisive reason to believe that this action would be net positive overall, and neither will we be able to establish the latter through any other means.
Increasing the relative influence of longtermist actors is one of the very few strategies we have good reasons to consider net positive. Shaping TAI is a particularly high-leverage instance of this strategy, where the main mechanism is reaping an ‘epistemic rent’ from having anticipated TAI earlier than other actors. I take this line of support to be significantly more robust than any particular story on how TAI might pose a global catastrophic risk including even broad operationalizations of the ‘value alignment problem.’
My empirical views on TAI
I think the strongest reasons to expect TAI this century are relatively outside-view-based (I talk about this century just because I expect that later developments are harder to predictably influence, not because I think a century is particularly meaningful time horizon or because I think TAI would be less important later):
We’ve been able to automate an increasing number of tasks (with increasing performance and falling cost), and I’m not aware of a convincing argument for why we should be highly confident that this trend will stop short of full automation – i.e., AI systems being able to do all tasks more economically efficiently than humans –, despite moderate scientific and economic incentives to find and publish one.
Independent types of weak evidence such as trend extrapolation and expert surveys suggest we might achieve full automation this century.
Incorporating full automation into macroeconomic growth models predicts – at least under some assumptions – a sustained higher rate of economic growth (e.g. Hanson 2001, Nordhaus 2015, Aghion et al. 2017), which arguably was the main driver of the welfare-relevant effects of the Industrial Revolution.
Accelerating growth this century is consistent with extrapolating historic growth rates, e.g. Hanson (2000[1998]).
I think there are several reasons to be skeptical, but that the above succeeds in establishing a somewhat robust case for TAI this century not being wildly implausible.
My impression is that I’m less confident than the typical longtermist EA in various claims around TAI, such as:
Uninterrupted technological progress would eventually result in TAI;
TAI will happen this century;
we can currently anticipate any specific way of positively shaping the impacts of TAI;
if the above three points were true then shaping TAI would be the most cost-effective way of improving the long-term future.
My guess is this is due to different priors, and due to frequently having found extant specific arguments for TAI-related claims (including by staff at FHI and Open Phil) less convincing than I would have predicted. I still think that work on TAI is among the few best shots for current longtermists.
Awesome post, Max, many thanks for this. I think it would be good if these difficult questions were discussed more on the forum by leading researchers like yourself.
I think you should post this as a normal post; it’s far too good and important to be hidden away on the shortform.
I second Stefan’s suggestion to share this as a normal post – I realize I should have read your shortform much sooner.
Thanks for putting your thoughts together, I only accidentally stumbled on this and I think it would be a great post, too.
I was really surprised about you giving ~20% for TAI this century, and am still curious about your reasoning, because it seems to diverge strongly from your peers. Why do you find inside-view based arguments less convincing? I’ve updated pretty strongly on the deep (reinforcement) learning successes of the last years, and on our growing computational and algorithmic level understanding of the human mind. I’ve found AI Impacts’ collection of inside- and outside-view arguments against current AI leading to AGI fairly unconvincing, e.g. the list of “lacking capacities” seem to me (as someone following CogSci, ML and AI Safety related blogs) to get a lot of productive research attention.
[deleted because the question I asked turned out to be answered in the comment, upon careful reading]