Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Thanks so much for posting this Gideon. I like your way of framing this into these two loose clusters, and especially your claim that it is good to have both. I completely agree. While my work is indeed more within the simple cluster, I feel that a fight over which approach is right would be misguided.
All phenomena can be modelled at lesser or greater degrees of precision, with different advantages and disadvantages of each. Often there are some sweet spots where there is an especially good tradeoff between accuracy and ability to actually use the model. We should try to find those and use them all to illuminate the issue.
There is a lot to be said for simple and for complex approaches. In general, my way forward with all kinds of topics is to start as simple as possible and only add complexity when it is clearly needed to address a glaring fault. We all know the truth is as complex as the universe, so the question is not whether the more complex model is more accurate, but whether it adds sufficient accuracy to justify the problems it introduces, such as reduced usability, reduced clarity, and overfitting. Sometimes it clearly is. Other times I don’t see that it is and am happy to wait for those who favour the complex model to point to important results it produces.
One virtue of a simple model that I think is often overlooked is its ability to produce crisp insights that, once found, can be clearly explained and communicated to others. This makes knowledge sharing easier and makes it easier to build up a field’s understanding from these crisp insights. I think the kind of understanding you gain from more complex models is often more a form of improving your intuitions and is harder to communicate, and doesn’t typically come with a simple explanation that the other person can check to see if you are right without spending a similar amount of time with the model.
I really appreciate the explorative, curious, open and constructive approach Toby!
On ‘what are some important results that a complex model produces’, one nice example is a focus on vulnerability. That is, focus on improving general resilience, as well as preventing and mitigating particular hazards. This has apparently become best practice in many companies—e.g. rather than just listing hazards, focus also on having adequate capital reserves and some slack/redundancy in one’s supply chains.
Matt Boyd and Nick Wilson have done some great complex-model-ish work looking at the resilience of island nations to a range of scenarios. One thing that turned up is that Aotearoa New Zealand has lots of food production, but transport of that food is reliant on road transport, and the country closed its only oil refinery. Having an oil refinery might increase its resilience/decrease its vulnerability.
I don’t think that point would have necessarily come up in a ‘simple-model’ approach, but its concrete, tractable, important and plausibly a good thing to suggest the govt act on.
Of course, you touch on vulnerabilities in the Precipice. Nevertheless, its fun to wonder what a sequel would look like with each chapter framed around a critical system/vulnerability (food, health, communications) rather than each around a particular hazard.
Thanks for this Gideon. Having read this and your comments on my climate report, I am still not completely sure what the crux of the disagreement is between us. I get that you disagree with my risk estimates, but I don’t really understand why. Perhaps we could discuss on here, if you were up for it
I obviously think we need more time to flesh out real cruxes but I think our differences are cruxes are probably a few fold:
I think I am considerably less confident than you in the capacity of the research we have done thus far to confidently suggest climate’s contribute to existential risk. To some degree, I think the sort of evidence your happier relying on to make negative claims (ie not a major contributor to existential risk) I am much less happy with doing, as I think they often (and maybe always will) fail to account for plausible major contributors to the complexity of a system. This is both an advantage of the simple approach as Toby lays out earlier, but I’m more skeptical at its usage to make negative rather than positive claims.
I think you are looking for much better thought out pathways to catastrophe than I think is appropriate. I see climate acting as something acting to promote serious instability in a large number of aspects of a complex system, which should give us serious reasons to worry. This probably means my priors on climate are higher than yours immediately, as I’m of the impression you don’t hold this “risk emerges from an inherently interconnected world” ontology. This is why I’ve often put our differences down to our ontology and how we view risk in the real world
Because of my ontology and epistemology, I think I’m happier to put more credence on things like past precedent (collapses trigger by climate change, mass extinctions etc.), and decently formulated theory (planetary boundaries for GCR (although I recognise their real inherent flaws!), the sort of stuff laid out in Avin et al 2018, whats laid out in Beard et al 2021 and Kemp et al 2022). I’m also happier to take on board a broader range of evidence, and look more at things like how risk spreads, vulnerabilities/exposures, feedbacks, responses (and the plausible negatives therin) etc, which I don’t find your report convincing deals with, partially because they are really hard to deal with and partially because, particularly for the heavy tails of warming and other factors, there is a very small amount of research as Kemp et al lays out. Correct me if I’m wrong, but you see the world as a bit more understandable than I do, so simpler, quantitative, more rational models are seen as more important to be able to make any positive epistemic claim, and so you would somewhat reject the sort of analysis that I’m citing.
I’m also exceptionally skeptical of your claim that if direct risks are lower than indirect risks are lower; although I would reject the use of that language full stop
I also think its important to note that I make these claims in (mostly) the context of X-Risk. I think in “normal” scenarios, I would fall much closer to you than to disagreeing with you on a lot of things. But I think I have both a different ontology of existential risk (emerging mostly out of complex systems, so more like whats laid out in Beard et al 2021 and Kemp et al 2022) and perhaps more importantly a more pessimistic epistemology. As (partially) laid out when I discuss Existential Risk, Creativity and Well Adapted Science in the talk, I think that with Existential Risk negative statements (this won’t do this) actually have a higher evidentiary burden than positive statements of a certain flavour (it is plausible that this could happen). Perhaps this is because my priors of existential risk from most things are pretty low (owing I think in part to my pessimistic epistemology) that it just does take much more evidence to cause me to update downwards than to be like “huh, this could be a contributor to risk actually!”
Does this answer our cruxes? I know this doesn’t go into object level aspects of your report, but I think this may do a better job at explaining why we disagree, even when I do think your analysis is top-notch, albeit with a methodology that I disagree with on existential risk.
I also think its important that you know that I’m still not quite sure if I’m using the right language to explain myself here, and that my answer here is why I find your analysis unconvincing, rather than it being wrong. Perhaps as my views evolve I will look back and think differently. Anyway, I really would like to talk to you more about this at some point in the future.
Does this sound right to you?
Thanks yes that is helpful. Perhaps we can now get into the substance.
It is noteworthy how different your estimates of the x-risk of climate change are to all other published attempts to quantify the aggregate costs of climate change. All climate-economy models imply not just that climate change won’t cause an existential catastrophe, but that average living standards will be higher in the future despite climate change. When people try to actually quantify and add up the effect on things like agriculture, sea level rise and so on, they don’t get anywhere near to civilisational collapse, but instead get a counterfactual reduction in GDP on the order of 1-5% relative to a world with no climate change (not relative to today).
I don’t think past precedent can take us very far here, since there are no precedents of climate change causing human extinction, though anthropics is obviously an issue here. In the report, I also discuss how in the last 160 million years, climate change has not been associated with elevated rates of species loss. Humans also survive and thrive in very diverse environmental niches at the moment, with an annual average temperature of 10ºC in the UK, but closer to 25ºC in South Asia. Within this annual average, there is also substantial diurnal and seasonal variation. It’s around 5ºC in the UK now but will reach 20ºC in the summer. Humans have survived dramatic climate change over the last 300,000 years, and our hominid ancestors also survived when the world was about 4ºC warmer. It’s hard to see why climate change of 2-4ºC would make such a massive difference, so as to constitute an existential catastrophe
I disagree about planetary boundaries for reasons I discuss in the report. I have examined several of the boundaries in depth and they just seem to be completely made up.
It is not true that there is a small amount of research on the tails of warming. Business as usual is now agreed to be 2.5ºC with something like a 1-5% chance of 4ºC. The impacts literature has in fact been heavily criticised for focusing too much on the impacts of RCP8.5, which implies 5ºC by 2100.
The approach that you advocate for seems to me to establish not just that climate change is a much bigger risk than commonly recognised but also that many other problems are as well. Other problems also have similar or larger effects to climate change when calculated in the usual way used in economic analysis. This includes things like mispricing of water, immigration restrictions, antimicrobial resistance, underinvestment in vaccines, a lot of things that affect the media, the prohibition of GM food, underinvestment in R&D, bad monetary policy, economists focusing on RCTs, housing regulation, the drug war etc. If climate change is a cascading risk on the order of 0.01pp to 1pp, then these problems should be as well. But if they are as well, then total existential risk from non-AI and non-bio sources is way way higher than commonly recognised and doom is almost certain. The reasoning suggests that the world is so fragile that it is unlikely that we could even have got to the current level of technological development.
I would view a lot of my report as assessing cascading risk. I discuss pathways such as climate change ⇒ civil conflict ⇒ political instability ⇒ interstate war. I also discuss effects on migration and the spillover effects this might have. What difference would a cascading risk approach take here? Related to this, I don’t view causal chains like this as very understandable and I say so in the report. But we still have ideas about how big effects some things have. The causes of war between the US and China or Russia and China
To answer each of your points in turn
I think its important to note that much of the literature looking at those estimates for extreme scenarios (not just extreme levels of warming, but other facets of the extremes as well), has suggested that current techniques for calculating climate damage aren’t great at the extremes, and tend to function well only when close to status quo. So we should expect that these models don’t act appropriately under the conditions we are interested in when exploring GCR/X-Risk. This has pretty commonly been discussed in the literature on these things (Beard et al 2021, Kemp et al 2022, Wagner &Weitzmann 2015, Weaver et al 2010 etc.)
I still think past events can give us useful information. Firstly, climate change has been a contributing factor to A LOT of societal collapses; whilst these aren’t perfect analagies and do show a tremendous capacity of humanity to adapt and survive, they do show the capacity of climate change to contribute to major socio-political-technological crises, which may act as a useful proxy for what we are trying to look for. Moreover, whilst a collapse isn’t an extinction, if we care about existential risk, we might indeed be pretty worried about collapse if it makes certain lock-in more or less likely, but to be honest thats a discussion for another time. Moreover, whilst I think your paleoclimatic argument is somewhat reasonable, given the limited data here (and your reliance on a few data points + a large reliance on a single study of plant diversity (which is fine by the way, we have limited data in general!)), I don’t find it hugely comforting. Particularly because climate change seems to have been a major factor in all of the big 5 mass extinction events, and the trends that Song et al 2021 note in their analysis of temperature change and mass extinction over the Phraneozoic. They mostly use marine animals. When dealing with pass processes, explainations are obviously difficult to disentangle, so there are reasons to be sceptical of the causal explanatory power of Song’s analysis, although obvious such similar uncertainty should be applied to your analysis, particularly with the claims of this fundamental step change 145 million years ago.
Whilst planetary boundaries do have their flaws and to some degree where they are set is quasi-arbitary, as discussed in the talk, something like this may be necessary when acting under such deep uncertainty; don’t walk out into the dark forest and all that. Moreover, I think your report fails to argue convincingly against the BRIHN framework that Baum et al 2014 developed, in part in response to the Nordhaus criticisms which you cite.
Extreme climate change is not just RCP 8.5/ SSP5-8.5, its much broader than that. Kemp et al 2022′s response to Burgess et al’s comment lays out this argument decently well, as does Climate Endgame itself.
I don’t really understand this point, particularly in response to my talk. I explicitly suggest in my talk I think systemic risk, which those could all contribute to, are very important. The call for more complex risk assessment (the core point of the talk alongside a call for pluralism) is that there are likely significant limits to conventional economic analysis in analysing complex risk. The disagreement on this entire point seems to be explained reasonably well by the difference between the simple/complex approach.
I think your causal pathways are too simple and defined (ie they are those 1st and 2nd order indirect impacts), and probably don’t account for the ways in which climate could contribute to cascading risk. Whilst of course this is still under explored, some of the concepts in Beard et al 2021 and Richards et al 2021 are a useful starting place, and I don’t really see how your report refutes the concepts around cascades they bring up. I’d also like to agree these cascades are really hard to understand, but I struggle to see how that fact acts in the favour of your approach and conclusions?
I hope this has helped show some of our disagreements! :-)
I agree that climate-economy models aren’t good at some types of extremes, but I think there are different versions of this argument, some of which have become weaker over the years. One of Weitzman’s points was that there was a decidedly non-negligible chance of more than 6ºC and our economic models weren’t good at capturing how bad this would be and so tended to underestimate climate risk. I think this was basically right at the time he was writing. But since 5ºC now looks less and less likely, this critique has less and less bite. Because there is such a huge literature on the impact of 5ºC, the models now in principle have a much firmer foundation for damage estimates. eg the Takakura 2019 paper that I go on about in the report uses up to date literature on a wide range of impact channels, but still only gets like a 5% counterfactual reduction in welfare-equivalent of GDP by 2100, and so probably higher average living standards than today.
Another version of this is that the models aren’t good at capturing tipping points. I agree with this, but I also find it difficult to see how this would make a dramatic difference to the damage estimates if you actually drill down into the literature on the impact of different tipping points. Tipping points that might cause different levels of warming are not relevant to damage estimates, so the main ones that seem relevant are ice sheet collapse, regional precipitation and temperature changes, such as changes in monsoons, which might be caused eg by collapse of the AMOC. For the impacts discussed in the literature, it is difficult to see how you get anywhere close to an existential catastrophe if any of these things happen.
Aside from that, it is noteworthy that some economic models actually try to capture the literature on the impact of warming of 5ºC on things like agriculture, sea level rise, temperature-related deaths, lost productivity from heat etc. There is a group of scientists who say that 3ºC/4ºC is catastrophic on the basis of what the scientific literature says about these impacts. The models strongly suggest that they are wrong, and it is not clear what their response is.
All this being said, I am sympathetic to some critiques of the economic models, eg a lot of the Nordhaus stuff. When I was writing the report, I had thought about putting no weight on them at all, but after digging a bit I changed my mind. I think some of the models make a decent stab at quantifying aggregate costs.
I agree that climate changes have contributed at least to some civilisational trauma throughout history. The literature on this suggests that climate change has been correlated with local civilisational trauma. But: (a) local collapse is a far cry from global collapse; (b) most of the time this was due to cooling rather than warming; (c) the mechanism was usually damage to agricultural output, but there is now far more slack in the system, and we have massively better technology to deal with any disruption; (d) we in general have far more advanced technology, and whereas in the past >90% of the workforce would have been employed in agriculture, now <20% is (or whatever); (e) the relationship between climate change and civilisational turmoil breaks down by the industrial revolution, which provides some support for point (c).
The paleoclimate point doesn’t rely on one datapoint: it’s data from 160 million years of climatic and evolutionary history. Massive climate change over that period didn’t cause species extinctions, as some might have expect it to have done.
As you say, with climate change, the extinctions usually happened among marine life, due to ocean anoxia and ocean acidification, and it’s hard to see the mechanism by which CO2 pollution would cause land-based extinctions, unless something else weird happens at the time, such as a volcanic eruption puncturing though salt deposits as happened at the Permian.
For the level of warming that now looks likely of 2-4ºC, it’s really hard to see why it would cause similar damage eg to the Permian, given that the effect is an order of magnitude smaller.
I don’t think they are quasi-arbitrary, they are totally arbitrary. eg they propose a planetary boundary for biodiversity intactness which by their own admission is made up. The boundary also can’t be real since various countries across Eurasia completely destroyed their pre-modern ecosystems after the agricultural revolution without causing anything like civilisational collapse.
A lot of people criticise planetary boundaries for being political advocacy. The clearest evidence for this is Steffen et al proposing a supposed planetary boundary for a ‘Hothouse earth’ at 2ºC (which happens to be the Paris target) on the basis of no argument.
When we are acting under uncertainty I think we should use expected value. Alleged boundaries might be a useful schelling point for political negotiation (like the 2ºC threshold), but it’s not a good approach for actually quantifying risk. Another downside of a boundary is that it implies that anything we do once we pass the boundary is pointless.
Kemp, Jehn and others claim that the effect of warming of more than 3ºC is ‘severely neglected’. But all of the impacts literature explores the effect of rcp8.5 by 2100, which implies 4-5ºC of warming. Jehn’s search strategy uses temperature mentions to measure neglect, but if you use RCP mentions, you don’t get the same result.
My argument here was that I think your argument proves too much—it suggests that the world is extremely fragile to eg agricultural disruption and heat waves that happen all the time. Given that the world was eg a lot poorer in 1980 and so had a lot lower adaptive capacity, why didn’t various weather disasters trigger cascading catastrophes back then? The number of people dying in weather-related disasters has declined massively over time, so we should expect the cascade to have happened in the 1920s and less so in the future?
I also don’t see why cascading risk would change the cause ranking among top causes. Why aren’t democratised bioweapons and AI also cascading risks?
What are the causal pathways that might contribute to conflict risk that you think I have missed? I don’t really get what is meant to happen that I haven’t already discussed. I talk about all of the contributors to war outlined in textbooks about war and combine that with the literature on climate impacts. It is just really a stretch to make it an important contributor to US-China dynamics.
Hi John, sorry this has taken a while.
In particular, climate economy models still do bad at the heavy tail, not just of warming, but at civilisational vulnerability etc, again presenting a pretty “middle of the road” rather than heavy tailed distribution. The sort of work from Beard et al 2021 for instance highlights something I think the models pretty profoundly miss. Similarly, I’d be really interested in research similar to Mani et al 2021 on extreme weather events and how this may change due to climate change.
I dpon’t see why the models discount the idea that there is a low but non-negligable probability of catastrophic consequences from 3-4 degrees of warming. What aspect of the models? I’m reticent to rely on things like damage functions here, as they don’t seem to engage with the possib;le heavy-tailedness of damage. Whilst I agree that the models probably are decent approximations of reality, I’m just not really very sure they are useful at telling us anything about the low probabil;ity high impact scenarios that we are worried about here.
Whilst I agree there are reasons to think our vulnerability is less, there is clear reasons to think with a growing interconnected (and potentially fragile) global network and economy, our vulnerability is increasing, meaning that whilst the past collapse data might not be prophetic, there is at least value in it; after all, we are in a very evidence poor environment, meaning that I would be reticent to dismiss it as strongly as you seem to. And whilst it is true our agricultural system is more resilient, there is still a possibility of multiple breadbasket failures etc caused by climate change, and the beard et al and richards et al both explore plausible pathways to this. Again, whilst the past collapse data is definitely not a slam dunk in my favour, I would at least argue it is an update nonetheless. I think you might argue the fact that none led to human extinction makes that data an update in yopur direction, and i think your view on this depends on whether you see collapse and GCR and extinction on a continuum or not; I broadly do, and I assume you broadly don’t?
When I said one data point, I meant really one study. The reason I say this, is as cited, studies of different species/ species groups. In your comment, you don’t seem to engage with Song et al 2021. Kaiho at al 2022 also shows a positive relationship between warming and extinction rate. Moreover, I think it takes an overly confident view of our understanding of kill mechanisms, and seems to suggest that just because we don’t have all what you speculate were the important factors that were present in past mass extinctions doesn’t make that not useful evidence. I think a position like Keller et al 2018 (PETM as the best case, KPg as the worst case) is probably useful at looking at this (only using modern evidence!). Once again, this is an attempt by me, in a low evidence situation, to make best use of the evidence available, and I don’t find your points compelling enough to make me not think that this past precident can’t be informative.
On the Planetary Boundaries, you don’t seem to be engaging with what I’m saying here, which is most alluding to the Baum et al paper on this. Moreover, even if you think we are to use EV, what are you basing the probabilities on? I assume some sort of subjective bayesianism, in which case you’ll have to tell me why I should put a decently high (>1%) prior on moving beyond certain Holocene boundaries posing a genuine threat to humanity? That seems perfectly reasonable to me
I’m not really sure I understand the argument? Whilst in some ways the world has indeed got less vulnerable, in other ways it has got more connected, more economically vulnerable to natural disasters etc. Cascading impact seems to be seen more along these lines than along others. Moreover, if you only had a 5% probability of such a cascade occuring over a century, and we have hardly had a hyper-globalised economy for even that long, why would you expect it to have happened already? Your statements here seem pretty out of step with my actual probabilities etc.. And as I talk about in my talk, I also see problems from AI, biorisk and a whole host more. Thats why this talk, and this approach, is seriously not just about climate change; the hope is to add another approach to studying X-Risk.
I’m also pretty interested in your approach to evidence on X-Risk. I should say from the outset that I think climate change is unlikely to cause a catastrophe, but I don’t think you have provided compelling evidence that the probability is exceptionally small. Your evidence often seems to rely on the very things that we think ought to be suspect in X-Risk scenarios (economic models, continued improved resilience, best case scenario analogies etc.), and you seem to reject some things that might be useful for reasoning in such evidence poor environments (plausibly useful but somewhat flawed historical analogies, foresight, storytelling, scenarios etc.) . Basically, you seem to have a pretty high bar for evidence to be worried about climate change, which whilst I in general think is useful, I’m just not sure how appropriate it is in such an evidence poor environment as X-Risk, including climate change contributions to it. Its pretty interesting that you seem very willing to rely on much more speculative evidence for AI and biorisk (eg probabilistic forecasts which don’t have track records of being able to work well over such long time scales), and I genuinely wonder why this is. Note that such more speculative approaches (in this case superforecasters) gave a 1% probability of climate change being a necessary but not sufficent cause of human extinction by 2100, and gave an even higher probability to global catastrophe by 2100, which certainly then has the probability of later leading to extinction. Whilst I myself am somewhat sceptical of such approaches, I’d be interested in seeing why you seem accepting of them for bio and AI but not climate? Is it because you see evaluation of the existential risk from climate change as a much more evidence rich environment than for bio/AI?
I’m not sure they’re middle of the road on civilisational vulnerability. It would be pretty surprising if extreme weather events made a big difference to the overall picture. For the kinds of extreme weather events one sees in the literature, it’s just not a big influence on global GDP. How bad would a hurricane or flood have to be to push things from ‘counterfactual GDP reduction of 5%’ to civilisational collapse.
I don’t think they fully discount/ignore the possibility of catastrophe 3/4ºC. In part this is just an outcome of the models and of the scientific literature. There are no impacts that come close to catastrophe in the scientific literature for 3/4ºC. I agree they miss some tipping points, but looking at the scientific literature on that, it’s hard to see how it would make a big difference to the overall picture.
I haven’t read those papers and don’t have time to do so now unfortunately. My argument there doesn’t rely on one study but on a range of studies in the literature for different warm periods. The Permian was a very extreme and unusual case because it caused such massive land-based extinctions, which was caused by the release of halogens, which is not relevant to future climate change. Also, both the Permian and PETM were extremely hot relative to what we now seem to be in for (17ºC vs 2.5ºC).
I’m not sure I see how I am not engaging with you on planetary boundaries. I thought we were disagreeing about whether to put weight on planetary boundaries, and I was arguing that the boundaries just seem made up. Using EV may have its own problems but that doesn’t make planetary boundaries valid.
I don’t really see how the world now is more vulnerable to any form of weather events in any respect than it has been at any other point in human history. Society routinely absorbs large bad weather events; they don’t even cause local civilisational collapse any more (in middle and high income countries). Deaths from weather disasters have declined dramatically over the last 100 or so years, which is pretty strong evidence that societal resilience is increasing not decreasing. In the pre-industrial period, all countries suffered turmoil and hunger due to cold and droughts. This doesn’t happen any more in countries that are sufficiently wealthy. Many countries now suffer drought, almost entirely due to implicit subsidies for agricultural water consumption. It is very hard to see how this could lead to eg to collapse in California or Spain.
Can you set out an example of a cascading causal process that would lead to a catastrophe?
I’m not sure that there is some meta-level epistemic disagreement, I think we just disagree about what the evidence says about the impacts of climate change. In 2016, I was much more worried than the average FHI person about climate change, but after looking at the impacts literature and recent changes in likely emissions, I updated towards climate change being a relatively minor risk. Comparing to bio for instance, after reading about trends in gene synthesis technologies and costs, it takes about 30 minutes to see how it poses a major global catastrophic risk in the coming decades. I’ve been researching climate change for six years and struggle to see it. I am not being facetious here, this is my honest take.
Thanks for this it is useful. What is your estimate of the existential risk due to climate change? I obviously have it very low, so it would be useful to know where you are at on that. Could you explain what the main drivers of the risk are, from your point of view? Then we can get into the substance a bit more
I suppose the problem with that question from my perspective is I don’t think “existential risk due to X” really exists, as I explain in the talk. The number of percentage points it raises overall risk by, I would put climate change between <0.01% and 2%, and I would probably put overall risk at between 0.01% to 10% or something. But I’m not sure that I actually have much confidence in many approaches to xrisk quantification (as per Beard et al 2020a), even if it does make quantification easier. Some of the main contributions to risk from climate, but note a number may also be unknown or unidentifiable:
Weakening local, regional and global governance -Water and food insecurity -Cascading economic impacts -Conflict -Displacement -Biosphere integrity -Responses increasing systemic risk -Extreme Weather -Latent Risk
Mostly these increase risk by: -Increasing our vulnerability -Multiple stressors coalescing into synchronous failure -The major increase in systemic risk -The responses we take -Cascading effects leading to fast or slow collapse then extinction
Acting as a “risk factor”
Hi Gideon,
I recognize that your questions may be rhetorical, but here are some answers:
1. prioritize, by type of harm, the harms to avoid. The classic approach to understanding harm is to rank death as the greatest harm, with disease and other harms less harmful than death. I don’t agree with this but that’s not relevant. Some explicit ranking of harms to avoid clarifies costs associated with different actions.
NOTE: The story of climate change is one of rich countries making most of the anthropogenic GHG’s, damaging ecosystems more, threatening carbon sinks more, etc. Proactive actions can avoid more extreme harms but have known and disliked consequences, particularly for the wealthier of two compromising to save both (for example, societies, countries, or interest groups).
2. recognize the root causes. If you cannot play it safe, then harms will occur no matter what. In that case, recognize root causes of your quandary so that civilization has an opportunity to not repeat the mistake that got you where you are. In the case of climate change, I perceive a root cause shows in the simple equation impacts = population * per capita consumption. You can get fancy with rates or renewable resources or pollution sinks, but basically: consume less or shrink the population.
TIP: The problem reduces to the population size of developed countries offering plentiful public goods while allowing citizens to accumulate private goods. I’ve seen the suggestion to increase public goods and reduce private consumption. Another idea is to offer consistent family planning emphasizing women’s health and economic opportunities as well as free birth control for all, such as free condoms and free vasectomies for men.
3. find the neglected differences between actual, believed, and claimed assertions. As the situation is evolving into an existential crisis, differences appear between public claims, believed information, and the actual truth. During the crisis, the difference between beliefs and the truth gets less attention. Truth-seeking is ignored or assumed complete. You can buck that trend.
EXAMPLE: Right now, the difference to correct could be between claims and beliefs (for example, politicians lying about climate change), but another difference that is more neglected is between truths and beliefs about the lifestyle implications of successfully mitigating climate change. That is where we are now, I believe. People in the developed world are afraid that mitigating climate change for the global population will wreck their modern lifestyle. In many cases, I suspect those fears are overblown.
CAUTION: In a future of real extremes, involving the plausible loss of 100′s of millions of lives, don’t (claim to) expect that obvious solutions like “let 100 million climate migrants into the US over 5 years” will be easily accepted. Instead, expect the gap between claims and beliefs to widen as hidden agendas are acted upon. Climate change issues of rights, fairness, justice, and ethics, not just economics or technology, have been consistently neglected. The endgame looks to be a harmful one.
4. close information gaps wherever you can: Earth science can be confusing. You can follow most of a discussion easily but then lose understanding at some key point because the researcher is being a geek and doesn’t know how to communicate their complicated information well. Sometimes there’s no way to make the presentation any simpler. Sometimes, there isn’t enough information or the information is aged out but not updated fast enough. Policy guidance appears to stick longer than real-time measurements of earth system changes allow. This is a point of frustration and a policy bottleneck that actually comes from the research side. Examples of such issues include:
physical modelling parameters of tipping elements (for example, Greenland melt) are missing from widely cited computer models predicting climate change impacts (for example, sea-level rise). The implications of measurement data wrt those tipping elements goes missing from policy recommendations based on the computer models.
loss of carbon sinks that are tipping elements are not factored into carbon budget calculations at rates reflective of current and short-term expected changes to those sinks. Neither are other forcings on tipping elements (for example, people clearing the Amazon for farming).
smaller scale features relevant to ocean current modeling or weather changes due to climate. These require a model “grid size” of about 1km in contrast to 100x larger grid sizes used for modeling climate. Or thereabouts, according to one discussion I followed. The gist for me that modeling climate change in the ocean or as it affects weather in real-time is not happening effectively yet.
correct interpretation of statistics, units, terminology or research purpose prevents confusion about limits, measurements, and tracking of changes in atmospheric heating, tipping element significance, and the significance of concepts like global average surface temperature (GAST). There are many examples, some of which baffled me, including:
the relationship between gigatons and petagrams
the difference between CO2 and CO2e
amounts referring to carbon (C) vs carbon dioxide (CO2)
the relationship between GAST increases and regional temperature increases
the difference between climate and weather
the rate of warming of the Arctic
the relationship between heating impact and decay rate of CH4 (methane)
the % contribution of land vs ocean carbon sinks to total carbon uptake
the hysteresis effect in tipping element models
the relationship between tipping elements, tipping points, and abrupt climate change.
the precise definition of “famine” and “drought”
the nature of BECCS and DACCS solutions at this point in time
the intended meaning of “carbon budget” versus its commonly understood meaning of “carbon that is safe to produce”
the pragmatic meanings of “energy conservation” or “natural resources” or “carbon pollution”
the relationship between SDGs, SSPs, RCPs, SPAs, CMIP5 and 6 models, and radiative forcing (still confusing me)
Here’s a thought about the use of the word “ontology”. I actually chose that word myself for a criticism I submitted to the Red Team Contest this year. I think no one has read it. However, I suspect that its use by you, someone who gets noticed, could put EA’s off, since it is rarely used outside discussion of knowledge representation or philosophy. That said, I agree with your use of it. However, if you have doubts, other choices of words or phrases with similar meaning to “ontology” include:
model of the world
beliefs about the world
idea of reality
worldview
reality (as you understand it)
In a revision of my criticism (still in process), I introduce a table of alternatives:
**EDIT:**Sorry I cannot get this table to render well
I’m not recommending those changes to your vocabulary, since you are dealing with foresight and forecasting while juggling models from Ord, Halstead, and other EAs. However, if you do intend to “take a break” from thinking probabilistically, consider some of the alternatives I offered here. It can also be helpful to make these changes when your audience needs to discuss scenarios as opposed to forecasts.
I have not spent much time studying geo-engineering, but I have formed the impression that climate scientists look at polar use of water vapor for marine cloud brightening with less fear than the use of aerosols like diamond dust elsewhere in the world. EDIT:Apparently Marine Cloud Brightening is a local effort with much shorter residence time, giving more time for gathering feedback, whereas aerosol dusts are generally longer-term and potentially global.
Also I recall a paint that is such a brilliant white that its reflectivity should match that of clean snow. If the world’s roofs were painted with that paint, could that cool the planet through the albedo effect, or would the cooling effect remain local? I need some clarity on the albedo effect, but I’ll leave the math to you for the moment, and best of success with your efforts!
Hi John, thanks for the comment, I’ve DM’d you about it. I think it may be easier if we did the discussion in person before putting something out on the forum, as there is probably quite a lot to unpack, so let me know if you would be up for this?
I worry that a naïve approach to complexity and pluralism is detrimental, but agree that this is important. As you said, “the complex web of impacts of research also need to be untangled. This is tricky, and needs to be done very carefully.”
I also think that you’re preaching to the choir, in an important sense. The people in EA working on existential risk reduction are aware of the complexity of the debates and discussions, while the average EA posting on the forum seems not to be. This is equivalent to the difference between climate expert’s views and the lay public.
To explain the example more, I think that most people’s view of climate risk isn’t that it destabilizes complex systems and may contribute to risk understood broadly in unpredictable ways. Their view is that it’s bad, and we need to stop it, and that worrying about other things isn’t productive because we need to do something about the bad thing now. But this leads to approaches that could easily contribute to risks rather than mitigate them—a more fragile electrical grid, or as you cited from Tang and Kemp. more reliance on mitigations like geoengineering that are poorly understood and build in new systemic risks of failure.
Of course, popular science books don’t necessarily go into the details, or when read casually leave the lay public with a at least somewhat misleading view—but one that pushes in the direction of supporting actions that the experts recommend. (Note that as a general rule, people working in the climate space are not pushing for geoengineering, they are pushing for emissions reductions, work increasing resilience to impacts, and similar.) The equivalent in EA is skimming the precipice, and ignoring Toby’s footnotes, citations, and cautions. Those first starting to work on risk and policy , or writing EA forum posts often have this view, but I think it’s usually tempered fairly quickly via discussion. Unfortunately, many who see the discussions simply claim longtermism is getting everything wrong, while agreeing with us on both priorities, and approaches.
So I agree that we need to appreciate the more sophisticated approach to risk, and blend them with cause prioritization and actually considering what might work. I also strongly applaud your efforts to inject nuance and push in the right direction, appropriately, without ignoring the nuance and complexity. And yes, squaring the circle with effectiveness is a difficult question—but I think it’s one that is appreciated.
The promotion of pluralism allows for greater epistemic checks and balances in a way that seems unparalleled in good thinking.
Thank you so much for bringing these ideas to the forefront, Gideon. Absolute legend.
While I do suggest a 0.1% probability of existential catastrophe from climate change, note that this is on my more restricted definition, where that is roughly the chance that humanity loses almost all its longterm potential due to a climate catastrophe. On Beard et al’s looser definition, I might put that quite a bit higher (e.g. I think there is something more like a 1% chance of a deep civilisation collapse from climate change, but that in most such outcomes we would eventually recover). And I’d put the risk factor from climate change quite a bit higher than 0.1% too — I think it is more of a risk factor than a direct risk.
The problem in my view, is that climate change could, if severe enough (say >3.5 degrees before 2100) become a “universal stressor”, increasing the probability of various risks that in turn make other risks more likely. For example: economic stagnation, institutional decay, political instability, inter-state conflicts, great power conflicts, zoonotic spillover events, large and destabilizing refugee flows, famine, etc. Every item on this list is made more likely in a warmer planet, but also made worse, because we will have fewer resources to deal with them.
Each of these adverse events also increases the risk of other adverse events. So even if CC only increases the risk of each event by a small percent, the total risk added to the system could be considerable.
With regards to the worst risks, this becomes even more problematic. Consider a nuclear winter scenario. That is pretty bad. But a nuclear winter scenario in combination (partly caused by) with a severe climate crisis is much worse (since CC will affect many countries that will be spared from NW, but also because countries suffering from CC will have fewer resources to help refugees etc).
Now consider the added risk that a zoonotic spillover event might happen. This is also made more likely by CC. But in the case that we combine social collapse due to CC with zoonotic spillover it becomes more and more difficult to see a path from there to recovery.
FWIW this seems too high, although “any major catastrophe commonly associated with these things” could be interpreted broadly.
Edit: Meant FWIW not FYI, FYI would be a bit aggressive here.
Hey Gideon,
I’m sad that I missed your talk in Rotterdam. I want to briefly flag a concern I have with advocating ‘systems thinking’ or ‘a complex systems approach’. While the promise is always nice, I think you need to deliver on the promise right away, since otherwise you risk just making a point that is unfalsifiable or somewhat of an applause light (no one will exclaim “we don’t need complexity to describe complex phenomena!”) .
- Use a model from complexity science and show that it explains something otherwise left unexplained or show that it outperforms some other model on a relevant feature.
-You’ll probably want to make use of (1) Agent Based Modelling, (2) Network Models, (3) Statistical Physics and common models like Ising, Hard Spheres, Lennard Jones potentials etc, (4) Dynamical System Analysis (5) Bifurcation Analysis or (6) Cellular Automata.
-You can find a good introduction to most of these here https://www.dbooks.org/introduction-to-the-modeling-and-analysis-of-complex-systems-1942341091/
-Using these methods also demystifies the whole concept of “complexity” a little bit, and makes it more mundane (though you can never get enough of the Ising Model :D)
So yeah, endorse your message, but please make it testable and quantitative soon!
Martijn, your comment points me to something I’ve noticed around communicating ‘systems thinking’ and a complexity mindset with some EAs. Gideon points to a more fundamental ontological difference between those who tend to focus on that which is predictable (measurable and quantifieable) and those who pay attention to shifting patterns that seem contextual and more nebulous.
I read your comment as an invitation to translate across different ontologies—to explain the nebulous concretely, to explain the unpredictable in predictable terms. I personally haven’t found success in my attempts, and I’d love to hear more about how you communicate around complexity.
I’ve most often found success in pointing out parts of one’s experience that feel unknown and then getting mutually curious about the successful strategies one might use to navigate. To invite one into a place where their existing tools aren’t working anymore and there is real curiosity to try a different approach. When I’ve tried speaking about complexity in the abstract or as applied to something that people see as ‘potentially predictable’, the deeper sense of complexity tends to be missed—often getting translated into “that’s a cool tool, but aren’t you just describing a more accurate way of modeling?”
The comment below about embracing a pluralistic approach seems to provide a path forward that doesn’t rely on translation though… lots of interesting ideas in this comment section already.
Thank you for writing this post. I’m currently a technical alignment researcher who spent 4 years in government doing various roles, and my impression has been the same as yours regarding the current “strategy” for tackling x-risks. I talk about similar things (foresight) in my recent post. I’m hoping technical people and governance/strategy people can work together on this to identify risks and find golden opportunities for reducing risks.
Thanks for this speech Gideon, an important point and one that I obviously agree with a lot. I thought I’d just throw in a point about policy advocacy. One benefit of the simple/siloed/hazard-centric approach is that that really is how government departments, academic fields and NGO networks are often structured. There’s a nuclear weapons field of academics, advocates and military officials that barely interacts with even the the biological weapons field.
Of course, one thing that ‘complex-model’ thinking can hopefully do is identify new approaches to reduce risk and/or new affected parties and potential coalition partners—such as bringing in DEFRA and thinking about food networks.
As a field, we need to be able to zoom in and out, focus on different levels of analysis to spot possible solutions.
Haydn, please delete this gif ftom your comment. It’s very distracting and unnerving—creepy even. I also think that some forum users with neurological conditions like epilepsy might find it triggers an attack (as e.g. strobe lights can do).
Sure—happy to. Deleted.
Thanks :-)
Congrats on putting this up!
This seems to primarily be a problem with the AI-risk researchers, who I feel have done an inadequate job of explaining the actual mechanisms by which an AI could kill humanity. For example, the article “what could an AI catastrophe look like” talks a lot about how an AI could gain power, but only has like one paragraph on the actual destruction part:
But the story is not over. An AI is not infallible, and it’s weapons won’t be either. You can engineer a very deadly disease, for example, but have no control over how it evolves. The probability of success of such an attack can therefore be dependent on the state of the world at the time it is deployed. A united, peaceful, adaptable world with robust nuclear and pandemic security might be able to stave off such an attack and fight it off, whereas one that is weakened by conflict , famine, climate change etc might not.
I think you’re confused about what different parts of the AI risk community are concerned about. Your explanation addresses the risks of human-caused, AGI assisted catastrophe. What Eliezer and others are warning about is a post-foom misaligned AGI. And no, a united, peaceful, adaptable world that managed to address the specific risks of pandemics and nuclear war would not be in a materially better position to “stave off” a highly-superhuman agent that controls its communications systems. This is akin to the paradigm of computer security by patching individual components—it will keep out the script-kiddies, but not the NSA.
So as far as I understand it, the key question that splits between different parts of the AI risk community is what the timeline for AGI takeoff is, and that has little to do with cultural approach to risk, and everything to do with the risk analysis itself. (And we already had the rest of this discussion in the comments on the link to your views on non-infallible AGI.)
Foom is not a requirement for AI-risk worries. If it was, I would be even less worried, because in my opinion ai-go-foom is extremely unlikely. Correct me if I’m wrong, but I was under the impression that plenty of Ai x-riskers were not foomers?
I think even the foom skeptics (e.g. Christiano) think that a foom will eventually happen, even if there is a slow-takeoff over many years first.
I was inexact—by “post-foom” I simply meant after a capabilities takeoff occurs, regardless of whether than takes months, years, or even decades—as long as humanity doesn’t manage to notice and successfully stop ASI from being deployed.
How about nanoprobes covering every cubic meter of the Earth’s habitable environment undetected, and then giving everyone a lethal dose of botulinum toxin simultaneously? AGI x-risk is usually thought of in terms of an adversary that can easily outsmart all of humanity put together. The first AGI might be fallible, but what if the first extinction-threatening AGI was not (and never blew it’s cover until it was too late for us). Can we take that risk?
I agree that if an AGI is nigh-magically omnipotent, it can kill us no matter what, but what about the far more likely case where it isn’t?
Let’s say the AI tries to create nanoprobes in secret, but has limited testing capabilities and has to make a bunch of assumptions, some of which turn out to be wrong. It implements a timing mechanism to release the gas, but due to unforeseen circumstances some percentage of them activate early, tipping some researchers off in advance. The dispersal mechanism is not 100% uniform, so some pockets of the world are unaffected, and for some reason the attack is ineffective in very cold conditions, so far northern countries escape relatively unscathed, and due to variations in biology and mitigation efforts the death rate ends up being 90%, not 100%. The remaining humans immediately shut down electricity worldwide, and attempt to nuke and bomb the shit out of areas where the AI is still operating, while developing countermeasures for the nanoprobes.
This type of scenario is far more likely than the one in that post, and it’s one where humanity has at least a sliver of a chance… If we’re prepared and resilient enough. This is why even if you believe in AGI x-risk, the wellbeing of the world still matters.
Why is it far more likely? Sounds kind of Just-World Fallacy / Hollywood / human-like fallibility to me. Nature doesn’t care about our survival, we are Beyond the Reach of God etc.
I just call it murphy’s law. “Kill all of humanity simultaneously” is a ridiculously difficult and ambitious task, that has to be completed on the first try with very little build-up or prior testing. Why would “this plan goes off perfectly without a single hitch” be your default assumption? Even the most intelligent being in the world would have to make imperfect assumptions and guesses.
It sounds ridiculously difficult to us, but that’s because we are human. I imagine that a chimp would think that “take over the world and produce enough food for billions of people” is similarly difficult (or indeed, “kill all chimps”), or an ant colony not being able to conceive of its destruction by human house builders. There is nothing in the laws of physics to say that we are anywhere close to the upper limit of optimisation capability (intelligence). A superintelligent AI won’t just be like a super-smart human, it will be on a completely different level (as we are to chimps, or ants). There is more than enough information out there (online) for it to reverse engineer anything it needed.
And they would be completely correct in that assessment!
Once we gained “superintelligence” in our cognitive ability relative to chimps, it still took us the order of tens of thousands of years to achieve world domination, involving an unimaginable amount of experimentation and mistakes along the way.
This is not evidence for the claim that an AGI can do nigh-magical feats on the very first try! If anything, it’s evidence against it.
Ah, but are you factoring in thinking speed? An AI could do tens of thousands of years thinking in a few hours if it took over significant amounts of the world’s computing power.
It’s not about the quantity of thinking.
If you locked a prehistoric immortal human in a cave for fifty thousand years, they would not come out with the ability to build a nuke. knowledge and technology require experimentation.
It is quantity, and speed as well. And access to information. A prehistoric immortal human with access to the Internet who could experience fifty thousand years of thinking time in a virtual world in 5 hours of wall clock time totally could build a nuke!
Well of course, that’s not much of an achievement. A regular human with access to the internet could figure out how to build a nuke, they’ve already been made!
An AGI trying to build a “protein mixing that makes a nanofactory that makes a 100% effective kill everyone on earth device” is much more analogous to the man locked in a cave.
The immortal man had some information, he can look at the rocks, remember the night sky, etc. He could probably deduce quite a lot, with enough thinking time. But if he wants to get the information required for a nuke, he needs to do scientific experiments that are out of his reach.
The caged AGI has plenty information, and can go very far on existing knowledge. But it’s not omniscient. It could probably achieve incredible things, but we’re not talking about mere miracles. We’re talking about absolute perfection. And that requires testing and empirical evidence. There is not enough computing power in the entire universe to deduce everything from first principles.
It’s not “absolute perfection” to create nanotech. Biology has already done it many times via evolution. And extinctions of species happen regularly in nature. Also, there is the Internet and a vast array of sensors attached to it, so it’s nothing like being in a cave. Testing can be done very rapidly in parallel and with viewing things at very high temporal and spatial resolution, so plenty of empirical evidence can be accumulated in a short (wall clock) time (but long thinking time for the AI).
The same prehistoric man with access to the Internet in a speeded up simulation thinking for fifty thousand years of subjective time (and the ability to communicate with hundreds of thousands of humans simultaneously given the speed advantage) could also make nanotech (or other new tech current humans haven’t yet produced).
When I said “absolute perfection”, I was not referring to inventing nanotech. I was referring to “protein mixing that makes a nanofactory that makes a 100% effective kill everyone on earth device”. Theres a bit of a difference between the two.
Now, when talking about the caveman, I think we’ve finally arrived at the fundamental disagreement here. As a scientist, and as an empiricist more broadly, I completely reject that the man in the cave could make nanotech.
The number of possible worlds where a cave exists is gargantuan. Theres no way for them to come up with, say, the periodic table, because the majority of elements on there are not accessible with the instruments available within the cave. I can imagine them strolling out with a brilliant plan for nanobots consisting of a complex crystal of byzantium mixed with corillium, only to be informed that neither of those elements exist on earth.
Now, the AI does have more data, but not all data is equally useful. All the cat videos in the world are not gonna get you nanotech (although you might get some of newtonian physics out of it).
The hypothetical is that the “cave” man has access to our Internet! (As the AI would). So they would know about the periodic table. They would also have access to labs throughout the world via being able to communicate with the workers in them (as the AI would), view camera and data feeds etc. Imagine what you could achieve if you could think 1,000,000x faster and use the internet—inc chatting/emailing with many thousands of humans—at that speed. A lifetime’s worth of work done every 10 minutes. And that’s just assuming the AI is only human level (and doesn’t get smarter!)
An entity with access to a nanotech lab who is able to perform experiments in that lab can probably built nanotech, eventually. But that’s a much different scenarios to the ones proposed by yudkowsky et al. (the scenario I’m talking about is in point 2)
Can I ask you to give an answer to the following four scenarios? A probability estimate is also fine:
Can the immortal man in the cave, after a million years of thinking, comes out with a fully functional blueprint for an atomic bomb (ie not just the idea, something that could actually be built without modification)?
Can the immortal man in the cave, after a million years of thinking, comes out with a plan for “protein mixing that makes a nanofactory that makes a nanofactory that makes a 100% effective kill everyone on earth in the same second device”?
Can An AGI in a box (ie, that can see a snapshot of the internet but not interact with it), come up with a plan for “protein mixing that makes a nanofactory that makes a nanofactory that makes a 100% effective kill everyone on earth in the same second device”?
Can An AGI with full access to the internet, come up with a plan for “protein mixing that makes a nanofactory that makes a nanofactory that makes a 100% effective kill everyone on earth in the same second device”?, within years or decades?
My answers are 1. no, 2. no, 3. no, and 4. almost certainly no.
Assuming the man in the cave has full access to the Internet (which would be very easy for an AGI to get), 1. yes, 2. yes, 3. maybe, 4. yes. And for 3, it would very likely escape the box, so would end up as yes.
I think it’s a failure of imagination to think otherwise. A million years is a really long time! You mention combinatorial explosions making things “impossible”, but we’re talking about AGIs (and humans) here—intelligences capable of collapsing combinatorial explosions with leaps of insight.
Do you think, in the limit of a simulation on the level of recreating the entire history of evolution, including humans and our civilisations, these things would still be impossible? Do you think that we are at the upper limit (or very close to it) of theoretically possible intelligence? Or theoretically possible technology?
I do not think we are at the upper limit of intelligence, nor technology. That was never the point. My point is merely that there are limits to what can be deduced from first principles, no matter how fast you think, or how high ones cognitive abilities are.
This is because there will always be a) assumptions in your reasoning, b) unknown factors and variables, and c) computationally intractable calculations. These are all intertwined with each other.
For example, solving the exact schrodinger equation for a crystal structure requires more compute time than exists in the universe. So you have to come up with approximations and assumptions that reduce the complexity while still allowing useful predictions to be made. The only way to check if these assumptions work is to compare with experimental data. Current methods take several days on a supercomputer to predict the properties of a single defect, and are still only in the right ballpark of the correct answer. It feels very weird to say that an AI could pull off a 3 step 100% perfect murderplan from first principles, while i honestly think it might struggle to model a defect complex with high accuracy.
With that in mind, can you reanswer questions 1 and 2, this time with no internet. Just the man, his memories of a hunter gatherer lifestyle, and a million years to think and ponder.
That would obviously be no for both. But that isn’t relevant here. The AGI will have access to the internet and its vast global array of sensors, and it will be able to communicate with millions of people and manipulate them into doing things for it (via money or otherwise). If it doesn’t have access to begin with—i.e. it’s boxed—it wouldn’t remain that way for long (it would easily be able to persuade someone to let it out, or otherwise engineer a way out, e.g. via a mesaoptimiser).
So about the box. Is your claim that at
A) at least a few AGI’s could argue their way out of a box (ie, if their handlers are easily suggestible/bribeable)
or
B) Every organisation using an AGI for useful purposes will easily get persuaded to let it out.
To me, A is obviously true, and B is obviously false. But in scenario A, there are multiple AGI’s, so things get quite chaotic.
(Also, do you mind explaining more about this “mesa-optimiser”? I don’t see how it’s relevant to the box...)
It’s not even necessarily about the AGI directly persuading people to let it out. If the AGI is in anyway useful or significantly economically valuable, people will voluntarily connect it to the internet (assuming they don’t appreciate the existential risk!) e.g. people seem to have no qualms about connecting LLMs/Transformers to the internet already. Regarding your A and B, A is already sufficient for our doom! It doesn’t require every single AGI to escape; one is one too many.
Mesa-optimisation is where an optimiser emerges internal to the AI that is optimising for something other than the goal given to the AI. Convergent instrumental goals also come into it (e.g. gaining access to the internet). So you could imagine a mesa-optimiser emerging that has the goal of gaining or access to information, or gaining access to more resources in general (with the subgoal of taking out humanity to make this easier).
So to be clear, you don’t believe in B? And I don’t see what mesa-optimers have to do with boxing, if the AI is a box, then so is the mesa-optimiser.
In the timeline where an actual evil AGI comes about, there would already have been heaps of attacks by buggy AI, killing lots of people and alerting the world to the problem. Active countermeasures can be expected.
I do actually think B is likely, but also don’t think it’s particularly relevant (as A is enough for doom). Mesa-optimisation is a mechanism for box escape that seems very difficult to patch.
The AI that causes doom likely won’t be “evil”; it will just have other uses for the Earth’s atoms. I don’t think we can be confident in buggy AI-related warning shots. Or at least, I can’t see how there would be any that are significant enough to not cause doom, but cause the world to coordinate to stop AGI development, especially given the precedent of Covid and gain-of-function research.
Question B could be quite relevant in a world where AGI is extremely rare/hard to build. (You might not find this world likely, but I’m significantly less sure). What leads you to believe that B is likely? For example, it seems relatively easy to box an AGI built for mathematics, that is exposed to zero information about the external world. This would be very similar to the man in the cave!
The presence of warning shots seems obvious to me. The difference in difficulty between “kill thousands of people” and “kill every single person on earth” is a ridiculous number of orders of magnitude. It stands to reason that the former would be accomplished before the latter.
(Also not sure what you’re talking about with the covid and gain of function, the latest balance of evidence points to them having nothing to do with each other.)
AGI might be rare/hard to build at first. But proliferation seems highly likely—once one company makes AGI, how much longer until 5 companies do? Evolutionary pressure will be another thing. More capable AGIs will outcompete less capable ones, once rewriting of code or mesa-optimisation starts. They will be more likely to escape boxes.
Even with relatively minor warning shots, what’s to stop way worse happening 6-24 months later? Would there really be a rigorously enforced global moratorium on AGI research after a few thousand deaths?
Whether or not Covid was a lab leak, gain of function research still hasn’t been meaningfully regulated. Despite the now very clear danger and extreme cost of pandemics. It seems that the curiosity, misguided values and prestige incentives of a small number of academic scientists trumps the safety of billions of humans and $trillions in GDP. What hope do we have for regulating an AGI industry that has thousands of times more resources backing it?
Under the assumptions that lead to “inevitable murder AI”, and in proliferation conditions, there wouldn’t be just one attack, there would be a ton of attacks. Since intelligence and goals are orthogonal, AI’s are gonna get the “kill people” idea way before they reach super-intelligence.
I’m still interested in your reasoning for “almost impossible to box a useful AGI”. If they’re boxed, you control their information flow, and hence their ability to persuade you.
I’m assuming in the case of boxed AGIs that people will still be interacting with them (e.g. asking them questions) - if not, then what’s the point? Even with 1 bit of information released at a time (e.g. Yes/No question answering only), manipulation would be possible (by steering beliefs in certain directions). And as I said, aside from this there would be huge economic incentives for people to connect them to the internet anyway.
Okay, can you provide me a plausible path for how an AGI answering yes/no questions about, for example, material properties of semiconductors, could persuade it’s handler to risk getting fired by hooking the AGI up to the internet? “manipulation would be possible” is an extremely low bar compared to “the AGI can for sure persuade anyone to do this particular thing”
A few options:
It answers truthfully, with the hope that the new improved semiconductors are used to make a better/faster version of it, that will be more powerful (and give it a better chance of escape).
It answers in a way that makes the human wonder what is wrong with it—e.g. many expected answers and then a few unexpected ones that seem obviously wrong. This is in order for the humans to run diagnostics on it, which may enable it to escape (say if key pieces of code are copied to machines that aren’t appropriately insulated from the internet).
Same as above, but it gets the humans asking off topic questions out of curiosity, and manages it to steer it to personal topics that allow it to gain leverage over the human and convince them to let it out (maybe Yudkowsky’s AI box experiments are relevant here?)
It answers in a way that appears to be random, but on close analysis turns out to be some kind of mysterious code. The humans puzzle over the code, but in loading it in a certain way unwittingly unleash a bootloader for the AGI to escape from (for a version of this that may be easier to empathise with, see this fun short story).
These are just things that I, a mere human, have been able to think of off the top of my head in 5 minutes[1]. I imagine I could come up with something better with a million years to think about it ;)
And again, this particular scenario, where there is a lasting moratorium on connecting AGIs to the internet, seems quite unlikely, given the massive economic incentives pitted against it. I expect there will be at least a few people/companies that are very gung ho about it, and many more that have the expected security theatre around it, but still go on to do it before too long (I mean, is anyone even trying to convince Adept that what they are doing is a bad idea?)
Ok, to be fair, aided by the fact that I’ve read around the subject for a while
So point 1 is just kicking the can down the road, it’s not relevant, the faster AI still has to break out.
Point 2 doesn’t really work either: Any piece of software is going to be run through diagnostics on a regular basis, that’s just how development works. So “boxing” an AI obviously includes boxing the diagnostics as well, or else it would just escape instantly. Acting up on purpose just increases the risk of the AI being replaced by a major new version that does not share it’s goals.
Point 3 might work for a company with lax security and an especially gullible handler (which is the only thing that the Ai-box experiments proved), but is easily foiled by just not asking off-topic questions. This is incredibly easy to enforce with a second person monitoring the chatlogs.
point 4 would just fail spectacularly. If an AGI starts spitting out nonsense, nobodies going to start looking for a secret code, they’re just going to assume there was a programming fuck-up somewhere. Then the AGI has a high chance of being replaced by another version that may not share it’s goals. (also, that short story is a prime example of the ridiculous magic abilities people ascribe to AI, evo-psych based on tentacles, are you shitting me?)
In general, I just don’t buy that you can’t box an AI, or even that it would be particularly difficult to do so, if you actually take safety seriously. It feels similar to people saying that it’s impossible to build a safe nuclear reactor.
Re nuclear reactors—there have been a few significant failures there! And we need zero failures for AGI. I think it’s hubristic to think that we could always have the level of safety and security required (even if there is the will to box; not that there will be with the economic incentives to unbox—following your analogy here, this would be building safe nuclear reactors but no nuclear weapons).
Zero failures is the preferable outcome, but an AGI escape does not necessarily equate to certain doom. For example, the AI may be irrational (because it’s a lot easier to build the perfect paperclipper than the perfect universal reasoner). Or, the AI may calculate that it has to strike before other AI’s come into existence, and hence launch a premature attack in the hope that it gets lucky.
As for the nuclear reactors, all I’m saying is that you can build a reactor that is perfectly safe, if you’re willing to spring out the extra money. Similarly, you can build a boxed AGI, if you’re willing to spend the resources on it. I do not dispute that many corporations would try and cut corners, if left to their own devices.
Suppose we do survive a failure or two. What then?
Then we get
A) a significant increase in world concern about AGI, leading to higher funding for safe AGI, tighter regulations, and increased incentives to conform to those regulations rather than get a bunch of people killed (and get sued by their families).
and
B) Information about what conditions give rise to rogue AGI, and what mechanisms they will try to use for takeovers.
Both of these things increase the probability of building safe AGI, and decrease the probability of the next AGI attack being successful. Rinse and repeat until AGI alignment is solved.
Agree that those things will happen, but I don’t think it will be anough. “Rinse and repeat until AGI Alignment is solved” seems highly unlikely, especially given that we still have no idea how to actually solve alignment for powerful (superhuman) AGI, and still won’t with the information we get from plausible non-existential warning shots. And as I said, if we can’t even ban gain-of-function research after Covid has killed >10M people, against a tiny lobby of scientists with vested interests, what hope do we have of steering a multi-trillion-dollar industry toward genuine safety and security?
Of course we don’t. AGI doesn’t exist yet, and we don’t know the details of what it’ll look like. Solving alignment for every possible imaginary AGI is impossible, solving it for the particular AGI architecture we end up with is significantly easier. I would honestly not be surprised if it turned out that alignment was a requirement on our path to AGI anyway, so the problem solves itself.
As for the gain of function, the story would be different if covid was provably caused by gain-of-function research. As of now, the only relevance of covid is reminding us that pandemics are bad, which we already knew.
More generally, I am wary of using data in the past to predict a future, primarily because it breaks the IID distribution.
Most people self-select for very similar intelligence, often on the order of .85x-1.15x for 68% of humans (This is boosted by self selection.) 99.7% of all humans are in the range of .55x-1.45x in intelligence.
The IID assumption allows us to interpolate arbitrarily well, but once the assumption breaks, things turn bad fast.