Matthew_Barnett

Karma: 3,524

Why I think it’s important to work on AI forecasting

Matthew_Barnett27 Feb 2023 21:24 UTC

179 points

10 comments10 min readEA link

Concerning the Recent 2019-Novel Coronavirus Outbreak

Matthew_Barnett27 Jan 2020 5:47 UTC

144 points

142 comments3 min readEA link

Matthew_Barnett 25 Apr 2024 2:18 UTC
131 points
10 ∶ 9
on: Matthew_Barnett’s Shortform
In this “quick take”, I want to summarize some my idiosyncratic views on AI risk.
My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI.
(Note that I won’t spend a lot of time justifying each of these views here. I’m mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.)
1. Skepticism of the treacherous turn: The treacherous turn is the idea that (1) at some point there will be a very smart unaligned AI, (2) when weak, this AI will pretend to be nice, but (3) when sufficiently strong, this AI will turn on humanity by taking over the world by surprise, and then (4) optimize the universe without constraint, which would be very bad for humans.
  
  By comparison, I find it more likely that no individual AI will ever be strong enough to take over the world, in the sense of overthrowing the world’s existing institutions and governments by surprise. Instead, I broadly expect unaligned AIs will integrate into society and try to accomplish their goals by advocating for their legal rights, rather than trying to overthrow our institutions by force. Upon attaining legal personhood, unaligned AIs can utilize their legal rights to achieve their objectives, for example by getting a job and trading their labor for property, within the already-existing institutions. Because the world is not zero sum, and there are economic benefits to scale and specialization, this argument implies that unaligned AIs may well have a net-positive effect on humans, as they could trade with us, producing value in exchange for our own property and services.
  
  Note that my claim here is not that AIs will never become smarter than humans. One way of seeing how these two claims are distinguished is to compare my scenario to the case of genetically engineered humans. By assumption, if we genetically engineered humans, they would presumably eventually surpass ordinary humans in intelligence (along with social persuasion ability, and ability to deceive etc.). However, by itself, the fact that genetically engineered humans will become smarter than non-engineered humans does not imply that genetically engineered humans would try to overthrow the government. Instead, as in the case of AIs, I expect genetically engineered humans would largely try to work within existing institutions, rather than violently overthrow them.
2. AI alignment will probably be somewhat easy: The most direct and strongest current empirical evidence we have about the difficulty of AI alignment, in my view, comes from existing frontier LLMs, such as GPT-4. Having spent dozens of hours testing GPT-4′s abilities and moral reasoning, I think the system is already substantially more law-abiding, thoughtful and ethical than a large fraction of humans. Most importantly, this ethical reasoning extends (in my experience) to highly unusual thought experiments that almost certainly did not appear in its training data, demonstrating a fair degree of ethical generalization, beyond mere memorization.
  
  It is conceivable that GPT-4′s apparently ethical nature is fake. Perhaps GPT-4 is lying about its motives to me and in fact desires something completely different than what it professes to care about. Maybe GPT-4 merely “understands” or “predicts” human morality without actually “caring” about human morality. But while these scenarios are logically possible, they seem less plausible to me than the simple alternative explanation that alignment—like many other properties of ML models—generalizes well, in the natural way that you might similarly expect from a human.
  
  Of course, the fact that GPT-4 is easily alignable does not immediately imply that smarter-than-human AIs will be easy to align. However, I think this current evidence is still significant, and aligns well with prior theoretical arguments that alignment would be easy. In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking bad actions, allowing us to shape its rewards during training accordingly. After we’ve aligned a model that’s merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.
3. The default social response to AI will likely be strong: One reason to support heavy regulations on AI right now is if you think the natural “default” social response to AI will lean too heavily on the side of laissez faire than optimal, i.e., by default, we will have too little regulation rather than too much. In this case, you could believe that, by advocating for regulations now, you’re making it more likely that we regulate AI a bit more than we otherwise would have, pushing us closer to the optimal level of regulation.
  
  I’m quite skeptical of this argument because I think that the default response to AI (in the absence of intervention from the EA community) will already be quite strong. My view here is informed by the base rate of technologies being overregulated, which I think is quite high. In fact, it is difficult for me to name even a single technology that I think is currently clearly underregulated by society. By pushing for more regulation on AI, I think it’s likely that we will overshoot and over-constrain AI relative to the optimal level.
  
  In other words, my personal bias is towards thinking that society will regulate technologies too heavily, rather than too loosely. And I don’t see a strong reason to think that AI will be any different from this general historical pattern. This makes me hesitant to push for more regulation on AI, since on my view, the marginal impact of my advocacy would likely be to push us even further in the direction of “too much regulation”, overshooting the optimal level by even more than what I’d expect in the absence of my advocacy.
4. I view unaligned AIs as having comparable moral value to humans: This idea was explored in one of my most recent posts. The basic idea is that, under various physicalist views of consciousness, you should expect AIs to be conscious, even if they do not share human preferences. Moreover, it seems likely that AIs — even ones that don’t share human preferences — will be pretrained on human data, and therefore largely share our social and moral concepts.
  
  Since unaligned AIs will likely be both conscious and share human social and moral concepts, I don’t see much reason to think of them as less “deserving” of life and liberty, from a cosmopolitan moral perspective. They will likely think similarly to the way we do across a variety of relevant axes, even if their neural structures are quite different from our own. As a consequence, I am pretty happy to incorporate unaligned AIs into the legal system and grant them some control of the future, just as I’d be happy to grant some control of the future to human children, even if they don’t share my exact values.
  
  Put another way, I view (what I perceive as) the EA attempt to privilege “human values” over “AI values” as being largely arbitrary and baseless, from an impartial moral perspective. There are many humans whose values I vehemently disagree with, but I nonetheless respect their autonomy, and do not wish to deny these humans their legal rights. Likewise, even if I strongly disagreed with the values of an advanced AI, I would still see value in their preferences being satisfied for their own sake, and I would try to respect the AI’s autonomy and legal rights. I don’t have a lot of faith in the inherent kindness of human nature relative to a “default unaligned” AI alternative.
5. I’m not fully committed to longtermism: I think AI has an enormous potential to benefit the lives of people who currently exist. I predict that AIs can eventually substitute for human researchers, and thereby accelerate technological progress, including in medicine. In combination with my other beliefs (such as my belief that AI alignment will probably be somewhat easy), this view leads me to think that AI development will likely be net-positive for people who exist at the time of alignment. In other words, if we allow AI development, it is likely that we can use AI to reduce human mortality, and dramatically raise human well-being for the people who already exist.
  
  I think these benefits are large and important, and commensurate with the downside potential of existential risks. While a fully committed strong longtermist might scoff at the idea that curing aging might be important — as it would largely only have short-term effects, rather than long-term effects that reverberate for billions of years — by contrast, I think it’s really important to try to improve the lives of people who currently exist. Many people view this perspective as a form of moral partiality that we should discard for being arbitrary. However, I think morality is itself arbitrary: it can be anything we want it to be. And I choose to value currently existing humans, to a substantial (though not overwhelming) degree.
  
  This doesn’t mean I’m a fully committed near-termist. I sympathize with many of the intuitions behind longtermism. For example, if curing aging required raising the probability of human extinction by 40 percentage points, or something like that, I don’t think I’d do it. But in more realistic scenarios that we are likely to actually encounter, I think it’s plausibly a lot better to accelerate AI, rather than delay AI, on current margins. This view simply makes sense to me given the enormously positive effects I expect AI will likely have on the people I currently know and love, if we allow development to continue.

[Question] What is the current most representative EA AI x-risk argument?

Matthew_Barnett15 Dec 2023 22:04 UTC

116 points

48 comments3 min readEA link

My thoughts on the social response to AI risk

Matthew_Barnett1 Nov 2023 21:27 UTC

116 points

17 comments1 min readEA link

AI alignment shouldn’t be conflated with AI moral achievement

Matthew_Barnett30 Dec 2023 3:08 UTC

110 points

15 comments5 min readEA link

A compute-based framework for thinking about the future of AI

Matthew_Barnett31 May 2023 22:00 UTC

96 points

36 comments19 min readEA link

The possibility of an indefinite AI pause

Matthew_Barnett19 Sep 2023 12:28 UTC

90 points

73 comments15 min readEA link

Matthew_Barnett 8 Jun 2023 7:43 UTC
90 points
35 ∶ 1
on: A note of caution about recent AI risk coverage
I suspect that if transformative AI is 20 or even 30 years away, AI will still be doing really big, impressive things in 2033, and people at that time will get a sense that even more impressive things are soon to come. In that case, I don’t think many people will think that AI safety advocates in 2023 were crying wolf, since one decade is not very long, and the importance of the technology will have only become more obvious in the meantime.

Matthew_Barnett 3 Feb 2024 6:14 UTC
75 points
12 ∶ 18
on: Matthew_Barnett’s Shortform
I’m curious why there hasn’t been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
1. Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions.
2. Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don’t appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn’t necessarily matter whether AIs are conscious.
3. Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of “population accelerationism”. Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-capita incomes. Indeed, humans populations have recently stagnated via low population growth rates, and AI promises to lift this bottleneck.
4. Therefore, AI accelerationism seems straightforwardly recommended by total utilitarianism under some plausible theories.
Here’s a non-exhaustive list of guesses for why I think EAs haven’t historically been sympathetic to arguments like the one above, and have instead generally advocated AI safety over AI acceleration (at least when these two values conflict):
- A belief that AIs won’t be conscious, and therefore won’t have much moral value compared to humans.
  - But why would we assume AIs won’t be conscious? For example, if Brian Tomasik is right, consciousness is somewhat universal, rather than being restricted to humans or members of the animal kingdom.
  - I also haven’t actually seen much EA literature defend this assumption explicitly, which would be odd if this belief is the primary reason EAs have for focusing on AI safety over AI acceleration.
- A presumption in favor of human values over unaligned AI values for some reasons that aren’t based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have “interesting” values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like “ideal moral values” compared to AIs.
  - Why would humans be more likely to have “interesting” values than AIs? It seems very plausible that AIs will have interesting values even if their motives seem alien to us. AIs might have even more “interesting” values than humans.
  - It seems to me like wishful thinking to assume that humans are strongly motivated by moral arguments and would settle upon something like “ideal moral values”
- A belief that population growth is inevitable, so it is better to focus on AI safety.
  - But a central question here is why pushing for AI safety—in the sense of AI research that enhances human interests—is better than the alternative on the margin. What reason is there to think AI safety now is better than pushing for greater AI population growth now? (Potential responses to this question are outlined in other bullet points above and below.)
- AI safety has lasting effects due to a future value lock-in event, whereas accelerationism would have, at best, temporary effects.
  - Are you sure there will ever actually be a “value lock-in event”?
  - Even if there is at some point a value lock-in event, wouldn’t pushing for accelerationism also plausibly affect the values that are locked in? For example, the value of “population growth is good” seems more likely to be locked in, if you advocate for that now.
- A belief that humans would be kinder and more benevolent than unaligned AIs
  - Humans seem pretty bad already. For example, humans are responsible for factory farming. It’s plausible that AIs could be even more callous and morally indifferent than humans, but the bar already seems low.
  - I’m also not convinced that moral values will be a major force shaping “what happens to the cosmic endowment”. It seems to me that the forces shaping economic consumption matter more than moral values.
- A bedrock heuristic that it would be extraordinarily bad if “we all died from AI”, and therefore we should pursue AI safety over AI accelerationism.
  - But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker?
- An adherence to person-affecting views in which the values of currently-existing humans are what matter most; and a belief that AI threatens to kill existing humans.
  - But in this view, AI accelerationism could easily be favored since AIs could greatly benefit existing humans by extending our lifespans and enriching our lives with advanced technology.
- An implicit acceptance of human supremacism, i.e. the idea that what matters is propagating the interests of the human species, or preserving the human species, even at the expense of individual interests (either within humanity or outside humanity) or the interests of other species.
  - But isn’t EA known for being unusually anti-speciesist compared to other communities? Peter Singer is often seen as a “founding father” of the movement, and a huge part of his ethical philosophy was about how we shouldn’t be human supremacists.
  - More generally, it seems wrong to care about preserving the “human species” in an abstract sense relative to preserving the current generation of actually living humans.
- A belief that most humans are biased towards acceleration over safety, and therefore it is better for EAs to focus on safety as a useful correction mechanism for society.
  - But was an anti-safety bias common for previous technologies? I think something closer to the opposite is probably true: most humans seem, if anything, biased towards being overly cautious about new technologies rather than overly optimistic.
- A belief that society is massively underrating the potential for AI, which favors extra work on AI safety, since it’s so neglected.
  - But if society is massively underrating AI, then this should also favor accelerating AI too? There doesn’t seem to be an obvious asymmetry between these two values.
- An adherence to negative utilitarianism, which would favor obstructing AI, along with any other technology that could enable the population of conscious minds to expand.
  - This seems like a plausible moral argument to me, but it doesn’t seem like a very popular position among EAs.
- A heuristic that “change is generally bad” and AI represents a gigantic change.
  - I don’t think many EAs would defend this heuristic explicitly.
- Added: AI represents a large change to the world. Delaying AI therefore preserves option value.
  - This heuristic seems like it would have favored advocating delaying the industrial revolution, and all sorts of moral, social, and technological changes to the world in the past. Is that a position that EAs would be willing to bite the bullet on?
What links here?

Slightly against aligning with neo-luddites

Matthew_Barnett26 Dec 2022 23:27 UTC

71 points

17 comments4 min readEA link

AI values will be shaped by a variety of forces, not just the values of AI developers

Matthew_Barnett11 Jan 2024 0:48 UTC

70 points

3 comments3 min readEA link

A proposal for a small inducement prize platform

Matthew_Barnett5 Jun 2021 19:06 UTC

66 points

10 comments3 min readEA link

Preventing a US-China war as a policy priority

Matthew_Barnett22 Jun 2022 18:07 UTC

64 points

22 comments8 min readEA link

Effects of anti-aging research on the long-term future

Matthew_Barnett27 Feb 2020 22:42 UTC

61 points

33 comments4 min readEA link

Updating Drexler’s CAIS model

Matthew_Barnett17 Jun 2023 1:57 UTC

59 points

0 comments1 min readEA link

Matthew_Barnett 23 Jun 2022 9:41 UTC
54 points
0 ∶ 0
in reply to: Wei Dai’s comment on: Preventing a US-China war as a policy priority
My assessment is that actually the opposite is true.
The argument you presented appears excellent to me, and I’ve now changed my mind on this particular point.

Matthew_Barnett 5 Oct 2022 0:02 UTC
51 points
14 ∶ 2
on: Overreacting to current events can be very costly
I strongly agree with the general point that overreaction can be very costly, and I agree that EAs overreacted to Covid, particularly after it was already clear that the overall infection fatality rate of Covid was under 1%, and roughly 0.02% in young adults.
However, I think it’s important to analyze things on a case-by-case basis, and to simply think clearly about the risk we face. Personally, I felt that it was important to react to Covid in January-March 2020 because we didn’t understand the nature of the threat yet, and from my perspective, there was a decent chance that it could end up being a global disaster. I don’t think the actions I took in that time—mainly stocking up on more food—were that costly, or irrational. After March 2020, the main actions I took were wearing a mask when I went out and avoiding certain social events. This too, was not very costly.
I think nuclear war is a fundamentally different type of risk than Covid, especially when we’re comparing the ex-ante risks of nuclear war versus the ex-post consequences of Covid. In my estimation, nuclear war could kill up to billions of people via very severe disruptions to supply chains. Even at the height of the panic, the most pessimistic credible forecasts for Covid were nowhere near that severe.
In addition, an all-out nuclear war is different from Covid because of how quickly the situation can evolve. With nuclear war, we may live through some version of the following narrative: At one point in time, the world was mostly normal. Mere hours later, the world was in total ruin, with tens of millions of people being killed by giant explosions. By contrast, Covid took place over months.
Given this, I personally think it makes sense to leave SF/NYC/wherever if we get a very clear and unambiguous signal that a large amount of the world may be utterly destroyed in a matter of hours.

Matthew_Barnett 12 Jan 2023 22:36 UTC
49 points
20 ∶ 0
in reply to: britomart’s comment on: I Support Bostrom
At the time Cinera’s post was published, the most upvoted post on the EA forum about the controversy was this post, which explicitly said that Bostrom’s apology was insufficient,
His apology fails badly to fully take responsibility or display an understanding of the harm the views expressed represent.

My current thoughts on the risks from SETI

Matthew_Barnett15 Mar 2022 17:17 UTC

47 points

9 comments10 min readEA link