Red-teaming existential risk from AI

Zed TararNov 30, 2023, 2:35 PM

30 points

AI risk skepticism Artificial intelligence Existential risk Red teaming Opinion Forecasting AI safety

Are we too willing to accept forecasts from experts on the probability of humanity’s demise at the hands of artificial intelligence? What degree of individual liberty should we curtail in the name of AI risk mitigation? I argue that focusing on AI’s existential risk distracts from real negative externalities that we can observe today and that we ought to dismiss long-range forecasts with low confidence scores.

Examining existential risk scenarios

The physicist Richard Feynman put it best, “The first principle is that you must not fool yourself, and you are the easiest person to fool,” in other words, claiming N number of experts, all subject to the same inherent cognitive biases we all suffer, espouse a specific belief is a poor substitute for rigorous, evidence-based decision-making. Yet this is precisely where the debate on existential risk from AI seems to hinge.

Eliezer Yudkowsky’s argument in his Time magazine oped reads:

Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.

Most AI doomsday scenarios rely on compounding assumptions and intuitive leaps (for example, that creating synthetic super intelligence is possible in the first instance, or that it escapes human control, or that it is centralized, or that it becomes agentic, or that it decides to destroy humanity). Rather than delve into a specific doomsday scenario, Yudkwosky links to a “survey” in which respondents estimate the risk of uncontrolled AI. Again, these forecasts are given without any supporting evidence (fanciful and detailed doomsday scenarios notwithstanding). Still, when a group of AI researchers tell you that we should slow down or risk destroying humanity, we should listen, right? Perhaps not.

The validity of forecasts beyond the 10-year horizon

We know from Phil Tetlock’s work on forecasting that domain experts consistently overestimate risks emanating from their field and that the accuracy of forecasts decays rapidly as time horizons expand. It may be reasonable to forecast events with a 12-month, and even 5-year, horizon. Beyond that scope, accuracy is so hard to estimate that it renders the forecast almost useless from a policy view. In a world of trade-offs and limited resources, should governments halt progress or divert public energy to risks that are impossible to quantify accurately?

Actuarial tables help us hedge against risk—they do this because they employ base rates in their forecasts. As critics of the long-termist viewpoint have noted, the base-rate for human extinction is zero. Of course, this is of only mild comfort since the past can only tell us so much about the future. Still, invoking Tetlock once more, the base-rate is what informs our forecasts, meaning any attempt to estimate the existential threat from technology ought to start from zero.

Examining the prevalence of belief in AI existential risk

Let’s assume for a moment that domain experts who warn of imminent threats to humanity’s survival from AI are acting in good faith and are sincere in their convictions. How can one explain why so many intelligent individuals (some dubbed the “godfathers of AI” by news media) would coalesce around an unreasonable position? I suspect two phenomena may be at play—a general foreboding about the future coupled with motivated reasoning.

Psychologist Tali Sharot illustrates this in her work, noting that surveys and her own lab’s experiments consistently find a gap between general optimism of personal circumstances and outlook for society. “While private optimism (positive expectations regarding our own future) is commonplace, it is typically accompanied by public despair (negative expectations regarding the future of our country),” she writes. A Pew Research survey from August 2023 found that more than half of Americans were “more concerned than excited” about the increased use of artificial intelligence. Fear of a change and novel technology isn’t limited to AI—we’ve seen similar skepticism from the general public on nuclear fission, climate change, genetic engineering, etc. Why this systemic aversion to novel technology?

Avoiding the unknown recruits two cognitive biases—the status quo bias and uncertainty avoidance. Status quo bias is a subcategory born from Kahneman and Tversky’s prospect theory, in which people overvalue what they already have compared to what they don’t. According to researchers, people systematically avoid uncertainty whenever possible, although there is a marked cultural difference between groups in their tolerance for ambiguity. Differing national levels in risk tolerance may explain the gap between public opinion on, for example, genetic engineering between the United States and Europe.

Wharton Business School professor Jonah Berger puts this well in his book Contagious, “This devaluing of things uncertain is called the ‘uncertainty tax.’ When choosing between a sure thing and a risky one, the risky option has to be that much better to get chosen. The remodeled room has to be that much nicer. The gamble has to be that much higher in expected value.” This may help explain why novel technologies are met with suspicion to varying degrees across national cultures, but it still leaves the question of why domain experts continue to view specific AI doomsday scenarios as credible.

Fooling oneself

Another cognitive quirk may be at play—the ability to structure an argument around an emotion. Science author Dave Robson puts it well in The Intelligence Trap, noting three reasons why “an intelligent person may act stupidly.” Namely, “They may lack elements of creative or practical intelligence that are essential for dealing with life’s challenges; they may suffer from ‘dysrationalia,’ using biased intuitive judgments to make decisions; and they may use their intelligence to dismiss any evidence that contradicts their views thanks to motivated reasoning.” This is not to say that any specific AI researcher who assigns a probability of say, 10 percent, that rogue AI will destroy humanity in the next decade suffers from dysrationalia or is hopelessly trapped in cognitive biases, but it gives us a sense of the macro picture. In any case, we ought to judge the argument on its merits and not the pedigree of its proponent.

Skepticism of expert warnings

Most of us are inclined to weigh expert opinion above that of novices, yet this heuristic may break down in cases with high uncertainty. You should value your physician’s interpretation of an MRI more than an eight-year-old, but you may want to assign their judgments of a coming apocalypse equally. In essence, questions that require forecasting beyond the next decade and that require multiple assumptions to be true move from epistemic uncertainty to aleatory uncertainty—from the theoretically knowable to the unknowable. Setting aside the existential risk from AI, we can instead focus on near-term negative externalities.

The key to efficient market interventions from central authorities in the form of regulation is an entire subject unto itself. For our purposes, we must acknowledge the inherent trade-offs between market efficiency, liberty, and regulation. We balance an individual right to free speech with collective security, for example, by curtailing speech when it is designed to spark violence. Too often, the debate around AI regulation is painted without mention of trade-offs. For example, a global pause in model training that many advocated for made no reference to the idea’s inherent weakness—that is, it sets up a prisoner’s dilemma in which the more AI firms voluntarily agree to pause research, the greater the incentive for any one group to defect from the agreement and gain a competitive edge. It makes no mention of practical implementation, nor does it explain how it arrived on its pause time-duration; nor does it recognize the improbability of enforcing a global treaty on AI.

Any discussion on curtailing private enterprise in the name of public good must clearly establish a real causal relationship to negative externalities and estimate trade-offs in the form of lower efficiency and slower progress. Systematic reviews show an inverse relationship between regulatory burdens and innovation. And innovation will be the key to continued global prosperity, without which we may see increased geopolitical instability as pension systems collapse under demographic burdens.

Moving the debate towards observable risk

A better way to look at AI alignment might be to set aside existential risk and focus on demonstrated externalities and ethical considerations. While less heroic, considerations like the production at scale of hateful synthetic media, or copyright infringement, or scams hypercharged by AI, are more proximate and data-driven then long-range forecasts of doom. What might research along those lines look like?

Focusing on demonstrated externalities from LLMs and other forms of AI means creating industry standards, best practices, a code of ethics, etc. Just as the public demands accountability and sound ethical practices from journalists and major news media, or from its legal practitioners, so too should we expect responsible behavior from AI companies. And just as news media organizes itself into associations with its own norms that ultimately protect the entire industry from bad actors, so too should AI firms form their own guild. Moving the discussion to the responsible development of AI and away from doomsday scenarios, we can focus on practical steps, which companies like Anthropic are already doing, as are institutions like the Center for Human-Campatible Artificial Intelligence

Practical interventions

The AI safety community must acknowledge the practical limits of global enforcement and regulatory regimes. For example, authorities can attempt to ban doping in professional sports, but pull factors still prompt athletes to risk their health to gain an advantage. The more athletes that are drug-free, the greater the incentive to cheat. On a grander scale, nuclear weapon proliferation works this way. A strict international regime dedicated to preventing proliferation still failed to prevent India, Israel, Pakistan, North Korea, and, likely, Iran from acquiring weapons. (The counterfactual is unknowable, i.e., how many states would have an active nuclear weapons program in the absence of a global regime—I would maintain that states with nuclear umbrellatreaties have little incentive to pursue costly and unpopular weapons programs).

Instead of outlandish ideas of a new global government capable of unilaterally curtailing compute power or some other factor through force, we should focus on what is practically achievable today. Encouraging firms like OpenAI to red-team their models before release, for example, is practical and limits negative externalities. This practical approach, which is already well underway in labs across the globe, and focuses on issues like LLMs’ interpretability and on creating a global community of researchers. The American Bar Association looks to limit unethical behavior among attorneys—this ultimately helps the entire industry and engenders public trust. AI companies need similar institutions, ones that encourage ethical behavior and avoid race to the bottom.

What links here?

Zed Tarar's comment on We’re Not Ready: thoughts on “pausing” and responsible scaling policies by Holden Karnofsky (Dec 1, 2023, 1:41 PM; 1 point)

Zed TararNov 30, 2023, 2:35 PM

30 points

16 comments6 min readEA link

AI risk skepticism Artificial intelligence Existential risk Red teaming Opinion Forecasting AI safety

JWS 🔸Dec 4, 2023, 11:31 PM
15 points
1 ∶ 0

Thanks for sharing the post Zed :) Like titotal says, I hope you consider staying around. I think AI-risk (AIXR) sceptic posts should be welcomed on the Forum. I’m someone who’d probably count as AIXR sceptic for the EA community (but not the wider world/public). It’s clearly an area you think EA as a whole is making a mistake, so I’ve read the post and recent comments and have some thoughts that I hope you might find useful:
I think there are some good points you made:
- I really appreciate posts that push against the ‘EA Orthodoxy’ on the Forum that start off useful discussions. I think ‘red-teaming’ ideas is a great example of necessary error-correction, so regardless of how much I agree or not, I want to give you plaudits for that.
- On humility in long-term forecasts—I completely agree here. I’m sure you’ve come across it but Tetlock’s recent forecasting tournament deals with this question and does indeed find Forecasters place lower AIXR than subject-matter experts.^[1] But I’d still say that a risk of extinction roughly ~1% is worth considering as an important risk worth consideration and more investigation, wouldn’t you?
- I think your scepticism on very short timelines is directionally very valid. I hope that those who have made very, very short timeline predictions on Metaculus are willing to update if those dates^[2] come and go without AGI. I think one way out of the poor state of the AGI debate is for more people to make concrete falsifiable predictions.
- While I disagree with your reasoning about what the EA position on AIXR is (see below), I think it’s clear that many people think that is the position, so I’d really like to here how you’ve come to this impression and what EA or the AIXR community could do to present a more accurate picture of itself. I think reducing this gap would be useful for all sides.
Some parts that I didn’t find convincing:
- You view Hanson’s response as a knock-down argument. But he only addresses the ‘foom’ cases and only does so heuristically, not from any technical arguments. I think more credible counterarguments are being presented by experts such as Belrose & Pope, who you might find convincing (though I think they have non-trivial subjective estimates of AIXR too fwiw).
- I really don’t like the move to psychoanalyse people in terms of bias. Is bias at play? Of course, it’s at play for all humans, but therefore just as likely for those who are super optimistic as those pessimistic a-priori. I think once something breaks through enough to be deemed ‘worthy of consideration’ then we ought to do most of our evaluation on the merits of the arguments given. You even say this at the end of the ‘fooling oneself’ section! I guess I think the questions of “are AIXR concerns valid?” and “if not, why are they so prominent?” are probably worth two separate posts imo. Similarly to this, I think you sometimes conflate the questions of “are AIXR concerns valid?” and “if it is, what would an appropriate policy response look like?” I think in your latest comment to Hayven that’s where you strongest objections are (which makes sense to me, given your background and expertise), but again is diferent from the pure question of if AIXR concern is valid.
- Framing those concerned with AIXR as ‘alarmists’ - I think you’re perhaps overindexing on MIRI here as representative of AI Safety as a whole? From my vague sense, MIRI doesn’t hold a dominant position in AI Safety space as it perhaps did ¹⁰⁄₂₀ years ago. I don’t think that ~90%+ belief in doom is an accurate depiction of EA, and similarly I don’t think that an indefinite global pause is the default EA view of the policies that ought to be adopted. Like you mention Anthropic and CHAI as two good institutions, and they’re both highly EA-coded and sincerely concerend about AIXR. I think a potential disambiguation here is between ‘concern about AIXR’ and ‘certain doom about AIXR’?
But also some bad ones:
- saying that EA’s focus on x-risk lacks “common sense”—I actually think x-risk is something which the general public would think makes a lot of sense, but they’d think that EA gets the source of that risk wrong (though empirical data). I think a lot of people would say that trying to reduce the risk of human extinction from Nuclear War or Climate Change is an unambiguously good cause and potentially good use of marginal resources.
- Viewing EA, let alone AIXR, as motivated by ‘nonsense utilitarianism’ about ‘trillions of theoretical future people’. Most EA spending goes to Global Health Causes in the present. Many AIXR advocates don’t identify as longtermists at all. They’re often, if not mostly, concerned about risk to humans alive today, themselves, those they care about. Concern about AIXR could also be motivated through non-utilitarian frameworks, though I’d concede that this probably isn’t the standard EA position
I know this is a super long comment, so feel free to only respond to the bits you find useful or even not at all. Alternatively we could try out the new dialogue feature to talk through this a bit more? In any case, thanks again for the post, it got me thinking about where and why I disagree both with AI ‘doomers’ as well as your position in this post.
1. ^
  roughly 0.4% for superforecasters vs 2.1% for AI experts by 2100
2. ^
  Currently March 14th 2026 at time of writing
- Zed Tarar Dec 5, 2023, 4:48 PM
  3 points
  0 ∶ 0
  Parent
  
  Love this thoughtful response!
  
  Good feedback—I see the logic of your points, and don’t find faults with any of them.
  On AIXR as valid and what the response would be, you’re right; I emphasize the practical nature of the policy recommendation because otherwise, the argument can veer into the metaphysical. To use an analogy, if I claim there’s a 10% chance another planet could collide with Earth and destroy the planet in the next decade, you might begrudgingly accept the premise to move the conversation on to the practical aspect of my forecast. Even if that were true, what would my policy intervention look like? Build interstellar lifeboats? Is that feasible in the absence of concrete evidence?
  
  Agree—armchair psychoanalysis isn’t really useful. What is useful is understanding how heuristics and biases work on a population level. If we know that, in general, projects run over budget and take longer than expected, we can adjust our estimates. If we know experts mis-forecast x-risk, we can adjust for that too. That’s far from psychoanalysis.
  
  I don’t really know what the median view on AIXR within EA communities truly is. One thing’s for certain: the public narrative around the issue highly tilts towards the “pause AI” camp and the Yudkowskys out there.
  On the common sense of X-risk—one of the neat offices that few people know of at the State Department is the Nuclear Risk Reduction Center or NRRC. It’s staffed ²⁴⁄₇ and has foreign language designated positions, meaning at least someone in the room speaks Russian, etc. The office is tasked with staying in touch with other nations to reduce the odds of a miscalculation and nuclear war. That makes tons of sense. Thinking about big problems that could end the world makes sense in general—disease, asteroids, etc.
  What I find troubling is the propensity to assign odds to distant right-tail events. And then to take the second step of recommending costly and questionable policy recommendations. I don’t think these are EA consensus positions, but they certainly receive outsized attention.
  - JWS 🔸Dec 8, 2023, 8:33 AM
    3 points
    0 ∶ 0
    Parent
    
    I’m glad you found my comment useful. I think then, with respect, you should consider retracting some of your previous comments, or at least reframing them to be more circumspect and be clear you’re taking issue with a particular framing/subset of the AIXR community as opposed to EA as a whole.
    As for the points in your comment, there’s a lot of good stuff here. I think a post about the NRRC, or even an insider’s view into how the US administration thinks about and handles Nuclear Risk, would be really useful content on the Forum, and also incredibly interesting! Similarly, I think how a community handles making ‘right-tail recommendations’ when those recommendations may erode its collective and institutional legitimacy^[1] would be really valuable. (Not saying that you should write these posts, they’re just examples off the top of my head. In general I think you have a professional perspective a lot of EAs could benefit from)
    I think one thing where we agree is that there’s a need to ask and answer a lot more questions, some of which you mention here (beyond ‘is AIXR valid’):
    What policy options do we have to counteract AIXR if true?
    How do the effectiveness of these policy options change as we change our estimation of the risk?
    What is the median view in the AIXR/broader EA/broader AI communities on risk?
    And so on.
    ^
    Some people in EA might write this off as ‘optics’, but I think that’s wrong
    - Zed Tarar Dec 14, 2023, 10:37 AM
      1 point
      0 ∶ 0
      Parent
      
      These are all great suggestions! As for my objections to EA as a whole versus a subset, it reminds me a bit of a defense that folks employ whenever a larger organization is criticised. Defenses that one hears from Republicans in the US for example. “It’s not all of us, just a vocal subset!” That might be true, but I think it misses the point. It’s hard to soul-search and introspect as an organization or a movement if we collectively say, “not all-EA” when someone points to the enthusiasm around SBF and ideas like buying up coal mines.
titotal Nov 30, 2023, 5:47 PM
10 points
3 ∶ 2

Hey, welcome to the EA forum! I hope you stick around.
I pretty much agree with this post. The argument put forward by AI risk doomers is generally flimsy and weak, with core weaknesses involving unrealistic assumptions about what AGI would actually be capable of, given limitations of computational complexity and the physical difficulty of technological advancements, and also a lack of justification for assuming AI will be fanatical utility function maximisers. I think the chances of human extinction from AI are extremely low, and that estimates around here are inflated by subtle groupthink, poor probabilistic treatment of speculative events, and a few just straight up wrong ideas that were made up a long time ago and not updated sufficiently for the latest events in AI.
That being said, AI advancements could have a significant effect on the world. I think it’s fairly likely that if AI is misused, there may be a body count, perhaps a significant one. I don’t think it’s a bad idea to be proactive and think ahead about how to manage the risks involved. There is a middle ground between no regulation and bombing data centers.
- Jason Nov 30, 2023, 11:16 PM
  2 points
  0 ∶ 0
  Parent
  
  I’m curious as to the somewhat hedging word choices in the second paragraph like could and perhaps. The case for great, even extreme, harm from AI misuse seems a lot more straightforward than AI doom. Misuse of new, very powerful technologies has caused at least significant harm (including body counts) in the past with some consistency, so I would assume the pattern would follow with AI as well.
  - titotal Dec 1, 2023, 10:12 AM
    4 points
    0 ∶ 0
    Parent
    
    I’m allowing for the possibility that we hit another AI winter, and the new powerful technology just doesn’t arrive in our lifetime. Or that the technology is powerful for some things, but remains too unreliable for use in life-critical situations and is kept out of them.
    I think it’s likely that AI will have at least an order or magnitude or two greater body count than it has now, but I don’t know how high it will be.
  - Zed Tarar Dec 1, 2023, 10:02 AM
    1 point
    0 ∶ 0
    Parent
    
    I once worked on a program with DoD to help buy up loose MANPADS in Libya. There’s a linear causal relationship between portable air defense systems and harm. Other ordnance has a similar relationship.
    The relationship is tenuous when we move from the world of atoms to bits. I struggle to see how new software could pose novel risks to life and limb. That doesn’t mean developers of self-driving vehicles or autopilot functions in aircraft should ignore safety in their software design, what I’m suggesting is that those considerations are not novel.
    
    If someone advocates that we treat neural networks unlike any other system in existence today, I would imagine the burden of proof would be on them to justify this new approach.
MvK🔸Nov 30, 2023, 3:50 PM
8 points
3 ∶ 1

Hi Zed! Thanks for your post. A couple of responses:

“As critics of the long-termist viewpoint have noted, the base-rate for human extinction is zero.”
- Yes, but this is tautologically true: Only in worlds where humanity hasn’t gone extinct could you make that observation in the first place. (For a discussion of this and some tentative probabilities, see https://www.nature.com/articles/s41598-019-47540-7)
“Instead of outlandish ideas of a new global government capable of unilaterally curtailing compute power or some other factor through force, we should focus on what is practically achievable today. Encouraging firms like OpenAI to red-team their models before release, for example, is practical and limits negative externalities.”
- Why are the two mutually exclusive? I think you’re opening a false dichotomy—as far as I know, x-risk oriented folks are amongst the leading voices calling for red teams or even engaging in this work themselves. (See also: https://forum.effectivealtruism.org/posts/Q4rg6vwbtPxXW6ECj/we-are-fighting-a-shared-battle-a-call-for-a-different)
“Let’s assume for a moment that domain experts who warn of imminent threats to humanity’s survival from AI are acting in good faith and are sincere in their convictions.”
- The way you phrase this makes it sound like we have reason to doubt their sincerity. I’d love to hear what makes you think we do!
“For example, a global pause in model training that many advocated for made no reference to the idea’s inherent weakness—that is, it sets up a prisoner’s dilemma in which the more AI firms voluntarily agree to pause research, the greater the incentive for any one group to defect from the agreement and gain a competitive edge. It makes no mention of practical implementation, nor does it explain how it arrived on its pause time-duration; nor does it recognize the improbability of enforcing a global treaty on AI.”
- My understanding is that even strong advocates of a pause are aware of its shortcomings and communicate these uncertainties rather transparently—I have yet to meet someone who sees them as a panacea. Granted, the questions you ask need to be answered, but the fact that an idea is thorny and potentially difficult to implement doesn’t make it a bad one per sé.
“A strict international regime dedicated to preventing proliferation still failed to prevent India, Israel, Pakistan, North Korea, and, likely, Iran from acquiring weapons.”
- Are you talking about the NPT or the IAEA here? My expertise on this is limited (~90 hours of engagement), but I authored a case study on IAEA safeguards this summer and my overall takeaway was that domain experts like Carl Robichaud still consider these regimes success stories. I’d be curious to hear where you disagree! :)
- Zed Tarar Dec 1, 2023, 10:43 AM
  3 points
  0 ∶ 0
  Parent
  
  Thanks for the thoughtful response.
  - On background extinction rates, rather than go down that rabbit hole, I think my point still stands, any estimation of human extinction needs to be rooted in some historical analysis. Whether that is one in 87,000 of homo sapiens going extinct in any given year as the Nature piece suggests, or something revised up or down from there.
  - On false dichotomies—I’d set aside individual behavior for a moment and look at the macro picture. We know from political science basics that elites can meaningfully shift public opinion on issues of low salience. According to Pew, we’ve seen a 15-point shift in the general public expressing “more concern than excitement” over AI in the United States. Rarely do we see such a marked shift in opinion on any particular issue in such a divided electorate.
    
    Let’s put it this way—in a literal sense, yes, one could loudly espouse a belief that AI could destroy humanity within a decade and at the same time, advocate for rudimentary red-teaming to keep napalm recipes out of an LLM’s response, but, in practice, this seems to defy common sense and ignores the effect on public opinion.
    
    Imagine we’re engineers at a new electric vehicle company. At an all hands meeting, we discuss one of the biggest issues with the design, the automatic trunk release. We’re afraid people might get their hands caught in it. An engineer pipes up and says, “while we’re talking about flaws, I think there’s a chance that the car might explode and take out a city block.” Now, there’s nothing stopping us from looking at the trunk release and investigating spontaneous combustion, but in practice, I struggle to imagine those processes happening in parallel in a meaningful way.
    
    Coming back to public opinion, we’ve seen what happens when novel technology gains motivated opponents, from nuclear fission to genetic engineering, to geoengineering, to stem cell research, to gain of function research, to autonomous vehicles, and on. Government policy responds to voter sentiment, not elite opinion. And fear of the unknown is a much more powerful driver of behavior than a vague sense of productivity gains. My sense is that if we continue to see elites writing op-eds on how the world will end soon, we’ll see public opinion treat AI like it treats GMO fruits and veg.
  - My default is to assume folks are sincere in their convictions (and data shows most people are)--I should have clarified that line; it was in reference to claims that outfits calling for AI regulation are cynically putting up barriers to entry and on a path to rent-seeking.
  - On the pause being a bad idea: my point here is that the very conception is foolish at the strategic level, not that it has practical implementation difficulties. First, what would change in six months? And second, why would creating a prisoner’s dilemma lead to better outcomes? It would be like soft drink makers asking for a non-binding pause on advertising—it only works if there’s consensus and an enforcement mechanism that would impose a penalty on defectors; otherwise, it’s even better for me if you stop advertising, and I continue, stealing your market share.
  - The IAEA and NPT are their own can of worms, but in general, my broader point here is that even a global attempt to limit the spread of nuclear weapons failed. What is the likelihood of imposing a similar regime on a technology that is much simpler to work with? No centrifuges, no radiation, just code and compute power? I struggle to see how creating an IAEA for AI would have a different outcome.
  - Hayven Frienby Dec 1, 2023, 1:14 PM
    1 point
    0 ∶ 0
    Parent
    
    Do you think a permanent* ban on AI research and development would be a better path than a pause? I agree a six-month pause is likely not to do anything, but far-reaching government legislation banning AI just might—especially if we can get the U.S., China, EU, and Russia all on board (easier said than done!).
    *nothing is truly permanent, but I would feel much more comfortable with a more socially just and morally advanced human society having the AI discussion ~200 years from now, than for the tech to exist today. Humanity today shouldn’t be trusted to develop AI for the same reason 10-year-olds shouldn’t be trusted to drive trucks: it lacks the knowledge, experience, and development to do it safely.
    - Zed Tarar Dec 1, 2023, 1:29 PM
      1 point
      0 ∶ 0
      Parent
      
      Let’s look at the history of global bans:
      - They don’t work for doping in the Olympics.
      - They don’t work for fissile material.
      - They don’t prevent luxury goods from entering North Korea.
      - They don’t work against cocaine or heroine.
      
      We could go on. And those examples are much easier to implement—there’s global consensus and law enforcement trying to stop the drug trade, but the economics of the sector mean an escalating war with cartels only leads to greater payoffs for new market entrants.
      
      Setting aside practical limitations, we ought to think carefully before weaponizing the power of central governments against private individuals. When we can identify a negative externality, we have some justification to internalize it. No one wants firms polluting rivers or scammers selling tainted milk.
      Generative AI hasn’t shown externalities that would necessitate something like a global ban.
      
      Trucks: we know what the externalities of a poorly piloted vehicle are. So we minimize those risks by requiring competence.
      And on a morally advanced society—yes, I’m certain a majority of folks if asked would say they’d like a more moral and ethical world. But that’s not the question—the question is who gets to decide what we can and cannot do? And what criteria are they using to make these decisions? Real risk, as demonstrated by data, or theoretical risk? The latter was used to halt interest in nuclear fission. Should we expect the same for generative AI?
      - Hayven Frienby Dec 1, 2023, 3:42 PM
        3 points
        0 ∶ 0
        Parent
        
        The question of “who gets to do what” is fundamentally political, and I really try to stay away from politics especially when dealing with the subject of existential risk. This isn’t to discount the importance of politics, only to say that while political processes are helpful in determining how we manage x-risk, they don’t in and of themselves directly relate to the issue. Global bans would also be political, of course.
        
        You may well be right that the existential risk iof generative AI, and eventually AGI, is low or indeterminate, and theoretical rather than actual. I don’t think we should wait until we have an actual x-risk on our hands to act — because then it may be too late.
        
        You’re also likely correct on AI development being unstoppable at this point. Mitigation plans are needed should unfriendly outcomes occur especially with an AGI, and I think we can both agree on that.
        
        Maybe I’m too cautious when it comes to the subject of AI, but part of what motivates me is the idea that, should the catastrophic occur, I could at least know that I did everything in my power to oppose that risk.
        Zed Tarar Dec 4, 2023, 1:11 PM
        1 point
        0 ∶ 1
        Parent
        
        These are all very reasonable positions, and one would struggle to find fault with them.
        
        Personally, I’m glad there are smart folks out there thinking about what sorts of risks we might face in the near future. Biologists have been talking about the next big pandemic for years. It makes sense to think these issues through.
        
        Where I vehemently object is on the policy side. To use the pandemic analogy, it’s the difference between a research-led investigation into future pandemics and a call to ban the use of CRISPR. It’s impractical and, from a policy perspective, questionable.
        
        The conversation around AI within EA is framed as “we need to stop AI progress before we all die.” It seems tough to justify such an extreme policy position.
        What links here?
        JWS 🔸's comment on Red-teaming existential risk from AI by Zed Tarar (Dec 4, 2023, 11:31 PM; 15 points)
Vasco Grilo🔸Dec 8, 2023, 8:31 PM
2 points
0 ∶ 0

Welcome to the EA Forum, and thanks for the post, Zed!
Differing national levels in risk tolerance may explain the gap between public opinion on, for example, genetic engineering between the United States and Europe.
The link you have here is broken.
- Zed Tarar Dec 14, 2023, 10:25 AM
  3 points
  0 ∶ 0
  Parent
  
  ah yes sorry, here it is: https://link.springer.com/chapter/10.1007/978-3-662-45704-7_2