MvK🔸 comments on Red-teaming existential risk from AI

MvK🔸 30 Nov 2023 15:50 UTC
8 points
3 ∶ 1
Hi Zed! Thanks for your post. A couple of responses:

“As critics of the long-termist viewpoint have noted, the base-rate for human extinction is zero.”
- Yes, but this is tautologically true: Only in worlds where humanity hasn’t gone extinct could you make that observation in the first place. (For a discussion of this and some tentative probabilities, see https://www.nature.com/articles/s41598-019-47540-7)
“Instead of outlandish ideas of a new global government capable of unilaterally curtailing compute power or some other factor through force, we should focus on what is practically achievable today. Encouraging firms like OpenAI to red-team their models before release, for example, is practical and limits negative externalities.”
- Why are the two mutually exclusive? I think you’re opening a false dichotomy—as far as I know, x-risk oriented folks are amongst the leading voices calling for red teams or even engaging in this work themselves. (See also: https://forum.effectivealtruism.org/posts/Q4rg6vwbtPxXW6ECj/we-are-fighting-a-shared-battle-a-call-for-a-different)
“Let’s assume for a moment that domain experts who warn of imminent threats to humanity’s survival from AI are acting in good faith and are sincere in their convictions.”
- The way you phrase this makes it sound like we have reason to doubt their sincerity. I’d love to hear what makes you think we do!
“For example, a global pause in model training that many advocated for made no reference to the idea’s inherent weakness—that is, it sets up a prisoner’s dilemma in which the more AI firms voluntarily agree to pause research, the greater the incentive for any one group to defect from the agreement and gain a competitive edge. It makes no mention of practical implementation, nor does it explain how it arrived on its pause time-duration; nor does it recognize the improbability of enforcing a global treaty on AI.”
- My understanding is that even strong advocates of a pause are aware of its shortcomings and communicate these uncertainties rather transparently—I have yet to meet someone who sees them as a panacea. Granted, the questions you ask need to be answered, but the fact that an idea is thorny and potentially difficult to implement doesn’t make it a bad one per sé.
“A strict international regime dedicated to preventing proliferation still failed to prevent India, Israel, Pakistan, North Korea, and, likely, Iran from acquiring weapons.”
- Are you talking about the NPT or the IAEA here? My expertise on this is limited (~90 hours of engagement), but I authored a case study on IAEA safeguards this summer and my overall takeaway was that domain experts like Carl Robichaud still consider these regimes success stories. I’d be curious to hear where you disagree! :)
- Zed Tarar 1 Dec 2023 10:43 UTC
  3 points
  0 ∶ 0
  Parent
  Thanks for the thoughtful response.
  - On background extinction rates, rather than go down that rabbit hole, I think my point still stands, any estimation of human extinction needs to be rooted in some historical analysis. Whether that is one in 87,000 of homo sapiens going extinct in any given year as the Nature piece suggests, or something revised up or down from there.
  - On false dichotomies—I’d set aside individual behavior for a moment and look at the macro picture. We know from political science basics that elites can meaningfully shift public opinion on issues of low salience. According to Pew, we’ve seen a 15-point shift in the general public expressing “more concern than excitement” over AI in the United States. Rarely do we see such a marked shift in opinion on any particular issue in such a divided electorate.
    
    Let’s put it this way—in a literal sense, yes, one could loudly espouse a belief that AI could destroy humanity within a decade and at the same time, advocate for rudimentary red-teaming to keep napalm recipes out of an LLM’s response, but, in practice, this seems to defy common sense and ignores the effect on public opinion.
    
    Imagine we’re engineers at a new electric vehicle company. At an all hands meeting, we discuss one of the biggest issues with the design, the automatic trunk release. We’re afraid people might get their hands caught in it. An engineer pipes up and says, “while we’re talking about flaws, I think there’s a chance that the car might explode and take out a city block.” Now, there’s nothing stopping us from looking at the trunk release and investigating spontaneous combustion, but in practice, I struggle to imagine those processes happening in parallel in a meaningful way.
    
    Coming back to public opinion, we’ve seen what happens when novel technology gains motivated opponents, from nuclear fission to genetic engineering, to geoengineering, to stem cell research, to gain of function research, to autonomous vehicles, and on. Government policy responds to voter sentiment, not elite opinion. And fear of the unknown is a much more powerful driver of behavior than a vague sense of productivity gains. My sense is that if we continue to see elites writing op-eds on how the world will end soon, we’ll see public opinion treat AI like it treats GMO fruits and veg.
  - My default is to assume folks are sincere in their convictions (and data shows most people are)--I should have clarified that line; it was in reference to claims that outfits calling for AI regulation are cynically putting up barriers to entry and on a path to rent-seeking.
  - On the pause being a bad idea: my point here is that the very conception is foolish at the strategic level, not that it has practical implementation difficulties. First, what would change in six months? And second, why would creating a prisoner’s dilemma lead to better outcomes? It would be like soft drink makers asking for a non-binding pause on advertising—it only works if there’s consensus and an enforcement mechanism that would impose a penalty on defectors; otherwise, it’s even better for me if you stop advertising, and I continue, stealing your market share.
  - The IAEA and NPT are their own can of worms, but in general, my broader point here is that even a global attempt to limit the spread of nuclear weapons failed. What is the likelihood of imposing a similar regime on a technology that is much simpler to work with? No centrifuges, no radiation, just code and compute power? I struggle to see how creating an IAEA for AI would have a different outcome.
  - Hayven Frienby 1 Dec 2023 13:14 UTC
    1 point
    0 ∶ 0
    Parent
    Do you think a permanent* ban on AI research and development would be a better path than a pause? I agree a six-month pause is likely not to do anything, but far-reaching government legislation banning AI just might—especially if we can get the U.S., China, EU, and Russia all on board (easier said than done!).
    *nothing is truly permanent, but I would feel much more comfortable with a more socially just and morally advanced human society having the AI discussion ~200 years from now, than for the tech to exist today. Humanity today shouldn’t be trusted to develop AI for the same reason 10-year-olds shouldn’t be trusted to drive trucks: it lacks the knowledge, experience, and development to do it safely.
    - Zed Tarar 1 Dec 2023 13:29 UTC
      1 point
      0 ∶ 0
      Parent
      Let’s look at the history of global bans:
      - They don’t work for doping in the Olympics.
      - They don’t work for fissile material.
      - They don’t prevent luxury goods from entering North Korea.
      - They don’t work against cocaine or heroine.
      
      We could go on. And those examples are much easier to implement—there’s global consensus and law enforcement trying to stop the drug trade, but the economics of the sector mean an escalating war with cartels only leads to greater payoffs for new market entrants.
      
      Setting aside practical limitations, we ought to think carefully before weaponizing the power of central governments against private individuals. When we can identify a negative externality, we have some justification to internalize it. No one wants firms polluting rivers or scammers selling tainted milk.
      Generative AI hasn’t shown externalities that would necessitate something like a global ban.
      
      Trucks: we know what the externalities of a poorly piloted vehicle are. So we minimize those risks by requiring competence.
      And on a morally advanced society—yes, I’m certain a majority of folks if asked would say they’d like a more moral and ethical world. But that’s not the question—the question is who gets to decide what we can and cannot do? And what criteria are they using to make these decisions? Real risk, as demonstrated by data, or theoretical risk? The latter was used to halt interest in nuclear fission. Should we expect the same for generative AI?
      - Hayven Frienby 1 Dec 2023 15:42 UTC
        3 points
        0 ∶ 0
        Parent
        The question of “who gets to do what” is fundamentally political, and I really try to stay away from politics especially when dealing with the subject of existential risk. This isn’t to discount the importance of politics, only to say that while political processes are helpful in determining how we manage x-risk, they don’t in and of themselves directly relate to the issue. Global bans would also be political, of course.
        
        You may well be right that the existential risk iof generative AI, and eventually AGI, is low or indeterminate, and theoretical rather than actual. I don’t think we should wait until we have an actual x-risk on our hands to act — because then it may be too late.
        
        You’re also likely correct on AI development being unstoppable at this point. Mitigation plans are needed should unfriendly outcomes occur especially with an AGI, and I think we can both agree on that.
        
        Maybe I’m too cautious when it comes to the subject of AI, but part of what motivates me is the idea that, should the catastrophic occur, I could at least know that I did everything in my power to oppose that risk.
        Zed Tarar 4 Dec 2023 13:11 UTC
        1 point
        0 ∶ 1
        Parent
        These are all very reasonable positions, and one would struggle to find fault with them.
        
        Personally, I’m glad there are smart folks out there thinking about what sorts of risks we might face in the near future. Biologists have been talking about the next big pandemic for years. It makes sense to think these issues through.
        
        Where I vehemently object is on the policy side. To use the pandemic analogy, it’s the difference between a research-led investigation into future pandemics and a call to ban the use of CRISPR. It’s impractical and, from a policy perspective, questionable.
        
        The conversation around AI within EA is framed as “we need to stop AI progress before we all die.” It seems tough to justify such an extreme policy position.
        What links here?
        JWS 🔸's comment on Red-teaming existential risk from AI by Zed Tarar (4 Dec 2023 23:31 UTC; 15 points)