Kat Woods 🔶 ⏸️ comments on New AI safety funding source for people raising awareness about AI risk or advocating for a pause

Kat Woods 🔶 ⏸️ 26 Jul 2025 20:31 UTC
4 points
1 ∶ 4
Seems hard to prevent s-risks if nobody knows they’re a potential problem.
Reminds me of this argument I made awhile back.
- JuliaHP 27 Jul 2025 15:42 UTC
  2 points
  0 ∶ 0
  Parent
  The linked argument seems to talk about default outcomes, and I think it makes sense for x-risk. For s-risks I guess it depends on how one expects that the default outcomes look like, it could make sense depending on the outlook.
  
  My view sees the severe s-risks (strong value pessimization) as tail risks, which one could hyperstition into becoming more probable. I’m sympathetic to seeing some milder/less-severe s-risks being non trivial probability (although still not default).
  - Jacob Watts🔸 27 Jul 2025 20:20 UTC
    3 points
    0 ∶ 0
    Parent
    What dynamics do you have in mind specifically?
    
    Always a strong unilateralist curse with infohazard stuff haha.
    
    I think it is reasonably based and there is a lot to be said for hype, infohazards, and the strange futurist x-risk warning to product company pipeline. It may even be especially potent or likely to bite in exactly the EA milieu.
    
    I find the idea of Waluigi a bit of a stretch given that “what if the robot became evil” is a trope. And so is the Christian devil for example. “Evil” seems at least adjacent to “strong value pessimization”.
    
    Maybe a literal bit flip utility minimizer is rare (outside of eg extortion) and talking about it would spread the memes and some cultist or confused billionaire would try to build it sort of thing?
    - JuliaHP 27 Jul 2025 20:54 UTC
      2 points
      1 ∶ 0
      Parent
      I don’t have any specific pathway I think is particularly likely. Some pathways could be stuff like “simulator AI latches onto evil AI storytrope” or “AI controllers start threatening each other” or “psychotic AI controller on drugs decides to do Y where Y is a concept sampled from things the AI controller knows about”. The specific pathway is hard to predict and there is a general principle underlying all of them which is more relevant to pay attention to.
      
      The abstract principle at play is that a system which has low complexity for X also has low complexity for notX.
      
      If there aren’t other reasons that the system has a low complexity for notX, then the dominant effect on the complexity of notX is directly downstream (through inversion) of the systems complexity for X.
      
      Ending up with notX when you wanted X is more likely the more foolish you go about things, and the collection of the general public is not a wise careful entity.
      - Jacob Watts🔸 6 Aug 2025 16:36 UTC
        1 point
        0 ∶ 0
        Parent
        Ya, I think that’s right. I think making bad stuff more salient can make it more likely in certain contexts.
        
        For example, I can imagine it to be naive to be constantly transmitting all sorts of detailed information, media, and discussion about specific weapons platforms. Raising awareness that you really hope the bad guys don’t develop because it might make them too strong. I just read “Power to the People: How Open Technological Innovation Is Arming Tomorrow’s Terrorists” by Audrey Kurth Cronin and I think it has a really relevant vibe here. Sometimes I worry about EAs doing unintentional advertisement for eg. bioweapons and superintelligence.
        
        On the other hand, I think that topics like s-risk are already salient enough for other reasons. Like, I think extreme cruelty and torture have arisen independently at a lot of times throughout history and nature. And there are already ages worth of pretty unhinged torture porn stuff that people write which exist already on a lot of other parts of the internet. For example, the Christian conception of hell or horror fiction.
        
        This seems sufficient to say we are unlikely to significantly increase the likelihood of “blind grabs from the memeplex” leading to mass suffering. Even cruel torture is already pretty salient. And suffering is in some sense simple if it is just “the opposite of pleasure” or whatever. Utilitarians commonly talk in these terms already.
        
        I will agree that I don’t think it’s good to carelessly spread memes about specific bad stuff sometimes. I don’t always know how to navigate the trade offs here; probably there is at least some stuff broadly related to GCRs and s-risks which is better left unsaid. But also a lot of stuff related to s-risk is there whether you acknowledge it or not. I submit to you that surely some level of “raise awareness so that more people and resources can be used on mitigation” is necessary/good?