Yonatan Cale comments on Impact markets may incentivize predictably net-negative projects

Yonatan Cale 23 Jun 2022 13:05 UTC
3 points
0 ∶ 0
Another comment, specifically about AGI capabilities:
If someone wants to advance AI capabilities, they can already get prospective funding by opening a regular for-profit startup.
No?
- Ofer 23 Jun 2022 14:14 UTC
  1 point
  0 ∶ 0
  Parent
  
  If someone wants to advance AI capabilities, they can already get prospective funding by opening a regular for-profit startup.
  
  No?
  
  Right. But without an impact market it can be impossible to profit from, say, publishing a post with a potentially transformative insight about AGI development. (See this post as a probably-harmless-version of the type of posts I’m talking about here.)
  - Yonatan Cale 24 Jun 2022 7:26 UTC
    5 points
    0 ∶ 0
    Parent
    I acknowledge this could be bad, but (as with most of my comments here), this is not a new problem.
    Also today, if someone publishes such a post in the Alignment Forum: I hope they have moderation for taking it down, wether the author expects to make money from it or not.
    Or is your worry something like “there will be 10x more such posts and the moderation will be overloaded”?
    - Ofer 25 Jun 2022 8:14 UTC
      3 points
      0 ∶ 0
      Parent
      It’s just an example for how a post on the alignment forum can be net-negative and how it can be very hard to judge whether it’s net-negative. For any net-negative intervention that impact markets would incentivize, if people can do it without funding then the incentive to do impressive things can also cause them to carry out the intervention. In those cases, impact markets can cause those interventions to be more likely to be carried out.
      - Yonatan Cale 25 Jun 2022 11:12 UTC
        5 points
        0 ∶ 0
        Parent
        I hope I’m not strawmanning your claim and please call me out if I am,
        but,
        Seems like you are arguing for making it more likely to have [a risk] that, as you point out, happened, and the AF could solve with almost no cost, and they chose not to.
        ..right?
        So.. why do you think it’s a big problem?
        Or at least.. seems like the AF disagrees about this being a problem.. no?
        (Please say if this is an unfair question somehow)
        Ofer 25 Jun 2022 12:26 UTC
        9 points
        0 ∶ 0
        Parent
        
        seems like the AF disagrees about this being a problem.. no?
        
        (Not an important point [EDIT: meaning the text you are reading in these parentheses], but I don’t think that a karma of 18 points is a proof for that; maybe the people who took the time to go over that post and vote are mostly amateurs who found the topic interesting. Also, as an aside, if someone one day publishes a brilliant insight about how to develop AGI much faster, taking the post down can be net-negative due to the Streisand effect).
        
        I’m confident that almost all the alignment researchers on Earth will agree with the following statement: conditional on such a post having a transformative impact, it is plausible [EDIT: >10% credence] that the post will ~~end up having~~ have an extremely harmful impact. [EDIT: “transformative impact” here means impact that is either extremely negative or extremely positive.] I argue that we should be very skeptical about potential funding mechanisms that incentivize people to treat “extremely harmful impact” here as if it were “neutral impact”. A naive impact market is such a funding mechanism.
        Yonatan Cale 26 Jun 2022 11:23 UTC
        2 points
        0 ∶ 0
        Parent
        You changed my mind!
        I think the missing part, for me, is a public post saying “this is what I’m going to do, but I didn’t start”, which is what the prospective funder sees, and would let the retro funder say “hey you shouldn’t have funded this plan”.
        I think.
        I’ll think about it
        DC 26 Jun 2022 1:01 UTC
        1 point
        0 ∶ 0
        Parent
        I think you’re missing the part where if such a marketplace was materially changing the incentives and behavior of the Alignment Forum, people could get an impact certificate for counterbalancing externalities such as critiquing/flagging/moderating a harmful AGI capabilities post, possibly motivating them to curate more than a small moderation team could handle.
        
        That’s not to say that in that equilibrium there couldn’t be an even stronger force of distributionally mismatched positivity bias, e.g. upvote-brigading assuming there are some Goodhart incentives to retro fund posts in proportion to their karma, but it is at least strongly suggestive.