richard_ngo comments on richard_ngo’s Quick takes

richard_ngo 27 Sep 2023 16:50 UTC
18 points
5 ∶ 1
One exchange that makes me feel particularly worried about Scenario 2 is this one here, which focuses on the concern that there’s:
No rigorous basis for that the use of mechanistic interpretability would “open up possibilities” to long-term safety. And plenty of possibilities for corporate marketers – to chime in on mechint’s hypothetical big breakthroughs. In practice, we may help AI labs again – accidentally – to safety-wash their AI products.
I would like to point to this as a central example of the type of thing I’m worried about in scenario 2: the sort of doom spiral where people end up actively opposed to the most productive lines of research we have, because they’re conceiving of the problem as being arbitrarily hard. This feels very reminiscent of the environmentalists who oppose carbon capture or nuclear energy because it might make people feel better without solving the “real problem”.
It looks like, on net, people disagree with my take in the original post. So I’d like to ask the people who disagree: do you have reasons to think that the sort of position I’ve quoted here won’t become much more common as AI safety becomes much more activism-focused? Or do you think it would be good if it did?
What links here?
- Richard_Ngo's comment on Against Almost Every Theory of Impact of Interpretability by Charbel-Raphaël (LessWrong; 27 Sep 2023 16:52 UTC; 8 points)
- RyanCarey 4 Oct 2023 13:58 UTC
  7 points
  5 ∶ 0
  Parent
  It looks like, on net, people disagree with my take in the original post.
  I just disagreed with the OP because it’s a false dichotomy; we could just agree with the true things that activists believe, and not the false ones, and not go based on vibes. We desire to believe that mech-interp is mere safety-washing iff it is, and so on.
- Remmelt 4 Oct 2023 13:11 UTC
  5 points
  1 ∶ 1
  Parent
  The problem here is doing insufficient safety R&D at AI labs that enables the AI labs to market themselves as seriously caring about safety and thus that their ML products are good for release.
  
  You need to consider that, especially since you work at an AI lab.
- quinn 27 Sep 2023 17:12 UTC
  2 points
  0 ∶ 0
  Parent
  Slightly conflicted agree vote: your model here offloads so much to judgment calls that fall on people who are vulnerable to perverse incentives (like, alignment/capabilities as a binary distinction is a bad frame, but it seems like anyone who’d be unusually well suited to thinking clearly about it’s alternatives make more money and have less stressful lives if their beliefs fall some ways vs others).
  Other than that, I’m aware that no one’s really happy about the way they tradeoff “you could copenhagen ethics your way out of literally any action in the limit” against “saying that the counterfactual a-hole would do it worse if I didn’t is not a good argument”. It seems like a law of opposite advice situation, maybe? As in some people in the blase / unilateral / powerhungry camp could stand to be nudged one way and some people in the scrupulous camp could stand to be nudged another.
  It also matters that the “oppose carbon capture or nuclear energy because it might make people feel better without solving the ‘real problem’.” environmentalists have very low standards even when you condition on them being environmentalists. That doesn’t mean they can’t be memetically adaptive and then influential, but it might be tactically important (i.e. you have a messaging problem instead of a more virtuous actually-trying-to-think-clearly problem)