Rohin Shah comments on Critiques of prominent AI safety labs: Conjecture

Rohin Shah 12 Jun 2023 22:34 UTC
139 points
34 ∶ 7
I’m not very compelled by this response.
It seems to me you have two points on the content of this critique. The first point:
I think it’s bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don’t find a lot until you hit.
I’m pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?
Presumably you would want to say “the team will be good at hits-based research such that we can expect a future hit, for X, Y and Z reasons”. I think you should actually say those X, Y and Z reasons so that the authors of the critique can engage with them; I assume that the authors are implicitly endorsing a claim like “there aren’t any particularly strong reasons to expect Conjecture to do more impactful work in the future”.
The second point:
Your statements about the VCs seem unjustified to me. How do you know they are not aligned? [...] I haven’t talked to the VCs either, but I’ve at least asked people who work(ed) at Conjecture.
Hmm, it seems extremely reasonable to me to take as a baseline prior that the VCs are profit-motivated, and the authors explicitly say
We have heard credible complaints of this from their interactions with funders. One experienced technical AI safety researcher recalled Connor saying that he will tell investors that they are very interested in making products, whereas the predominant focus of the company is on AI safety.
The fact that people who work(ed) at Conjecture say otherwise means that (probably) someone is wrong, but I don’t see a strong reason to believe that it’s the OP who is wrong.
At the meta level you say:
I do not understand where the confidence with which you write the post (or at least how I read it) comes from.
And in your next comment:
I think we should really make sure that we say true things when we criticize people, quantify our uncertainty, differentiate between facts and feelings and do not throw our epistemics out of the window in the process.
But afaict, the only point where you actually disagree with a claim made in the OP (excluding recommendations) is in your assessment of VCs? (And in that case I feel very uncompelled by your argument.)
In what way has the OP failed to say true things? Where should they have had more uncertainty? What things did they present as facts which were actually feelings? What claim have they been confident about that they shouldn’t have been confident about?
(Perhaps you mean to say that the recommendations are overconfident. There I think I just disagree with you about the bar for evidence for making recommendations, including ones as strong as “alignment researchers shouldn’t work at organization X”. I’ve given recommendations like this to individual people who asked me for a recommendation in the past, on less evidence than collected in this post.)
- richard_ngo 15 Jun 2023 7:00 UTC
  11 points
  5 ∶ 0
  Parent
  Good comment, consider cross-posting to LW?
- mariushobbhahn 13 Jun 2023 11:04 UTC
  9 points
  4 ∶ 2
  Parent
  1. Meta: maybe my comment on the critique reads stronger than intended (see comment with clarifications) and I do agree with some of the criticisms and some of the statements you made. I’ll reflect on where I should have phrased things differently and try to clarify below.
  2. Hits-based research: Obviously results are one evaluation criterion for scientific research. However, especially for hits-based research, I think there are other factors that cannot be neglected. To give a concrete example, if I was asked whether I should give a unit under your supervision $10M in grant funding or not, I would obviously look back at your history of results but a lot of my judgment would be based on my belief in your ability to find meaningful research directions in the future. To a large extent, the funding would be a bet on you and the research process you introduce in a team and much less on previous results. Obviously, your prior research output is a result of your previous process but especially in early organizations this can diverge quite a bit. Therefore, I think it is fair to say that both a) the output of Conjecture so far has not been that impressive IMO and b) I think their updates to early results to iterate faster and look for more hits actually is positive evidence about their expected future output.
  3. Of course, VCs are interested in making money. However, especially if they are angel investors instead of institutional VCs, ideological considerations often play a large role in their investments. In this case, the VCs I’m aware of (not all of which are mentioned in the post and I’m not sure I can share) actually seem fairly aligned for VC standards to me. Furthermore, the way I read the critique is something like “Connor didn’t tell the VCs about the alignment plans or neglects them in conversation”. However, my impression from conversation with (ex-) staff was that Connor was very direct about their motives to reduce x-risks. I think it’s clear that products are a part of their way to address alignment but to the best of my knowledge, every VC who invested was very aware about what their getting into. At this point, it’s really hard for me to judge because I think that a) on priors, VCs are profit-seeking, and b) different sources said different things some of which are mutually exclusive. I don’t have enough insight to confidently say who is right here. I’m mainly saying, the confidence of OP surprised me given my previous discussions.
  4. On recommendations: I have also recommended people in private not to work at specific organizations. However, this was always conditional on their circumstances. For example, often people aren’t aware on what exactly different safety teams are working on, so conditional on their preferences they should probably not work for lab X. Secondly, I think there is a difference between you saying something like this in private, even if it is unconditional, vs in public. In public, the audience is much larger and has much less context, etc. So I feel like your burden of proof is much higher.
    
    lmk if that makes my position and disagreements clearer.
  - Rohin Shah 13 Jun 2023 15:48 UTC
    32 points
    8 ∶ 0
    Parent
    On hits-based research: I certainly agree there are other factors to consider in making a funding decision. I’m just saying that you should talk about those directly instead of criticizing the OP for looking at whether their research was good or not.
    (In your response to OP you talk about a positive case for the work on simulators, SVD, and sparse coding—that’s the sort of thing that I would want to see, so I’m glad to see that discussion starting.)
    On VCs: Your position seems reasonable to me (though so does the OP’s position).
    On recommendations: Fwiw I also make unconditional recommendations in private. I don’t think this is unusual, e.g. I think many people make unconditional recommendations not to go into academia (though I don’t).
    I don’t really buy that the burden of proof should be much higher in public. Reversing the position, do you think the burden of proof should be very high for anyone to publicly recommend working at lab X? If not, what’s the difference between a recommendation to work at org X vs an anti-recommendation (i.e. recommendation not to work at org X)? I think the three main considerations I’d point to are:
    (Pro-recommendations) It’s rare for people to do things (relative to not doing things), so we differentially want recommendations vs anti-recommendations, so that it is easier for orgs to start up and do things.
    (Anti-recommendations) There are strong incentives to recommend working at org X (obviously org X itself will do this), but no incentives to make the opposite recommendation (and in fact usually anti-incentives). Similarly I expect that inaccuracies in the case for the not-working recommendation will be pointed out (by org X), whereas inaccuracies in the case for working will not be pointed out. So we differentially want to encourage the opposite recommendations in order to get both sides of the story by lowering our “burden of proof”.
    (Pro-recommendations) Recommendations have a nice effect of getting people excited and positive about the work done by the community, which can make people more motivated, whereas the same is not true of anti-recommendations.
    Overall I think point 2 feels most important, and so I end up thinking that the burden of proof on critiques / anti-recommendations should be lower than the burden of proof on recommendations—and the burden of proof on recommendations is approximately zero. (E.g. if someone wrote a public post recommending Conjecture without any concrete details of why—just something along the lines of “it’s a great place doing great work”—I don’t think anyone would say that they were using their power irresponsibly.)
    I would actually prefer a higher burden of proof on recommendations, but given the status quo if I’m only allowed to affect the burden of proof on anti-recommendations I’d probably want it to go down to ~zero. Certainly I’d want it to be well below the level that this post meets.
    What links here?
    Will Aldred's comment on Future of Humanity Institute 2005-2024: Final Report by Pablo (22 Apr 2024 2:22 UTC; 6 points)
    - mariushobbhahn 13 Jun 2023 16:25 UTC
      38 points
      6 ∶ 0
      Parent
      Hmm, yeah. I actually think you changed my mind on the recommendations. My new position is something like:
      1. There should not be a higher burden on anti-recommendations than pro-recommendations.
      2. Both pro- and anti-recommendations should come with caveats and conditionals whenever they make a difference to the target audience.
      3. I’m now more convinced that the anti-recommendation of OP was appropriate.
      4. I’d probably still phrase it differently than they did but my overall belief went from “this was unjustified” to “they should have used different wording” which is a substantial change in position.
      5. In general, the context in which you make a recommendation still matters. For example, if you make a public comment saying “I’d probably not recommend working for X” the severity feels different than “I collected a lot of evidence and wrote this entire post and now recommend against working for X”. But I guess that just changes the effect size and not really the content of the recommendation.
      - Rohin Shah 13 Jun 2023 22:52 UTC
        6 points
        0 ∶ 0
        Parent
        :) I’m glad we got to agreement!
        (Or at least significantly closer, I’m sure there are still some minor differences.)