Max H comments on Announcing the Winners of the 2023 Open Philanthropy AI Worldviews Contest

Max H 30 Sep 2023 19:44 UTC
5 points
3 ∶ 4
In some cases, the judges liked that an entry crisply argued for a conclusion the judges did not agree with—the clear articulation of an argument makes it easier for others to engage. One does not need to find a piece wholly persuasive to believe that it usefully contributes to the collective debate about AI timelines or the threat that advanced AI systems might pose.
Facilitating useful engagement seems like a fine judging criterion, but was there any engagement or rebuttal to the winning pieces that the judges found particularly compelling? It seems worth mentioning such commentary if so.
Neither of the two winning pieces significantly updated my own views, and (to my eye) look sufficiently rebutted that observers taking a more outside view might similarly be hesitant to update about any AI x-risk claims without taking the commentary into account.

On the EMH piece, I think Zvi’s post is a good rebuttal on its own and a good summary of some other rebuttals.

On the Evolution piece, lots of the top LW comments raise good points. My own view is that the piece is a decent argument that AI systems produced by current training methods are unlikely to undergo a SLT. But the actual SLT argument applies to systems in the human-level regime and above; current training methods do not result in systems anywhere near human-level in the relevant sense. So even if true, the claim that current methods are dis-analogous to evolution isn’t directly relevant to the x-risk question, unless you already accept that current methods and trends related to below-human level AI will scale to human-level AI and beyond in predictable ways. But that’s exactly what the actual SLT argument is intended to argue against!
- Quintin Pope 1 Oct 2023 5:33 UTC
  7 points
  3 ∶ 0
  Parent
  Speaking as the author of Evolution provides no evidence for the sharp left turn, I find your reaction confusing because the entire point of the piece is to consider rapid capabilities gains from sources other than SGD. Specifically, it consists of two parts:
  1. Argues that human evolution provides no evidence for spikiness in AI capabilities gains, because the human spike in capabilities was due to human evolution-specific details which do not appear in the current AI paradigm (or plausible future paradigms).
  2. Considers two scenarios for AI-specific sudden capabilities gains (neither due to SGD directly, and both of which would likely involve human or higher levels of AI capabilities), and argues that they’re manageable from an alignment perspective.
  - Max H 1 Oct 2023 15:06 UTC
    1 point
    1 ∶ 3
    Parent
    On the first point, my objection is that the human regime is special (because human-level systems are capable of self-reflection, deception, etc.) regardless of which methods ultimately produce systems in that regime, or how “spiky” they are.
    A small, relatively gradual jump in the human-level regime is plausibly more than enough to enable an AI to outsmart / hide / deceive humans, via e.g. a few key insights gleaned from reading a corpus of neuroscience, psychology, and computer security papers, over the course of a few hours of wall clock time.
    The second point is exactly what I’m saying is unsupported, unless you already accept the SLT argument as untrue. You say in the post you don’t expect catastrophic interference between current alignment methods, but you don’t consider that a human-level AI will be capable of reflecting on those methods (and their actual implementation, which might be buggy).
    
    Similarly, elsewhere in the piece you say:
    
    Once you condition on this specific failure mode of evolution, you can easily predict that humans would undergo a sharp left turn at the point where we could pass significant knowledge across generations. I don’t think there’s anything else to explain here, and no reason to suppose some general tendency towards extreme sharpness in inner capability gains.
    
    And
    
    In my frame, we’ve already figured out and applied the sharp left turn to our AI systems, in that we don’t waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization.
    
    But again, the actual SLT argument is not about “extreme sharpness” in capability gains. It’s an argument which applies to the human-level regime and above, so we can’t already be past it no matter what frame you use. The version of the SLT argument you argue against is a strawman, which is what my original LW comment was pointing out.
    I think readers can see this for themselves if they just re-read the SLT post carefully, particularly footnotes 3-5, and then re-read the parts of your post where you talk about it.
    [edit: I also responded further on LW here.]
    What links here?
    Max H's comment on Evolution provides no evidence for the sharp left turn by Quintin Pope (LessWrong; 1 Oct 2023 15:38 UTC; 1 point)