Cullen 🔸 comments on Third-wave AI safety needs sociopolitical thinking

Cullen 🔸 4 Apr 2025 16:58 UTC
2 points
0 ∶ 0
Thanks for this very thoughtful reply!

I have a lot to say about this, much of which boils down to a two points:
1. I don’t think Jeremy is a good example of unnecessary polarization.
2. I think “avoid unnecessary polarization” is a bad heuristic for policy research (which, related to my first point, is what Jeremy was responding to in Dislightenment), at least if it means anything other than practicing the traditional academic virtues of acknowledging limitations, noting contrary opinion, being polite, being willing to update, inviting disagreement, etc.
The rest of your comment I agree with.

I realize that point (1) may seem like nitpicking, and that I am also emotionally invested in it for various reasons. But this is all in the spirit of something like avoiding reasoning from fictional evidence: if we want to have a good discussion of avoiding unnecessary polarization, we should reason from clear examples of it. If Jeremy is not a good example of it, we should not use him as a stand-in.

I was just using Jeremy as a stand-in for the polarisation of Open Source vs AI Safety more generally.

Right, this is in large part where our disagreement is: whether Jeremy is good evidence for or an example of unnecessary polarization. I just simply don’t think that Jeremy is a good example of where there has been unnecessary (more on this below) polarization because I think that he, explicitly and somewhat understanably, just finds the idea of approval regulation for frontier AI abhorrent. So to use Jeremy as evidence or example of unnecessary polarization, we have to ask what he was reacting to, and whether something unnecessary was done to polarize him against us.

Dislightenment “started out as a red team review” of FAIR, and FAIR is the most commonly referenced policy proposal in the piece, so I think that Jeremy’s reaction in Dislightenment is best understood as, primarily, a reaction to FAIR. (More generally, I don’t know what else he would have been reacting to, because in my mind FAIR was fairly catalytic in this whole debate, though it’s possible I’m overestimating its importance. And in any case I wasn’t on Twitter at the time so may lack important context that he’s importing into the conversation.) In which case, in order to support your general claim about unnecessary polarization, we would need to ask whether FAIR did unnecessary things polarize him.

Which brings us to the question of what exactly unnecessary polarization means. My sense is that avoiding unnecessary polarization would, in practice, mean that policy researchers write and speak extremely defensively to avoid making any unnecessary enemies. This would entail falsifying not just their own personal beliefs about optimal policy, but also, crucially, falsifying their prediction about what optimal policy is from the set of preferences that the public already holds. It would lead to writing positive proposals shot through with diligent and pervasive reputation management, leading to a lot of unnecessary and confusing hedges and disjunctive asides. I think pieces like that can be good, but it would be very bad if every piece was like that.

Instead, I think it is reasonable and preferable for discourse to unfold like this: Policy researchers write politely about the things that they think are true, explain their reasoning, acknowledge limitations and uncertainties, and invite further discussion. People like Jeremy then enter the conversation, bringing a useful different perspective, which is exactly what happened here. And then we can update policy proposals over time, to give more or less weight to different considerations in light of new arguments, political evidence (what do people think is riskier: too much centralization or too much decentralization?) and technical evidence. And then maybe eventually there is enough consensus to overcome the vetocratic inertia of our political system and make new policy. Or maybe a consensus is reached that this is not necessary. Or maybe no consensus is ever reached, in which case the default is nothing happens.

Contrast this with what I think the “reduce unnecessary polarization” approach would tend to recommend, which is something closer to starting the conversation with an attempt at a compromise position. It is sometimes useful to do this. But I think that, in terms of actual truth discovery, laying out the full case for one’s own perspective is productive and necessary. Without full-throated policy proposals, policy will tend too much either towards an unprincipled centrism (wherein all perspectives are seen as equally valid and therefore worthy of compromise) or towards the perspectives of those who defect from the “start at compromise” policy. When the stakes are really high, this seems bad.

To be clear, I don’t think you’re advocating for this “compromise-only” position. But in the case of Jeremy and Dislightenment specifically, I think this is what it would have taken to avoid polarization (and I doubt even that would have worked): writing FAIR with a much mushier, “who’s to say?” perspective.

In retrospect, I think it’s perfectly reasonable to think that we should have talked about centralization concerns more in FAIR. In fact, I endorse that proposition. And of course it was in some sense unnecessary to write it with the exact discussion of centralization that we did. But I nevertheless do not think that we can be said to have caused Jeremy to unnecessarily polarize against us, because I think him polarizing against us on the basis of FAIR is in fact not reasonable.

On ‘elite panic’ and ‘counter-enlightenment’, he’s not directly comparing FAIR to it I think. He’s saying that previous attempts to avoid democratisation of power in the Enlightenment tradition have had these flaws.

I disagree with this as a textual matter. Here are some excerpts from Dislightenment (emphases added):

Proposals for stringent AI model licensing and surveillance will . . . potentially roll[] back the societal gains of the Enlightenment.

bombing data centers and global surveillance of all computers is the only way[!!!] to ensure the kind of safety compliance that FAR proposes.

FAR briefly considers this idea, saying ‘for frontier AI development, sector-specific regulations can be valuable, but will likely leave a subset of the high severity and scale risks unaddressed’ But it . . . promote[s] an approach which, as we’ve seen, could undo centuries of cultural, societal, and political development.

He fairly consistently paints FAIR (or licensing more generally, which is a core part of FAIR) as the main policy he is responding to.

I think, from Jeremy’s PoV, that centralization of power is the actual ballgame and what Frontier AI Regulation should be about. So one mention on page 31 probably isn’t good enough for him.

It is definitely fair for him to think that we should have talked about decentralization more! But I don’t think it’s reasonable for him to polarize against us on that basis. That seems like the crux of the issue.

Jeremy’s reaction is most sympathetic if you model the FAIR authors specifically or the TAI governance community more broadly as a group of people totally unsympathetic to distribution of power concerns. The problem is that that is not true. My first main publication in this space was on the risk of excessively centralized power from AGI; another lead FAIR coauthor was on that paper too. Other coauthors have also written about this issue: e.g., 1; 2; 3 at 46–48; 4; 5; 6. It’s a very central worry in the field, dating back to the first research agenda. So I really don’t think polarization against us on the grounds that we have failed to give centralization concerns a fair shake is reasonable.

I think the actual explanation is that Jeremy and the group of which he is representative have a very strong prior in favor of open-sourcing things, and find it morally outrageous to propose restrictions thereon. While I think a prior in favor of OS is reasonable (and indeed correct), I do not think it reasonable for them to polarize against people who think there should be exceptions to the right to OS things. I think that it generally stems from an improper attachment to a specific method of distributing power without really thinking through the limits of that justification, or acknowledging that there even could be such limits.

You can see this dynamic at work very explicitly with Jeremy. In the seminar you mention, we tried to push Jeremy on whether, if a certain AI system turns out to be more like an atom bomb and less like voting, he would still think it’s good to open-source it. His response was that AI is not like an atomic bomb.

Again, a perfectly fine proposition to hold on its own. But it completely fails to either: (a) consider what the right policy would be if he is wrong, (b) acknowledge that there is substantial uncertainty or disagreement about whether any given AI system will be more bomb-like or voting-like.

That’s a fine reaction to me, just as it’s fine for you and Marcus to disagree on the relative costs/benefits and write the FAIR paper the way you did.

I agree! But I guess I’m not sure where the room for Jeremy’s unnecessary polarization comes in here. Do reasonable people get polarized against reasonable takes? No.

I know you’re not necessarily saying that FAIR was an example of unnecessary polarizing discourse. But my claim is either (a) FAIR was in fact unnecessarily polarizing, or (b) Jeremy’s reaction is not good evidence of unnecessary polarization, because it was a reaction to FAIR.

There’s probably a difference between ~July23-Jeremy and ~Nov23Jeremy

I think all of the opinions of his we’re discussing are from July 23? Am I missing something?

On the actual points though, I actually went back and skim-listened to the the webinar on the paper in July 2023, which Jeremy (and you!) participated in, and man I am so much more receptive and sympathetic to his position now than I was back then, and I don’t really find Marcus and you to be that convincing in rebuttal,

A perfectly reasonable opinion! But one thing that is not evident from the recording is that Jeremy showed up something like 10-20 minutes into the webinar, and so in fact missed a large portion of our presentation. Again, I think this is more consistent with some story other than unnecessary polarization. I don’t think any reasonable panelist would think it appropriate to participate in a panel where they missed the presentation of the other panelists, though maybe he had some good excuse.