RobBensinger comments on A challenge for AGI organizations, and a challenge for readers

RobBensinger 3 Dec 2022 21:08 UTC
4 points
0 ∶ 0
I agree with a lot of what you say! I still want to move EA in the direction of “people just say what’s on their mind on the EA Forum, without trying to dot every i and cross every t; and then others say what’s on their mind in response; and we have an actual back-and-forth that isn’t carefully choreographed or extremely polished, but is more like a real conversation between peers at an academic conference”.
(Another way to achieve many of the same goals is to encourage more EAs who disagree with each other to regularly talk to each other in private, where candor is easier. But this scales a lot more poorly, so it would be nice if some real conversation were happening in public.)
A lot of my micro-decisions in making posts like this are connected to my model of “what kind of culture and norms are likely to result in EA solving the alignment problem (or making a lot of progress)?”, since I think that’s the likeliest way that EA could make a big positive difference for the future. In that context, I think building conversations about heavily polished, “final” (rather than in-process) cognition, tends to be insufficient for fast and reliable intellectual progress:
- Highly polished content tends to obscure the real reasons and causes behind people’s views, in favor of reasons that are more legible, respectable, impressive, etc. (See Beware defensibility.)
  - AGI alignment is a pre-paradigmatic proto-field where making good decisions will probably depend heavily on people having good technical intuitions, intuiting patterns before they know how to verbalize those patterns, and generally becoming adept at noticing what their gut says about a topic and putting their gut in contact with useful feedback loops so it can update and learn.
  - In that context, I’m pretty worried about an EA where everyone is hyper-cautious about saying anything that sounds subjective, “feelings-ish”, hard-to-immediately-transmit-to-others, etc. That might work if EA’s path to improving the world is via donating more money to AMF or developing better vaccine tech, but it doesn’t fly if making (and fostering) conceptual progress on AI alignment is the path to impact.
  - Ideally, it shouldn’t merely be the case that EA technically allows people to candidly blurt out their imperfect, in-process thoughts about things. Rather, EA as a whole should be organized around making this the expected and default culture (at least to the degree that EAs agree with me about AI being a top priority), and this should be reflected in a thousand small ways in how we structure our conversation. Normal EA Forum conversations should look more like casual exchanges between peers at an academic conference, and less like polished academic papers (because polished academic papers are too inefficient a vehicle for making early-stage conceptual progress).
  - I think this is not only true for making direct AGI alignment progress, but is also true for converging about key macrostrategy questions (hard vs. soft takeoff; overall difficulty of the alignment; probability of a sharp left turn; impressiveness of GPT-3; etc.). Insofar as we haven’t already converged a lot on these questions, I think a major bottleneck is that we’ve tried too much to make our reasoning sound academic-paper-ish before it’s really in that format, with the result that we confuse ourselves about our real cruxes, and people end up updating a lot less than they would in a normal back-and-forth.
- Highly polished, heavily privately reviewed and edited content tends to reflect the beliefs of larger groups, rather than the beliefs of a specific individual.
  - This often results in deference cascades, double-counting evidence, and herding: everyone is trying (to some degree) to bend their statements in the direction of what everyone else thinks. I think it also often creates “phantom updates” in EA, where there’s a common belief that X is widely believed, but the belief is wrong to some degree (at least until everyone updates their outside views because they think other EAs believe X).
  - It also has various directly distortionary effects (e.g., a belief might seem straightforwardly true to all the individuals at an org, but doesn’t feel like “the kind of thing” an organization writ large should endorse).
In principle, it’s not impossible to push EA in those directions while also passing drafts a lot more in private. But I hope it’s clearer why that doesn’t seem like the top priority to me (and why it could be at least somewhat counter-productive) given that I’m working with this picture of our situation.
I’m happy to heavily signal-boost replies from DM and Anthropic staff (including editing the OP), especially if it shows that MIRI was just flatly wrong about how much those orgs already have a plan. And I endorse people docking MIRI points insofar as we predicted wrongly, here; and I’d prefer the world where people knew our first-order impressions of where the field’s at in this case, and were able to dock us some points if we turn out to be wrong, as opposed to the world where everything happens in private.
(I think I still haven’t communicated fully why I disagree here, but hopefully the pieces I have been able to articulate are useful on their own.)