David Johnston comments on A challenge for AGI organizations, and a challenge for readers

David Johnston 3 Dec 2022 14:04 UTC
11 points
2 ∶ 1
What sort of substantial value would you expect to be added? It sounds like we either have a different belief about the value-add, or a different belief about the costs.
I’d be very surprised if the actual amount of big-picture strategic thinking at either organisation was “very little”. I’d be less surprised if they didn’t have a consensus view about big-picture strategy, or a clearly written document spelling it out. If I’m right, I think the current content is misleading-ish. If I’m wrong and actually little thinking has been done—there’s some chance they say “we’re focused on identifying and tackling near-term problems”, which would be interesting to me given what I currently believe. If I’m wrong and something clear has been written, then making this visible (or pointing out its existence) would also be a useful update for me.
Polished vs sloppy
Here are some dimensions I think of as distinguishing sloppy from polished:
- Vague hunches <-> precise theories
- First impressions <-> thorough search for evidence/prior work
- Hard <-> easy to understand
- Vulgar <-> polite
- Unclear <-> clear account of robustness, pitfalls and so forth
All else equal, I don’t think the left side is epistemically superior. It can be faster, and that might be worth it, but there are obvious epistemic costs to relying on vague hunches, first impressions, failures of communication and overlooked pitfalls (politeness is perhaps neutral here). I think these costs are particularly high in, as you say, domains that are uncertain and disagreement-heavy.
I think it is sloppy to stay too close to the left if you think the issue is important and you have time to address it properly. You have to manage your time, but I don’t think there are additional reasons to promote sloppy work.
You say that there are epistemic advantages to exposing thought processes, and you give the example of dialogues. I agree there are pedagogical advantages to exposing thought processes, but exposing thoughts clearly also requires polish, and I don’t think pedagogy is a high priority most of the time. I’d be way more excited to see more theory from MIRI than more dialogues.
If my reasoning process is actually flawed, then I want other EAs to be aware of that, so they can have an accurate model of how much weight to put on my views.
I don’t think it’s realistic to expect Lightcone forums to do serious reviews of difficult work. That takes a lot of individual time and dedication; maybe you occasionally get lucky, but you should mostly expect not to.
I agree that I’m not a paradigmatic example of the EAs who most need to hear this lesson [of exposing the thought process]; but I think non-established EAs heavily follow the example set by established EAs, so I want to set an example that’s closer to what I actually want to see more of
Maybe I’ll get into this more deeply one day, but I just don’t think sharing your thoughts freely is a particularly effective way to encourage other people to share theirs. I think you’ve been pretty successful at getting the “don’t worry about being polite to OpenAI” message across, less so the higher level stuff.
- RobBensinger 3 Dec 2022 21:08 UTC
  4 points
  0 ∶ 0
  Parent
  I agree with a lot of what you say! I still want to move EA in the direction of “people just say what’s on their mind on the EA Forum, without trying to dot every i and cross every t; and then others say what’s on their mind in response; and we have an actual back-and-forth that isn’t carefully choreographed or extremely polished, but is more like a real conversation between peers at an academic conference”.
  (Another way to achieve many of the same goals is to encourage more EAs who disagree with each other to regularly talk to each other in private, where candor is easier. But this scales a lot more poorly, so it would be nice if some real conversation were happening in public.)
  A lot of my micro-decisions in making posts like this are connected to my model of “what kind of culture and norms are likely to result in EA solving the alignment problem (or making a lot of progress)?”, since I think that’s the likeliest way that EA could make a big positive difference for the future. In that context, I think building conversations about heavily polished, “final” (rather than in-process) cognition, tends to be insufficient for fast and reliable intellectual progress:
  - Highly polished content tends to obscure the real reasons and causes behind people’s views, in favor of reasons that are more legible, respectable, impressive, etc. (See Beware defensibility.)
    AGI alignment is a pre-paradigmatic proto-field where making good decisions will probably depend heavily on people having good technical intuitions, intuiting patterns before they know how to verbalize those patterns, and generally becoming adept at noticing what their gut says about a topic and putting their gut in contact with useful feedback loops so it can update and learn.
    In that context, I’m pretty worried about an EA where everyone is hyper-cautious about saying anything that sounds subjective, “feelings-ish”, hard-to-immediately-transmit-to-others, etc. That might work if EA’s path to improving the world is via donating more money to AMF or developing better vaccine tech, but it doesn’t fly if making (and fostering) conceptual progress on AI alignment is the path to impact.
    Ideally, it shouldn’t merely be the case that EA technically allows people to candidly blurt out their imperfect, in-process thoughts about things. Rather, EA as a whole should be organized around making this the expected and default culture (at least to the degree that EAs agree with me about AI being a top priority), and this should be reflected in a thousand small ways in how we structure our conversation. Normal EA Forum conversations should look more like casual exchanges between peers at an academic conference, and less like polished academic papers (because polished academic papers are too inefficient a vehicle for making early-stage conceptual progress).
    I think this is not only true for making direct AGI alignment progress, but is also true for converging about key macrostrategy questions (hard vs. soft takeoff; overall difficulty of the alignment; probability of a sharp left turn; impressiveness of GPT-3; etc.). Insofar as we haven’t already converged a lot on these questions, I think a major bottleneck is that we’ve tried too much to make our reasoning sound academic-paper-ish before it’s really in that format, with the result that we confuse ourselves about our real cruxes, and people end up updating a lot less than they would in a normal back-and-forth.
  - Highly polished, heavily privately reviewed and edited content tends to reflect the beliefs of larger groups, rather than the beliefs of a specific individual.
    This often results in deference cascades, double-counting evidence, and herding: everyone is trying (to some degree) to bend their statements in the direction of what everyone else thinks. I think it also often creates “phantom updates” in EA, where there’s a common belief that X is widely believed, but the belief is wrong to some degree (at least until everyone updates their outside views because they think other EAs believe X).
    It also has various directly distortionary effects (e.g., a belief might seem straightforwardly true to all the individuals at an org, but doesn’t feel like “the kind of thing” an organization writ large should endorse).
  In principle, it’s not impossible to push EA in those directions while also passing drafts a lot more in private. But I hope it’s clearer why that doesn’t seem like the top priority to me (and why it could be at least somewhat counter-productive) given that I’m working with this picture of our situation.
  I’m happy to heavily signal-boost replies from DM and Anthropic staff (including editing the OP), especially if it shows that MIRI was just flatly wrong about how much those orgs already have a plan. And I endorse people docking MIRI points insofar as we predicted wrongly, here; and I’d prefer the world where people knew our first-order impressions of where the field’s at in this case, and were able to dock us some points if we turn out to be wrong, as opposed to the world where everything happens in private.
  (I think I still haven’t communicated fully why I disagree here, but hopefully the pieces I have been able to articulate are useful on their own.)