Some biases and selection effects in AI risk discourse

Link post

These are some selection effects impacting what ideas people tend to get exposed to and what they’ll end up believing, in ways that make the overall epistemics worse. These have mostly occured to me about AI discourse (alignment research, governance, etc), mostly on LessWrong. (They might not be exclusive to discourse on AI risk.)

(EDIT: I’ve reordered the sections in this post so that less people get stuck on what was the first section and so they a better chance of reading the other two sections.)

Outside-view is overrated

In AI discourse, outside-view (basing one’s opinion on other people’s and on (things that seem like) precedents), as opposed to inside-view (having an actual gears-level understanding of how things work), is being quite overrated for a variety of reasons.

  • There’s the issue of outside-view double-counting, as in this comic I drew. When building an outside-view, people don’t particularly check whether 10 people say the same thing because they came up with independently, or because 9 of them heard it from the 1 person who came up with it, and they themselves mostly stuck to outside-view.

  • I suspect that outside-view is being over-valued because it feels safe — if you just follow what you believe to be consensus and/​or an authority, then it can feel less like it’s “your fault” if you’re wrong. You can’t really just rely on someone else’s opinion on something, because they might be wrong, and to know if they’re wrong you need an inside-view yourself. And there’s a fundamental sense in which developing your own inside-view of AI risk is contributing to the research, whereas just reusing what exists is neutral, and {reusing what exists + amplifying it based on what has status or memetic virulence} is doing damage to the epistemic commons, due to things like outside-view double-counting.

  • There’s occasionally a tendency to try to adopt the positions that are held by authority figures/​organizations in order to appeal to them, to get resources/​status, and/​or generally to fit in. (Similarly, be wary of the opposite as well — having a wacky opinion in order to get quirkyness/​interestingness points.)

  • “Precedents”-based ideas are pretty limited — there isn’t much that looks similar to {us building things that are smarter than us and as-flexible-as-software}. The comparison with {humans as mesa-optimizers relative to evolution} has been taken way outside of its epistemic range.

Arguments about P(doom) are filtered for nonhazardousness

Some of the best arguments for high P(doom) /​ short timelines that someone could make would look like this:

It’s not that hard to build an AI that kills everyone: you just need to solve [some problems] and combine the solutions. Considering how easy it is compared to what you thought, you should increase your P(doom) /​ shorten your timelines.

But obviously, if people had arguments of this shape, they wouldn’t mention them, because they make it easier for someone to build an AI that kills everyone. This is great! Carefulness about exfohazards is better than the alternative here.

But people who strongly rely on outside-view for their P(doom) /​ timelines should be aware that their arguments are being filtered for nonhazardousness. Note that this plausibly applies to other topics than P(doom) /​ timelines.

Note that beyond not-being-mentioned, such arguments are also anthropically filtered against: in worlds where such arguments have been out there for longer, we died a lot quicker, so we’re not there to observe those arguments having been made.

Confusion about the problem often leads to useless research

People enter AI risk discourse with various confusions, such as:

  • What are human values?

  • Aligned to whom?

  • What does it mean for something to be an optimizer?

  • Okay, unaligned ASI would kill everyone, but how?

  • What about multipolar scenarios?

  • What counts as AGI, and when do we achieve that?

Those questions about the problem do not particularly need fancy research to be resolved; they’re either already solved or there’s a good reason why thinking about them is not useful to the solution. For these examples:

These answers (or reasons-why-answering-is-not-useful) usually make sense if you’re familiar with rationality and alignment, but some people are still missing a lot of the basics of rationality and alignment, and by repeatedly voicing these confusions they cause people to think that those confusions are relevant and should be researched, causing lots of wasted time.

It should also be noted that some things are correct to be confused about. If you’re researching a correlation or concept-generalization which doesn’t actually exist in the territory, you’re bound to get pretty confused! If you notice you’re confused, ask yourself whether the question is even coherent/​true, and ask yourself whether figuring it out helps save the world.

Crossposted from LessWrong (22 points, 21 comments)