Intellectual Diversity in AI Safety

There are these undercurrents running through the way I hear people talk about everyone not already inside the AI-safety umbrella that imply they’re not worth talking to until they understand all the basic premises, where basic premises are something like “all of Superintelligence and some of Yudkowsky”. If you talk to these AI safety people, they’re generally willing to acknowledge some version of this pretty explicitly.

No one wants to rehash the same arguments a million times. (“So, like Skynet? Killer robots? Come on, you can just unplug it.”) But if everyone has to be more-or-less on board with some mandatory reading as the price of entry, you’re going to get a more homogeneous field than you otherwise could have gotten.

Why do I think that drawing in a wide variety of viewpoints is important?

The less varied the intellectual pedigree of AI safety is, the more likely it is that everyone is making correlated mistakes.

In my opinion, the landscape of AI’s future is dominated by unknown unknowns. We have not yet even thought of all of the ways it could go, let alone which are more likely or how to deal with them.

In part, I think the homogeneity of people’s background worldviews is an effect of the small number of people that quite recently drew a reasonably large group of people’s attention to the issue, which is only to their credit (otherwise, there might be no conversation to speak of, homogeneous or otherwise). But if you’re trying to do creative work and come up with as many possibilities as you can, you want intellectual diversity in the people who are thinking about the problem. If everyone’s first exposure to AI safety involved foom, for instance, they’re going to be thinking very different thoughts from someone who’s never heard of it. Even if they disagree, it might color their later intuitions.

It seems to me that AI safety has already allowed weak, confused, or just plain incorrect arguments to stand due to insufficient questioning of shared assumptions. Ben Garfinkel argues in On Classic Arguments for AI Discontinuities that classical arguments fail to adequately distinguish between a sudden jump to AGI and one from AGI to superintelligent systems. By arguing for the latter assuming the former, they overestimate the possibility of a catastrophic jump from AGI to superintelligence.

That’s one set of assumptions that someone has put in effort to untangle. I would be very surprised if there weren’t a lot more buried in our fundamental understanding of the issues.


The obvious counter-argument is that most fields do not work like this and seem to be the better for it. No one’s going to take a biologist seriously if they’re running around quoting Lamark. Deriving your own physics from first principles is the domain of crackpots. In general, discarding the work of previous thinkers wholesale is not often a good idea.

Why do I think it’s worth trying here? AI safety is a pre-paradigmatic science that is much newer than biology and physics. As it stands, it is also much less grounded in testable facts. A lot of intellectual progress in the basic underpinnings seems to be made when someone says “I thought of a way that AI could go, here’s a blog post about why I think so”. If it’s a good, persuasive seeming argument, some people integrate it into their worldviews and consider that as a scenario that needs to be prepared for.

Other downsides:

  • It is harder to talk to someone who doesn’t have your shared concepts.

  • Low rate of interesting outsider opinions to arguments that have well established counter-arguments or are otherwise very obviously flawed.

  • Not being introduced to previous work means that people will spend a lot of time rederiving existing concepts.


I don’t think all the existing arguments are bad, or that we should jettison everything and start over, or anything so dramatic. The current state of knowledge is the work of a lot of very smart people that have created something very valuable. But I do think it would be helpful to aim for a wider variety of viewpoints.

Some possible actions:

  • I’m not sure how you would get people from outside the existing structures on board with the basic program without exposing them to the existing arguments, but it seems like an interesting experiment to try. What do you get when you take some intelligent ML people or evolutionary biologists or economists or philosophers or whatever you think is an interesting background to start thinking through the problem and ask them to do so without priming them with the large number of established concepts already floating around?

Obviously this is kind of dumb as presented, no one does math by teaching people basic algebra and then going “okay, now rederive modern mathematics”, but I suspect there’s a better thought out version of this proposal that might have interesting results.

  • Someone thinking within the existing intellectual framework might benefit from talking through their ideas with someone who they respect who hasn’t engaged with it much.

  • It might be worth people’s time to try to really understand the criticism of well-informed outsiders, and try to see if they disagree in fundamental assumptions.