There are a few obvious flaws in interpreting the survey as is:
sample size and sampling bias: 13% respondents amount to 700 ish people surveyed. That is fairly small. Secondly, with MIRI and AI impacts’s respective logos on the front page of the survey document, it introduces a bias into who is taking the survey, very likely it is just the people familiar with lesswrong etc who has heard of these organizations. Here’s why a lot of serious AI researchers don’t engage with MIRI et al:
MIRI hasn’t shipped a SoTA in AI alignment or AI research in the last 5-6 years
A quick look at the publications on their website shows they don’t publish at top ML conferences ( ICML, ICLR, NeurIPS, etc). Less wrong “research” is not research, it is usually philosophy and not backed by performing experiments ( “thought experiments” don’t count).
appeal to authority fallacy: a lot of people saying something doesn’t make it true, so I’d advise people to not confuse between “AI research is bad” and “a proportion of surveyed people think AI research could be bad”. Some moral outrage in the comment section takes some people feeling this way as evidence for defacto truths, while the reality is that they are contested claims.
modeling the future: Human beings are notoriously bad at modeling the future. Imagine if we ran a survey in Oct among EAs about FTX health. Not just that modeling the future is hard, but modeling the far future is exponentially harder and existential risk analyses are often incomplete because:
New improvements in AI safety research is not accounted in these projections
Multiagent dynamics of a world with multiple AIs is not modeled in catastrophic scenario projections
Multiagent dynamics with governments/stakeholders is not modeled
Phased deployment: accelerate, then align to use case, then accelerate again, as we are doing today is also not modeled. AI deployment is currently also accelerating alignment research because alignment is needed to build a useful product- a gaslighting chatbot is a bad product compared to a harmless, helpful one.
New research produces new knowledge previously not known.
Anthropic’s write up, afaik, is a nuanced take and may be a reasonable starting point for an informed and calibrated take towards AI research.
I usually don’t engage with AI takes here because it is a huge echo chamber in here but these are my two cents!
Humans need not be around to give a penalty at inference time, just like how GPT4 is not penalized by individual humans, but that the reward is learned / programmed. Even if all humans are sleeping / dead today, GPT can run inference according to the reward we preprogrammed. They are not doing pure online learning.
It is a logical fallacy to account for future increase in capabilities but not future advances in safety research. You’re claiming AGI will be an x-risk based on scaling current capabilities only, but you’re failing to scale safety. Generalization to unsafe scenarios is a situation we want to write tests for before deploying in situations where they may occur. Phase deployment should help test whether we can generalize to increasingly harder situations.
The recent push for productization is making everyone realize that alignment is a capability. A gaslighting chatbot is a bad chatbot compared to a harmless helpful one. As you can see currently, the world is phasing out AI deployment, fixing the bugs, then iterating.
Humans are unaligned in various ways, it looks like a lot of AIs will be deployed in the future, many aligned to different objectives. I’m skeptical of MIRI’s modeling of risk because y’all only talk about one super-powerful AGI that is godlike, but y’all haven’t modeled multiple companies, multiple AGIs, multiple deployments. Unlike the former, this is going to be the most likely scenario that is frequently unmentioned in forecasting. Future compute is going to be distributed among these AGIs too, so in many ways we end up at something akin to a modern society of humans.
Then why the overemphasis/obsession on doom scenario? It makes for a great robot-uprising scifi story but is unscientific. If you approximate the likelihood of future scenarios as a gaussian distribution, wiping out all humans is so extreme and long tailed that it is less likely than almost any other scenario in the set, and the least likely scenario in that set has a probability whose limit approaches to zero given the infinite set of possibilities summing up to 1.0. Given that the number of possibilities are infinite, the likelihood of any one possibility is far too small, close to zero. The likelihood of unaligned AGIs jerking each other off in a massive orgy for eternity is as likely as wiping out humans (more likely accounting for resistance to latter scenario).