How does Rationalist Community Attention/Consensus compare? I’d like to mention a paper of mine published at the top AI theory conference which proves that when a certain parameter of a certain agent is set sufficiently high, the agent will not aim to kill everyone, while still achieving at least human-level intelligence. This follows from Corollary 14 and Corollary 6. I am quite sure most AI safety researchers would have confidently predicted no such theorems ever appearing in the academic literature. And yet there are no traces of any minds being blown. The associated Alignment Forum post only has 22 upvotes and one comment, and I bet you’ve never heard any of your EA friends discuss it. It hasn’t appeared, to my knowledge, in any AI safety syllabuses. People don’t seem to bother investigating or discussing whether their concerns with the proposal are surmountable. I’m reluctant to bring up this example since it has the air of a personal grievance, but I think the disinterest from the Rationality Community is erroneous enough that it calls for an autopsy. (To be clear, I’m not saying everyone should be hailing this as an answer to AI existential risk, only that it should definitely be of significant interest.)
I’m someone who has read your work (this paper and FGOIL, the latter of which I have included in a syllabus), and who would like to see more work in similar vein, as well as more formalism in AI safety. I say this to establish my bona fides, the way you established your AI safety bona fides.
I don’t think this paper is mind-blowing, and I would call it representative of one of the ways in which tailoring theoretical work for the peer-review process can go wrong. In particular, you don’t show that “when a certain parameter of a certain agent is set sufficiently high, the agent will not aim to kill everyone”, you show something more like “when you can design and implement an agent that acts and updates its beliefs in a certain way and can restrict the initial beliefs to a set containing the desired ones and incorporate a human into the process who has access to the ground truth of the universe, then you can set a parameter high enough that the agent will not aim to kill everyone” [edit: Michael disputes this last point, see his comment below and my response], which is not at all the same thing. The standard academic failure mode is to make a number of assumptions for tractability that severely lower the relevance of the results (and the more pernicious failure mode is to hide those assumptions).
You’d be right if you said that most AI safety people did not read the paper and come to that conclusion themselves, and even if you said that most weren’t even aware of it. Very little of the community has the relevant background for it (and I would like to see a shift in that direction), especially the newcomers that are the targets of syllabi. All that said, I’m confident that you got enough qualified eyes on it that if you had shown what you said in your summary, it would have had an impact similar in scale to what you think is appropriate.
This comment is somewhat of a digression from the main post, but I am concerned that if someone took your comments about the paper at face value, they would come away with an overly negative perception of how the AI safety community engages with academic work.
I thought the paper itself was poorly argued, largely as a function of biting off too much at once. Several times the case against the TUA was not actually argued, merely asserted to exist along with one or two citations for which it is hard to evaluate if they represent a consensus. Then, while I thought the original description of TUA was accurate, the TUA response to criticisms was entirely ignored. Statements like “it is unclear why a precise slowing and speeding up of different technologies...across the world is more feasible or effective than the simpler approach of outright bans and moratoriums” were egregious, and made it seem like you did not do your research. You spoke to 20+ reviewers, half of which were sought out to disagree with you, and not a single one could provide a case for differential technology? Not a single mention of the difficulty of incorporating future generations into the democratic process?
Ultimately, I think the paper would have been better served by focusing on a single section, leaving the rest to future work. The style of assertions rather than argument and skipping over potential responses comes across as more polemical than evidence-seeking. I believe that was the major contributor to blowback you have received.
I agree that more diversity in funders would be beneficial. It is harmful to all researchers if access to future funding is dependent on the results of their work. Overall, it is unclear from your post the actual extent of the blowback. What does “tried to prevent the paper being published” mean? Is the threat of withdrawn funding real or imagined? Were the authors whose work was criticized angry, and did they take any actions to retaliate?
Finally, I would like to abstract away from this specific paper. Criticisms of the dominant paradigm limiting future funding and career opportunities is a sign of terrible epistemics in a field. However, poor criticisms of the dominant paradigm limiting future funding and career opportunities is completely valid. The one line you wrote that I think all EAs would agree with is “This is not a game. Fucking it up could end really badly”. If the wrong arguments being made would cause harm when believed, it is not only the right but the responsibility of funders to reduce their reach. Of course, the difficulty is in differentiating wrong criticisms from criticisms against the current paradigm, while within the current paradigm. The responsibility of the researcher is to make their case as bulletproofed as possible, and designed to convince believers in the current paradigm. Otherwise, even if their claims are correct, they won’t make an impact. The “Effective” part of EA includes making the right arguments to convince the right people, rather than the argument that is cathartic to unleash.