Thanks for writing this post, it’s always useful to hear people’s experiences! For others considering a PhD, I just wanted to chime in and say that my experience in a PhD program has been quite different (4th year PhD in ML at UC Berkeley). I don’t know how much this is the field, program or just my personality. But I’d encourage everyone to seek a range of perspectives: PhDs are far from uniform.
I hear the point about academic incentives being bad a lot, but I don’t really resonate with it. A summary of my view is that incentives are misaligned everywhere, not just academia. Rather than seeking a place with good (in general) incentives, first figure out what you want to do, and then find a place where the incentives happen to be compatible with that (even if for the “wrong” reasons).
I’ve worked in quant finance, industry AI labs, and academic AI research. There were serious problems with incentives in all three. I found this particularly unforgivable in quantitative finance, where the goal is pretty clear: make money. You can even measure day to day if you’re making money! But getting the details right is hard. At one place I’m aware of, people were paid based on their group’s profitability, divided by how risky their strategies were. This seems reasonable: profit good, risk bad. The problem was, it measured the risk of your strategy in isolation—not how it affected the whole firm’s risk levels. So different groups colluded to swap strategies, which made each of them seem less risky in isolation (so they could paid more), without changing the firm’s overall strategy at all!
Incentivizing research is an unusually hard problem. Agendas can take years to pay off. The best agendas are often really high variance, so someone might fail several times but still be doing great (in expectation) work. Given this backdrop, a PhD actually seems pretty reasonable.
It’s pretty hard to get fired doing a PhD, and some (by no means all) advisors will let you work on pretty much whatever you want. So, you have a 3-5 year runway to just work on whatever topics you think are best. At the end of those 3-5 years, you have to convince a panel of experts (who you get to hand-pick!) that you did something that’s “worth” a PhD.
As far as things go, this is incredibly flexible, and is evidenced by the large number of people who goof of during their PhD. (This is the pitfall of weak incentives.) It also seems like a pretty reasonable incentive. If after 5 years of work you can’t convince people what you did was good, it might be that it’s incredibly ahead of it’s time, but more likely you either need to communicate it better or the work just wasn’t that great by the standards of the field.
The “by the standards of the field” is the key issue here. Some high impact work just doesn’t fit well into the taste of a particular field. Perhaps it falls between disciplinary boundaries. Or it’s more about distilling existing research, so isn’t novel enough. That sucks, and academic research is probably the wrong venue to be pursuing this in—but it doesn’t make academic incentives bad per se. Just bad for that kind of research.
I think the bigger issue are the tacit social pressures to publish and make a name for yourself. These matter a fair bit for the job market, so it’s a real pressure. But I think analogous or equal pressures exist outside of academia. If you work at an industry lab, there might be a pressure to deliver flashy results of products. If you work as an independent researcher, funders will want to see publications or other signs of progress.
I’d love to see better incentives, but I think it’s important to acknowledge that mechanism design for research is a hard problem, not just that academia is screwing it up uniquely badly.
First of all, I’m sorry to hear you found the paper so emotionally draining. Having rigorous debate on foundational issues in EA is clearly of the utmost importance. For what it’s worth when I’m making grant recommendations I’d view criticizing orthodoxy (in EA or other fields) as a strong positive so long as it’s well argued. While I do not wholly agree with your paper, it’s clearly an important contribution, and has made me question a few implicit assumptions I was carrying around.
The most important updates I got from the paper:
Put less weight on technological determinism. In particular, defining existential risk in terms of a society reaching “technological maturity” without falling prey to some catastrophe frames technological development as being largely inevitable. But I’d argue even under the “techno-utopian” view, many technological developments are not needed for “technological maturity”, or at least not for a very long time. While I still tend to view development of things like advanced AI systems as hard to stop (lots of economic pressures, geographically dispersed R&D, no expert consensus on whether it’s good to slow down/accelerate), I’d certainly like to see more research into how we can affect the development of new technologies, beyond just differential technological advancement.
“Existential risk” is ambiguous, so hard to study formally, we might want to replace it with more precise terms like “extinction risk” that are down-stream of some visions of existential risk. I’m not sure how decision relevant this ends up being, I think disagreement about how the world will unfold explains more of the disagreement on x-risk probabilities than definitions of x-risk, but it does seem worth trying to pin down more precisely.
“Direct” vs “indirect” x-risk is a crude categorization, as most hazards lead to risks via a variety of pathways. Taking AI: there are some very “direct” risks such as a singleton AI developing some superweapon, but also some more “indirect” risks such as an economy of automated systems gradually losing alignment with collective humanity.
My main critiques:
I expect a fairly broad range of worldviews end up with similar conclusions to the “techno-utopian approach” (TUA). The key beliefs seem to be that: (a) substantially more value is present in the future than exists today; (b) we have a moral obligation to safeguard that. The TUA is a very strong version of this, where there is many orders of magnitude more value in the future (transhumanism, total utilitarianism) and moral obligation is equal in the future and present (strong longtermism). But a non-transhumanist who wants 8 billion non-modified, biological humans to continue happily living on Earth for the next 100,000 years and values future generations at 1% of current generations would for many practical purposes make the same decisions.
I frequently found myself unsure if there was actually a concrete disagreement between your views and those in the x-risk community, including those you criticize, beyond a choice of framing and emphasis. I understand it can be hard to nail down a disagreement, but this did leave me a little unsatisfied. For example, I’m still unsure what it really means to “democratise research and decision-making in existential risk” (page 26). I think almost all x-risk researchers would welcome more researchers from complementary academic disciplines or philosophical bents, and conversely I expect you would not suggest that random citizen juries should start actively participating in research. One concrete question I had is what axes you’d be most excited for the x-risk research field to become more diverse on at the margin: academic discipline, age, country, ethnicity, gender, religion, philosophical views, …?
Related to the above, it frequently felt like the paper was arguing against uncharitable versions of someone else’s views—VWH is an example others have brought up. On reflection, I think there is value to this, as many people may be holding those versions of the person’s views even if the individual themselves had a more nuanced perspective. But it did often make me react “but I subscribe to <view X> and don’t believe <supposed consequence Y>”! One angle you could consider taking in future work is to start by explaining your most core disagreements with a particular view, and then go on to elaborate on problems with commonly held adjacent positions.
I’d also suggest that strong longtermism is a meaningfully different assumption to e.g. transhumanism and total utilitarianism. In particular, the case for existential or extinction risk research seems many orders of magnitude weaker under a near-termist than strong longtermist worldview. Provided you think strong longtermism is at least credible, it seems reasonable to assume it when doing x-risk research, even though you should discount the impact of such interventions based on your credence in longtermism when making a final decision on where to allocate resources. If there is a risk that seems very likely to occur (e.g. AI, bio) such that it is plausible under both near-termist and longtermist grounds then perhaps it makes sense to drop this assumption, but even then I suspect it is often easier to just run two different analyses, given the different outcome metrics of concerns (e.g. % x-risk averted vs QALYs saved).