I also think average utilitarianism doesnât seem very plausible. I was just using it as an example of a non-linear theory (though as Will notes if any individual is linear in resources so is the world as a whole, just with a smaller derivative).
OscarDđ¸
Interesting, is this the sort of thing you have in mind? It at least seems similar to me, and I remember thinking that post got at something important.
A bull case for convergence:
Factory farming, and to a lesser extent global poverty, persist because there are some costs to ending them, and the rich arenât altruistic enough (or the altruists arenât rich enough) to end them. Importantly, it will not just be that factory farming itself ends, but due to cognitive dissonance, peopleâs moral views towards nonhumans will likely change a lot too once ~no-one is eating animals. So there will predictably be convergence on viewing c2025 treatment of animals as terrible.
There is an ongoing homogenization of global culture which will probably continue. As the educational and cultural inputs to people converge, it seems likely their beliefs (including moral beliefs) will also converge at least somewhat.
Some fraction of current disagreements about economic/âpolitical/âmoral questions are caused just by people not being sufficiently informed/ârational. So those disagreements would go away when we have ~ideal post-human reasoners.
A more ambitious version of the above is that perhaps post-humans will take epistemic humility very seriously, and they will know that all their peers are also very rational, so they will treat their own moral intuitions as little evidence of what the true/âbest/âidealised-upon-reflection moral beliefs are. Then, everyone just defers very heavily to the annual survey of all of (post)humanityâs views on e.g. population axiology rather than backing their own intuition.
(Arguably this doesnât count as convergence if peopleâs intuitions still differ, but I think if peopleâs all-things-considered beliefs, and therefore their actions, converge that is enough.)
But I agree we shouldnât bank on convergence!
It felt surprisingly hard to come up with important examples of this, I think because there is some (suspicious?) convergence between both extinction prevention and trajectory changing via improving the caution and wisdom with which we transition to ASI. This both makes extinction less likely (through more focus on alignment and control work, and perhaps slowing capabilities progress or differential accelerating safety-oriented AI applications) and improves the value of surviving futures (by making human takeovers, suffering digital minds etc less likely).
But maybe this is just focusing on the wrong resolution. Breaking down âmaking the ASI transition wiserâ, if we are mainly focused on extinction, AI control looks especially promising but less so otherwise. Digital sentience and rights work looks better if trajectory changes dominate, though not entirely. Improving company and government (especially USG) understanding of relevant issues seems good for both.
Obviously, asteroids, supervolcanoes, etc work looks worse if preventing extinction is less important.
Biorisk Iâm less sure aboutânon-AI mediated extinction from bio seems very unlikely, but what would a GCR pandemic do to future values? Probably ~neutral in expectation, but plausibly it could lead to the demise of liberal democratic institutions (bad), or to a post-recovery world that is more scared and committed to global cooperation to prevent that recurring (good).
Bostrom discusses things like this in Deep Utopia, under the label of âinterestingnessâ (where even if we edit post-humans to never be subjectively bored, maybe they run out of âobjectively interestingâ things to do and this leads to value not being nearly as high as it could otherwise be). I donât think he takes a stance on whether or how much interestingness actually matters, but I am only ~half way through the book so far.
(I have not read all of your sequence.) Iâm confused how being even close to 100% on something like this is appropriate, my sense is generally just that population ethics is hard, humans have somewhat weak minds in the space of possible minds, and our later post-human views on ethics might be far more subtle or quite different.
Notably, the extinction event in this scenario is non-AI related I assume? And needs to occur before we have created self-sufficient AIs.
If the true/âbest/âmy subjective axiology is linear in resources (e.g. total utilitarianism), lots of âgoodâ futures will probably capture a very small fraction of how good the optimal future could have been. Conversely, if axiology is not linear in resources (e.g. intuitive morality, average utilitarianism), good futures seem more likely to be nearly optimal. Therefore whether axiology is linear in resources is one of the cruxes for the debate week question.
Discuss.
A broader coalition of actors will be motivated to pursue extinction prevention than longtermist trajectory changes.[1] This means:
Extinction risk reduction work will be more tractable, by virtue of having broader buy-in and more allies.
Values change work will be more neglected.[2]
Is this a reasonable framingâif so which effect dominates or how can we reason through this?
I think a fair bit might come down to what we mean by âjudgement callsâ.
Letâs take an example of predicting who would win the US 2024 presidential election. Reasonable, well informed people can and did disagree about what the fair market price for such prediction contracts were. There are many important reasons on either side. If two people were perfect rationalist Bayesians, they would pool their collective evidence (including hard-to-explain intuitions) and both end up with the same joint probability estimate.
So to take it back to your example, maybe Alice and I are both reasonable people and after discussing thoroughly both update towards each other. But I donât see why we would need to end up at 50%. I suppose if by judgement call we mean âthere is room for reasonable disagreementâ then I agree with you, but if we mean the far stronger ârational predictors should be at 50% on the questionâ that seems unwarranted. And it seems to me for cluelessness to bind, we need the strong 50% version? As otherwise we can just act on the balance of probabilities, while also trying to gain more relevant information.
Interesting, I think I would expect more objections to P1 than to P2. P2 seems pretty solid to me.
For P1, I agree X-risk being bad is not as trivial to show as most of us might naively think. But maybe there are other interventions that are more robustly good in expectation (or robustly slightly greater than 50% good at least). Eg what about these sort of interventions, which do not try to make claims about what the longterm future should be like, but rather try to improve civilisation wisdom:
Find the most altruistic person you know, and direct their attention towards crucial considerations about moral patienthood, population ethics, decision theory etc.
The effect size is probably pretty small, but having altruistic people learn more about longtermism seems good in expectation.
Find the most competent/âpowerful/âintelligent person you know and try to make them more altruistic (especially regarding the far future).
Again, maybe not very tractable, but all else equal it seems better for agents to value reducing suffering and so forth.
Something to do with improved institutional decision-making or making humans more cooperative and pro-social generally?
Has fuzzy consequences, but seems positive in most worlds.
Pablo and I were trying to summarise the top page of Habrykaâs comments that he linked to (~13k words) not this departure post itself.
Hmm true, I gave it the whole Greater Wrong page of comments, maybe it just didnât quote from those for some reason.
fyi for anyone like me who doesnât have lots of the backstory here and doesnât want to read through Habrykaâs extensive corpus of EAF writings, here is Claude 3.7 Sonnetâs summary based on the first page of comments Habryka links to.
Based on Habrykaâs posts, I can provide a summary of his key disagreements with EA leadership and forum administrators that ultimately led to his decision to leave the community.
Key Disagreements
Leadership and Accountability: Habryka repeatedly expresses concern about what he sees as a âleaderlessâ EA community. He believes the community has shifted from being driven by independent intellectual contributors to being determined by âa closed-off set of leaders with little history of intellectual contributions.â He argues that almost everyone who was historically in leadership positions has stepped back and abdicated their roles.
Institutional Integrity: He criticizes EA organizations, particularly CEA (Centre for Effective Altruism), for prioritizing growth, prestige, and public image over intellectual integrity. In his posts, he describes personal experiences at CEA where they âdeployed highly adversarial strategiesâ to maintain control over EAâs public image and meaning.
FTX Situation: Habryka was particularly critical of how EA leadership handled Sam Bankman-Fried (SBF) and FTX. He claims to have warned people about SBFâs reputation for dishonesty, but these warnings were not heeded. He criticizes Will MacAskill and others for their continued endorsement of SBF despite red flags, and was frustrated by the lack of transparency and open discussion after FTXâs collapse.
Risk-Aversion and PR Focus: He repeatedly criticizes what he perceives as excessive risk-aversion and PR-mindedness among EA organizations. He argues this approach prevents honest discussion of important issues and contributes to a culture of conformity.
Funding Centralization: Habryka expresses concern about EA funding being increasingly centralized through a single large foundation (likely referring to Open Philanthropy), arguing this concentration of resources creates unhealthy power dynamics.
Community Culture: He criticizes the shift in EA culture away from what he describes as âa thriving and independent intellectual community, open to ideas and leadership from any internet weirdoâ toward something more institutional and conformist.
Failure to Create Change: Habryka states that he no longer sees âa way for arguments, or data, or perspectives explained on this forum to affect change in what actually happens with the extended EA community,â particularly in domains like AI safety research and community governance.
His departure post suggests a deep disillusionment with the direction of the EA community, expressing that while many of the principles of EA remain important, he believes âEA at large is causing large harm for the worldâ with âno leadership or accountability in-place to fix it.â He recommends others avoid posting on the EA Forum as well, directing them to alternatives like LessWrong.
I think I would have found this more interesting/âinformative if the scenarios (or other key parts of the analysis) came with quantitative forecasts. I realise of course this is hard, but without this I feel like we are left with many things being âplausibleâ. And then do seven âplausibleâs sum to make a âlikelyâ? Hard to say! That said, I think this could be a useful intro to arguments for short timelines to people without much familiarity with this discourse.
Good points, I agree with this, trends 1 and 3 seem especially important to me. As you note though the competitive (and safety) reasons for secrecy and research automation probably dominate.
Another thing that current trends in AI progress means though is that it seems (far) less likely that the first AGIs will be brain emulations. This in turn makes it less likely AIs will be moral patients (I think). Which I am inclined to think is good, at least until we are wise and careful enough to create flourishing digital minds.
Two quibbles:
âGiven the amount of money invested in the leading companies, investors are likely to want to take great precautions to prevent the theft of their most valuable ideas.â This would be nice, but companies are generally only incentivised to prevent low-resourced actors steal their models. To put in enough effort to make it hard for sophisticated attackers (e.g. governments) to steal the models is a far heavier lift and probably not something AI companies will do of their own accord. (Possibly you already agree with this though.
âThe power of transformer-based LLMs was discovered collectively by a number of researchers working at different companies.â I thought it was just Google researchers who invented the Transformer? It is a bit surprising they published it, I suppose they just didnât realise how transformative it would be, and there was a culture of openness in the AI research community.
My sense is that of the many EAs who have taken EtG jobs quite a few have remained fairly value-aligned? I donât have any data on this and am just going on vibes, but I would guess significantly more than 10%. Which is some reason to think the same would be the case for AI companies. Though plausibly the finance companyâs values are only orthogonal to EA, while the AI companyâs values (or at least plans) might be more directly opposed.
The comment that Ajeya is replying to is this one from Ryan, who says his timelines are roughly the geometric mean of Ajeyaâs and Danielâs original views in the post. That is sqrt(4*13) = 7.2 years from the time of the post, so roughly 6 years from now.
As Josh says, the timelines in the original post were answering the question âMedian Estimate for when 99% of currently fully remote jobs will be automatableâ.
So I think it was a fair summary of Ajeyaâs comment.
There is some discussion of strategy 4 on LW at the moment: https://ââwww.lesswrong.com/ââposts/ââJotRZdWyAGnhjRAHt/ââtail-sp-500-call-options
I think there are a lot of thorny definitional issues here that make this set of issues not boil down that nicely to a 1D spectrum. But overall extinction prevention will likely have a far broader coalition supporting it, while making the future large and amazing is far less popular since most people arenât very ambitious with respect to spreading flourishing through the universe, but I tentatively am.