my fave is @Duncan Sabien’s ‘Deadly by Default’
Will Aldred
Long-term risks from ideological fanaticism
Good post, and I strongly agree. My preferred handle for what you’re pointing at is ‘integrity’. Quoting @Habryka (2019):
I think of integrity as a more advanced form of honesty […] Where honesty is the commitment to not speak direct falsehoods, integrity is the commitment to speak truths that actually ring true to yourself, not ones that are just abstractly defensible to other people. It is also a commitment to act on the truths that you do believe, and to communicate to others what your true beliefs are.
(In this frame, What We Owe the Future, for example, was honest but low integrity.)
There was some discussion of this a few months ago: see here.
Although, maybe your main point—which I agree the existing discussion doesn’t really have answers to—is, “How, if at all, should we be getting ahead of things and proactively setting a framing for the social media conversation that will surely follow (as opposed to just forming some hypotheses over what that conversation will be like, but not particularly doing anything yet)? Who within our community should lead these efforts? How high priority is this compared to other forms of improving the EA brand?[1]”
- ^
Fwiw, I personally mostly disagree with efforts to improve EA’s brand/reputation. I endorse embodying our principles and not focusing too much on how that is perceived; IMO, this is an indirect/long-game way of ~optimising for reputation. I’m generally suspicious of actors who try to control their reputations. (Will MacAskill said something similar here.)
- ^
[sorry I’m late to this thread]
@William_MacAskill, I’m curious which (if any) of the following is your position?
1.
“I agree with Wei that an approach of ‘point AI towards these problems’ and ‘listen to the AI-results that are being produced’ has a real (>10%? >50%?) chance of ending in moral catastrophe (because ‘aligned’ AIs will end up (unintentionally) corrupting human values or otherwise leading us into incorrect conclusions).
And if we were living in a sane world, then we’d pause AI development for decades, alongside probably engaging in human intelligence enhancement, in order to solve the deep metaethical and metaphilosophical problems at play here. However, our world isn’t sane, and an AI pause isn’t in the cards: the best we can do is to differentially advance AIs’ philosophical competence,[1] and hope that that’s enough to avoid said catastrophe.”
2.
“I don’t buy the argument that aligned AIs can unintentionally corrupt human values. Furthermore, I’m decently confident that my preferred metaethical theory (e.g., idealising subjectivism) is correct. If intent alignment goes well, then I expect a fairly simple procedure like ‘give everyone a slice of the light cone, within which they can do anything they want (modulo some obvious caveats), and facilitate moral trade’ will result in a near-best future.”
3.
“Maybe aligned AIs can unintentionally corrupt human values, but I don’t particularly think this matters since it won’t be average humans making the important decisions. My proposal is that we fully hand off questions re. what to do with the light cone to AIs (rather than have these AIs boost/amplify humans). And I don’t buy that there is a metaphilosophical problem here: If we can train AIs to be at least as good as the best human philosophers at the currently in-distribution ethical+philosophical problems, then I see no reason to think that these AIs will misgeneralise out of distribution any more than humans would. (There’s nothing special about the conclusions human philosophers would reach, and so even if the AIs reach different conclusions, I don’t see that as a problem. Indeed, if anything, the humans are more likely to make random mistakes, thus I’d trust the AIs’ conclusions more.)
(And then, practically, the AIs are much faster than humans, so they will make much more progress over the crucial crunch time months. Moreover, above we were comparing AIs to the best human philosophers / to a well-organised long reflection, but the actual humans calling the shots are far below that bar. For instance, I’d say that today’s Claude has better philosophical reasoning and better starting values than the US president, or Elon Musk, or the general public. All in all, best to hand off philosophical thinking to AIs.)”
@Wei Dai, I understand that your plan A is an AI pause (+ human intelligence enhancement). And I agree with you that this is the best course of action. Nonetheless, I’m interested in what you see as plan B: If we don’t get an AI pause, is there any version of ‘hand off these problems to AIs’/ ‘let ‘er rip’ that you feel optimistic about? or which you at least think will result in lower p(catastrophe) than other versions? If you have $1B to spend on AI labour during crunch time, what do you get the AIs to work on?
(I’m particularly interested in your plan B re. solving (meta)philosophy, since I’m exploring starting a grantmaking programme in this area. Although, I’m also interested if your answer goes in another direction.)
Possible cruxes:
Human-AI safety problems are >25% likely to be real and important
Giving the average person god-like powers (via an intent-aligned ASI) to reshape the universe and themselves is >25% likely to result in the universe becoming optimised for more-or-less random values—which isn’t too dissimilar to misaligned AI takeover
If we attempt to idealise a human-led reflection and hand if off to AIs, then the outcome will be at least as good as a human-led reflection (under various plausible (meta)ethical frameworks, including ones in which getting what humans-in-particular would reflectively endorse is important)
Sufficient (but not necessary) condition: advanced AIs can just perfectly simulate ideal human deliberation
‘Default’ likelihood of getting an AI pause
Tractability of pushing for an AI pause, including/specifically through trying to legiblise currently-illegible problems
Items that I don’t think should be cruxes for the present discussion, but which might be causing us to talk past each other:
In practice, human-led reflection might be kinda rushed and very far away from an ideal long reflection
For the most important decisions happening in crunch time, either it’s ASIs making the decisions, or non-reflective and not-very-smart humans
Political leaders often make bad decisions, and this is likely to get worse when the issues become more complicated (if they’re not leveraging advanced AI)
An advanced AI could be much better than any current political leader along all the character traits we’d want in a political leader (e.g., honesty, non-self-interest, policymaking capability)
- ^
“For instance, via my currently-unpublished ‘AI for philosophical progress’ and ‘Guarding against mind viruses’ proposals.”
Nice!
One quibble: IMO, the most important argument within ‘economic dominance,’ which doesn’t appear in your list (nor really in the body of your text), is Wei Dai’s ‘AGI will drastically increase economies of scale’.
Mod here. It looks like this thread has devolved into a personal dispute with only tangential relevance to EA. I’m therefore locking the thread.
Those involved, please don’t try to resurrect the dispute elsewhere on this forum; we may issue bans if we see that happening.
Richard’s ‘Coercion is an adaptation to scarcity’ post and follow-up comment talk about this (though ofc maybe there’s more to Richard’s view than what’s discussed there). Select quotes:
What if you think, like I do, that we live at the hinge of history, and our actions could have major effects on the far future—and in particular that there’s a significant possibility of existential risk from AGI? I agree that this puts us in more of a position of scarcity and danger than we otherwise would be (although I disagree with those who have very high credence in catastrophe). But the more complex the problems we face, the more counterproductive scarcity mindset is. In particular, AGI safety requires creative paradigm-shifting research, and large-scale coordination; those are both hard to achieve from a scarcity mindset. In other words, coercion at a psychological or community level has strongly diminishing marginal returns when dealing with scarcity at a civilizational level.
…
AI is a danger on a civilizational level; but the best way to deal with danger on a civilizational level is via cultivating abundance at the level of your own community, since that’s the only way it’ll be able to make a difference at that higher level.
FYI readers, here is Habryka’s response to this post over on LessWrong, if you haven’t seen it.
Relatedly, @MichaelDickens shallow-reviewed Horizon just under a year ago—see here.[1] Tl;dr: Michael finds that Horizon’s work isn’t very relevant to x-risk reduction; Michael believes Horizon is net-negative for the world (credence: 55%).
(On the other hand, it was Eth, Perez and Greenblatt—i.e., people whose judgement I respect—who recommended donating to Horizon in that post Mikhail originally commented on. So, I overall feel unsure about what to think.)
Fyi, the Forum team has experimented with LLMs for tagging posts (and for automating some other tasks, like reviewing new users), but so far none have been accurate enough to rely on. Nonetheless, I appreciate your comment, since we weren’t really tracking the transparency/auditing upside of using LLMs.
Beyond the specifics (which Vasco goes into in his reply): These tweets are clearly not serious/principled/good-faith criticisms. If we are constantly moderating what we say to try to make sure that we don’t possibly give trolls any ammunition, then our discourse is forever at the mercy of those most hostile to the idea of doing good better. That’s not a good situation to be in. Far better, I say, to ignore the trolling.
I agree with ‘within dedicated discussions and not on every single animal welfare post,’ and I think Vasco should probably take note, here.
However, I’m not really sure what you mean by reputational risk—whose reputation is at risk?
Generally speaking, I very much want people to be saying what they honestly believe, both on this forum and elsewhere. Vasco honestly believes that soil animal welfare outweighs farmed animal welfare, and he has considered arguments for why he believes this, and so I think it’s valuable for him to say the things he says (so long as he’s not being spammy about it). If people are constantly self-censoring out of fear of reputational risks, and the like, then it’s ~impossible for us to collectively figure out what’s true, and we will thus fail to rectify moral atrocities.
And, like, the core of Vasco’s argument—that if soil animals are conscious, then, given how numerous they are, their total moral weight must be very high—is really quite straightforward. I’m skeptical that readers will go away feeling confused, or thinking that Vasco (and, by extension, animal welfare folks in general?) is crazy, such that they somehow end up caring less about animals?
Just want to quickly flag that you seem to have far more faith in superforecasters’ long-range predictions than do most people who have worked full-time in forecasting, such as myself.
@MichaelDickens’ ‘Is It So Much to Ask?’ is the best public writeup I’ve seen on this (specifically, on the problems with Metaculus’ and FRI XPT’s x-risk/extinction forecasts, which are cited in the main post above). I also very much agree with:
Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions
Here are some reasons why we might expect longer-term predictions to be more difficult:
No fast feedback loops for long-term questions. You can’t get that many predict/check/improve cycles, because questions many years into the future, tautologically, take many years to resolve. There are shortcuts, like this past-casting app, but they are imperfect.
It’s possible that short-term forecasters might acquire habits and intuitions that are good for forecasting short-term events, but bad for forecasting longer-term outcomes. For example, “things will change more slowly than you think” is a good heuristic to acquire for short-term predictions, but might be a bad heuristic for longer-term predictions, in the same sense that “people overestimate what they can do in a week, but underestimate what they can do in ten years”. This might be particularly insidious to the extent that forecasters acquire intuitions which they can see are useful, but can’t tell where they come from. In general, it seems unclear to what extent short-term forecasting skills would generalize to skill at longer-term predictions.
“Predict no change” in particular might do well, until it doesn’t. Consider a world which has a 2% probability of seeing a worldwide pandemic, or some other large catastrophe. Then on average it will take 50 years for one to occur. But at that point, those predicting a 2% will have a poorer track record compared to those who are predicting a ~0%.
In general, we have been in a period of comparative technological stagnation, and forecasters might be adapted to that, in the same way that e.g., startups adapted to low interest rates.
Sub-sampling artifacts within good short-term forecasters are tricky. For example, my forecasting group Samotsvety is relatively bullish on transformative technological change from AI, whereas the Forecasting Research Institute’s pick of forecasters for their existential risk survey was more bearish.
How much weight should we give to these aggregates?
My personal tier list for how much weight I give to AI x-risk forecasts to the extent I defer:
Individual forecasts from people who seem to generally have great judgment, and have spent a ton of time thinking about AI x-risk forecasting e.g. Cotra, Carlsmith
Samotsvety aggregates presented here
A superforecaster aggregate (I’m biased re: quality of Samotsvety vs. superforecasters, but I’m pretty confident based on personal experience)
Individual forecasts from AI domain experts who seem to generally have great judgment, but haven’t spent a ton of time thinking about AI x-risk forecasting (this is the one I’m most uncertain about, could see anywhere from 2-4)
Everything else I can think of I would give little weight to.[1][2]
Separately, I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen job adverts from AISI that talk about loss of control risk (i.e., AI takeover), and I know people working at AISI who—last I spoke to them—put ≫10% on extinction.
- ^
Why do I give little weight to Metaculus’s views on AI? Primarily because of the incentives to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts aren’t aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
- ^
Why do I give little weight to AI expert surveys such as When Will AI Exceed Human Performance? Evidence from AI Experts? I think most AI experts have incoherent and poor views on this because they don’t think of it as their job to spend time thinking and forecasting about what will happen with very powerful AI, and many don’t have great judgment.
There’s an old (2006) Bostrom paper on ~this topic, as well as Yudkowsky’s ‘Anthropic Trilemma’ (2009) and Wei Dai’s ‘Moral Status of Independent Identical Copies’ (2009). Perhaps you’re remembering one of them?
(Bostrom disagrees with the second paragraph you cite, as far as I can tell. He writes: ‘If a brain is duplicated so that there are two brains in identical states, are there then two numerically distinct phenomenal experiences or only one? There are two, I argue.’)
I don’t know much about nematodes, mites or springtails in particular, but I agree that, when thinking about animal welfare interventions, one should be accounting for effects on wild animals.
(As Vasco says, these effects plausibly reverse the sign of factory farming—especially cattle farming—from negative to positive. I’m personally quite puzzled as to why this isn’t a more prominent conversation/consideration amongst the animal welfare community. (Aside from Vasco’s recent work, has ~any progress been made in the decade since Shulman and Tomasik first talked about the problem? If not, why not? Am I missing something?))
This post did generate a lot of pushback. It has more disagree votes than agree votes, the top comment by karma argues against some of its claims and is heavily upvoted and agree-voted, and it led to multiple response posts including one that reaches the opposite conclusion and got more karma & agree votes than this one.
I agree that this somewhat rebuts what Raemon says. However, I think a large part of Raemon’s point—which your pushback doesn’t address—is that Bentham’s post still received a highly positive karma score (85 when Raemon came upon it).
My sense is that karma shapes the Forum incentive landscape pretty strongly—i.e., authors are incentivized to write the kind of post that they expect will get upvoted. (I remember Lizka[1] mentioning, somewhere, that she/the Forum team found (via user interviews?) that authors tend to care quite a lot about karma.) So, considering how Bentham’s posts are getting upvoted, I kind of expect them to continue writing similar posts with similar reasoning. (Further, I kind of expect others to see Bentham’s writing+reasoning style as a style that ‘works,’ and to copy it.)
The question then becomes: Is this a good outcome? Do we want Forum discourse to look more like this type of post? Is the ‘wisdom of the EA Forum voting crowd’ where we want it to be? (Or, conversely, might there be an undesirable dynamic going on, such as tyranny of the marginal voter?) I have my own takes, here. I invite readers to likewise reflect on these questions, and to perhaps adjust your voting behaviour accordingly.
- ^
our former Forum Khaleesi
- ^
[resolved]
Meta: I see that this poll has closed after one day. I think it would make sense for polls like this to stay open for seven days, by default, rather than just one?[1] I imagine this poll would have received another ~hundred votes, and generated further good discussion, had it stayed open for longer (especially since it was highlighted in the Forum Digest just two hours prior).
- ^
I’m unsure if OP meant for this poll to close so soon. Last month, when I ran some polls, I found that a bunch of them ended up closing after the default one day even after I thought I’d set them to stay open for longer.
- ^
Yeah, thanks for pointing this out. With the benefit of hindsight, I’m seeing that there are really three questions I want answers to:
1. Have you been voting in line with the guidelines (whether or not you’ve literally read them)?
2a. Have you literally read the guidelines? (In other words, have we succeeded in making you aware of the guidelines’ existence?)
2b. If you have read the guidelines, to what extent can you accurately recall them? (In other words, conditional on you knowing the guidelines exist, to what extent have we succeeded at drilling them into you?)
Where Isaac’s interpretation is towards 1, and your interpretation is towards 2.
The poll I’ve ended up running is essentially the above three questions rolled into one, with ~unknown amounts of each contributing to the results. This isn’t ideal (my bad!), but I think the results will still be useful, and there are already lots of votes (thank you everyone for voting!), so it’s too late to turn back now. I advise people to continue voting under whichever interpretation makes sense to you; the mods will have fun untangling your results.
Nod. Plus, another advantage of your ‘consensus ASI’ approach—which is essentially a values handshake—over the deal types outlined by OP is that combined-US-China presents a unified front if and when third-party alien civilizations are encountered.
(A ‘unified front’ is an advantage if military power, and thus bargaining power, scales superlinearly in the deep future. Which seems >50% likely to me.)