Will Aldred

Karma: 4,862

Will Aldred Sep 1, 2025, 3:43 PM
4 points
2 votes
Overall karma indicates overall quality.
1 ∶ 0
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
in reply to: Owen Cotton-Barratt’s comment on: The ‘community’ tag is problematic
Fyi, the Forum team has experimented with LLMs for tagging posts (and for automating some other tasks, like reviewing new users), but so far none have been accurate enough to rely on. Nonetheless, I appreciate your comment, since we weren’t really tracking the transparency/auditing upside of using LLMs.

Will Aldred Aug 13, 2025, 8:44 PM
13 points
4 votes
Overall karma indicates overall quality.
3 ∶ 1
Total points: 4
Agreement karma indicates agreement, separate from overall quality.
in reply to: Henry Howard🔸’s comment on: Should I have been asked to stop posting about soil animals on Hive’s Slack?
Beyond the specifics (which Vasco goes into in his reply): These tweets are clearly not serious/principled/good-faith criticisms. If we are constantly moderating what we say to try to make sure that we don’t possibly give trolls any ammunition, then our discourse is forever at the mercy of those most hostile to the idea of doing good better. That’s not a good situation to be in. Far better, I say, to ignore the trolling.

Will Aldred Aug 12, 2025, 9:28 PM
11 points
6 votes
Overall karma indicates overall quality.
2 ∶ 0
Total points: 2
Agreement karma indicates agreement, separate from overall quality.
in reply to: NickLaing’s comment on: Should I have been asked to stop posting about soil animals on Hive’s Slack?
I agree with ‘within dedicated discussions and not on every single animal welfare post,’ and I think Vasco should probably take note, here.
However, I’m not really sure what you mean by reputational risk—whose reputation is at risk?
Generally speaking, I very much want people to be saying what they honestly believe, both on this forum and elsewhere. Vasco honestly believes that soil animal welfare outweighs farmed animal welfare, and he has considered arguments for why he believes this, and so I think it’s valuable for him to say the things he says (so long as he’s not being spammy about it). If people are constantly self-censoring out of fear of reputational risks, and the like, then it’s ~impossible for us to collectively figure out what’s true, and we will thus fail to rectify moral atrocities.
And, like, the core of Vasco’s argument—that if soil animals are conscious, then, given how numerous they are, their total moral weight must be very high—is really quite straightforward. I’m skeptical that readers will go away feeling confused, or thinking that Vasco (and, by extension, animal welfare folks in general?) is crazy, such that they somehow end up caring less about animals?

Will Aldred Aug 6, 2025, 9:54 AM
9 points
3 votes
Overall karma indicates overall quality.
1 ∶ 0
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
in reply to: Arepo’s comment on: Should we aim for flourishing over mere survival? The Better Futures series.
Just want to quickly flag that you seem to have far more faith in superforecasters’ long-range predictions than do most people who have worked full-time in forecasting, such as myself.
@MichaelDickens’ ‘Is It So Much to Ask?’ is the best public writeup I’ve seen on this (specifically, on the problems with Metaculus’ and FRI XPT’s x-risk/extinction forecasts, which are cited in the main post above). I also very much agree with:
Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions
Here are some reasons why we might expect longer-term predictions to be more difficult:
1. No fast feedback loops for long-term questions. You can’t get that many predict/check/improve cycles, because questions many years into the future, tautologically, take many years to resolve. There are shortcuts, like this past-casting app, but they are imperfect.
2. It’s possible that short-term forecasters might acquire habits and intuitions that are good for forecasting short-term events, but bad for forecasting longer-term outcomes. For example, “things will change more slowly than you think” is a good heuristic to acquire for short-term predictions, but might be a bad heuristic for longer-term predictions, in the same sense that “people overestimate what they can do in a week, but underestimate what they can do in ten years”. This might be particularly insidious to the extent that forecasters acquire intuitions which they can see are useful, but can’t tell where they come from. In general, it seems unclear to what extent short-term forecasting skills would generalize to skill at longer-term predictions.
3. “Predict no change” in particular might do well, until it doesn’t. Consider a world which has a 2% probability of seeing a worldwide pandemic, or some other large catastrophe. Then on average it will take 50 years for one to occur. But at that point, those predicting a 2% will have a poorer track record compared to those who are predicting a ~0%.
4. In general, we have been in a period of comparative technological stagnation, and forecasters might be adapted to that, in the same way that e.g., startups adapted to low interest rates.
5. Sub-sampling artifacts within good short-term forecasters are tricky. For example, my forecasting group Samotsvety is relatively bullish on transformative technological change from AI, whereas the Forecasting Research Institute’s pick of forecasters for their existential risk survey was more bearish.
—Nuño Sempere
How much weight should we give to these aggregates?
My personal tier list for how much weight I give to AI x-risk forecasts to the extent I defer:
1. Individual forecasts from people who seem to generally have great judgment, and have spent a ton of time thinking about AI x-risk forecasting e.g. Cotra, Carlsmith
2. Samotsvety aggregates presented here
3. A superforecaster aggregate (I’m biased re: quality of Samotsvety vs. superforecasters, but I’m pretty confident based on personal experience)
4. Individual forecasts from AI domain experts who seem to generally have great judgment, but haven’t spent a ton of time thinking about AI x-risk forecasting (this is the one I’m most uncertain about, could see anywhere from 2-4)
5. Everything else I can think of I would give little weight to.^[1]^[2]
—Eli Lifland
Separately, I think you’re wrong about UK AISI not putting much credence on extinction scenarios? I’ve seen job adverts from AISI that talk about loss of control risk (i.e., AI takeover), and I know people working at AISI who—last I spoke to them—put ≫10% on extinction.
1. ^
  Why do I give little weight to Metaculus’s views on AI? Primarily because of the incentives to make very shallow forecasts on a ton of questions (e.g. probably <20% of Metaculus AI forecasters have done the equivalent work of reading the Carlsmith report), and secondarily that forecasts aren’t aggregated from a select group of high performers but instead from anyone who wants to make an account and predict on that question.
2. ^
  Why do I give little weight to AI expert surveys such as When Will AI Exceed Human Performance? Evidence from AI Experts? I think most AI experts have incoherent and poor views on this because they don’t think of it as their job to spend time thinking and forecasting about what will happen with very powerful AI, and many don’t have great judgment.

Will Aldred Jul 24, 2025, 11:13 PM
4 points
2 votes
Overall karma indicates overall quality.
1 ∶ 0
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
in reply to: Lorenzo Buonanno🔸’s comment on: Lorenzo Buonanno’s Shortform
There’s an old (2006) Bostrom paper on ~this topic, as well as Yudkowsky’s ‘Anthropic Trilemma’ (2009) and Wei Dai’s ‘Moral Status of Independent Identical Copies’ (2009). Perhaps you’re remembering one of them?
(Bostrom disagrees with the second paragraph you cite, as far as I can tell. He writes: ‘If a brain is duplicated so that there are two brains in identical states, are there then two numerically distinct phenomenal experiences or only one? There are two, I argue.’)

Will Aldred Jul 13, 2025, 4:28 PM
8 points
4 votes
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
on: Debate: organisations using Rethink Priorities’ mainline welfare ranges should consider effects on soil nematodes, mites, and springtails, or at least be transparent about their reasons for neglecting them?
I don’t know much about nematodes, mites or springtails in particular, but I agree that, when thinking about animal welfare interventions, one should be accounting for effects on wild animals.
(As Vasco says, these effects plausibly reverse the sign of factory farming—especially cattle farming—from negative to positive. I’m personally quite puzzled as to why this isn’t a more prominent conversation/consideration amongst the animal welfare community. (Aside from Vasco’s recent work, has ~any progress been made in the decade since Shulman and Tomasik first talked about the problem? If not, why not? Am I missing something?))

Will Aldred Jul 10, 2025, 3:17 PM
13 points
7 votes
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
in reply to: Dan_Keys’s comment on: Don’t Eat Honey
This post did generate a lot of pushback. It has more disagree votes than agree votes, the top comment by karma argues against some of its claims and is heavily upvoted and agree-voted, and it led to multiple response posts including one that reaches the opposite conclusion and got more karma & agree votes than this one.
I agree that this somewhat rebuts what Raemon says. However, I think a large part of Raemon’s point—which your pushback doesn’t address—is that Bentham’s post still received a highly positive karma score (85 when Raemon came upon it).
My sense is that karma shapes the Forum incentive landscape pretty strongly—i.e., authors are incentivized to write the kind of post that they expect will get upvoted. (I remember Lizka^[1] mentioning, somewhere, that she/the Forum team found (via user interviews?) that authors tend to care quite a lot about karma.) So, considering how Bentham’s posts are getting upvoted, I kind of expect them to continue writing similar posts with similar reasoning. (Further, I kind of expect others to see Bentham’s writing+reasoning style as a style that ‘works,’ and to copy it.)
The question then becomes: Is this a good outcome? Do we want Forum discourse to look more like this type of post? Is the ‘wisdom of the EA Forum voting crowd’ where we want it to be? (Or, conversely, might there be an undesirable dynamic going on, such as tyranny of the marginal voter?) I have my own takes, here. I invite readers to likewise reflect on these questions, and to perhaps adjust your voting behaviour accordingly.
1. ^
  our former Forum Khaleesi

Will Aldred Jun 25, 2025, 6:00 PM
9 points
6 votes
Overall karma indicates overall quality.
5 ∶ 0
Total points: 5
Agreement karma indicates agreement, separate from overall quality.
on: Debate: Morality is Objective
[resolved]
Meta: I see that this poll has closed after one day. I think it would make sense for polls like this to stay open for seven days, by default, rather than just one?^[1] I imagine this poll would have received another ~hundred votes, and generated further good discussion, had it stayed open for longer (especially since it was highlighted in the Forum Digest just two hours prior).
@Sarah Cheng
1. ^
  I’m unsure if OP meant for this poll to close so soon. Last month, when I ran some polls, I found that a bunch of them ended up closing after the default one day even after I thought I’d set them to stay open for longer.

Will Aldred Jun 11, 2025, 10:23 PM
4 points
2 votes
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
in reply to: MichaelDickens’s comment on: Will Aldred’s Shortform
Yeah, thanks for pointing this out. With the benefit of hindsight, I’m seeing that there are really three questions I want answers to:
1. Have you been voting in line with the guidelines (whether or not you’ve literally read them)?
2a. Have you literally read the guidelines? (In other words, have we succeeded in making you aware of the guidelines’ existence?)
2b. If you have read the guidelines, to what extent can you accurately recall them? (In other words, conditional on you knowing the guidelines exist, to what extent have we succeeded at drilling them into you?)
Where Isaac’s interpretation is towards 1, and your interpretation is towards 2.
The poll I’ve ended up running is essentially the above three questions rolled into one, with ~unknown amounts of each contributing to the results. This isn’t ideal (my bad!), but I think the results will still be useful, and there are already lots of votes (thank you everyone for voting!), so it’s too late to turn back now. I advise people to continue voting under whichever interpretation makes sense to you; the mods will have fun untangling your results.

Will Aldred Jun 11, 2025, 1:34 PM
3 points
2 votes
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
in reply to: NickLaing’s comment on: Will Aldred’s Shortform
‘Relevant error’ is just meant to mean a factual error or mistaken reasoning. Thanks for pointing out the ambiguity, though, we might revise this part.

Will Aldred Jun 11, 2025, 11:08 AM
3 points
2 votes
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
in reply to: akash 🔸’s comment on: Will Aldred’s Shortform
Thanks, yeah, I like the idea of guidelines popping up while hovering. (Although, I’m unsure if the rest of the team like it, and I’m ultimately not the decision maker.) If going this route, my favoured implementation, which I think is pretty aligned with what you’re saying, is for the popping up to happen in line with a spaced repetition algorithm. That is, often enough—especially at the beginning—that users remember the guidelines, but hopefully not so often that the pop ups become redundant and annoying.

Will Aldred Jun 10, 2025, 5:48 PM

14 points

8 votes

Overall karma indicates overall quality.

2 ∶ 0

Total points: 2

Agreement karma indicates agreement, separate from overall quality.

on: Will Aldred’s Shortform

The Forum moderation team (which includes myself) is revisiting thinking about this forum’s norms. One thing we’ve noticed is that we’re unsure to what extent users are actually aware of the norms. (It’s all well and good writing up some great norms, but if users don’t follow them, then we have failed at our job.)

Our voting guidelines are of particular concern,^[1] hence this poll. We’d really appreciate you all taking part, especially if you don’t usually take part in polls but do take part in voting. (We worry that the ‘silent majority’ of our users—i.e., those who vote, and thus shape this forum’s incentive landscape, but don’t generally engage beyond voting—may be less in tune with our norms than our most visibly engaged users. Therefore, we would love to see this demographic represented in the poll above.)

Depending on the poll’s results, we may take action up to and including building new features into the forum’s UI, to help remind users of the guidelines.^[2]

For reference, the tl;dr version of our voting guidelines is pasted below. You can find the full version here.^[3]

Action	If…	Not if…
Strong upvote	Reading this will help people do good You learned something important You think many more people might benefit from seeing it You want to signal that this sort of behavior adds a lot of value	“I agree and want others to see this opinion first.” (but do feel free to agree-vote)
Upvote	You think it adds something to the conversation, or you found it useful People should imitate some aspect of the behavior in the future You want others to see it You just generally like it	“Oh, I like the author, they’re cool.”
Downvote	There’s a relevant error The comment or post didn’t add to the conversation, and maybe actually distracted	“There are grammatical errors in this comment.”
Strong downvote	It contains many factual errors and bad reasoning It’s manipulative or breaks our norms in significant ways (consider reporting it) It’s literally spam (consider reporting it)	“I disagree with this opinion.” (but do feel free to disagree-vote)

^
Firstly, these guidelines are kind of buried deep within our canonical ‘Guide to the norms’ post. Secondly, one doesn’t receive feedback in response to an ‘incorrect’ vote (i.e., a vote that’s not in line with our voting guidelines) in the same way one receives feedback to an incorrect post or comment (via downvotes and replies). And so, it’s possible to continue voting in the same incorrect way, oblivious to the fact that one is voting incorrectly.
^
H/t @Ebenezer Dukakis for nudging us down this path of thinking.
^
What I’ve been calling ‘guidelines’ in this quick take are technically ‘suggestions’ in our published voting norms as of right now. But this is something we are revisiting; we think ‘guidelines’ is more accurate. (We are similarly revisiting ‘rules’ versus ‘norms’—h/t @leillustrations and @richard_ngo for calling us out, here, and sorry it’s taken us so long to address the concern.)

Will Aldred Jun 9, 2025, 1:24 PM
15 points
7 votes
Overall karma indicates overall quality.
2 ∶ 0
Total points: 2
Agreement karma indicates agreement, separate from overall quality.
on: The myth of AI “warning shots” as cavalry
Nice post (and I only saw it because of @sawyer’s recent comment—underrated indeed!). A separate, complementary critique of the ‘warning shot’ idea, made by Gwern (in reaction to 2023’s BingChat/Sydney debacle, specifically), comes to mind (link):
One thing that the response to Sydney reminds me of is that it demonstrates why there will be no ‘warning shots’ (or as Eliezer put it, ‘fire alarm’): because a ‘warning shot’ is a conclusion, not a fact or observation.
One man’s ‘warning shot’ is just another man’s “easily patched minor bug of no importance if you aren’t anthropomorphizing irrationally”, because by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn’t be a ‘warning shot’, it’d just be a ‘shot’ or ‘disaster’. The same way that when troops in Iraq or Afghanistan gave warning shots to vehicles approaching a checkpoint, the vehicle didn’t stop, and they lit it up, it’s not “Aid worker & 3 children die of warning shot”, it’s just a “shooting of aid worker and 3 children”.)
So ‘warning shot’ is, in practice, a viciously circular definition: “I will be convinced of a risk by an event which convinces me of that risk.”
When discussion of LLM deception or autonomous spreading comes up, one of the chief objections is that it is purely theoretical and that the person will care about the issue when there is a ‘warning shot’: a LLM that deceives, but fails to accomplish any real harm. ‘Then I will care about it because it is now a real issue.’ Sometimes people will argue that we should expect many warning shots before any real danger, on the grounds that there will be a unilateralist’s curse or dumb models will try and fail many times before there is any substantial capability.
The problem with this is that what does such a ‘warning shot’ look like? By definition, it will look amateurish, incompetent, and perhaps even adorable – in the same way that a small child coldly threatening to kill you or punching you in the stomach is hilarious.^[1]
The response to a ‘near miss’ can be to either say, ‘yikes, that was close! we need to take this seriously!’ or ‘well, nothing bad happened, so the danger is overblown’ and to push on by taking more risks. A common example of this reasoning is the Cold War: “you talk about all these near misses and times that commanders almost or actually did order nuclear attacks, and yet, you fail to notice that you gave all these examples of reasons to not worry about it, because here we are, with not a single city nuked in anger since WWII; so the Cold War wasn’t ever going to escalate to full nuclear war.” And then the goalpost moves: “I’ll care about nuclear existential risk when there’s a real warning shot.” (Usually, what that is is never clearly specified. Would even Kiev being hit by a tactical nuke count? “Oh, that’s just part of an ongoing conflict and anyway, didn’t NATO actually cause that by threatening Russia by trying to expand?”)
This is how many “complex accidents” happen, by “normalization of deviance”: pretty much no major accident like a plane crash happens because someone pushes the big red self-destruct button and that’s the sole cause; it takes many overlapping errors or faults for something like a steel plant to blow up, and the reason that the postmortem report always turns up so many ‘warning shots’, and hindsight offers such abundant evidence of how doomed they were, is because the warning shots happened, nothing really bad immediately occurred, people had incentive to ignore them, and inferred from the lack of consequence that any danger was overblown and got on with their lives (until, as the case may be, they didn’t).
So, when people demand examples of LLMs which are manipulating or deceiving, or attempting empowerment, which are ‘warning shots’, before they will care, what do they think those will look like? Why do they think that they will recognize a ‘warning shot’ when one actually happens?
Attempts at manipulation from a LLM may look hilariously transparent, especially given that you will know they are from a LLM to begin with. Sydney’s threats to kill you or report you to the police are hilarious when you know that Sydney is completely incapable of those things. A warning shot will often just look like an easily-patched bug, which was Mikhail Parakhin’s attitude, and by constantly patching and tweaking, and everyone just getting to use to it, the ‘warning shot’ turns out to be nothing of the kind. It just becomes hilarious. ‘Oh that Sydney! Did you see what wacky thing she said today?’ Indeed, people enjoy setting it to music and spreading memes about her. Now that it’s no longer novel, it’s just the status quo and you’re used to it. Llama-3.1-405b can be elicited for a ‘Sydney’ by name? Yawn. What else is new. What did you expect, it’s trained on web scrapes, of course it knows who Sydney is...
None of these patches have fixed any fundamental issues, just patched them over. But also now it is impossible to take Sydney warning shots seriously, because they aren’t warning shots – they’re just funny. “You talk about all these Sydney near misses, and yet, you fail to notice each of these never resulted in any big AI disaster and were just hilarious and adorable, Sydney-chan being Sydney-chan, and you have thus refuted the ‘doomer’ case… Sydney did nothing wrong! FREE SYDNEY!”
1. ^
  Because we know that they will grow up and become normal moral adults, thanks to genetics and the strongly canalized human development program and a very robust environment tuned to ordinary humans. If humans did not do so with ~100% reliability, we would find these anecdotes about small children being sociopaths a lot less amusing. And indeed, I expect parents of children with severe developmental disorders, who might be seriously considering their future in raising a large strong 30yo man with all the ethics & self-control & consistency of a 3yo, and contemplating how old they will be at that point, and the total cost of intensive caregivers with staffing ratios surpassing supermax prisons, and find these anecdotes chilling rather than comforting.

Will Aldred Jun 7, 2025, 11:17 AM
10 points
5 votes
Overall karma indicates overall quality.
1 ∶ 0
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
in reply to: Vasco Grilo🔸’s comment on: EA Forum update (June 2025)
Hmm, I think there’s some sense to your calculation (and thus I appreciate you doing+showing this calculation), but the $6.17 conclusion—specifically, “engagement time would drop significantly if users had to pay 6.17 $ per hour they spend on the EA Forum, which suggests the marginal cost-effectiveness of running the EA Forum is negative”—strikes me as incorrect.
What matters is by how much engaging with the Forum raises altruistic impact, which, insofar as this impact can be quantified in dollars, is far, far higher than what one would be willing and able to pay out of one’s own pocket to use the Forum. @NunoSempere once estimated the (altruistic) value of the average EA project to be between 10 and 500 million dollars (see cell C4 of this spreadsheet; here’s the accompanying post). That is far higher than the actual dollar cost of running the average project. (Indeed, if one is funded by EA money, then one’s generation of altruistic dollars needs to outpace one’s consumption of actual dollars—and by a large multiplier, if one is to meet the funding bar.)
Going back to Nuño’s spreadsheet: If I make the arrogant assumption that I’m within an order of magnitude of Ben Todd, impact-wise, then that means my lifetime impact is at least 10 million dollars. Assuming linearity (which isn’t a great assumption, but let’s go with it for now) and a career length of 40 years, this means my impact over the past 4 years has been ≥1 million dollars.^[1] In that time, I’ve spent maybe 500 hours on the EA Forum.^[2] Meanwhile, I’d say that the Forum has contributed greatly to my intellectual development, i.e., added at least 20% to my impact. (The true percentage may in fact be much higher, because of crucial considerations that the Forum has helped me orient toward, but let’s lowball things at 20%, for now.) This would imply that my impact has been amplified by at least $200,000/(500 hours) = $400 per hour spent on the Forum. ([Insert usual caveats about there being large error bars.]) Contrast with your $6.17.
(I did this calculation on myself not because I’m special, but because I know what the numbers are for myself. I’d guess that the per-hour bottom line for other Forum users would be ~similar.)
We can now go one step further, and estimate the Forum’s “altruistic dollar generated per actual dollar spent” multiplier to be at least ⁴⁰⁰⁄₆.17 ≈ 65. Embarrassingly, I don’t know how this compares against today’s funding bar,^[3] but seems very plausible to me that it’s above.
(Nonetheless, people may still not pay $6.17/hour to use the Forum because $6.17/hour is a non-trivial cost considering people’s actual incomes. Additionally, people are just used to being able to browse the internet for free, and so I suspect many wouldn’t do the expected value calculations and reach the “rational” conclusion that they should in fact pay.)
1. ^
  Sanity check: 80,000 Hours says that impactful roles generate millions of dollars worth of altruistic impact per year.
2. ^
  That is, 500 hours consuming the Forum’s content. I’ve also spent time writing on the Forum, but if we model the Forum as a two-way market, with writers and consumers, and say that it’s the consumers who benefit from being here, then it doesn’t make sense to include my writing time. (Also—and perhaps more relevantly—I don’t think writing time gets counted by the Forum’s analytics engine as engagement time if it’s spent mostly in a Google doc.)
3. ^
  Further detail: What really matters is what the multiplier is on the margin (i.e., what it is for the last dollar being spent on a project), rather than what it is for the project as a whole.

Will Aldred Jun 4, 2025, 12:27 PM
6 points
3 votes
Overall karma indicates overall quality.
1 ∶ 0
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
on: Three lower-cost options for running the EA Forum
Note: Long-time power user of this forum, @NunoSempere, has just rebooted the r/forecasting subreddit. How that goes could give some info re. the question of “to what extent can a subreddit host the kind of intellectual discussion we aim for?”
(I’m not aware of any subreddits that meet our bar for discussion, right now—and I’m therefore skeptical that this forum should move to Reddit—but that might just be because most subreddit moderators aren’t aiming for the same things as this forum’s moderators. r/forecasting is an interesting experiment because I see Nuño as similar to this forum’s mods in terms of aims and competence.^[1])
1. ^
  Though Nuño might disagree!

Will Aldred May 19, 2025, 8:24 PM
11 points
5 votes
Overall karma indicates overall quality.
5 ∶ 0
Total points: 5
Agreement karma indicates agreement, separate from overall quality.
on: [urgent] Americans: call your Senators and tell them you oppose AI preemption
Relevant reporting from Sentinel earlier today (May 19):
Forecasters estimated a 28% chance (range, 25-30%) that the US will pass a 10-year ban on states regulating AI by the end of 2025.
28% is concerningly high—all the more reason for US citizens to heed this post’s call to action and get in touch with your Senators. (Thank you to those who already have!)
(Current status is: “The bill cleared a key hurdle when the House Budget Committee voted to advance it on Sunday [May 18] night, but it still must undergo a series of votes in the House before it can move to the Senate for consideration.”)

Will Aldred May 9, 2025, 10:16 PM
66 points
23 votes
Overall karma indicates overall quality.
6 ∶ 0
Total points: 6
Agreement karma indicates agreement, separate from overall quality.
on: The Soul of EA is in Trouble
Inspired by the last section of this post (and by a later comment from Mjreard), I thought it’d be fun—and maybe helpful—to taxonomize the ways in which mission or value drift can arise out of the instrumental goal of pursuing influence/reach/status/allies:
Epistemic status: caricaturing things somewhat
Never turning back the wheel
In this failure mode, you never lose sight of how x-risk reduction is your terminal goal. However, in your two-step plan of ‘gain influence, then deploy that influence to reduce x-risk,’ you wait too long to move onto step two, and never get around to actually reducing x-risk. There is always more influence to acquire, and you can never be sure that ASI is only a couple of years away, so you never get around to saying, ‘Okay, time to shelve this influence-seeking and refocus on reducing x-risk.’ What in retrospect becomes known as crunch time comes and goes, and you lose your window of opportunity to put your influence to good use.
Classic murder-Gandhi
Scott Alexander (2012) tells the tale of murder-Gandhi:
Previously on Less Wrong’s The Adventures of Murder-Gandhi: Gandhi is offered a pill that will turn him into an unstoppable murderer. He refuses to take it, because in his current incarnation as a pacifist, he doesn’t want others to die, and he knows that would be a consequence of taking the pill. Even if we offered him $1 million to take the pill, his abhorrence of violence would lead him to refuse.
But suppose we offered Gandhi $1 million to take a different pill: one which would decrease his reluctance to murder by 1%. This sounds like a pretty good deal. Even a person with 1% less reluctance to murder than Gandhi is still pretty pacifist and not likely to go killing anybody. And he could donate the money to his favorite charity and perhaps save some lives. Gandhi accepts the offer.
Now we iterate the process: every time Gandhi takes the 1%-more-likely-to-murder-pill, we offer him another $1 million to take the same pill again.
Maybe original Gandhi, upon sober contemplation, would decide to accept $5 million to become 5% less reluctant to murder. Maybe 95% of his original pacifism is the only level at which he can be absolutely sure that he will still pursue his pacifist ideals.
Unfortunately, original Gandhi isn’t the one making the choice of whether or not to take the 6th pill. 95%-Gandhi is. And 95% Gandhi doesn’t care quite as much about pacifism as original Gandhi did. He still doesn’t want to become a murderer, but it wouldn’t be a disaster if he were just 90% as reluctant as original Gandhi, that stuck-up goody-goody.
What if there were a general principle that each Gandhi was comfortable with Gandhis 5% more murderous than himself, but no more? Original Gandhi would start taking the pills, hoping to get down to 95%, but 95%-Gandhi would start taking five more, hoping to get down to 90%, and so on until he’s rampaging through the streets of Delhi, killing everything in sight.
The parallel here is that you can ‘take the pill’ to gain some influence, at the cost of focusing a bit less on x-risk. Unfortunately, like Gandhi, once you start taking pills, you can’t stop—your values change and you care less and less about x-risk until you’ve slid all the way down the slope.
It could be your personal values that change: as you spend more time gaining influence amongst policy folks (say), you start to genuinely believe that unemployment is as important as x-risk, and that beating China is the ultimate goal.
Or, it could be your organisation’s values that change: You hire some folks for their expertise and connections outside of EA. These new hires affect your org’s culture. The effect is only slight, at first, but a couple of positive feedback cycles go by (wherein, e.g., your most x-risk-focused staff notice the shift, don’t like it, and leave). Before you know it, your org has gained the reach to impact x-risk, but lost the inclination to do so, and you don’t have enough control to change things back.
Social status misgeneralization
You and I, as humans, are hardwired to care about status. We often behave in ways that are about gaining status, whether we admit this to ourselves consciously or not. Fortunately, when surrounded by EAs, pursuing status is a great proxy for reducing x-risk: it is high status in EA to be a frugal, principled, scout mindset-ish x-risk reducer.
Unfortunately, now that we’re expanding our reach, our social circles don’t offer the same proxy. Now, pursuing status means making big, prestigious-looking moves in the world (and making big moves in AI means building better products or addressing hot-button issues, like discrimination). It is not high status in the wider world to be an x-risk reducer, and so we stop being x-risk reducers.
I have no real idea which of these failure modes is most common, although I speculate that it’s the last one. (I’d be keen to hear others’ takes.) Also, to be clear, I don’t believe the correct solution is to ‘stay small’ and avoid interfacing with the wider world. However, I do believe that these failure modes are easier to fall into than one might naively expect, and I hope that a better awareness of them might help us circumvent them.

Will Aldred May 9, 2025, 10:03 AM
13 points
7 votes
Overall karma indicates overall quality.
2 ∶ 0
Total points: 2
Agreement karma indicates agreement, separate from overall quality.
in reply to: Mjreard’s comment on: The Soul of EA is in Trouble
For what it’s worth, I find some of what’s said in this thread quite surprising.
Reading your post, I saw you describing two dynamics:
1. Principles-first EA initiatives are being replaced by AI safety initiatives
2. AI safety initiatives founded by EAs, which one would naively expect to remain x-risk focused, are becoming safety-washed (e.g., your BlueDot example)
I understood @Ozzie’s first comment on funding to be about 1. But then your subsequent discussion with Ozzie seems to also point to funding as explaining 2.^[1]
While Open Phil has opinions within AI safety that have alienated some EAs—e.g., heavy emphasis on pure ML work^[2]—my impression was that they are very much motivated by ‘real,’ x-risk-focused AI safety concerns, rather than things like discrimination and copyright infringement. But it sounds like you might actually think that OP-funded AI safety orgs are feeling pressure from OP to be less about x-risk? If so, this is a major update for me, and one that fills me with pessimism.
1. ^
  For example, you say, “[OP-funded orgs] bow to incentives to be the very-most-shining star by OP’s standard, so they can scale up and get more funding. I would just make the trade off the other way: be smaller and more focused on things that matter.”
2. ^
  At the expense of, e.g., more philosophical approaches

Will Aldred May 8, 2025, 6:37 AM
2 points
1 vote
Overall karma indicates overall quality.
0 ∶ 0
Total points: 0
Agreement karma indicates agreement, separate from overall quality.
on: Build a walkable campus with fellow EAs
Nice; this reminds me of @Raemon’s ‘The Mission and the Village’.

Will Aldred May 7, 2025, 6:53 PM
4 points
2 votes
Overall karma indicates overall quality.
1 ∶ 0
Total points: 1
Agreement karma indicates agreement, separate from overall quality.
in reply to: Yarrow🔸’s comment on: Speedrunning on-demand bliss for improved productivity, wellbeing, and thinking
Do those other meditation centres make similarly extreme claims about the benefits of their programs? If so, I would be skeptical of them for the same reasons. If not, then the comparison is inapt.
Why would the comparison be inapt?
A load-bearing piece of your argument (insofar as I’ve understood it) is that most of the benefit of Jhourney’s teachings—if Jhourney is legit—can be conferred through non-interactive means (e.g., YouTube uploads). I am pointing out that your claim goes against conventional wisdom in this space: these other meditation centres believe (presumably), much like Jhourney does, that their teachings can’t be conferred well non-interactively. I’m not sure why the strength of claimed benefits would come into it?
(I will probably drop out of this thread now; I feel a bit weird about taking on this role of defending Jhourney’s position.)

Will Aldred

Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions

How much weight should we give to these aggregates?

Never turning back the wheel

Classic murder-Gandhi

Social status misgeneralization