Will Aldred

Karma: 4,828

Will Aldred Apr 29, 2025, 6:40 AM
9 points
2 ∶ 0
on: Will Aldred’s Shortform
Meta: I’m seeing lots of blank comments in response to the DIY polls. Perhaps people are thinking that they need to click ‘Comment’ in order for their vote to count? If so, PSA: your vote counted as soon as you dropped your slider. You can simply close the pop-up box that follows if you don’t also mean to leave a comment.
Happy voting!

Will Aldred Apr 28, 2025, 1:10 PM
1 point
0 ∶ 2
on: Three polls: on timelines and cause prio
Consequentialists should be strong longtermists
For me, the strongest arguments against strong longtermism are simulation theory and the youngness paradox (as well as yet-to-be-discovered crucial considerations).^[1]
~~(Also, nitpickily, I’d personally reword this poll from ‘Consequentialists~~ ~~should~~ ~~be strong longtermists’ to ‘I am a strong longtermist,’ because I’m~~ ~~not convinced~~ ~~that anyone ‘should’ be anything, normatively speaking.)~~
1. ^
  I also worry about cluelessness, though cluelessness seems just as threatening to neartermist interventions as it does to longtermist ones.

Will Aldred Apr 28, 2025, 12:14 PM
4 points
2 ∶ 0
in reply to: Will Howard🔹’s comment on: DIY debate week (April 28 - May 2nd)
[Good chance you considered my idea already and rejected it (for good reason), but stating it in case not:]
For these debate week polls, consider dividing each side up into 10 segments, rather than 9? That way, when someone votes, they’re agreeing/disagreeing by a nice, round 10 or 20 or 30%, etc., rather than by the kinda random amounts (at present) of 11, 22, 33%?

Will Aldred Apr 10, 2025, 2:57 PM
8 points
2 ∶ 0
in reply to: OscarD🔸’s comment on: Selling out to AI companies is bad. Period. You will be corrupted.
I think Holly’s claim is that these people aren’t really helping from an ‘influencing the company to be more safety conscious’ perspective, or a ‘solving the hard parts of the alignment problem’ perspective. They could still be helping the company build commercially lucrative AI.

Will Aldred Mar 4, 2025, 2:49 PM
49 points
1 ∶ 0
in reply to: OscarD🔸’s comment on: Habryka’s Quick takes
Hmm, I’m not a fan of this Claude summary (though I appreciate your trying). Below, I’ve made a (play)list of Habryka’s greatest hits,^[1] ordered by theme,^[2]^[3] which might be another way for readers to get up to speed on his main points:
Leadership
Reputation^[5]
Funding
Impact
1. ^
  ‘Greatest hits’ may be a bit misleading. I’m only including comments from the post-FTX era, and then, only comments that fit within the core themes. (This is one example of a great comment I haven’t included.)
2. ^
  although the themes overlap a fair amount
3. ^
  My ordering is quite different to the karma ordering given on the GreaterWrong page Habryka links to. I think mine does a better job of concisely covering Habryka’s beliefs on the key topics. But I’d be happy to take my list down if @Habryka disagrees (just DM me).
4. ^
  For context, Zachary is CEA’s CEO.
5. ^

Will Aldred Feb 18, 2025, 3:05 AM
2 points
0 ∶ 0
in reply to: Uni Groups Team’s comment on: What posts would you like someone to write?
I’m not sure if this hits what you mean by ‘being ineffective to be effective’, but you may be interested in Paul Graham’s ‘Bus ticket theory of genius’.

Will Aldred Feb 7, 2025, 2:25 PM
20 points
5 ∶ 2
in reply to: Eugenics-Adjacent’s comment on: Eugenics-Adjacent’s Quick takes
The moderation team is issuing @Eugenics-Adjacent a 6-month ban for flamebait and trolling.
I’ll note that Eugenics-Adjacent’s posts and comments have been mostly about pushing against what they see as EA groupthink. In banning them, I do feel a twinge of “huh, I hope I’m not making the Forum more like an echo chamber.” However, there are tradeoffs at play. “Overrun by flamebait and trolling” seems to be the default end state for most internet spaces: the Forum moderation team is committed to fighting against this default.
All in all, we think the ratio of “good” EA criticism to more-heat-than-light criticism in Eugenics-Adjacent’s contributions is far too low. Additionally, at −220 karma (at the time of writing), Eugenics-Adjacent is one of the most downvoted users of all time—we take this as a clear indication that other users are finding their contributions unhelpful. If Eugenics-Adjacent returns to the Forum, we’ll expect to see significant improvement. I expect we’ll ban them indefinitely if anything like the above continues.
As a reminder, a ban applies to the person behind the account, not just to the particular account.
If anyone has questions or concerns, feel free to reach out or reply in this thread. If you think we’ve made a mistake, you can appeal.

Will Aldred Jan 31, 2025, 3:24 PM
13 points
0 ∶ 0
on: Announcing: Draft Amnesty Week (Feb 24 - March 2)
I’m also announcing this year’s first debate week! We’ll be discussing whether, on the margin, we should put more effort into reducing the chances of avoiding human extinction or increasing the value of futures where we survive.
Nice! A couple of thoughts:
1.
In addition to soliciting new posts for the debate week, consider ‘classic repost’-ing relevant old posts, especially ones that haven’t been discussed on the Forum before?
Tomasik’s ‘Risks of astronomical future suffering’ comes to mind, as well as Assadi’s ‘Will humanity choose its future?’ and Anthis’s ‘The future might not be so great’.^[1]
2.
Alongside the debate statement voting, I think it could be very cool to let users create and post their own distribution over {great, fine, non-, bad, hellish} futures,^[2] in line with the following diagram. You could then aggregate^[3] users’ distributions and display a wisdom-of-the-EA-Forum-crowd prediction for the future:
Source: ‘Beyond Maxipok’ (Cotton-Barratt, 2024)
1. ^
  This is, of course, not an endorsement of Anthis’s former conduct.
2. ^
  On the user end, this would just mean entering five percentages (which add up to 100).
3. ^
  Although a mean aggregation would be simplest, a better aggregation (in my view) would be to take the median user’s percentage within each of {great, fine, non-, bad, hellish}, and then normalize so that the five aggregate percentages add up to 100. (And you could optionally weight users’ distributions/predictions in line with their karma / strong-upvote power.)
  (For context, the Metaculus forecasting platform’s ‘community prediction’ aggregation is a weighted median.)

Will Aldred Jan 22, 2025, 1:29 PM
13 points
2 ∶ 0
on: Preparing Effective Altruism for an AI-Transformed World
+1. I appreciated @RobertM ’s articulation of this problem for animal welfare in particular:
I think the interventions for ensuring that animal welfare is good after we hit transformative AI probably look very different from interventions in the pretty small slice of worlds where the world looks very boring in a few decades.
…
If we achieve transformative AI and then don’t all die (because we solved alignment), then I don’t think the world will continue to have an “agricultural industry” in any meaningful sense (or, really, any other traditional industry; strong nanotech seems like it ought to let you solve for nearly everything else). Even if the economics and sociology work out such that some people will want to continue farming real animals instead of enjoying the much cheaper cultured meat of vastly superior quality, there will be approximately nobody interested in ensuring those animals are suffering, and the cost for ensuring that they don’t suffer will be trivial.
[...] if you think it’s at all plausible that we achieve TAI in a way that locks in reflectively-unendorsed values which lead to huge quantities of animal suffering, that seems like it ought to dominate effectively all other considerations in terms of interventions w.r.t. future animal welfare.
I’ve actually tried asking/questioning a few animal welfare folks for their takes here, but I’ve yet to hear back anything that sounded compelling (to me). (If anyone reading this has an argument for why ‘standard’ animal welfare interventions are robust to the above, then I’d love to hear it!)

Will Aldred Jan 17, 2025, 6:29 AM
9 points
2 ∶ 0
in reply to: RyanCarey’s comment on: What are we doing about the EA Forum? (Jan 2025)
Agree. Although, while the Events dashboard isn’t up to date, I notice that the EAG team released the following table in a post last month, which does have complete 2024 data:
EAG applicant numbers were down 42% from 2022 to 2024,^[1] which is a comparable decline to that in monthly Forum users (down 35% from November 2022’s peak to November 2024).^[2]
To me, this is evidence that the dropping numbers are driven by changes in the larger zeitgeist rather than by any particular thing the Events or Online team is doing (as @Jason surmises in his comment above).
1. ^
  (3460/5988) x 100% = 58% (2 s.f.)
2. ^
  (3561/5509) x 100% = 65% (2 s.f.)
  Note that, in a surprising (to me) coincidence, the absolute numbers of annual EAG applicants and monthly EA Forum users are very similar.

Will Aldred Jan 17, 2025, 6:25 AM
2 points
0 ∶ 0
in reply to: Sarah Cheng’s comment on: Sarah Cheng’s Quick takes
Bug report (although this could very well be me being incompetent!):
The new @mention interface doesn’t appear to take users’ karma into account when deciding which users to surface. This has the effect of showing me a bunch of users with 0 karma, none of whom are the user I’m trying to tag.^[1] (I think the old interface showed higher-karma users higher up?)
More importantly, I’m still shown the wrong users even when I type in the full username of the person I’m trying to tag—in this case, Jason. [Edit: I’ve tried @ing some other users, now, and I’ve found that most of the time there isn’t this problem. It looks like the problem occurs for users with a single word username that’s a common(-ish) name, e.g., Jason, geoffrey, Joseph.] And there doesn’t appear to be a way to ‘scroll down’ to find the right user [when there is this problem],^[2] which means that I’m unable to tag them.
(Sorry that this is a pretty critical comment. I appreciate the work you do :) )
1. ^
2. ^
  Pressing my keyboard’s down arrow lets me ‘move down’ the list that’s shown in the interface, but it loops back to the top after the tenth option—user ‘JAB’, in the example above—rather than letting me go ‘further down’ to find the right user.

Will Aldred Dec 2, 2024, 12:21 AM
18 points
7 ∶ 0
in reply to: Ozzie Gooen’s comment on: Ozzie Gooen’s Shortform
OP has provided very mixed messages around AI safety. They’ve provided surprisingly little funding / support for technical AI safety in the last few years (perhaps 1 full-time grantmaker?), but they have seemed to provide more support for AI safety community building / recruiting
Yeah, I find myself very confused by this state of affairs. Hundreds of people are being funneled through the AI safety community-building pipeline, but there’s little funding for them to work on things once they come out the other side.^[1]
As well as being suboptimal from the viewpoint of preventing existential catastrophe, this also just seems kind of common-sense unethical. Like, all these people (most of whom are bright-eyed youngsters) are being told that they can contribute, if only they skill up, and then they later find out that that’s not the case.
1. ^
  These community-building graduates can, of course, try going the non-philanthropic route—i.e., apply to AGI companies or government institutes. But there are major gaps in what those organizations are working on, in my view, and they also can’t absorb so many people.

Will Aldred Nov 26, 2024, 6:38 AM
6 points
0 ∶ 0
on: What are the best available studies on the effects of smartphone/instagram/facebook/tiktok/etc use on humans, specifically children?
While not a study per se, I found the Huberman Lab podcast episode ‘How Smartphones & Social Media Impact Mental Health’ very informative. (It’s two and a half hours long, mostly about children and teenagers, and references the study(ies) it draws from, IIRC.)

Will Aldred Nov 25, 2024, 9:23 PM
7 points
1 ∶ 0
in reply to: Mikolaj Kniejski’s comment on: Mikolaj Kniejski’s Quick takes
For previous work, I point you to @NunoSempere’s ‘Shallow evaluations of longtermist organizations,’ if you haven’t seen it already. (While Nuño didn’t focus on AI safety orgs specifically, I thought the post was excellent, and I imagine that the evaluation methods/approaches used can be learned from and applied to AI safety orgs.)

Will Aldred Nov 14, 2024, 3:51 PM
10 points
1 ∶ 0
in reply to: lukeprog’s comment on: lukeprog’s Quick takes
I hope in the future there will be multiple GV-scale funders for AI GCR work, with different strengths, strategies, and comparative advantages
(Fwiw, the Metaculus crowd prediction on the question ‘Will there be another donor on the scale of 2020 Good Ventures in the Effective Altruist space in 2026?’ currently sits at 43%.)

Will Aldred Nov 13, 2024, 10:04 PM
24 points
6 ∶ 2
in reply to: Julia_Wise🔸’s comment on: Julia_Wise’s Shortform
Epistemic status: strong opinions, lightly held
I remember a time when an org was criticized, and a board member commented defending the org. But the board member was factually wrong about at least one claim, and the org then needed to walk back wrong information. It would have been clearer and less embarrassing for everyone if they’d all waited a day or two to get on the same page and write a response with the correct facts.
I guess it depends on the specifics of the situation, but, to me, the case described, of a board member making one or two incorrect claims (in a comment that presumably also had a bunch of accurate and helpful content) that they needed to walk back sounds… not that bad? Like, it seems only marginally worse than their comment being fully accurate the first time round, and far better than them never writing a comment at all. (I guess the exception to this is if the incorrect claims had legal ramifications that couldn’t be undone. But I don’t think that’s true of the case you refer to?)
A downside is that if an organization isn’t prioritizing back-and-forth with the community, of course there will be more mystery and more speculations that are inaccurate but go uncorrected. That’s frustrating, but it’s a standard way that many organizations operate, both in EA and in other spaces.
I don’t think the fact that this is a standard way for orgs to act in the wider world says much about whether this should be the way EA orgs act. In the wider world, an org’s purpose is to make money for its shareholders: the org has no ‘teammates’ outside of itself; no-one really expects the org to try hard to communicate what it is doing (outside of communicating well being tied to profit); no-one really expects the org to care about negative externalities. Moreover, withholding information can often give an org a competitive advantage over rivals.
Within the EA community, however, there is a shared sense that we are all on the same team (I hope): there is a reasonable expectation for cooperation; there is a reasonable expectation that orgs will take into account externalities on the community when deciding how to act. For example, if communicating some aspect of EA org X’s strategy would take half a day of staff time, I would hope that the relevant decision-maker at org X takes into account not only the cost/benefit to org X of whether or not to communicate, but also the cost/benefit to the wider community. If half a day of staff time helps others in the community better understand org X’s thinking,^[1] such that, in expectation, more than half a day of (quality-adjusted) productive time is saved (through, e.g., community members making better decisions about what to work on), then I would hope that org X chooses to communicate.
When I see public comments about the inner workings of an organization by people who don’t work there, I often also hear other people who know more about the org privately say “That’s not true.” But they have other things to do with their workday than write a correction to a comment on the Forum or LessWrong, get it checked by their org’s communications staff, and then follow whatever discussion comes from it.
I would personally feel a lot better about a community where employees aren’t policed by their org on what they can and cannot say. (This point has been debated before—see saulius and Habryka vs. the Rethink Priorities leadership.) I think such policing leads to chilling effects that make everyone in the community less sane and less able to form accurate models of the world. Going back to your example, if there was no requirement on someone to get their EAF/LW comment checked by their org’s communications staff, then that would significantly lower the time and effort barrier to publishing such comments, and then the whole argument around such comments being too time-consuming to publish becomes much weaker.
All this to say: I think you’re directionally correct with your closing bullet points. I think it’s good to remind people of alternative hypotheses. However, I push back on the notion that we must just accept the current situation (in which at least one major EA org has very little back-and-forth with the community)^[2]. I believe that with better norms, we wouldn’t have to put as much weight on bullets 2 and 3, and we’d all be stronger for it.
1. ^
  Or, rather, what staff at org X are thinking. (I don’t think an org itself can meaningfully have beliefs: people have beliefs.)
2. ^
  Note: Although I mentioned Rethink Priorities earlier, I’m not thinking about Rethink Priorities here.

Will Aldred Oct 26, 2024, 9:56 PM
46 points
12 ∶ 4
on: What should EAIF Fund?
Open Phil has seemingly moved away from funding ‘frontier of weirdness’-type projects and cause areas; I therefore think a hole has opened up that EAIF is well-placed to fill. In particular, I think an FHI 2.0 of some sort (perhaps starting small and scaling up if it’s going well) could be hugely valuable, and that finding a leader for this new org could fit in with your ‘running specific application rounds to fund people to work on [particularly valuable projects].’
My sense is that an FHI 2.0 grant would align well with EAIF’s scope. Quoting from your announcement post for your new scope:
Examples of projects that I (Caleb) would be excited for this fund [EAIF] to support
- A program that puts particularly thoughtful researchers who want to investigate speculative but potentially important considerations (like acausal trade and ethics of digital minds) in the same physical space and gives them stipends—ideally with mentorship and potentially an emphasis on collaboration.
- …
- Foundational research into big, if true, areas that aren’t currently receiving much attention (e.g. post-AGI governance, ECL, wild animal suffering, suffering of current AI systems).
Having said this, I imagine that you saw Habryka’s ‘FHI of the West’ proposal from six months ago. The fact that that has not already been funded, and that talk around it has died down, makes me wonder if you have already ruled out funding such a project. (If so, I’d be curious as to why, though of course no obligation on you to explain yourself.)

Will Aldred Oct 25, 2024, 9:52 PM
2 points
0 ∶ 0
in reply to: Owen Cotton-Barratt’s comment on: AI safety tax dynamics
Thanks for clarifying!
Be useful for research on how to produce intent-aligned systems
Just checking: Do you believe this because you see the intent alignment problem as being in the class of “complex questions which ultimately have empirical answers, where it’s out of reach to test them empirically, but one may get better predictions from finding clear frameworks for thinking about them,” alongside, say, high energy physics?

Will Aldred Oct 25, 2024, 7:31 PM
2 points
0 ∶ 0
in reply to: RobertM’s comment on: The default trajectory for animal welfare means vastly more suffering
For a variety of reasons it seems pretty unlikely to me that we manage to robustly solve alignment of superintelligent AIs while pointed in “wrong” directions; that sort of philosophical unsophistication is why I’m pessimistic on our odds of success.
This is an aside, but I’d be very interested to hear you expand on your reasons, if you have time. (I’m currently on a journey of trying to better understand how alignment relates to philosophical competence; see thread here.)
(Possibly worth clarifying up front: by “alignment,” do you mean “intent alignment,” as defined by Christiano, or do you mean something broader?)
What links here?
- _will_'s comment on RobertM’s Shortform by RobertM (LessWrong; Dec 6, 2024, 11:36 AM; 5 points)

Will Aldred Oct 25, 2024, 7:24 PM
2 points
0 ∶ 0
in reply to: Owen Cotton-Barratt’s comment on: AI safety tax dynamics
Hmm, interesting.
I’m realizing now that I might be more confused about this topic than I thought I was, so to backtrack for just a minute: it sounds like you see weak philosophical competence as being part of intent alignment, is that correct? If so, are you using “intent alignment” in the same way as in the Christiano definition? My understanding was that intent alignment means “the AI is trying to do what present-me wants it to do.” To me, therefore, this business of the AI being able to recognize whether its actions would be approved by idealized-me (or just better-informed-me) falls outside the definition of intent alignment.
(Looking through that Christiano post again, I see a couple of statements that seem to support what I’ve just said,^[1] but also one that arguably goes the other way.^[2])
Now, addressing your most recent comment:
Okay, just to make sure that I’ve understood you, you are defining weak philosophical competence as “competence at reasoning about complex questions [in any domain] which ultimately have empirical answers, where it’s out of reach to test them empirically, but where one may get better predictions from finding clear frameworks for thinking about them,” right? Would you agree that the “important” part of weak philosophical competence is whether the system would do things an informed version of you, or humans at large, would ultimately regard as terrible (as opposed to how competent the system is at high energy physics, consciousness science, etc.)?
If a system is competent at reasoning about complex questions across a bunch of domains, then I think I’m on board with seeing that as evidence that the system is competent at the important part of weak philosophical competence, assuming that it’s already intent aligned.^[3] However, I’m struggling to see why this would help with intent alignment itself, according to the Christiano definition. (If one includes weak philosophical competence within one’s definition of intent alignment—as I think you are doing(?)—then I can see why it helps. However, I think this would be a non-standard usage of “intent alignment.” I also don’t think that most folks working on AI alignment see weak philosophical competence as part of alignment. (My last point is based mostly on my experience talking to AI alignment researchers, but also on seeing leaders of the field write things like this.))
A couple of closing thoughts:
1. I already thought that strong philosophical competence was extremely neglected, but I now also think that weak philosophical competence is very neglected. It seems to me that if weak philosophical competence is not solved at the same time as intent alignment (in the Christiano sense),^[4] then things could go badly, fast. (Perhaps this is why you want to include weak philosophical competence within the intent alignment problem?)
2. The important part of weak philosophical competence seems closely related to Wei Dai’s “human safety problems”.
(Of course, no obligation on you to spend your time replying to me, but I’d greatly appreciate it if you do!)
1. ^
  They could [...] be wrong [about; sic] what H wants at a particular moment in time.
  They may not know everything about the world, and so fail to recognize that an action has a particular bad side effect.
  They may not know everything about H’s preferences, and so fail to recognize that a particular side effect is bad.
  …
  I don’t have a strong view about whether “alignment” should refer to this problem or to something different. I do think that some term needs to refer to this problem, to separate it from other problems like “understanding what humans want,” “solving philosophy,” etc.
  (“Understanding what humans want” sounds quite a lot like weak philosophical competence, as defined earlier in this thread, while “solving philosophy” sounds a lot like strong philosophical competence.)
2. ^
  An aligned AI would also be trying to do what H wants with respect to clarifying H’s preferences.
  (It’s unclear whether this just refers to clarifying present-H’s preferences, or if it extends to making present-H’s preferences be closer to idealized-H’s.)
3. ^
  If the system is not intent aligned, then I think this would still be evidence that the system understands what an informed version of me would ultimately regard as terrible vs. not terrible. But, in this case, I don’t think the system will use what it understands to try to do the non-terrible things.
4. ^
  Insofar as a solved vs. not solved framing even makes sense. Karnofsky (2022; fn. 4) argues against this framing.

Will Aldred

Examples of projects that I (Caleb) would be excited for this fund [EAIF] to support