Stephen McAleese

Karma: 378

Software engineer interested in AI safety.

Stephen McAleese Apr 12, 2025, 10:56 PM
3 points
1 ∶ 4
on: Announcing our 2025 strategy
I would like to see a push towards increasing donations to x-risk reduction and longtermist charities. Last time I checked, only about 10% of GWWC donations were going to longtermist funds like the Long-Term Future Fund. Consequently, I think the x-risk and AI safety funding landscapes have been more reliant on big donors than they should be.

Stephen McAleese Mar 17, 2025, 10:28 PM
1 point
0 ∶ 0
on: Discussion Thread: Existential Choices Debate Week
I think avoiding existential risk is the most important thing. As long as we can do that and don’t have some kind of lock in, then we’ll have time to think about and optimize the value of the future.

Stephen McAleese Mar 9, 2025, 1:21 PM
1 point
0 ∶ 1
in reply to: Marcus Abramovitch 🔸’s comment on: Could this be an unusually good time to Earn To Give?
£110k seems like it would probably be impactful, and that’s just one person giving right? That’s probably at least one FTE. Also SERI MATS only costs about ~£500k per year so it could be expanded substantially with that amount.

Stephen McAleese Mar 8, 2025, 1:29 PM
3 points
0 ∶ 0
in reply to: Davidmanheim’s comment on: How Can Average People Contribute to AI Safety?
Thank you for your comment.
- Regarding evals, I was referring specifically to evals focused on AI safety and risk-related behaviors like dangerous capabilities, deception, or situational awareness (I will edit the post). I think it’s important to measure and quantify these capabilities to determine when risk mitigation strategies are necessary. Otherwise we risk deploying models with hidden risks and insufficient safeguards.
- Exaggerating the risks of current AI models would be misleading so we should avoid that. The point I intended to communicate was that we should try to accurately inform everyone about both the risks and benefits of AI and the opinions of different experts. Given the potential future importance of AI, I believe the quantity and quality of discussion on the topic is too low and this problem is often worsened by the media which tends to focus on short-term events rather than what’s important in the long term.
More generally, while we should aim to avoid causing harm, avoiding all actions that have a non-zero risk of causing harm would lead to inaction.
If overly cautious individuals refrain from taking action, decision making and progress may then be driven by those who are less concerned about risks, potentially leading to worse overall situation.
Therefore, a balanced approach that considers the risks and benefits of each action without stifling all action is needed to make meaningful progress.

How Can Average People Contribute to AI Safety?

Stephen McAleeseMar 6, 2025, 10:50 PM

14 points

4 comments8 min readEA link

Stephen McAleese Jan 11, 2025, 2:38 PM
5 points
0 ∶ 0
in reply to: Imma🔸’s comment on: An Overview of the AI Safety Funding Situation
Now the post is updated with 2024 numbers :)
I didn’t include Longview Philanthropy because they’re a smaller funder and a lot of their funding seems to come from Open Philanthropy. There is a column called “Other” that serves as a catch-all for any funders I left out.
I took a look at Founder’s Pledge but their donations don’t seem that relevant to AI safety to me.

Stephen McAleese Dec 31, 2024, 5:10 PM
1 point
0 ∶ 0
on: If You’re Going To Eat Animals, Eat Beef and Dairy
Do you think wild animals such as tuna and deer are a good option too since they probably have a relatively high standard of living compared to farmed animals?

Stephen McAleese Dec 26, 2024, 10:25 AM
3 points
0 ∶ 0
on: Donation Celebration Post
LTFF, LessWrong.

Geoffrey Hinton on the Past, Present, and Future of AI

Stephen McAleeseOct 12, 2024, 4:41 PM

5 points

1 comment18 min readEA link

Stephen McAleese Jul 25, 2024, 6:49 PM
5 points
0 ∶ 0
on: Climate Advocacy and AI Safety: Supercharging AI Slowdown Advocacy
I’ve never heard this idea proposed before so it seems novel and interesting.
As you say in the post, the AI risk movement could gain much more awareness by associating itself with the climate risk advocacy movement which is much larger. Compute is arguably the main driver of AI progress, compute is correlated with energy usage, and energy use generally increases carbon emissions so limiting carbon emissions from AI is an indirect way of limiting the compute dedicated to AI and slowing down the AI capabilities race.
This approach seems viable in the near future until innovations in energy technology (e.g. nuclear fusion) weaken the link between energy production and CO2 emissions, or algorithmic progress reduces the need for massive amounts of compute for AI.
The question is whether this indirect approach would be more effective than or at least complementary to a more direct approach that advocates explicit compute limits and communicates risks from misaligned AI.

Stephen McAleese Jun 15, 2024, 1:40 PM
2 points
1 ∶ 0
on: Response to Aschenbrenner’s “Situational Awareness”
A recent survey of AI alignment researchers found that the most common opinion on the statement “Current alignment research is on track to solve alignment before we get to AGI” was “Somewhat disagree”. The same survey found that most AI alignment researchers also support pausing or slowing down AI progress.
Slowing down AI progress might be net-positive if you take ideas like longtermism seriously but it seems challenging to do given the strong economic incentives to increase AI capabilities. Maybe government policies to limit AI progress will eventually enter the Overton window when AI reaches a certain level of dangerous capability.

Stephen McAleese Jun 5, 2024, 4:44 PM
1 point
0 ∶ 0
on: Drexler’s Nanosystems is now available online
This is a cool project! Thanks for making it. Hopefully it makes the book more accessible.

Stephen McAleese May 22, 2024, 7:47 PM
6 points
0 ∶ 0
on: An Overview of the AI Safety Funding Situation
Update: the UK government has announced £8.5 million in AI safety funding for systematic AI safety and these grants will probably be distributed in 2025.

Stephen McAleese May 12, 2024, 9:38 AM
3 points
3 ∶ 0
on: MATS Winter 2023-24 Retrospective
Thanks for writing this! It’s interesting to see how MATS has evolved over time. I like all the quantitative metrics in the post as well.

Stephen McAleese Mar 15, 2024, 8:57 PM
2 points
0 ∶ 0
on: What happened to the ‘only 400 people work in AI safety/governance’ number dated from 2020?
I wrote a blog post in 2022 (1.5 years ago) estimating that there were about 400 people working on technical AI safety and AI governance.
In the same post, I also created a mathematical model which said that the number of technical AI safety researchers was increasing by 28% per year.
Using this model for all AI safety researchers, we can estimate that there are now $400 \times {1.28}^{1.5} \approx 580$ people working on AI safety.
I personally suspect that the number of people working on AI safety in academia has grown faster than the number of people in new EA orgs so the number could be much higher than this.

Stephen McAleese Jan 10, 2024, 6:27 PM
2 points
0 ∶ 0
on: Why can’t we accept the human condition as it existed in 2010?
One argument for continued technological progress is that our current civilization is not particularly stable or sustainable. One of the lessons from history is that seemingly stable empires such as the Roman or Chinese empires eventually collapse after a few hundred years. If there isn’t more technological progress so that our civilization reaches a stable and sustainable state, I think our current civilization will eventually collapse because of climate change, nuclear war resource exhaustion, political extremism, or some other cause.

Stephen McAleese Jan 9, 2024, 3:46 PM
10 points
2 ∶ 0
on: Reflections on my first year of AI safety research
Thanks for the writeup. I like how it’s honest and covers all aspects of your experience. I think a key takeaway is that there is no obvious fixed plan or recipe for working on AI safety and instead, you just have to try things and learn as you go along. Without these kinds of accounts, I think there’s a risk of survivorship bias and positive selection effects where you see a nice paper or post published and you don’t get to see experiments that have failed and other stuff that has gone wrong.

Stephen McAleese Dec 14, 2023, 10:02 AM
8 points
2 ∶ 2
on: Funding case: AI Safety Camp
I’m sad to hear that AISC is lacking in funding and somewhat surprised given that it’s one of the most visible and well-known AI safety programs. Have you tried applying for grant money from Open Philanthropy since it’s the largest AI safety grant-maker?

Stephen McAleese Oct 16, 2023, 7:49 PM
1 point
0 ∶ 0
on: AI Pause Will Likely Backfire
“In brief, the book [Superintelligence] mostly assumed we will manually program a set of values into an AGI, and argued that since human values are complex, our value specification will likely be wrong, and will cause a catastrophe when optimized by a superintelligence”
Superintelligence describes exploiting hard-coded goals as one failure mode which we would probably now call specification gaming. But the book is quite comprehensive, other failure modes are described and I think the book is still relevant.
For example, the book describes what we would now call deceptive alignment:
“A treacherous turn can result from a strategic decision to play nice and build strength while weak in order to strike later”
And reward tampering:
“The proposal fails when the AI achieves a decisive strategic advantage at which point the action which maximizes reward is no longer one that pleases the trainer but one that involves seizing control of the reward mechanism.”
And reward hacking:
“The perverse instantiation—manipulating facial nerves—realizes the final goal to a greater degree than the methods we would normally use.”
I don’t think incorrigibility due to the ‘goal-content integrity’ instrumental goal has been observed in current ML systems yet but it could happen given the robust theoretical argument behind it:
If an agent retains its present goals into the future, then its present goals will be more likely to be achieved by its future self. This gives the agent a present instrumental reason to prevent alternations of its final goals.”

Stephen McAleese Oct 2, 2023, 10:02 PM
3 points
0 ∶ 1
on: An Overview of the AI Safety Funding Situation
Some information not included in the original post:
- In April 2023, the UK government announced £100m in initial funding for a new AI Safety Taskforce.
- In June 2023, UKRI awarded £31m to the University of Southhampton to create a new responsible and trustworthy AI consortium named Responsible AI UK.

Stephen McAleese

How Can Aver­age Peo­ple Con­tribute to AI Safety?

Ge­offrey Hin­ton on the Past, Pre­sent, and Fu­ture of AI

How Can Average People Contribute to AI Safety?

Geoffrey Hinton on the Past, Present, and Future of AI