Toby_Ord

Karma: 3,947

Toby_Ord 22 Dec 2025 14:12 UTC
67 points
0 ∶ 0
in reply to: William_MacAskill’s comment on: Contradict my take on OpenPhil’s past AI beliefs
I ran a timelines exercise in 2017 with many well known FHI staff (though not including Nick) where the point was to elicit one’s current beliefs for AGI by plotting CDFs. Looking at them now, I can tell you our median dates were: 2024, 2032, 2034, 2034, 2034, 2035, 2054, and 2079. So the median of our medians was (robustly) 2034 (i.e. 17 more years time). I was one of the people who had that date, though people didn’t see each others’ CDFs during the exercise.
I think these have held up well.
So I don’t think Eliezer’s “Oxford EAs” point is correct.

Toby_Ord 16 Dec 2025 16:36 UTC
2 points
0 ∶ 0
in reply to: Oliver Sourbut’s comment on: Rerunning the Time of Perils
I’ve often been frustrated by this assumption over the last 20 years, but don’t remember any good pieces about it.
It may be partly from Eliezer’s first alignment approach being to create a superintelligent sovereign AI, where if that goes right, other risks really would be dealt with.

Toby_Ord 16 Dec 2025 16:32 UTC
2 points
0 ∶ 0
in reply to: Oliver Sourbut’s comment on: Rerunning the Time of Perils
Yeah, I mean ‘more valuable to prevent’, before taking into account the cost and difficulty.

Toby_Ord 16 Dec 2025 16:27 UTC
8 points
3 ∶ 0
in reply to: Simon ’s comment on: Rerunning the Time of Perils
At any rate, merely uncertain catastrophic risks do not have rerun risk, while chancy ones do.
This is a key point. For many existential risks, the risk is mainly epistemic (i.e. we should assign some probability p to it happening in the next time period), rather than it being objectively chancy. For one-shot decision-making sometimes this distinction doesn’t matter, but here it does.
Complicating matters, what is really going on is not just that the probability is one of two types, but that we have a credence distribution over the different levels of objective chance. A pure subjective case is where all our credence is on 0% and 100%, but in many cases we have credences over multiple intermediate risk levels — these cases are neither purely epistemic nor purely objective chance.

Toby_Ord 16 Dec 2025 16:18 UTC
3 points
1 ∶ 0
on: Rerunning the Time of Perils
The value of saving philanthropic resources to deploy post-superintelligence is greater than it otherwise would be.
One way to think of this is that if there is a 10% existential risk from the superintelligence transition and we will attempt that transition, then the world is currently worth 0.90 V, where V is the expected value of the world after achieving that transition. So the future world is more valuable (in the appropriate long-term sense) and saving it is correspondingly more important. With these numbers the effect isn’t huge, but would be important enough to want to take into account.
More generally, worlds where we are almost through the time of perils are substantially more valuable than those where we aren’t. And it setback prevention becomes more important the further through you are.

Toby_Ord 5 Nov 2025 9:36 UTC
11 points
2 ∶ 0
on: Legible vs. Illegible AI Safety Problems
That’s a very nice and clear idea — I think you’re right that working on making mission-critical, but illegible, problems legible is robustly high value.

Toby_Ord 29 Oct 2025 17:58 UTC
11 points
0 ∶ 0
in reply to: Peter’s comment on: How Well Does RL Scale?
It’s very difficult to do this with benchmarks, because as the models improve benchmarks come and go. Things that used to be so hard that it couldn’t do better than chance quickly become saturated and we look for the next thing, then the one after that, and so on. For me, the fact that GPT-4 → GPT4.5 seemed to involve climbing about half of one benchmark was slower progress than I expected (and the leaks from OpenAI suggest they had similar views to me). When GPT-3.5 was replaced by GPT-4, people were losing their minds about it — both internally and on launch day. Entirely new benchmarks were needed to deal with what it could do. I didn’t see any of that for GPT-4.5.
I agree with you that the evidence is subjective and disputable. But I don’t think it is a case where the burden of proof is disproportionately on those saying it was a smaller jump than previously.
(Also, note that this doesn’t have much to do with the actual scaling laws, which are a measure of how much prediction error of the next token goes down when you 10x the training compute. I don’t have reason to think that has gone off trend. But I’m saying that the real-world gains from this (or the intuitive measure of intelligence) has diminished, compared to the previous few 10x jumps. This is definitely compatible. e.g. if the model only trained on wikipedia plus an unending supply of nursery rhymes, its prediction error would continue to drop as more training happened, but its real world capabilities wouldn’t improve by continued 10x jumps in the number of nursery rhymes added in. I think the real world is like this where GPT-4-level systems are already trained on most books ever written and much of the recorded knowledge of the last 10,000 years of civilisation, and it makes sense that adding more Reddit comments wouldn’t move the needle much.)

Toby_Ord 29 Oct 2025 14:38 UTC
5 points
1 ∶ 0
in reply to: Toby_Ord’s comment on: How Well Does RL Scale?
I was going to say something about lack of incentives, but I think it is also a lack of credible signals that the work is important, is deeply desired by others working in these fields, and would be used to inform deployments of AI. In my view, there isn’t much desire for work like this from people in the field and they probably wouldn’t use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.

Toby_Ord 29 Oct 2025 14:07 UTC
7 points
0 ∶ 0
in reply to: Noah Birnbaum’s comment on: How Well Does RL Scale?
I don’t know what to make of that. Obviously Vladimir knows a lot about state of the art compute, but there are so many details there without them being drawn together into a coherent point that really disagrees with you or me on this.
It does sound like he is making the argument that GPT 4.5 was actually fine and on trend. I don’t really believe this, and don’t think OpenAI believed it either (there are various leaks they were disappointed with it, they barely announced it, and then they shelved it almost immediately).
I don’t think the argument about original GPT-4 really works. It improved because of post-training, but did they also add that post-training on GPT-4.5? If so, then the 10x compute really does add little. If not, then why not? Why is OpenAI’s revealed preference to not put much effort into enhancing their most expensive ever system if not because they didn’t think it was that good?
There is a similar story re reasoning models. It is true that in many ways the advanced reasoning versions of GPT-4o (e.g. o3) are superior to GPT-4.5, but why not make it a reasoning model too? If that’s because it would use too much compute or be too slow for users due to latency, then these are big flaws with scaling up larger models.

Toby_Ord 29 Oct 2025 13:53 UTC
5 points
0 ∶ 0
in reply to: Wei Dai’s comment on: How Well Does RL Scale?
Re 99% of academic philosophers, they are doing their own thing and have not heard of these possibilities and wouldn’t be likely to move away from their existing areas if they had. Getting someone to change their life’s work is not easy and usually requires hours of engagement to have a chance. It is especially hard to change what people work on in a field when you are outside that field.
A different question is about the much smaller number of philosophers who engage with EA and/or AI safety (there are maybe 50 of these). Some of these are working on some of those topics you mention. e.g. Will MacAskill and Joe Carlsmith have worked on several of these. I think some have given up philosophy to work on other things such as AI alignment. I’ve done occasional bits of work related to a few of these (e.g. here on dealing with infinities arising in decision theory and ethics without discounting) and also to other key philosophical questions that aren’t on your list.
For such philosophers, I think it is a mixture of not having seen your list and not being convinced these are the best things that they each could be working on.

Toby_Ord 26 Oct 2025 10:28 UTC
26 points
4 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: How Well Does RL Scale?
I appreciate you raising this Wei (and Yarrow’s responses too). They both echoed a lot of my internal debate on this. I’m definitely not sure whether this is the best use of my time. At the moment, my research time is roughly evenly split between this thread of essays on AI scaling and more philosophical work connected to longtermism, existential risk and post-AGI governance. The former is much easier to demonstrate forward progress and there is more of a demand signal for it. The latter is harder to be sure it is on the right path and is in less demand. My suspicion is that it is generally more important though, and that demand/appreciation doesn’t track importance very well.
It is puzzling to me too that no-one else was doing this kind of work on understanding scaling. I think I must be adding some rare ingredient, but I can’t think of anything rare enough to really explain why no-one else got these results first. (People at the labs probably worked out a large fraction of this, but I still don’t understand why the people not at the labs didn’t.)
In addition to the general questions about which strand is more important, there are a few more considerations:
- No-one can tell ex ante how a piece of work or research stream will pan out, so everyone will always be wrong ex post sometimes in their prioritisation decisions
- My day job is at Oxford University’s AI Governance Initiative (a great place!) and I need to be producing some legible research that an appreciable number of other people are finding useful
- I’m vastly more effective at work when I have an angle of attack and a drive to write up the results — recently this has been for these bite-size pieces of understanding AI scaling. The fact that there is a lot of response from others is helping with this as each piece receives some pushback that leads me to the next piece.
But I’ve often found your (Wei Dai’s) comments over the last 15-or-so years to be interesting, unusual, and insightful. So I’ll definitely take into account your expressed demand for more philosophical work and will look through those pages of philosophical questions you linked to.

Toby_Ord 23 Oct 2025 11:10 UTC
49 points
11 ∶ 1
in reply to: Lowe Lundin’s comment on: How Well Does RL Scale?
Thanks. I’m also a bit surprised by the lack of reaction to this series given that:
- compute scaling has been the biggest story of AI in the last few decades
- it has dramatically changed
- very few people are covering these changes
- it is surprisingly easy to make major crisp contributions to our understanding of it just by analysing the few pieces of publicly available data
- the changes have major consequences for AI companies, AI timelines, AI risk, and AI governance

How Well Does RL Scale?

Toby_Ord22 Oct 2025 13:16 UTC

163 points

52 comments7 min readEA link

Toby_Ord 16 Oct 2025 10:04 UTC
8 points
1 ∶ 0
on: You Should Get a Reusable Mask
Thanks Jeff, this was very helpful. I’d listened to Andrew Snyder-Beattie’s excellent interview on the 80,000 Hours podcast and wanted to buy one of these, but hadn’t known exactly what to buy until now.

Toby_Ord 15 Oct 2025 15:06 UTC
11 points
1 ∶ 0
on: Longtermism: An Impracticable Attempt to Reason Our Way into Becoming Irrationally Generous Heroes?
my hope with this essay is simply to make a case that all might benefit from a widening of Longtermism’s methods and a greater boldness in proclaiming that it is a part of the greatness of being human to be heroically, even slightly irrationally, generous in our relationship with others, including future generations, out of our love for humanity itself.
This is a very interesting approach, and I don’t think it is in conflict with the approach in the volume. I hope you develop it further.

Toby_Ord 15 Oct 2025 12:17 UTC
43 points
6 ∶ 0
on: Effective altruism in the age of AGI
Thanks so much for writing this Will, I especially like the ideas:
1. It is much more clear now than it was 10 years ago that AI will be a major issue of our time, affecting many aspects of our world (and our future). So it isn’t just relevant as a cause, but instead as something that affects how we pursue many causes, including things like global health, global development, pandemics, animal welfare etc.
2. Previously EA work on AI was tightly focused around technical safety work, but expansion of this to include governance work has been successful and we will need to further expand it, such that there are multiple distinct AI areas of focus within EA.
If you’ve got a very high probability of AI takeover (obligatory reference!), then my first two arguments, at least, might seem very weak because essentially the only thing that matters is reducing the risk of AI takeover.
I’m not even sure your arguments would be weak in that scenario.
e.g. if there were a 90% chance we fall at the first hurdle with an unaligned AI taking over, but also a 90% chance that even if we avoid this, we fall at the second hurdle with a post-AGI world that squanders most of the value of the future, then this would be symmetrical between the problems (it doesn’t matter formally which one comes a little earlier). In this case we’d only have a 1% chance of a future that is close to the best we could achieve. Completely solving either problem would increase that to 10%. Halving the chance of the bad outcome for both of them would instead increase that to 30.25% (and would probably be easier than completely solving one). So in this case there would be active reason to work on both at once (even if work on each had a linear effect on its probability).
One needs to add substantive additional assumptions on top of the very high probability of AI takeover to get it to argue against allocating some substantial effort to ensuring that even with aligned AI things go well. e.g. that if AI doesn’t takeover, it solves our other problems or that the chance of near-best futures is already very high, etc.

Toby_Ord 15 Oct 2025 10:45 UTC
8 points
0 ∶ 0
in reply to: SiebeRozendal’s comment on: Prediction markets & many experts think authoritarian capture of the US looks distinctly possible
Your comment above is the most informative thing I’ve read so far on the likelihood of the end of democracy in America. I especially appreciate the mix of key evidence pointing in both directions.

Toby_Ord 15 Oct 2025 10:26 UTC
12 points
3 ∶ 0
on: Taking ethics seriously, and enjoying the process
Thanks for this Kuhan — a great talk.
I’m intrigued about the idea of promoting a societal-level culture of substantially more altruism. It does feel like there is room for a substantial shift (from a very low base!) and it might be achievable.

Toby_Ord 4 Oct 2025 14:51 UTC
19 points
1 ∶ 0
on: The day Elon Musk’s AI became a Nazi (and what it means for AI safety) | New video from AI in Context
Chana, this is incredible work by you, Aric, and the rest of the team.
It’s not at all easy to balance being informative, sober, engaging, and touching — all while addressing the most important issues of our time — but you’re knocking it out of the park.

Toby_Ord 26 Sep 2025 7:58 UTC
6 points
1 ∶ 0
in reply to: Ben_West🔸’s comment on: Evaluating the Infinite
Interesting! So this is a kind of representation theorem (a bit like the VNM Theorem) but instead of saying that Archimedean preferences of gambles can be represented as a standard sum, it says that any aggregation method (even a non-Archimedean one) can be represented by a sum of a hyperreal utility function applied to each of its parts.