tobycrisford 🔸

Karma: 885

tobycrisford 🔸 31 Mar 2026 19:14 UTC
1 point
0 ∶ 0
on: GiveWell’s AI red-teaming limitations aren’t a model problem — they’re an architecture problem
It’s really cool that you’ve done this and released the code!
Am I understanding right that the givewell baseline you’re trying to beat used GPT, while your approach uses Claude? How can you be sure that the improvements aren’t due to the model choice, rather than the architecture?

tobycrisford 🔸 9 Mar 2026 19:57 UTC
1 point
0 ∶ 0
in reply to: CianHamilton’s comment on: The Intersection of Moral Weights and Logarithmic Pain
Sorry for the very delayed reply to this. I meant to reply at the time and then it slipped my mind!
Yes, you’ve summarised my position perfectly, I like those diagrams!
I guess my deeper point was that I wasn’t sure there was any meaningful way to say something like “X is twice as painful as Y” without defining it via choices among gambles or durations. You say for humans it seems real, but does it? I can definitely introspect and discover that X is more painful than Y, but I’m not sure I can introspect and discover that it is N times as painful. Where does that number come from?
Although as I was thinking more about how to justify this, I started thinking about other sensory experiences, like sound. Is it meaningful to say that “X feels twice as loud as Y”, in a sense that doesn’t have to line up with the intensity of the physical sound wave? And then I remembered my physics lessons from way back, and realised the answer might be yes. I was definitely taught that the reason we measure sound volume on a log scale (decibels) is because it lines up better with our sensory perception of it (you have to square the intensity of the sound wave in order to double the perceived intensity). But if this is true then it means there is some sense in which we can introspect and say “X sounds twice as loud as Y”, even though the underlying sound wave might not be twice as intense. And if that is the case then maybe this should also be true for pain.
I’m still very uncertain about this though. If I listened to different sounds and tried to place them on a numerical scale, I’m not really sure what it is that I’d actually be doing.

tobycrisford 🔸 24 Feb 2026 8:06 UTC
5 points
0 ∶ 0
in reply to: CianHamilton’s comment on: The Intersection of Moral Weights and Logarithmic Pain
Thank you for your reply and clarification!
If the claim is that the gap between ‘Disabling’ and ‘Excruciating’ should be larger than the gap between ‘Annoying’ and ‘Hurtful’, then that makes sense to me, and seems interesting.
But it sounds like this wasn’t a numerical scale to begin with? So this again just feels like a claim about how we should go about assigning numbers to those categories (if we need numbers), rather than a claim that pain unpleasantness is ‘superlinear’ in some objective sense?
Defining what a numerical score for pain means seems like a hard problem. From my perspective, it seems like it should be defined so that the being concerned would be indifferent between a day of 2*x and 2 days of x. I think this is the notion you are referring to as ‘unpleasantness’. The question then for any other pain metric is just: “how well does it measure this?”. I’m still not sure it makes sense to ask “How does pain intensity scale with unpleasantness?”, since then we would first have to define a numerical scale for pain intensity in some different way, and I’m still not sure how we begin to do that?
I suppose there is another ineresting complication here, which is that you could also try to define your pain scale in terms of preferences among gambles. For example, the pain scale should be defined so that a rational being is indifferent between 100% chance of x and a 50% chance of 2*x. And then you’re confronted with the question of whether this should give you the same answer as defining it in terms of preferences among durations. My feeling is that it should be the same (something about personal identity not being a ‘further fact’ and applying standard utilitarian aggregation approach to person-moments rather than persons..?) but would be interesting to explore points of view where those two potential scale definitions are different. That doesn’t feel quite the same as ‘intensity’ vs ‘unpleasantness’ though. More like two different definitions of ‘unpleasantness’.

tobycrisford 🔸 23 Feb 2026 18:08 UTC
4 points
0 ∶ 0
on: The Intersection of Moral Weights and Logarithmic Pain
I’m confused about what “superlinearity” is even supposed to mean here.
In the intro you distinguish “unpleasantness” and “intensity”, and say that one grows superlinearly with the other, but how are these two things even defined to begin with? And what is the difference between them? Defining one scale for measuring pain is hard enough, but before we can evaluate this “superlinear” claim we first need to define two!
In the examples with humans, I can see what the claim is. There are at least two ways you could try to define a pain scale: (i) self-report on a scale of 1-10, and (ii) something that more consistently tracked actual preferences with respect to gambles or experiences of different duration, and in this example the claim is that (ii) grows super-linearly with (i).
But this just seems like a claim about the limitations of the self-report 1-10 scale, which is only relevant for humans (think I’m probably agreeing with the summary of Bob Fischer’s take here).
In the case of non-humans, it’s not that I disagree, but I don’t even understand what the claim is that is being made?
What links here?
- Jim Buhler's comment on The Intersection of Moral Weights and Logarithmic Pain by CianHamilton (26 Feb 2026 8:19 UTC; 3 points)

tobycrisford 🔸 9 Feb 2026 7:36 UTC
9 points
0 ∶ 0
on: Donations, The Fifth Year
If I understand right, the claim you’re making here is that if I give £10 to a Givewell charity, I cause Dustin Muskovitz to give £10 less to that Givewell charity, and do something else with it instead. What else does he do with it?
- Donate it to a different global health charity—Ok, doesn’t seem like too big a deal, my counterfactual impact is still to move money to a highly effective global health charity
- Spend it on himself—Seems unlikely..?
- Donate it to a different cause area, e.g. AI safety—so while I think I have supported global health, the counterfactual impact is actually to move more money to AI safety.
The second two possibilities seem surprising and important if true, and I’d be interested to hear more justification for this! Is there some evidence that this is really what happens?

tobycrisford 🔸 4 Feb 2026 17:25 UTC
7 points
2 ∶ 0
in reply to: Christina La Fleur’s comment on: The Vegan Filter: Overcoming the Leading Barrier to Veganism
Why do you expect it to be worse environmentally to order online?
If the alternative is driving, it seems much less efficient to have 10 people independently drive to the shop and back than to have one van deliver all their food in a single round trip.
If the alternative is public transport, I guess it’s less clear, but ordering online probably allows bigger shops in that case, which I’d guess would be more efficient again?
The only way I can see it clearly making things worse is if the alternative is walking to the shops. But in that case, I’d still guess that the environmental costs of the products themselves would be much more important than the environmental costs of their transport (just because this is a claim that seems to be made a lot, and I think must factor in the transport costs of getting it from the shop to your home as well!)

tobycrisford 🔸 31 Dec 2025 16:12 UTC
1 point
0 ∶ 0
in reply to: David Mathers🔸’s comment on: Unknown Knowns: Five Ideas You Can’t Unsee
Maybe, although an election being tied is about the only way that particular example can be fuzzy, and there is a well defined process for what happens in that situation (like flipping a coin). There is ultimately only one winner, and it is possible for a single vote to make the difference.
Whether an experience is painful or not is extremely unclear, but if your metric is just something like “number of animals killed for meat each year” then again that is something well defined and precise, and it must in principle be possible to change it with an individual purchase.

tobycrisford 🔸 31 Dec 2025 8:43 UTC
5 points
1 ∶ 0
in reply to: David Mathers🔸’s comment on: Unknown Knowns: Five Ideas You Can’t Unsee
Ironically I might also be guilty of using some technical terminology incorrectly here!
I had in mind the discussion on valuing actions with imperceptible effects from the “Five Mistakes in Moral Mathematics” chapter in Reasons+Persons (relevant to all the examples mentioned in the IVT section of this post), where if I remember right Parfit makes an explicit comparison with the “paradox of the heap” (I think this is where I first came across the term).
It feels the same in that for both cases we have a function from natural numbers (number of grains of sand in our potential heap, or number of people voting/buying meat) to some other set (boolean ‘heap’ vs ‘not heap’, or winner of election, or number of animals harmed). And the point is that mathematically, this function must at some point change with the addition of a single +1 to the input, or it can never change at all. Moreover, the sum of the expected value of lots of potential additions must equal the expected value of all of them being applied together, so that if the collective has a large effect, the individual effects can’t be smaller, on average, than the collective effect divided by the number of consituents.
I suppose the point is that this paradox is non-trivial and possibly unsolved when the output is fuzzy (like whether some grains of sand are a heap or not) but trivially true when the output is precise or quantitative (like who wins an election or how many animals are harmed)?

tobycrisford 🔸 30 Dec 2025 10:09 UTC
9 points
2 ∶ 0
on: Unknown Knowns: Five Ideas You Can’t Unsee
I like this post!
The pedantic mathematician in me though didn’t like the concept in the “Intermediate Value Theorem” section being described with that name. I’m not sure that is actually the theorem being relied on here.
A couple of reasons:
- IVT only applies to functions from the real numbers to the real numbers, whereas the example you’re using it on is a mapping from a discrete set to a discrete set (or at a stretch, from rationals to rationals). IVT doesn’t apply there (e.g. the function x → x^2 on the rationals between x=0 and x=2 goes from 0 to 4 without ever passing through 2).
- I don’t think the IVT is “trivially easy”. It’s a theorem that the formal definition of continuity aligns with our intuition (which is not trivial), and really it is only true because the real numbers have been specifically constructed to make it true (it is not true over the rational numbers), and the construction of the real numbers is very weird and non-trivial!
The actual concept you’re relying on in that section is actually a lot simpler I think. It’s essentially just the “paradox of the heap”. As you say, if the value of N things is V, and the value of N+M things is W, then the sum of the changes of each addition to N must add to (W-V). You don’t need calculus for that. It’s not the Intermediate Value Theorem. It really is just “trivially easy” I think. I’m not sure if that theorem has a name.

tobycrisford 🔸 30 Nov 2025 9:04 UTC
10 points
2 ∶ 1
on: Why Does EA Focus on Veganism but Not Other Boycotts?
There is a big difference between veganism and most(?) other boycott campaigns. Every time you purchase an animal product then you are causing significant direct harm (in expectation, if you accept the vegan argument). This is because if demand for animal products increases by 1, then we should expect some fraction more of that product to be produced to meet that demand, on average (the particular fraction depending on price elasticity, since you also raise prices a bit which puts other consumers off).
A lot of other boycott campaigns aren’t like this. For example, take the boycott of products which have been tested on animals. Here you don’t do direct harm with each purchase in the same way (or at least if you do, it is probably orders of magnitude less). Instead, the motivation is that if enough people start acting like this, it will lead to policy change.
In the first case, it doesn’t matter if no one else in the world agrees with you, participating in the boycott can still do significant good. In the second case, a large number of people are required in order for it to have meaningful impact. It makes sense that impact minded EAs are more inclined to support a boycott of the first kind.
I think a lot of your examples probably fall under the second kind (though not all). And I think that’s a big part of the answer to your question. Also, for at least some of the ones in the first kind, I think most EAs probably just disagree with the fundamental argument. For example, the environmental impact of using LLMs isn’t actually that bad: https://andymasley.substack.com/p/a-cheat-sheet-for-conversations-about.
What links here?
- Jackson Wagner's comment on Why Does EA Focus on Veganism but Not Other Boycotts? by satelliteprocess (5 Dec 2025 10:07 UTC; 9 points)

tobycrisford 🔸 24 Nov 2025 18:07 UTC
6 points
0 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Unsolved research problems on the road to AGI
I don’t disagree with much of this comment (to the extent that it puts o3′s achievement in its proper context), but I think this is still inconsistent with your original “no progress” claim (whether the progress happened pre or post o3′s ARC performance isn’t really relevant). I suppose your point is that the “seed of generalization” that LLMs contain is so insignificant that it can be rounded to zero for practical purposes? That was true pre o3 and is still true now? Is that a fair summary of your position? I still think “no progress” is too bold!
But in addition, I think I also disagree with you that there is nothing exciting about o3′s ARC performance.
It seems obvious that LLMs have always had some ability to generalize. Any time that they produce a coherent response that has not appeared verbatim in their training data, they are doing some kind of generalisation. And I think even Chollet has always acknowledged that too. I’ve heard him characterize LLMs (pre ARC success) as combining dense sampling of the problem space with an extremely weak ability to generalize, contrasting that with the ability of humans to learn from only a few examples. But there is still an acknowledgement here that some non-zero generalization is happening.
But if this is your model of how LLMs work, that their ability to generalize is extremely weak, then you don’t expect them to be able to solve ARC problems. They shouldn’t be able to solve ARC problems even if they had access to unlimited inference time compute. Ok, so o3 had 1,024 attempts at each task, but that doesn’t mean it tried the task 1,024 times until it hit on the correct answer. That would be cheating. It means it tried the task 1,024 times and then did some statistics on all of its solutions before providing a single guess, which turned out to be right most of the time!
I think it is surprising and impressive that this worked! This wouldn’t have worked with GPT-3. You could have given it chain of thought prompting, let it write as much as it wanted per attempt, and given it a trillion attempts at each problem, but I still don’t think you would expect to find the correct answer dropping out at the end. In at least this sense, o3 was a genuine improvement in generalization ability.
And Chollet thought it was impressive too, describing it as a “genuine breakthrough”, despite all the caveats that go with that (that you’ve already quoted).
When LLMs can solve a task, but only with masses of training data, then I think it is fair to contrast their data efficiency with that of humans and write off their intelligence as memorization rather than generalization. But when they can only solve a task by expending masses of inference time compute, I think it is harder to write that off in the same way. Mainly because: we don’t really know how much inference time compute humans are using! (I don’t think? Unless we understand the brain a lot better than I thought we did). I wouldn’t be surprised at all if we find that AGI requires spending a lot of inference-time compute. I don’t think that would make it any less AGI.
The exteme inference time compute costs are really important context to bear in mind when forecasting how AI progress is going to go, and what kinds of things are going to be possible. But I don’t think it provides a reason to describe the intelligence as not “general”, in the way that extreme data inefficiency does.

tobycrisford 🔸 24 Nov 2025 10:08 UTC
5 points
1 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Unsolved research problems on the road to AGI
Because the ARC benchmark was specifically designed to be a test of general intelligence (do you disagree that it successfully achieves this?) and because each problem takes the form of requiring you to spot a pattern from only a couple of examples.

tobycrisford 🔸 24 Nov 2025 6:54 UTC
3 points
1 ∶ 0
on: Unsolved research problems on the road to AGI
LLMs have made no progress on any of these problems.
I think this probably overstates things? For example, o3 was able to achieve human level performance on ARC-AGI-1, which I think counts as at least some kind of progress on the problems of generalization and data efficiency?

tobycrisford 🔸 18 Nov 2025 7:34 UTC
16 points
0 ∶ 0
in reply to: Connacher Murphy 🔸’s comment on: A major flaw in the Forecasting Research Institute’s “Longitudinal Expert AI Panel” survey
This is a really great exchange, and thank you for responding to the post.
I just wanted to leave a quick comment to say: It seems crazy to me that someone would say the “slow” scenario has “already been achieved”!
Unless I’m missing something, the “slow” scenario says that half of all freelance software engineering jobs taking <8 hours can be fully automated, that any task a competent human assistant can do in <1 hour can be fully automated with no drop in quality (what if I ask my human assistant to solve some ARC-2 problems for me?), that the majority of customer complaints in a typical business will be fully resolved by AI in those businesses that use it, and that AI will be capable of writing hit songs (at least if humans aren’t made aware that it is AI-generated)?

I suppose the scenario is framed only to say that AI is capable of all of the above, rather than that it is being used like this in practice. That still seems like an incorrect summary of current capability to me, but is slightly more understandable. But in that case, it seems the scenario should have just been framed that way: “Slow progress: No significant improvement in AI capabilities from 2025, though possibly a significant increase in adoption”. There could then be a separate question on what people think about the level that current capabilities are at?
Otherwise disagreements about current capabilities and progress are getting blurred in the single question. Describing the “slow” scenario as “slow” and putting it at the extreme end of the spectrum is inevitably priming people to think about current capabilities in a certain way. Still struggling to understand the point of view that says this is an acceptable way to frame this question.
What links here?
- Yarrow Bouchard 🔸's comment on A major flaw in the Forecasting Research Institute’s “Longitudinal Expert AI Panel” survey by Yarrow Bouchard 🔸 (18 Nov 2025 19:33 UTC; 2 points)

tobycrisford 🔸 14 Nov 2025 17:08 UTC
12 points
4 ∶ 2
on: A major flaw in the Forecasting Research Institute’s “Longitudinal Expert AI Panel” survey
This post is getting some significant downvotes. I would be interested if someone who has downvoted could explain the reason for that.
There’s plenty of room for disagreement on how serious a mistake this is, whether it has introduced a ‘framing’ bias into other results or not, and what it means for the report as a whole. But it just seems straightforwardly true that this particular question is phrased extremely poorly (it seems disingenuous to suggest that the question using the phrasing “best matching” covers you for not even attempting to include the full range of possibilities in your list).
I assume that people downvoting are objecting to the way that this post is using this mistake to call the entire report into question, with language like “major flaw”. They may have a point there. But I think you should have a very high bar for downvoting someone who is politely highlighting a legitimate mistake in a piece of research.
‘Disagree’ react to the ‘major flaw’ language if you like, and certainly comment your disagreements, but silently downvoting someone for finding a legitimate methodological problem in some EA research seems like bad EA forum behaviour to me!
What links here?
- Yarrow Bouchard 🔸's comment on NickLaing’s Quick takes by NickLaing (18 Nov 2025 2:15 UTC; 3 points)

tobycrisford 🔸 12 Nov 2025 18:20 UTC
10 points
3 ∶ 1
on: Effective Altruists Should Promote Frugality More
I think I broadly agree with this.
I am very confused about your number 1 con though! Why would promoting frugality be perceived as the rich promoting their own interests over those of the poor? Isn’t it exactly the other way around?
To the extent that EA is comfortable with people spending large sums of their money on unnecessary things, I think it is open to the ‘elitism’ criticism (think of the discussion around SBF’s place in the bahamas). People can justifiably argue: “it is easy to say we should all be donating a lot to charity when you are so rich that you will still have enough left over to live in luxury!”.
But if EA advocates frugality for everyone, including the super rich, then this seems like a powerful response to the elitism criticism. I would have put this near the top of the pros list!

tobycrisford 🔸 5 Nov 2025 21:39 UTC
8 points
0 ∶ 1
on: The Last Stop on the Crazy Train
I don’t think longtermism is a nice solution to this problem. If you’re open to letting astronomically large but unlikely scenarios dominate your expected value calculations, then I don’t think this rounds out nicely to simply “reduce existential risk”. The more accurate summary would be: reduce existential risk according to a worldview in which astronomical value is possible, which is likely to lead to very different recommendations than if you were to attempt to reduce existential risk unconditionally.
https://forum.effectivealtruism.org/posts/RCmgGp2nmoWFcRwdn/should-strong-longtermists-really-want-to-minimize

tobycrisford 🔸 5 Nov 2025 7:12 UTC
1 point
0 ∶ 0
in reply to: Davidmanheim’s comment on: Will Welfareans Get to Experience the Future?
It sounds like MichaelDickens’ reply is probably right, that we don’t need to consider identical experiences in order for this argument to go through.
But the question of whether identical copies of the same experience have any additional value is a really interesting one. I used to feel very confident that they have no value at all. I’m now a lot more uncertain, after realising that this view seems to be in tension with the many worlds interpretation of quantum mechanics: https://www.lesswrong.com/posts/bzSfwMmuexfyrGR6o/the-ethics-of-copying-conscious-states-and-the-many-worlds

tobycrisford 🔸 29 Oct 2025 18:47 UTC
1 point
0 ∶ 0
in reply to: Joey Marcellino’s comment on: Some thoughts on fanaticism
It seems very strange to me to treat reducing someone’s else chance of X differently to reducing your own (if you’re confident it would affect each of you similarly)! But thank you for engaging with these questions, it’s helping me understand your position better I think.
By ‘collapsing back to expected utility theory’ I only meant that if you consider a large enough reference class of similar decisions, it seems like it will in practice be the same as acting as if you had an extremely low discount threshold? But it sounds like I may just not have understood the original approach well enough.

tobycrisford 🔸 28 Oct 2025 18:33 UTC
3 points
0 ∶ 0
on: Is “cage free” really the most humane option for egg-laying hens?
Thanks for writing this up! I’d be interested to read a response to these points.