This is all very interesting, thanks for following up!
tobycrisford đ¸
Something seems especially weird about offsetting your purchase of non-BCC chicken by donating to campaigns to get supermarkets to adopt the BCC.
I think one important consideration missing here: supermarkets respond to campaigners by saying that customers want to buy non-BCC chicken, and they are just doing what their customers want. If you buy non-BCC chicken from them, you make that argument stronger, and the campaignersâ argument weaker.
And I donât think this is necessarily a negligible concern in comparison to the other effects being discussed here, since the mechanism for how your small donation is supposed to help chickens is also by tipping the scales on some corporate campaign and getting a company like a supermarket to make a big change.
I donât deny that my âunlimited time, ink, and paperâ caveat is doing a lot of work in my argument. But we started with a thought experiment that is impossible to implement in practice (simulating a modern digital computer with a pen and paper) so I donât see why my reply canât do the same thing (even if it might require a lot more resources).
I think itâs very unlikely that the human brain requires infinite time and memory to simulate. Even if continuous, you could probably simulate to arbitrary accuracy with a big enough discrete approximation. And the Bekenstein bound suggests there is a finite limit to the amount of information that can exist within a given volume.
As for whether my speed analogy works, I still think it does. Sure, if you pick a frame of reference in which you are stationary, then you continue to have experiences at the normal rate. But that wasnât the frame of reference I was using. I was working in the frame of reference of someone back on Earth, which is an equally valid frame of reference. In those coordinates, every physical process in your brain is getting slowed down (electrical impulses are travelling slower from one side of your brain to the other, chemical reactions are slowing down, etc) and you are having experiences at a slower rate.
If the human brain operates according the known laws of physics, then in principle your brain could be simulated with a pen and paper (at least given unlimited time, ink, and paper), and it would behave identically to the real thing (it would talk and think like you and have all your opinions).
Suppose this was all that existed of you, and your real brain never had existed. Would that mean that you never existed as a conscious being, despite all your thoughts and utterances still being a part of the world? That seems like a much more counter intuitive conclusion to me than biting the bullet on pen+paper simulations having the potential for consciousness.
I donât get why the âmoment of experience taking a thousand yearsâ thing is supposed to be so weird? If we slowed down all the processes in your brain then moments of experience would take longer in physical time. Thatâs not an argument against your consciousness being real. And this isnât a hypothetical. We can literally do that by sending you on a spaceship close to the speed of light, and thatâs exactly what would happen!
That makes a lot of sense, thanks.
Iâm sorry youâve said you regret your engagement, since Iâve found your comments helpful (the link to AISLEâs OpenSSL zero days has shifted my view on this a fair bit).
I guess this whole discussion does just feel like a classic example of âAll debates are bravery debatesâ.
Thanks for the detailed reply, I understand your point clearly now I think!
But $20,000 for *all* of the OpenBSD bugs (not just the published ones) doesnât sound like that much to spend on inference compute to me. If AISLE could have spent the same and made an equally impressive announcement, unearthing enough bugs at once that government ministers around the world start issuing statements about it, then shouldnât they have been able to find the investors to fund that? That would have been incredible publicity for them.
The crux for me seems to be whether they have made equally impressive announcements, as you suggest they might have done. Maybe theyâre just worse at marketing. I donât know enough to evaluate that claim properly, but that does seem the relevant question here: have Anthropic been able to use Mythos to go significantly beyond what the best harnesses could already achieve with existing models for the same inference spend? I thought the answer was a clear yes, and I didnât find the original linked AISLE writeup very convincing at all. Your comment has made me more uncertain, but has still not convinced me, and Iâd be really interested to read something more in depth on that question. (Maybe we also would disagree about what the word âsignificantlyâ means here, since I guess you are acknowledging it probably represents some improvement).
(Also, Iâd push back a bit on your characterization of AI progress. I agree the scaffolding is extremely important, but in my experience the âparadigm shiftsâ in capability over the last two and a half years Iâve been working with them have come from the models)
(And extra comment: the fact that cybersecurity capabilities might not imply imminent superintelligence takeoff seems an entirely independent point that I donât necessarily disagree with)
On the take by AISLE, maybe Iâm missing something here, but if their headline claim was correct (that the harness is more important than the model), shouldnât they have been able to find the vulnerabilities that Anthropic hasnât published? Or find hundreds more similarly impactful ones?
Re-discovering the ones Anthropic had already published seems much less impressive, because there are lots of ways to cheat, and from their write up it sounded to me like they were essentially admitting that they had cheated.
Of course Anthropic could be lying about the existence or significance of the vulnerabilities they havenât published. But they have committed in advance to what those vulnerabilities are (I think they have already made some kind of cryptographic commitment to their unpublished write ups..?) which seems impressive to me.
Either they have used the new model to find significant vulnerabilities in every major OS and browser that are too dangerous to be released, or they havenât. If they have, it seems genuinely scary and impressive (not just marketing hype), because Iâm not aware people working on fancy harnessing have had similar results (or have they?) And if they havenât, then itâs a very weird marketing ploy, because theyâre going to get found out very quickly!
I think this misunderstands what people mean when they compare arguments about the importance of AI safety to a Pascalâs wager.
Pascalâs wager refers to situations where a tiny probability of enormous value seemingly leads to ridiculous conclusions if you try to do naive expected value calculations with it. When people say that strong longtermism is a Pascalâs wager, the âsmall probabilityâ they are talking about is not the probability of extinction, which as you point out, is significant. The small probability is the probability that the future will contain âseptillions of future sapientsâ. That is the probability that is small. And it gets even smaller if the probability of extinction soon is high! So a large probability of extinction this century makes the Pascalâs wager comparison more relevant as a critique of strong longtermism, not less. It is multiplying this small probability by the value of those septillions of potential âsapientsâ that gives you the astronomical value that says existential risk reduction should almost automatically dominate our concerns.
I think youâre completely right to point out that people should care a lot about things which might carry a 10% chance of causing human extinction, even ignoring their stance on longtermism. But some people believe that existential risk has astronomically more value than just the impact it will have on the next few generations, and that therefore tiny changes in the probability of existential risk almost automatically trump any other concern, however small those changes are. When people talk about Pascalâs wager in the context of strong longtermism or AI safety, I think it is this claim that they are challenging, not the claim that we should care about extinction at all. And that criticism is just as valid, actually more valid, if the probability of extinction from AI safety is high (though I of course agree that if there are people who use the Pascalâs Wager argument to dismiss all work on AI risk then they are making a serious mistake).
I agree with your title, but I donât think negative utilitarianism is the answer. I like Toby Ordâs essay on this, âWhy Iâm Not a Negative Utilitarianâ: https://ââwww.amirrorclear.net/ââacademic/ââideas/âânegative-utilitarianism/ââ
On your argument about tradeoffs, people make choices all the time where they accept some very small risk of some very severe suffering in order to increase their happiness by a modest amount. For example: cycling along a busy road to visit their friend. If you say that no amount of happiness can make up for the trauma of being involved in a serious accident, then it seems like you are forced to say that this choice is wrong. That seems like a strange conclusion to me.
Itâs really cool that youâve done this and released the code!
Am I understanding right that the givewell baseline youâre trying to beat used GPT, while your approach uses Claude? How can you be sure that the improvements arenât due to the model choice, rather than the architecture?
Sorry for the very delayed reply to this. I meant to reply at the time and then it slipped my mind!
Yes, youâve summarised my position perfectly, I like those diagrams!
I guess my deeper point was that I wasnât sure there was any meaningful way to say something like âX is twice as painful as Yâ without defining it via choices among gambles or durations. You say for humans it seems real, but does it? I can definitely introspect and discover that X is more painful than Y, but Iâm not sure I can introspect and discover that it is N times as painful. Where does that number come from?
Although as I was thinking more about how to justify this, I started thinking about other sensory experiences, like sound. Is it meaningful to say that âX feels twice as loud as Yâ, in a sense that doesnât have to line up with the intensity of the physical sound wave? And then I remembered my physics lessons from way back, and realised the answer might be yes. I was definitely taught that the reason we measure sound volume on a log scale (decibels) is because it lines up better with our sensory perception of it (you have to square the intensity of the sound wave in order to double the perceived intensity). But if this is true then it means there is some sense in which we can introspect and say âX sounds twice as loud as Yâ, even though the underlying sound wave might not be twice as intense. And if that is the case then maybe this should also be true for pain.
Iâm still very uncertain about this though. If I listened to different sounds and tried to place them on a numerical scale, Iâm not really sure what it is that Iâd actually be doing.
Thank you for your reply and clarification!
If the claim is that the gap between âDisablingâ and âExcruciatingâ should be larger than the gap between âAnnoyingâ and âHurtfulâ, then that makes sense to me, and seems interesting.
But it sounds like this wasnât a numerical scale to begin with? So this again just feels like a claim about how we should go about assigning numbers to those categories (if we need numbers), rather than a claim that pain unpleasantness is âsuperlinearâ in some objective sense?
Defining what a numerical score for pain means seems like a hard problem. From my perspective, it seems like it should be defined so that the being concerned would be indifferent between a day of 2*x and 2 days of x. I think this is the notion you are referring to as âunpleasantnessâ. The question then for any other pain metric is just: âhow well does it measure this?â. Iâm still not sure it makes sense to ask âHow does pain intensity scale with unpleasantness?â, since then we would first have to define a numerical scale for pain intensity in some different way, and Iâm still not sure how we begin to do that?
I suppose there is another ineresting complication here, which is that you could also try to define your pain scale in terms of preferences among gambles. For example, the pain scale should be defined so that a rational being is indifferent between 100% chance of x and a 50% chance of 2*x. And then youâre confronted with the question of whether this should give you the same answer as defining it in terms of preferences among durations. My feeling is that it should be the same (something about personal identity not being a âfurther factâ and applying standard utilitarian aggregation approach to person-moments rather than persons..?) but would be interesting to explore points of view where those two potential scale definitions are different. That doesnât feel quite the same as âintensityâ vs âunpleasantnessâ though. More like two different definitions of âunpleasantnessâ.
Iâm confused about what âsuperlinearityâ is even supposed to mean here.
In the intro you distinguish âunpleasantnessâ and âintensityâ, and say that one grows superlinearly with the other, but how are these two things even defined to begin with? And what is the difference between them? Defining one scale for measuring pain is hard enough, but before we can evaluate this âsuperlinearâ claim we first need to define two!
In the examples with humans, I can see what the claim is. There are at least two ways you could try to define a pain scale: (i) self-report on a scale of 1-10, and (ii) something that more consistently tracked actual preferences with respect to gambles or experiences of different duration, and in this example the claim is that (ii) grows super-linearly with (i).
But this just seems like a claim about the limitations of the self-report 1-10 scale, which is only relevant for humans (think Iâm probably agreeing with the summary of Bob Fischerâs take here).
In the case of non-humans, itâs not that I disagree, but I donât even understand what the claim is that is being made?
If I understand right, the claim youâre making here is that if I give ÂŁ10 to a Givewell charity, I cause Dustin Muskovitz to give ÂŁ10 less to that Givewell charity, and do something else with it instead. What else does he do with it?
Donate it to a different global health charityâOk, doesnât seem like too big a deal, my counterfactual impact is still to move money to a highly effective global health charity
Spend it on himselfâSeems unlikely..?
Donate it to a different cause area, e.g. AI safetyâso while I think I have supported global health, the counterfactual impact is actually to move more money to AI safety.
The second two possibilities seem surprising and important if true, and Iâd be interested to hear more justification for this! Is there some evidence that this is really what happens?
Why do you expect it to be worse environmentally to order online?
If the alternative is driving, it seems much less efficient to have 10 people independently drive to the shop and back than to have one van deliver all their food in a single round trip.
If the alternative is public transport, I guess itâs less clear, but ordering online probably allows bigger shops in that case, which Iâd guess would be more efficient again?
The only way I can see it clearly making things worse is if the alternative is walking to the shops. But in that case, Iâd still guess that the environmental costs of the products themselves would be much more important than the environmental costs of their transport (just because this is a claim that seems to be made a lot, and I think must factor in the transport costs of getting it from the shop to your home as well!)
Maybe, although an election being tied is about the only way that particular example can be fuzzy, and there is a well defined process for what happens in that situation (like flipping a coin). There is ultimately only one winner, and it is possible for a single vote to make the difference.
Whether an experience is painful or not is extremely unclear, but if your metric is just something like ânumber of animals killed for meat each yearâ then again that is something well defined and precise, and it must in principle be possible to change it with an individual purchase.
Ironically I might also be guilty of using some technical terminology incorrectly here!
I had in mind the discussion on valuing actions with imperceptible effects from the âFive Mistakes in Moral Mathematicsâ chapter in Reasons+Persons (relevant to all the examples mentioned in the IVT section of this post), where if I remember right Parfit makes an explicit comparison with the âparadox of the heapâ (I think this is where I first came across the term).
It feels the same in that for both cases we have a function from natural numbers (number of grains of sand in our potential heap, or number of people voting/âbuying meat) to some other set (boolean âheapâ vs ânot heapâ, or winner of election, or number of animals harmed). And the point is that mathematically, this function must at some point change with the addition of a single +1 to the input, or it can never change at all. Moreover, the sum of the expected value of lots of potential additions must equal the expected value of all of them being applied together, so that if the collective has a large effect, the individual effects canât be smaller, on average, than the collective effect divided by the number of consituents.
I suppose the point is that this paradox is non-trivial and possibly unsolved when the output is fuzzy (like whether some grains of sand are a heap or not) but trivially true when the output is precise or quantitative (like who wins an election or how many animals are harmed)?
I like this post!
The pedantic mathematician in me though didnât like the concept in the âIntermediate Value Theoremâ section being described with that name. Iâm not sure that is actually the theorem being relied on here.
A couple of reasons:
IVT only applies to functions from the real numbers to the real numbers, whereas the example youâre using it on is a mapping from a discrete set to a discrete set (or at a stretch, from rationals to rationals). IVT doesnât apply there (e.g. the function x â x^2 on the rationals between x=0 and x=2 goes from 0 to 4 without ever passing through 2).
I donât think the IVT is âtrivially easyâ. Itâs a theorem that the formal definition of continuity aligns with our intuition (which is not trivial), and really it is only true because the real numbers have been specifically constructed to make it true (it is not true over the rational numbers), and the construction of the real numbers is very weird and non-trivial!
The actual concept youâre relying on in that section is actually a lot simpler I think. Itâs essentially just the âparadox of the heapâ. As you say, if the value of N things is V, and the value of N+M things is W, then the sum of the changes of each addition to N must add to (W-V). You donât need calculus for that. Itâs not the Intermediate Value Theorem. It really is just âtrivially easyâ I think. Iâm not sure if that theorem has a name.
There is a big difference between veganism and most(?) other boycott campaigns. Every time you purchase an animal product then you are causing significant direct harm (in expectation, if you accept the vegan argument). This is because if demand for animal products increases by 1, then we should expect some fraction more of that product to be produced to meet that demand, on average (the particular fraction depending on price elasticity, since you also raise prices a bit which puts other consumers off).
A lot of other boycott campaigns arenât like this. For example, take the boycott of products which have been tested on animals. Here you donât do direct harm with each purchase in the same way (or at least if you do, it is probably orders of magnitude less). Instead, the motivation is that if enough people start acting like this, it will lead to policy change.
In the first case, it doesnât matter if no one else in the world agrees with you, participating in the boycott can still do significant good. In the second case, a large number of people are required in order for it to have meaningful impact. It makes sense that impact minded EAs are more inclined to support a boycott of the first kind.
I think a lot of your examples probably fall under the second kind (though not all). And I think thatâs a big part of the answer to your question. Also, for at least some of the ones in the first kind, I think most EAs probably just disagree with the fundamental argument. For example, the environmental impact of using LLMs isnât actually that bad: https://ââandymasley.substack.com/ââp/ââa-cheat-sheet-for-conversations-about.
I may be misunderstanding something in the argument, but this potential solution is I think already discussed in detail in Reasons+Persons, where Parfit introduces the repugnant conclusion? He calls it the âlexical valueâ solution, I think? And I donât think this write up addresses any of the strong arguments that Parfit makes against it there?
In Parfitâs original argument, he doesnât just rely on an appeal to intuition that B is better than A. Instead we move to B from A in two steps: first we go to A+ (more people having lives just worth living, but original population unaffected). Parfit claims that A+ is not worse than A. And then we move to world B, where we are now significantly increasing the welfare of already existing people, with only a small drop in the welfare of the original population. Parfit claims that B is better than A+.
If we reject that A+ is not worse than A, then we have to say it is sometimes wrong to create new people, even if their lives are worth living. This seems very strange. If we reject that B is better than A, then we have to believe that a small drop in welfare for people whose welfare is already very high can outweigh a large increase in welfare for people whose welfare is very low. This is the bullet that I think the lexical value solution bites?
But this is a tough bullet to bite! It is the opposite of egalitarian. It says we should prioritise changes in the welfare of already existing high welfare people (whose welfare is above the lexical level) over changes in the welfare of also already existing low welfare people (welfare below the lexical level).