This is very helpful, thanks!
tobycrisford đ¸
If youâre correct in the linked analysis, this sounds like a really important limitation in ACEâs methodology, and Iâm very glad youâve shared this!
In case anyone else has the same confusion as me when reading your summary: I think there is nothing wrong with calculating a charityâs cost effectiveness by taking the weighted sum of the cost-effectiveness of all of their interventions (weighted by share of total funding that intervention receives). This should mathematically be the same as (Total Impact /â Total cost), and so should indeed go up if their spending on a particular intervention goes down (while achieving the same impact).
The (claimed) cause of the problem is just that ACEâs cost-effectiveness estimate does not go up by anywhere near as much as it should when the cost of an intervention is reduced, leading the cost-effectiveness of the charity as a whole to actually change in the wrong direction when doing the above weighted sum!
If this is true it sounds pretty bad. Would be interested to read a response from them.
Of course, the other thing that could be going on here, is that average cost-effectiveness is not the same as cost-effectiveness on the margin, which is presumably what ACE should care about. Though I donât see why an intervention representing a smaller share of a charityâs expenditure should automatically mean that this is not where extra dollars would be allocated. The two things seem independent to me.
I would be very interested to read a summary of what Tyler Cowen means by all this!
I know it was left as an exercise for the reader, but if someone wants to do the work for me it would be appreciated :)
Makes sense, thank you for the reply and clarification!
This is a fascinating summary!
I have a bit of a nitpicky question on the use of the phrase âconfidence intervalsâ throughout the report. Are these really supposed to be interpreted as confidence intervals? Rather than the Bayesian alternative, âcredible intervalsâ..?
My understanding was that the phrase âconfidence intervalâ has a very particular and subtle definition, coming from frequentist statistics:
80% Confidence Interval: For any possible value of the unknown parameter, there is an 80% chance that your data-collection and estimation process would produce an interval which contained that value.
80% Credible interval: Given the data you actually have, there is an 80% chance that the unknown parameter is contained in the interval.
From my reading of the estimation procedure, it sounds a lot more like these CIs are supposed to be interpreted as the latter rather than the former? Or is that wrong?
Appreciate this is a bit of a pedantic question, that the same terms can have different definitions in different fields, and that discussions about the definitions of terms arenât the most interesting discussions to have anyway. But the term jumped out at me when reading and so thought I would ask the question!
This is a really interesting post, and I appreciate how clearly it is laid out. Thank you for sharing it! But Iâm not sure I agree with it, particularly the way that everything is pinned to the imminent arrival of AGI.
Firstly, the two assumptions you spell out in your introduction, that AGI is likely only a few years away, and that it will most likely come from scaled up and refined versions of moden LLMs, are both much more controversial than you suggest (I think)! (Although Iâm not confident they are false either)
But even if we accept those assumptions, the third big assumption here is that we can alter a superintelligent AGIâs values in a predictable and straightforward way by just adding in some synethetic training data which expresses the views we like, when building some of its component LLMs. This seems like a strange idea to me!
If we removed some concept from the training data completely, or introduced a new concept that had never appeared otherwise, then I can imagine that having some impact on the AGIâs behaviour. But if all kinds of content are included in significant quantities anyway, then i find it hard to get my head around the inclusion of additional carefully chosen synthetic data having this kind of effect. I guess it clashes with my understanding of what a superintelligent AGI means, to think that its behaviour could be altered with such simple manipulation.
I think an important aspect of this is that even if AGI does come from scaling up and refining LLMs, it is not going to just be a LLM in a straightforward definition of that term (i.e. something that communicates by generating each word with a single forward pass through a neural network). At the very least it must also have some sort of hidden internal monologue where it does chain of thought reasoning, and stores memories, etc.
But I donât know much about AI alignment, so would be very interested to read and understand more about the reasoning behind this third assumption.
All that said, even ignoring AGI, LLMs are likely going to be used more and more in peopleâs every day lives over the next few years, so training them to express kinder views towards animal seems like a potentially worthwhile goal anyway. I donât think AGI needs to come into it!
I agree that we can imagine a similar scenario where your identity is changed to a much lesser degree. But Iâm still not convinced that we can straightforwardly apply the Platinum rule to such a scenario.
If your subjective wellbeing is increased after taking the pill, then one of the preferences that must be changed is your preference not to take the pill. This means that when we try to apply the Platinum rule: âtreat others as they would have us treat themâ, we are naturally led to ask: âas they would have us treat them when?â If their preference to have taken the pill after taking it is stronger than their preference not to take the pill before taking it, the Platinum rule becomes less straightforward.
I can imagine two ways of clarifying the rule here, to explain why forcing someone to take the pill would be wrong, which you already allude to in your post:
We should treat others as they would have us treat them at the time we are making the decision. But this would imply that if someoneâs preferences are about to naturally, predictably, change for the rest of their life, then we should disregard that when trying to decide what is best for them, and only consider what they want right now. This seems much more controversial than the original statement of the rule.
We should treat others as they would have us treat them, considering the preferences they would have over their lifetime if we did not act. But this would imply that if someone was about to eat the pill by accident, thinking it was just a sweet, and we knew it was against their current wishes, then we should not try to stop them or warn them. This would create a very odd action/âinaction distinction. Again, this seems much more controversial than the original statement of the rule.
In the post you say the Platinum rule might be the most important thing for a moral theory to get right, and I think I agree with you on this. It is something that seems so natural and obvious that I want to take it as a kind of axiom. But neither of these two extensions to it feel this obvious any more. They both seem very controversial.
I think the rule only properly makes sense when applied to a person-moment, rather than to a whole person throughout their life. If this is true, then I think my original objection still applies. We arenât dealing with a situation where we can apply the platinum rule in isolation. Instead, we have just another utilitarian trade-off between the welfare of one (set of) person(-moments) and another.
This was a really thought-provoking read, thank you!
I think I agree with Richard Chappellâs comment that: âthe more you manipulate my values, the less the future person is meâ.
In this particular case, if I take the pill, my preferences, dispositions, and attitudes are being completely transformed in an instant. These are a huge part of what makes me who I am, so I think that after taking this pill I would become a completely different person, in a very literal sense. It would be a new person who had access to all of my memories, but it would not be me.
From this point of view, there is no essential difference between this thought experiment, and the common objection to total utilitarianism where you consider killing one person and replacing them with someone new, so that total well-being is increased.
This is still a troubling thought experiment of course, but I think it does weaken the strength of your appeal to the Platinum rule? We are no longer talking about treating a person differently to how they would want to be treated, in isolation. We just have another utilitarian thought experiment where we are considering harming person X in order to benefit a different person Y.
And I think my response to both thought experiments is the same. Killing a person who does not want to be killed, or changing the preferences of someone who does not want them changed, does a huge amount of harm (at least on a preference-satisfaction version of utilitarianism), so the assumption in these thought experiments that overall preference satisfaction is nevertheless increased is doing a lot of work, more work than it might appear at first.
I really like this thought experiment, thank you for sharing!
Personally, I agree with you, and I think the answer to your headline question is: yes! Your reasoning makes sense to me anyway. (At least if we donât combine the Self-Sampling Assumption with another assumption like the Self-Indication Assumption as well).
I think that your example is essentially equivalent to the Doomsday argument, or the Adam+Eve paradox, see here: https://ââanthropic-principle.com/ââpreprints/ââcau/ââparadoxes But I like that your thought experiment really isolates the key problem and puts precise numbers on it!
I havenât digested the full paper yet, but based on the summary pasted below, this is precisely the claim I was trying to argue for in the âAgainst Anthropic Shadowâ post of mine that you have linked.
It looks like this claim has been fleshed out in a lot more detail here though, and Iâm looking forward to reading it properly!
In the post you linked I also went on quite a long digression trying to figure out if it was possible to rescue Anthropic Shadow by appealing to the fact that there might be large numbers of other worlds containing life (this plausibly weakens the strength of evidence provided by A, which may then stop the cancellation in C). I decided it technically was possible, but only if you take a strange approach to anthropic reasoning, with a strange and difficult-to-define observer reference class.
Possibly focusing so much on this digression was a mistake though, since the summary above is really pointing to the important flaw in the original argument!
This is a fantastic answer, thank you!
I think (2) is the relevant one here. Maybe in the not too distant future there will be a massive shift in global public opinion, and the farming of animals (at least at industrial scale) will become a thing of the past. If you think most farmed animals lead lives so bad that they would be better off not being born, then the impact of this change would be huge. (And if youâre a non-consequentialist vegan who doesnât like to view the issue in these terms, then itâs harder to quantify the impact, but you probably care even more about doing everything possible to make this scenario happen)
I think this is what is hoped for by the vegans who prioritise outreach. The idea would be that outreach either increases the probability of this scenario becoming reality, or it means that this scenario happens sooner than it otherwise would. I think this is a conceivable way that vegan outreach could have the kind of huge, hard to measure, benefit youâre talking about.
Of course thereâs a whole argument to be had here. Iâm sure lots of people would find this scenario so implausible as to not be worth considering (or they would think it will only happen if and when we get good cheap lab grown meat, or that we canât do anything to influence if and when it happens⌠etc).
I wasnât really trying to start that argument with this question, but just asking what someone who wants to give some weight to this argument in their donations should do.
Sure, but once youâve assumed that already, you donât need to rely any more on an argument about shifts to P(X_1 > x) being cancelled out by shifts to P(X_n > x) for larger n (which if I understand correctly is the argument youâre making about existential risk).
If P(X_N > x) is very small to begin with for some large N, then it will stay small, even if we adjust P(X_1 > x) by a lot (we canât make it bigger than 1!) So we can safely say under your assumption that adjusting the P(X_1 > x) factor by a large amount does influence P(X_N > x) as well, itâs just that it canât make it not small.
The existential risk set-up is fundamentally different. We are assuming the future has astronomical value to begin with, before we intervene. That now means non-tiny changes to P(Making it through the next year) must have astronomical value too (unless there is some weird conspiracy among the probability of making it through later years which precisely cancels this out, but that seems very weird, and not something you can justify by pointing to global health as an analogy).
I donât see why the same argument holds for global health interventions....?
Why should X_N > x require X_1 > x....?
Thanks a lot for this answer! That sounds very plausible.
I think a lot depends here on whether:
i) We think there may well be a meaningful effect for vegan education initiatives but we canât measure it in a controlled experiment, or
ii) We think there is no meaningful effect for currently popular vegan education initiatives.
(By âmeaningfulâ, I basically mean an effect big enough that I might consider donating, which is admittedly a bit vague)
I think CC makes a good point. Whichever of these possibilities is true, it feels like there is still scope for someone interested in vegan outreach to do something useful with their donations. If (i), then we could fund research into alternative non-experimental ways of comparing existing vegan outreach interventions (EAs are often happy funding things on the basis of weaker evidence than RCTs). If (ii), then we could fund research to investigate alternative kinds of interventions that havenât been considered yet (or has everything been considered?) If unsure between (i) and (ii), we can do both!
Maybe there is already research on these questions that we could use as well. Iâve been doing some more digging and found this survey of vegans, linked to from Faunalytics: https://ââvomad.life/ââsurvey/ââ#about-your-veganism This seems like a decent non-experimental way of finding out which factors might influence someone to go vegan.
On the basis of this survey, maybe some effective vegan outreach interventions would be:
Funding advertising campaigns for Veganuary
Funding the production and/âor marketing of vegan documentaries
Funding the production and/âor marketing of online videos with a vegan message
hroughCorrect me if I am wrong, but I think you are suggesting something like the following. If there is a 99 % chance we are in future 100 (U_100 = 10^100), and a 1 % (= 1 â 0.99) chance we are in future 0 (U_0 = 0), i.e. if it is very likely we are in an astronomically valuable world[1], we can astronomically increase the expected value of the future by decreasing the chance of future 0. I do not agree. Even if the chance of future 0 is decreased by 100 %, I would say all its probability mass (1 pp) would be moved to nearby worlds whose value is not astronomical. For example, the expected value of the future would only increase by 0.09 (= 0.01*9) if all the probability mass was moved to future 1 (U_1 = 9).
The claim you quoted here was a lot simpler than this.
I was just pointing out that if we take an action to increase near-term extinction risk to 100% (i.e. we deliberately go extinct), then we reduce the expected value of the future to zero. Thatâs an undeniable way that a change to near-term extinction risk can have an astronomical effect on the expected value of the future, provided only that the future has astronomical expected value before we make the intervention.
It is not that I expect us to get worse at mitigation.
But this is more or less a consequence of your claims isnât it?
The cost of moving physical mass increases with distance, and I guess the cost of moving probability mass increases (maybe exponentially) with value-distance (difference between the value of the worlds).
I donât see any basis for this assumption. For example, it is contradicted by my example above, where we deliberately go extinct, and therefore move all of the probability weight from U_100 to U_0, despite their huge value difference.
Or I suppose maybe I do agree with your assumption (as canât think of any counter-examples I would actually endorse in practice) I just disagree with how youâre explaining its consequences. I would say it means the future does not have astronomical expected value, not that it does have astronomical value but that we canât influence it (since it seems clear we can if it does).
(If I remember our exchange on the Toby Ord post correctly, I think you made some claim along the lines of: there are no conceivable interventions which would allow us to increase extinction risk to ~100%. This seems like an unlikely claim to me, but itâs also I think a different argument to the one youâre making in this post anyway.)
Hereâs another way of explaining it. In this case the probability p_100 of U_100 is given by the huge product:
P(making it through next year) X P(making it through the year after given we make it through year 1) X âŚ..⌠etc
Changing near-term extinction risk is influencing the first factor in this product, so it would be weird if it didnât change p_100 as well. The same logic doesnât apply to the global health interventions that youâre citing as an analogy, and makes existential risk special.In fact I would say it is your claim (that the later factors get modified too in just such a special way as to cancel out the drop in the first factor) which involves near-term interventions having implausible effects on the future that we shouldnât a priori expect them to have.
Thanks for the explanation, I have a clearer understanding of what you are arguing for now! Sorry I didnât appreciate this properly when reading the post.
So youâre claiming that if we intervene to reduce the probability of extinction in 2025, then that increases the probability of extinction in 2026, 2027, etc, even after conditioning on not going extinct earlier? The increase is such that the chance of reaching the far future is unchanged?
My next question is: why should we expect something like that to be true???
It seems very unlikely to me that reducing near term extinction risk in 2025 then increases P(extinction in 2026 | not going extinct in 2025). If anything, my prior expectation is that the opposite would be true. If we get better at mitigating existential risks in 2025, why would we expect that to make us worse at mitigating them in 2026?
If I understand right, youâre basing this on a claim that we should expect the impact of any intervention to decay exponentially as we go further and further into the future, and youâre then looking at what has to happen in order to make this true. I can sympathise with the intuition here. But I donât agree with how itâs being applied.
I think the correct way of applying this intuition is to say that itâs these quantities which will only be changed negligibly in the far future by interventions we take today:
P(going extinct in far future year X | we reach far future year X) (1)
E(utility in far future year X | we reach year X) (2)
In a world where the future has astronomical value, we obviously can astronomically change the expected value of the future by adjusting near-term extinction risk. To take an extreme example: if we make near-term extinction risk 100%, then expected future value becomes zero, however far into the future X is.
I think asserting that (1) and (2) are unchanged is the correct way of capturing the idea that the effect of interventions tends to wash out over time. That then leads to the conclusion from my original comment.
I think your life-expectancy example is helpful. But I think the conclusion is the opposite of what youâre claiming. If I play Russian Roulette and take an instantaneous risk of death, p, and my current life expectancy is L, then my life expectancy will decrease by pL. This is certainly non-negligible for non-negligible p, even though the time I take the risk over is minuscule in comparison to the duration of my life.
Of course I have changed your example here. You were talking about reducing the risk of death in a minuscule time period, rather than increasing it. Itâs true that that doesnât meaningfully change your life expectancy, but thatâs not because the duration of time is small in relation to your life, itâs because the risk of death in such a minuscule time period is already minuscule!
If we translate this back to existential risk, it does become a good argument against the astronomical cost-effectiveness claim, but itâs now a different argument. Itâs not that near-term extinction isnât important for someone who thinks the future has astronomical value. Itâs that: if you believe the future has astronomical value, then you are committed to believing that the extinction risk in most centuries is astronomically low, in which case interventions to reduce it stop looking so attractive. The only way to rescue the âastronomical cost-effectiveness claimâ is to argue for something like the âtime of perilsâ hypothesis. Essentially that we are doing the equivalent of playing Russian Roulette right now, but that we will stop doing so soon, if we survive.
I think I agree with the title, but not with the argument youâve made here.
If you believe that the future currently has astronomical expected value, then a non-tiny reduction in nearterm extinction risk must have astronomical expected value too.
Call the expected value conditional on us making it through the next year, U.
If you believe future has astronomical expected value, then U must be astronomically large.
Call the chance of extinction in the next year, p.
The expected value contained >1 year in the future is:
p * 0 + (1-p) * U
So if you reduce p by an amount dp, the change in expected value >1 year in the future is:
U * dp
So if U is astronomically big, and dp is not astronomically small, then the expected value of reducing nearterm extinction risk must be astronomically big as well.
The impact of nearterm extinction risk on future expected value doesnât need to be âdirectly guessedâ, as I think you are suggesting? The relationship can be worked out precisely. It is given by the above formula (as long as you have an estimate of U to begin with).
This is similar to the conversation we had on the Toby Ord post, but have only just properly read your post here (been on my to-read list for a while!)
I actually agree that reducing near-term extinction risk is probably not astronomically cost-effective. But I would say this is because U is not astronomically big in the first place, which seems different to the argument you are making here.
A question jumped out at me when reading these results. I should caveat this by emphasizing that I am very much not an expert in this kind of evaluation and this question may be naive.
Is there any seasonal effect on mortality in Malawi? If so, is it ok for the pre-intervention period to be 12-months while the post-intervention period is 18-months?