It is a well-known argument that diversification of charitable giving is hard to justify philosophically [1, 2, 3]. Givewell, probably the gold standard in terms of impact maximisation, mostly suggests 4 top charities [4]. Arguably, this is either a fairly big number (bigger than 1) or a fairly small number (not at all diversified, if diversification matters). Obviously, this number will have practical reasons, related to available opportunities, real-world constraints and operational considerations. However, it prompted me to think about the theoretical foundations of what I actually think I should do, which I would like to write up here.
In this post, I am firstly aiming to make the non-diversification argument somewhat rigorous without obscuring it too much with math. Secondly I would like to explain, in the spirit of Bayesian model averaging, my current thinking of why I still like diversification somewhat more than these results suggest.
The latter is somewhat ad-hoc. One aim is to get pushback and further directions on this. I suspect my thinking will be related to some commonly held views and to philosophical arguments around worldview diversification (?). Perhaps there is a simple refutation I am not aware of.
Setup and the “fully concentrated” argument
Assume I have one unit of resources/wealth that I want to give/donate effectively. Let there be giving opportunities 1 to N, and wi>0 will be my allocation to the ith, where ∑iwi=1.
The marginal expected utility/goodness/… of me allocating wi to i will be uiwi. The total expected utility will be E[u⋅w]=E[∑iuiwi]=E[u]⋅w.
(Note that it is very plausible that this is linear, because my donations are relatively small and donation opportunities have a lot of capacity – so even if you don’t believe in linear utility you can just Taylor expand.)
To maximise the impact my donation has, I need to solve w∗=argmaxwE[u]⋅w. Clearly, the solution is to allocate everything to the largest entry of the vector E[u], which has index k. So w∗k=1 and all others 0.
So, I should find the most worthwhile cause and allocate solely to it. This somewhat counterintuitive in some cases – what if there are two almost indistinguishable causes and I’m very uncertain which one is better? Should I not just diversify across both, if practical?
That is, this method seems very non-robust in the sense that as soon as my belief about the ranking of opportunities updates slightly, I will fully switch from k to another opportunity going forward. At the core of this is that “ranking” is an extremely non-linear tranformation of very “noisy” and continuous variables like lifes saved per dollar.
Why giving is different from portfolio optimisation
In portfolio optimisation, eg in finance, I would typically solve something like argmaxw(E[u]⋅w−λ2w′E[uu′]w+h.o.),
where the second and higher order terms capture interactions and non-linearities related to risk aversion. This is not an irrational bias in investing, cf “volatility drag”.
However, risk aversion is not justifiable in charitable giving – it would only serve to make me feel better and reduce the impact [3].
Interactions are also mostly relatively implausible because giving opportunities are very different, eg geographically separated.
If I deviate from allocating all to the best opportunity, I have to have some other reason.
Pulling some parameters out of the optimisation problem
The trick
I don’t know the utility of different opportunities and their probability distributions for sure, so let’s say there is a (hyper ?) parameter θ∈Ω that captures different models of the world, including my model around the uncertainty of my allocations’ impact.
The aim is for this to try and reflect epistemic humility in some way and get rid of the hard cut-off that ranking creates.
For example, this could capture some of the model uncertainty around plausible distribution of lives saved per money per charity, but this could also capture moral uncertainty around what opportunity has which utility (or equivalently “choiceworthiness”).
This means that I’ve now split the expectation into E[u]=∫ΩEθ[u]dP(θ). (This is just the law of iterated expectations.)
If I just solve the maximisation problem above, the result will be the same.
However, if I instead want to get a “model probability weighted best allocation”, I need to first solve the problem for each value of θ and then average.
So, I compute w∗=∫Ωargmaxw(Eθ[u]⋅w)dP(θ).
This has the solution w∗i=pi where pi is the probability that i is the best allocation opportunity over the whole space of possible probability weighted models. In other words, if I was randomly placed in a world according to P(θ), optimised, and reported back the results, this would be the average.
Conveniently, allocations are linear, so I can just take the probability weighted optimal allocation across models of the world instead of the optimum across all models of the world.
What does this mean?
In this setting, I should allocate proportionally to the probability that the model where the cause is the most worthwhile in expectation is true.
This has some nice properties. Allocating proportionally to this kind of probability means that I still very strongly favour the best options in the typical sense. Opportunities that are not the best in any possible model of the world will get zero allocation. At the same time, charities that could plausibly be the best will get some allocation.
For example, if I pull some of the uncertainty around which of GiveWell’s top charities are slightly more or less effective out of the optimisation problem, I will allocate close to equally across them. If GiveWell introduced another recommended top charity that I think is only 5% likely to be the best among plausible world views, it seems reasonable to allocate 5% to it.
Similarly, if there are two very contradictory and extreme world views that I give some mildly different credence to, with this approach my allocation might be close to zero instead of “all in” on one of them.
Is this justified?
I don’t know, but I think it is somewhat distinct from risk aversion-type arguments, and also different from a minimax-type argument. I also think this is similar to what many people do in practice.
Admittedly, it is fairly arbitrary which parameters I “pull outside” of the optimisation problem and hard to do this rigorously. However, it seems intuitively reasonable to average what I do over some possible models of the world and not maximise my impact in the average model of the world.
Another concern is that it could be some kind of “risk aversion introduced through the backdoor”.
In the thought experiment in [3] (originally from [5]), for x<10 this method would allocate 50:50 between option A and B with a total expected “goodness” of 15+x/2, which is worse than 15+x, but better than 15. For large x>10 it flips to always preferring B, which is also reasonable.
Pragmatic Bayesian diversification of giving
It is a well-known argument that diversification of charitable giving is hard to justify philosophically [1, 2, 3]. Givewell, probably the gold standard in terms of impact maximisation, mostly suggests 4 top charities [4]. Arguably, this is either a fairly big number (bigger than 1) or a fairly small number (not at all diversified, if diversification matters). Obviously, this number will have practical reasons, related to available opportunities, real-world constraints and operational considerations. However, it prompted me to think about the theoretical foundations of what I actually think I should do, which I would like to write up here.
In this post, I am firstly aiming to make the non-diversification argument somewhat rigorous without obscuring it too much with math. Secondly I would like to explain, in the spirit of Bayesian model averaging, my current thinking of why I still like diversification somewhat more than these results suggest.
The latter is somewhat ad-hoc. One aim is to get pushback and further directions on this. I suspect my thinking will be related to some commonly held views and to philosophical arguments around worldview diversification (?). Perhaps there is a simple refutation I am not aware of.
Setup and the “fully concentrated” argument
Assume I have one unit of resources/wealth that I want to give/donate effectively. Let there be giving opportunities 1 to N, and wi>0 will be my allocation to the ith, where ∑iwi=1.
The marginal expected utility/goodness/… of me allocating wi to i will be uiwi. The total expected utility will be E[u⋅w]=E[∑iuiwi]=E[u]⋅w.
(Note that it is very plausible that this is linear, because my donations are relatively small and donation opportunities have a lot of capacity – so even if you don’t believe in linear utility you can just Taylor expand.)
To maximise the impact my donation has, I need to solve w∗=argmaxwE[u]⋅w. Clearly, the solution is to allocate everything to the largest entry of the vector E[u], which has index k. So w∗k=1 and all others 0.
So, I should find the most worthwhile cause and allocate solely to it. This somewhat counterintuitive in some cases – what if there are two almost indistinguishable causes and I’m very uncertain which one is better? Should I not just diversify across both, if practical?
That is, this method seems very non-robust in the sense that as soon as my belief about the ranking of opportunities updates slightly, I will fully switch from k to another opportunity going forward. At the core of this is that “ranking” is an extremely non-linear tranformation of very “noisy” and continuous variables like lifes saved per dollar.
Why giving is different from portfolio optimisation
In portfolio optimisation, eg in finance, I would typically solve something like argmaxw(E[u]⋅w−λ2w′E[uu′]w+h.o.),
where the second and higher order terms capture interactions and non-linearities related to risk aversion. This is not an irrational bias in investing, cf “volatility drag”.
However, risk aversion is not justifiable in charitable giving – it would only serve to make me feel better and reduce the impact [3].
Interactions are also mostly relatively implausible because giving opportunities are very different, eg geographically separated.
If I deviate from allocating all to the best opportunity, I have to have some other reason.
Pulling some parameters out of the optimisation problem
The trick
I don’t know the utility of different opportunities and their probability distributions for sure, so let’s say there is a (hyper ?) parameter θ∈Ω that captures different models of the world, including my model around the uncertainty of my allocations’ impact.
The aim is for this to try and reflect epistemic humility in some way and get rid of the hard cut-off that ranking creates.
For example, this could capture some of the model uncertainty around plausible distribution of lives saved per money per charity, but this could also capture moral uncertainty around what opportunity has which utility (or equivalently “choiceworthiness”).
This means that I’ve now split the expectation into E[u]=∫ΩEθ[u]dP(θ).
(This is just the law of iterated expectations.)
If I just solve the maximisation problem above, the result will be the same.
However, if I instead want to get a “model probability weighted best allocation”, I need to first solve the problem for each value of θ and then average.
So, I compute w∗=∫Ωargmaxw(Eθ[u]⋅w)dP(θ).
This has the solution w∗i=pi where pi is the probability that i is the best allocation opportunity over the whole space of possible probability weighted models. In other words, if I was randomly placed in a world according to P(θ), optimised, and reported back the results, this would be the average.
Conveniently, allocations are linear, so I can just take the probability weighted optimal allocation across models of the world instead of the optimum across all models of the world.
What does this mean?
In this setting, I should allocate proportionally to the probability that the model where the cause is the most worthwhile in expectation is true.
This has some nice properties. Allocating proportionally to this kind of probability means that I still very strongly favour the best options in the typical sense. Opportunities that are not the best in any possible model of the world will get zero allocation. At the same time, charities that could plausibly be the best will get some allocation.
For example, if I pull some of the uncertainty around which of GiveWell’s top charities are slightly more or less effective out of the optimisation problem, I will allocate close to equally across them. If GiveWell introduced another recommended top charity that I think is only 5% likely to be the best among plausible world views, it seems reasonable to allocate 5% to it.
Similarly, if there are two very contradictory and extreme world views that I give some mildly different credence to, with this approach my allocation might be close to zero instead of “all in” on one of them.
Is this justified?
I don’t know, but I think it is somewhat distinct from risk aversion-type arguments, and also different from a minimax-type argument. I also think this is similar to what many people do in practice.
Admittedly, it is fairly arbitrary which parameters I “pull outside” of the optimisation problem and hard to do this rigorously. However, it seems intuitively reasonable to average what I do over some possible models of the world and not maximise my impact in the average model of the world.
Another concern is that it could be some kind of “risk aversion introduced through the backdoor”.
In the thought experiment in [3] (originally from [5]), for x<10 this method would allocate 50:50 between option A and B with a total expected “goodness” of 15+x/2, which is worse than 15+x, but better than 15. For large x>10 it flips to always preferring B, which is also reasonable.
References
[1] https://forum.effectivealtruism.org/posts/wHaj9zpoeraQqBG9H/statistical-foundations-for-worldview-diversification
[2] https://slate.com/culture/1997/01/giving-your-all.html
[3] https://www.lesswrong.com/posts/h7GSWD2eLXygdtuhZ/against-diversification
[4] https://www.givewell.org/charities/top-charities
[5] https://globalprioritiesinstitute.org/on-the-desire-to-make-a-difference-hilary-greaves-william-macaskill-andreas-mogensen-and-teruji-thomas-global-priorities-institute-university-of-oxford/