For a model like this to be effective, we need to choose a good prior belief.
Outside the model, your inchoate ‘prior’ has to include credence in all the models you could be convinced of by evidence.
A model that fixes the distribution of effectiveness by assumption is unable to accommodate evidence that the distribution is otherwise.
For example, evidence that the Earth is old and will be habitable for hundreds of millions of years is evidence that many kinds of impacts on the world may have big long-run effects. Likewise for astronomical evidence of the resources of the Solar System and other stars.
When you take that into account it increases the expected impact of almost every action, e.g. increasing GDP by $1 has astronomical waste implications. It also has implications for the relative effects of different interventions.
To represent this you need to have a model with uncertainty over the shape of the distribution, e.g. a mixture model of multiple candidate distributions whose weights are updated with evidence.
different priors for different interventions...It’s possible to have a larger effect when helping a group that people with power care less about
Likewise, neglectedness of different areas is subject to empirical inquiry, and much of the evidence we collect in prioritization and evaluation bears on it.
This framework treats a certain subset of evidence about neglectedness very differently from other evidence about neglectedness, or other charity features. All other kinds of evidence, including other evidence about neglectedness (market inefficiencies, biases, taboos, new discoveries, whatever) are discounted severely for top performers, becoming negligible after a certain point, while these differences go undiscounted.
Someone else might make a framework with different priors for direct intervention spending, vs lobbying, vs scientific research, which would similarly favor evidence about intervention type over other evidence for top performers.
On your first point, instead of using a single prior distribution I could do a weighted combination of multiple distributions. There are two ways to do this: either have a prior be a combination distribution, or compute multiple posteriors with different distributions and take their weighted average. Not sure which one correctly handles this uncertainty. I haven’t done the math but I’d expect that either way, a formulation with distribution probabilities 90% log-normal/10% Pareto will give much more credence to high cost-effectiveness estimates than a pure log-normal. I don’t believe it would change the results much to assign small probability to distributions with thinner tails than log-normal (e.g. normal or exponential).
On your second point, yeah I’m including some extra information in the prior, which is kinda wishy-washy. I realize this is suboptimal, but it’s better than anything else I’ve come up with, and probably better than not using a quantitative model at all. Do you know a better way to handle this?
On your first point, instead of using a single prior distribution I could do a weighted combination of multiple distributions. There are two ways to do this: either have a prior be a combination distribution, or compute multiple posteriors with different distributions and take their weighted average. Not sure which one correctly handles this uncertainty.
Not sure what you mean by a ‘combination distribution’, but I think something like Carl’s suggestion is correct: have a hierarchical model where the type of distribution over effectiveness that you will use is itself a random variable, which the distribution over effectiveness has as a ‘hyperparameter’. You could also add a level to the hierarchy by having a distribution over the probabilities for each type of distribution. That being said, it might be convenient to fix these probabilities since it’s difficult to put all the evidence you have access to in the model. Probabilistic programming languages are a convenient way to handle such hierarchical models, if you’re interested, I recommend checking out this tutorial for an introduction focussing on applications in psychology.
Not sure what you mean by a ‘combination distribution’
I mean that your prior probability density is given by $P(X) = w_{Pareto} P_{Pareto}(X) + w_{lognorm} P_{lognorm}(X)$ for weights $w$. (You can read LaTeX right?)
Sure. I think a better thing to do (which I think what Carl is suggesting) is to have a prior distribution over x (the effectiveness of a randomly chosen intervention), and interventionDistribution (a categorical distribution over different shapes you think the space of interventions might have). So P(x, ‘Pareto’) = P(‘Pareto’) P(x | ‘Pareto’) = w_{Pareto} P_{Pareto}(x) and P(x, ‘logNormal’) = P(‘logNormal’) P(x | ‘logNormal’) = w_{logNormal} P_{logNormal}(x). Then, for the first intervention you see, your prior density over effectiveness is indeed P(x) = w_{Pareto} P_{Pareto}(x) + w_{logNormal} P_{logNormal}(x), but after measuring a bunch of interventions, you can update your beliefs about the empirical distribution of effectivenesses.
Outside the model, your inchoate ‘prior’ has to include credence in all the models you could be convinced of by evidence.
A model that fixes the distribution of effectiveness by assumption is unable to accommodate evidence that the distribution is otherwise.
For example, evidence that the Earth is old and will be habitable for hundreds of millions of years is evidence that many kinds of impacts on the world may have big long-run effects. Likewise for astronomical evidence of the resources of the Solar System and other stars.
When you take that into account it increases the expected impact of almost every action, e.g. increasing GDP by $1 has astronomical waste implications. It also has implications for the relative effects of different interventions.
To represent this you need to have a model with uncertainty over the shape of the distribution, e.g. a mixture model of multiple candidate distributions whose weights are updated with evidence.
Likewise, neglectedness of different areas is subject to empirical inquiry, and much of the evidence we collect in prioritization and evaluation bears on it.
This framework treats a certain subset of evidence about neglectedness very differently from other evidence about neglectedness, or other charity features. All other kinds of evidence, including other evidence about neglectedness (market inefficiencies, biases, taboos, new discoveries, whatever) are discounted severely for top performers, becoming negligible after a certain point, while these differences go undiscounted.
Someone else might make a framework with different priors for direct intervention spending, vs lobbying, vs scientific research, which would similarly favor evidence about intervention type over other evidence for top performers.
Couple of important points you’re making here.
On your first point, instead of using a single prior distribution I could do a weighted combination of multiple distributions. There are two ways to do this: either have a prior be a combination distribution, or compute multiple posteriors with different distributions and take their weighted average. Not sure which one correctly handles this uncertainty. I haven’t done the math but I’d expect that either way, a formulation with distribution probabilities 90% log-normal/10% Pareto will give much more credence to high cost-effectiveness estimates than a pure log-normal. I don’t believe it would change the results much to assign small probability to distributions with thinner tails than log-normal (e.g. normal or exponential).
On your second point, yeah I’m including some extra information in the prior, which is kinda wishy-washy. I realize this is suboptimal, but it’s better than anything else I’ve come up with, and probably better than not using a quantitative model at all. Do you know a better way to handle this?
Not sure what you mean by a ‘combination distribution’, but I think something like Carl’s suggestion is correct: have a hierarchical model where the type of distribution over effectiveness that you will use is itself a random variable, which the distribution over effectiveness has as a ‘hyperparameter’. You could also add a level to the hierarchy by having a distribution over the probabilities for each type of distribution. That being said, it might be convenient to fix these probabilities since it’s difficult to put all the evidence you have access to in the model. Probabilistic programming languages are a convenient way to handle such hierarchical models, if you’re interested, I recommend checking out this tutorial for an introduction focussing on applications in psychology.
I mean that your prior probability density is given by $P(X) = w_{Pareto} P_{Pareto}(X) + w_{lognorm} P_{lognorm}(X)$ for weights $w$. (You can read LaTeX right?)
Sure. I think a better thing to do (which I think what Carl is suggesting) is to have a prior distribution over x (the effectiveness of a randomly chosen intervention), and interventionDistribution (a categorical distribution over different shapes you think the space of interventions might have). So P(x, ‘Pareto’) = P(‘Pareto’) P(x | ‘Pareto’) = w_{Pareto} P_{Pareto}(x) and P(x, ‘logNormal’) = P(‘logNormal’) P(x | ‘logNormal’) = w_{logNormal} P_{logNormal}(x). Then, for the first intervention you see, your prior density over effectiveness is indeed P(x) = w_{Pareto} P_{Pareto}(x) + w_{logNormal} P_{logNormal}(x), but after measuring a bunch of interventions, you can update your beliefs about the empirical distribution of effectivenesses.