I think it’s fair to contend that EA is “biased” towards high-legibility, quantitative outcomes, but I don’t think that it is very bad on balance. Importantly, EA is fairly open to attempted quantification of normally-hard-to-quantify concepts, but it requires putting in some mental legwork and making plausible quantifications/models (even loose ones) of how valuable this idea is.
If you could describe in a step-by-step (e.g., probability X impact) manner a variety of plausible pathways or arguments by which this approach could have really high expected value, I would be interested to see such a model. For example, if you can say “I believe there is a P% chance that spending $R would lead to X outcome(s), which has an F% of producing U QALYs/[or other metric] relative to the counterfactual, creating an expected value of Z QALYs/benefits per dollar spent,” where Z seems like a plausible number (based on the plausibility of the rest of the model), then it probably isn’t something that EA will just dismiss. This is heavily related to Open Philanthropy’s post about reasoning transparency, which (among other benefits) makes it easier for someone else to dissect another person’s claims.
I do think it may be difficult to show really high yet plausible expected value for this intervention, but I’d still be open to seeing such an analysis, and personally I think that should probably be an initial, unrequested step before complaining about EA not being willing to consider hard-to-quantify ideas.
I don’t think it’s possible to do an analysis that makes sense at all, given that outcomes are so high variance and depends so much on the skill and strategy and luck of the people working on it. That doesn’t mean no one should work on it. Open Philanthropy and the FTX future fund are uniquely positioned to be able to get effective at this kind of work and drive the kind of results no one else can
And I think they know this and have been trying; OpenPhil has done work in land use reform and criminal justice reform, for example. I’m not complaining about what people choose to do or not do, but I think my original statement about EA being biased against difficult-to-measure things is correct and makes sense with an evidence-based ideology
I’m a bit confused on where you stand on this: on the one hand, you seem to be suggesting that it’s not possible to derive a decent estimate on the likelihood of success, but on the other hand you are still suggesting that you think it is worth funding.
I don’t dispute that it can be hard to do “accurate” analysis—e.g., to even be within an order of magnitude of accuracy on certain probability or effect-size estimates—but the key behind various back-of-the-envelope calculations (BOTECs) is getting a rough sense of “does this seem to be at least 10:1 expected return on investment given XYZ explicit, dissectible assumptions?”
If the answer is yes, then that’s an important signal saying “this is worth deeper-than-BOTEC-level analysis.” Certain cause areas like AI safety/governance, biosecurity, and a few others have passed this bar by wide margins even when evidence/arguments were relatively scarce and it was (and still is!) really hard to come up with reliable specific estimates.
Explicating your reasoning in such ways is really important for making your analysis more legible/dissectible for others and also for yourself: it is quite easy to think “X is going to happen/work” before laying out the key steps/arguments, but sometimes by explicating your reasoning you (or others) can identify flawed assumptions, outright contradictions, or at least key hinge points in models.
Systematic change can be hard to predict, but (I suspect that) arguably everything can be given some kind of probability estimate in theory, even if it’s a situation of pure uncertainty that leads to an estimate of 1/n for n possible outcomes (such as ~50% for coin flips). These don’t have to be good estimates, but they need to be explicit so that 1) other people can evaluate them, and 2) you can use such calculations in your model (since you can’t multiply a number by ”?%”).
One thing that might be useful in this situation is to establish some kind of “outside view” or reference class: how often do these kinds of social movements/reforms work, and how beneficial do they tend to be? Once you have a generic reference class, you can add in arguments to refine the generic class to better fit this specific case (i.e., a systemic change being pushed by people in EA).
On a separate, more specific, but less important note: I especially take issue with the idea of “luck” being factored into the model. I suspect you don’t actually mean “luck” in the more superstitious sense (i.e., a cosmic quality that someone has or doesn’t have), but it’s exactly this kind of question/uncertainty (e.g., the likelihood that the environment will be favorable or that people will be in the right place at the right time) that needs to be made more explicit.
I’m a bit confused on where you stand on this: on the one hand, you seem to be suggesting that it’s not possible to derive a decent estimate on the likelihood of success, but on the other hand you are still suggesting that you think it is worth funding.
i think any estimate would have a confidence interval so wide that it would be useless. (I said “variance” before; maybe that’s a less well known term)
how often do these kinds of social movements/reforms work
I think I’ve cited a pretty good example with the conservative legal movement. My belief is that with a good strategy and the right movement, it will work IF there are people obsessed with getting it done over their lifetime. This is obviously a difficult belief to prove true or false.
I especially take issue with the idea of “luck” being factored into the model… it’s exactly this kind of question/uncertainty (e.g., the likelihood that the environment will be favorable or that people will be in the right place at the right time) that needs to be made more explicit.
This is difficult for me to swallow because “luck” is a huge factor in how getting things done in politics works. Something happens in the news and suddenly your cause area becomes super easy or super hard to advance. I’m not sure how this can be made more explicit in a model. Here’s an example in criminal justice reform that I was recently reading about: ALEC is a big conservative think tank. You would never think that they would be for criminal justice reform. But some outreach from pro-reform conservatives over time PLUS media outrage about their “Stand your ground” law that people blamed for the killing of Trayvon Martin made it possible.
Curious where the crux of our disagreement is:
Would you agree that some things that can’t be measured are still worth doing? And is your belief also that pushing the abundance agenda can’t possibly be more cost-effective than donations to AMF?
i think any estimate would have a confidence interval so wide that it would be useless. (I said “variance” before; maybe that’s a less well known term)
I am aware of what you mean by variance, but I don’t think this challenges my point: I dispute the idea that you can both say “we can’t make any useful estimate on the likelihood of success” and still claim “it’s worth funding (despite any opportunity costs and other potential drawbacks).”
As the rest of this comment gets into, even a really wide (initial/early-stage) confidence interval can be useful as long as the other variables involved are sufficiently large that you can credibly say “it seems very likely that the probability is at least X%, which is enough to make this very cost effective in expectation.”
(This line of reasoning is very pronounced in longtermism)
Curious where the crux of our disagreement is: Would you agree that some things that can’t be measured are still worth doing? And is your belief also that pushing the abundance agenda can’t possibly be more cost-effective than donations to AMF?
I think one crux/sticking point for me is: I believe that you could make a highly-simplistic but illustrative 3-variable plausibility model involving the following questions:
How much funding/resources should be devoted
What is the probability of achieving X outcome if we devote the above-given amount of resources
How valuable is X outcome (e.g., in terms of QALYs).
This is obviously oversimplified (the actual claims are more distributions rather than point estimates), but it requires you to explicate/stake claims like “even under conservative assumptions X, Y, and Z, the expected value of this intervention is still really large.” Relatedly, it allows you to establish breakeven points. Consider the following:
Let’s suppose you claim achieving some policy agenda outcome would produce somewhere between $1T and $10T of value.
Suppose you argue that spending $100M on some kind of movement/systemic change campaign would increase the likelihood of achieving that outcome by somewhere between 0.1% and 10%.
Those confidence intervals are rather large (the probability estimate spans two orders of magnitude), but even with such wide confidence intervals you can claim that a conservative estimate of the expected value is “at least $1B,” which is “at least a 10x return on investment.” And that’s a claim that I and others can at least dissect.
However, my concern/suspicion is that upon explicating these estimates, the “conservative estimate” of expected value will actually not look very large—and in fact I suspect that even my median estimate will probably be lower than global health and development charities.
Would you agree that some things that can’t be measured are still worth doing?
I would push back against the focus on the word “measured” here: “measured” typically is used to refer to estimates which are so objective, verifiable, and/or otherwise defensible that they get thought of as this special category of knowledge, like “we’ve empirically measured and verified that the average return on investment is X.”
I wholly agree that some things which can’t be “measured” are still worth doing, nor are measurements infallible. It’s not about measurements, it’s about estimates. Going back to the point I made at the beginning, the problem I see with your stance is that (based on my limited interaction here) you seem to both be asserting that no reliable estimates can be made, yet asserting a claim that your estimate finds it is worthwhile. But I’m unclear on what your estimate is, and thus I can’t evaluate it.
Regarding “luck,” I will just redirect back to my claim about breakeven points and reference class estimations: does the reliance on “luck” (fortunate circumstances) set the overall likelihood of success at something like 1%? 0.1%?
What is the breakeven point? And does a quick review of the historical frequency of such “luck” produce an estimate which exceeds that conservative breakeven point?
I think it’s fair to contend that EA is “biased” towards high-legibility, quantitative outcomes, but I don’t think that it is very bad on balance. Importantly, EA is fairly open to attempted quantification of normally-hard-to-quantify concepts, but it requires putting in some mental legwork and making plausible quantifications/models (even loose ones) of how valuable this idea is.
If you could describe in a step-by-step (e.g., probability X impact) manner a variety of plausible pathways or arguments by which this approach could have really high expected value, I would be interested to see such a model. For example, if you can say “I believe there is a P% chance that spending $R would lead to X outcome(s), which has an F% of producing U QALYs/[or other metric] relative to the counterfactual, creating an expected value of Z QALYs/benefits per dollar spent,” where Z seems like a plausible number (based on the plausibility of the rest of the model), then it probably isn’t something that EA will just dismiss. This is heavily related to Open Philanthropy’s post about reasoning transparency, which (among other benefits) makes it easier for someone else to dissect another person’s claims.
I do think it may be difficult to show really high yet plausible expected value for this intervention, but I’d still be open to seeing such an analysis, and personally I think that should probably be an initial, unrequested step before complaining about EA not being willing to consider hard-to-quantify ideas.
I don’t think it’s possible to do an analysis that makes sense at all, given that outcomes are so high variance and depends so much on the skill and strategy and luck of the people working on it. That doesn’t mean no one should work on it. Open Philanthropy and the FTX future fund are uniquely positioned to be able to get effective at this kind of work and drive the kind of results no one else can
And I think they know this and have been trying; OpenPhil has done work in land use reform and criminal justice reform, for example. I’m not complaining about what people choose to do or not do, but I think my original statement about EA being biased against difficult-to-measure things is correct and makes sense with an evidence-based ideology
I’m a bit confused on where you stand on this: on the one hand, you seem to be suggesting that it’s not possible to derive a decent estimate on the likelihood of success, but on the other hand you are still suggesting that you think it is worth funding.
I don’t dispute that it can be hard to do “accurate” analysis—e.g., to even be within an order of magnitude of accuracy on certain probability or effect-size estimates—but the key behind various back-of-the-envelope calculations (BOTECs) is getting a rough sense of “does this seem to be at least 10:1 expected return on investment given XYZ explicit, dissectible assumptions?”
If the answer is yes, then that’s an important signal saying “this is worth deeper-than-BOTEC-level analysis.” Certain cause areas like AI safety/governance, biosecurity, and a few others have passed this bar by wide margins even when evidence/arguments were relatively scarce and it was (and still is!) really hard to come up with reliable specific estimates.
Explicating your reasoning in such ways is really important for making your analysis more legible/dissectible for others and also for yourself: it is quite easy to think “X is going to happen/work” before laying out the key steps/arguments, but sometimes by explicating your reasoning you (or others) can identify flawed assumptions, outright contradictions, or at least key hinge points in models.
Systematic change can be hard to predict, but (I suspect that) arguably everything can be given some kind of probability estimate in theory, even if it’s a situation of pure uncertainty that leads to an estimate of 1/n for n possible outcomes (such as ~50% for coin flips). These don’t have to be good estimates, but they need to be explicit so that 1) other people can evaluate them, and 2) you can use such calculations in your model (since you can’t multiply a number by ”?%”).
One thing that might be useful in this situation is to establish some kind of “outside view” or reference class: how often do these kinds of social movements/reforms work, and how beneficial do they tend to be? Once you have a generic reference class, you can add in arguments to refine the generic class to better fit this specific case (i.e., a systemic change being pushed by people in EA).
On a separate, more specific, but less important note: I especially take issue with the idea of “luck” being factored into the model. I suspect you don’t actually mean “luck” in the more superstitious sense (i.e., a cosmic quality that someone has or doesn’t have), but it’s exactly this kind of question/uncertainty (e.g., the likelihood that the environment will be favorable or that people will be in the right place at the right time) that needs to be made more explicit.
i think any estimate would have a confidence interval so wide that it would be useless. (I said “variance” before; maybe that’s a less well known term)
I think I’ve cited a pretty good example with the conservative legal movement. My belief is that with a good strategy and the right movement, it will work IF there are people obsessed with getting it done over their lifetime. This is obviously a difficult belief to prove true or false.
This is difficult for me to swallow because “luck” is a huge factor in how getting things done in politics works. Something happens in the news and suddenly your cause area becomes super easy or super hard to advance. I’m not sure how this can be made more explicit in a model. Here’s an example in criminal justice reform that I was recently reading about: ALEC is a big conservative think tank. You would never think that they would be for criminal justice reform. But some outreach from pro-reform conservatives over time PLUS media outrage about their “Stand your ground” law that people blamed for the killing of Trayvon Martin made it possible.
Curious where the crux of our disagreement is: Would you agree that some things that can’t be measured are still worth doing? And is your belief also that pushing the abundance agenda can’t possibly be more cost-effective than donations to AMF?
I am aware of what you mean by variance, but I don’t think this challenges my point: I dispute the idea that you can both say “we can’t make any useful estimate on the likelihood of success” and still claim “it’s worth funding (despite any opportunity costs and other potential drawbacks).”
As the rest of this comment gets into, even a really wide (initial/early-stage) confidence interval can be useful as long as the other variables involved are sufficiently large that you can credibly say “it seems very likely that the probability is at least X%, which is enough to make this very cost effective in expectation.”
(This line of reasoning is very pronounced in longtermism)
I think one crux/sticking point for me is: I believe that you could make a highly-simplistic but illustrative 3-variable plausibility model involving the following questions:
How much funding/resources should be devoted
What is the probability of achieving X outcome if we devote the above-given amount of resources
How valuable is X outcome (e.g., in terms of QALYs).
This is obviously oversimplified (the actual claims are more distributions rather than point estimates), but it requires you to explicate/stake claims like “even under conservative assumptions X, Y, and Z, the expected value of this intervention is still really large.” Relatedly, it allows you to establish breakeven points. Consider the following:
Let’s suppose you claim achieving some policy agenda outcome would produce somewhere between $1T and $10T of value.
Suppose you argue that spending $100M on some kind of movement/systemic change campaign would increase the likelihood of achieving that outcome by somewhere between 0.1% and 10%.
Those confidence intervals are rather large (the probability estimate spans two orders of magnitude), but even with such wide confidence intervals you can claim that a conservative estimate of the expected value is “at least $1B,” which is “at least a 10x return on investment.” And that’s a claim that I and others can at least dissect.
However, my concern/suspicion is that upon explicating these estimates, the “conservative estimate” of expected value will actually not look very large—and in fact I suspect that even my median estimate will probably be lower than global health and development charities.
I would push back against the focus on the word “measured” here: “measured” typically is used to refer to estimates which are so objective, verifiable, and/or otherwise defensible that they get thought of as this special category of knowledge, like “we’ve empirically measured and verified that the average return on investment is X.”
I wholly agree that some things which can’t be “measured” are still worth doing, nor are measurements infallible. It’s not about measurements, it’s about estimates. Going back to the point I made at the beginning, the problem I see with your stance is that (based on my limited interaction here) you seem to both be asserting that no reliable estimates can be made, yet asserting a claim that your estimate finds it is worthwhile. But I’m unclear on what your estimate is, and thus I can’t evaluate it.
Regarding “luck,” I will just redirect back to my claim about breakeven points and reference class estimations: does the reliance on “luck” (fortunate circumstances) set the overall likelihood of success at something like 1%? 0.1%?
What is the breakeven point? And does a quick review of the historical frequency of such “luck” produce an estimate which exceeds that conservative breakeven point?