During the animal welfare vs global health debate week, I was very reluctant to make a post or argument in favor of global health, the cause I work in and that animates me. Here are some reflections on why, that may or may not apply to other people:
Moral weights are tiresome to debate. If you (like me) do not have a good grasp of philosophy, it’s an uphill struggle to grasp what RP’s moral weights project means exactly, and where I would or would not buy into its assumptions.
I don’t choose my donations/actions based on impartial cause prioritization. I think impartially within GHD (e.g. I don’t prioritize interventions in India just because I’m from there, I treat health vs income moral weights much more analytically than species moral weights) but not for cross-cause comparison. I am okay with this. But it doesn’t make for a persuasive case to other people.
It doesn’t feel good to post something that you know will provoke a large volume of (friendly!) disagreement. I think of myself as a pretty disagreeable person, but I am still very averse to posting things that go against what almost everyone around me is saying, at least when I don’t feel 100% confident in my thesis. I have found previous arguments about global health vs animal welfare to be especially exhausting and they did not lead to any convergence, so I don’t see the upside that justifies the downside.
I don’t fundamentally disagree with the narrow thesis that marginal money can do more good in animal welfare. I just feel disillusioned with the larger implications that global health is overfunded and not really worth the money we spend on it.
I’m deliberately focusing on emotional/psychological inhibitions as opposed to analytical doubts I have about animal welfare. I do have some analytical doubts, but I think of them as secondary to the personal relationship I have with GHD.
Often people post cost-effectiveness analyses of potential interventions, which invariably conclude that the intervention could rival GiveWell’s top charities. (I’m guilty of this too!) But this happens with such frequency, and I am basically never convinced that the intervention is actually competitive with GWTC. The reason is that they are comparing ex-ante cost-effectiveness (where you make a bunch of assumptions about costs, program delivery mechanisms, etc) with GiveWell’s calculated ex-post cost-effectiveness (where the intervention is already delivered, so there are much fewer assumptions).
Usually, people acknowledge that ex-ante cost-effectiveness is less reliable than ex-post cost-effectiveness. But I haven’t seen any acknowledgement that this systematically overestimates cost-effectiveness, because people who are motivated to try and pursue an intervention are going to be optimistic about unknown factors. Also, many costs are “unknown unknowns” that you might only discover after implementing the project, so leaving them out underestimates costs. (Also, the planning fallacy in general.) And I haven’t seen any discussion of how large the gap between these estimates could be. I think it could be orders of magnitude, just because costs are in the denominator of a benefit-cost ratio, so uncertainty in costs can have huge effects on cost-effectiveness.
One straightforward way to estimate this gap is to redo a GiveWell CEA, but assuming that you were setting up a charity to deliver that intervention for the first time. If GiveWell’s ex-post estimate is X and your ex-ante estimate is K*X for the same intervention, then we would conclude that ex-ante cost-effectiveness is K times too optimistic, and deflate ex-ante estimates by a factor of K.
I might try to do this myself, but I don’t have any experience with CEAs, and would welcome someone else doing it.
I think a similar view is found in ‘Why we can’t take expected value estimates literally even when they’re unbiased’ I.e. we should have a pretty low prior that any particular intervention is above (e.g.) 10x cash transfers, but the strength and robustness of top charities’ CEAs are sufficient to clear them over the bar. And most CEAs of specific interventions written up on the forum aren’t compelling enough to bring the estimate all that much higher from the low prior. I agree it’d be informative to see what ‘naive’ versions of top charity CEAs would be like. As a quick and dirty version, I looked at Givewell’s stuff on AMF—their 2023 central figure is $5,500 per life saved, with 226 rows in the spreadsheet. If we look at their 2012 CEA (downloadable here), they have 45 rows with their optimistic value being $1819 per life saved. Leaving aside inter-temporal confounders, naively a 3x cut is reasonable if the random forum CEA is equivalent to 2012 optimistic Givewell. Though it depends on the quality of the random CEA, I’d guess a 2x-10x cut is a reasonable prior? Plus a stronger cut for really high estimates—eg a 500x cost-effectiveness is more likely to be due to an over-generous methodology
That’s an interesting separate point, I certainly agree that our prior should have low mass around 10x cash and above and that has its own large effect. But I don’t feel like I would make this point contingent on the quality of the CEA; I think even the highest-quality ex-ante CEA can’t avoid these issues. Some CEAs are probably high-quality because there are real decisions attached to them (e.g. Charity Entrepreneurship’s ex-ante CEAs of their prospective charities) and I don’t think I would be convinced by those either.
Neat exercise with 2012 GiveWell. Does 2023 have a country breakdown? Because the main intertemporal confounder I would want to guard against is the change in country mix. I would compare 2012 to the 2023 country in which AMF had the most activity in 2012, which I don’t know off the top of my head. But 3x seems reasonable to me.
I sympathise with this view, but I think I see it in more continuous terms than ex ante vs. ex post, and maybe akin to quality. This is because even ex post, I think there would still be substantial guess-work and assumptions, and the bottom line still relies on interpretation. But the difference for ex post is how empirically informed that analysis can be, and how specific. I.e an ex post analysis can ground estimates on data for that specific org, with that program, in that community. Ex ante analyses can also differ in quality for how empirically informed they are, and how specific they are. But a great ex ante CEA could be more empirically informed and specific than an sub-par ex post CEA. But all this is ~semantics—I think we basically agree.
Not sure about the geography question—from this it looks like the 2012 estimate was based on distribution in Malawi. In 2022 they distribute in DRC, Ghana, Guinea, Malawi, Papua New Guinea, Togo, Uganda, and Zambia, and my guess is that the Givewell figure is an average of those programs? Read into that what you will.
Ooh another angle would be to compare Charity Entrepreneurship’s ex ante CEAs with the eventual charities’ own ex post CEAs. But there’d be a strong selection effect given it depends on eventual charity success/stability, plus the interventions change a lot from the research to implementaiton.
Yes, I agree quality matters a lot, but I think people are universally aware of that—I just wanted to draw attention to the ex-ante/ex-post distinction, which I hadn’t seen raised before.
The CE approach is a good idea, because actually I think the interventions changing a lot from research to implementation is a key part of why ex-ante estimates are unreliable. I don’t know if both estimates are available but it would be great if they are!
One example I know of off the top of my head is LEEP—their CEA for their Malawi campaign found a median of $14/DALY. CE’s original report on lead paint regulation suggested $156/DALY (as a central estimate, I think). That direction and magnitude is pretty surprising to me. I expect it would be explicable based on the details of the different approaches/considerations, but I’d need to look into the details. Maybe a motivating story is that LEEP’s Malawi campaign was surprisingly fast and effective compared to the original report’s hopes?
Another is Family Empowerment Media. An ex post Rethink Priorities report mentions FEM used a Givewell model to estimate a cost-effectiveness 26.9x cash transfers, and Founders Pledge estimated 22x. The original CE report links to a CEA that estimates $984/DALY averted, which is lower than Givewell top charities—though I don’t know the exact comparison to cash transfers, and there are other benefits to family planning than just DALYs.
I suspect a strong selection effect is in play—i.e. I know of these examples and their CEAs are prominent because they were successful—and the ideas survived the gauntlet of further research, selection, founding, piloting, and scaling.
LEEP is a pretty unusual situation in general I think, and I’m not sure is super generalisable. If you get an easy-ish win with lead things, the cost-effectiveness can be insane (see the bangladesh cumin situation).
This is one of the reasons I don’t love post-hoc Cost-effectiveness assessments of successful individual campaigns and policy changes which don’t take into account the probability that their (now successful) campaign might have failed—which I have seen a number of times on the lead front. For every win there might be 5, or 10 or 20 failures (which is fine). If you just zero in on the successes then cost-effective numbers look unrealistically rosy.
If the initial assessment say for LEEP in Malawi assessed say a 20% chance of success, then this should be factored into their final calculation I think, then they can perhaps update it if they realise their success rate increases. Otherwise we end up not costing in the failed campaigns, while the successful ones appear ludicrously cost-effective.
Yeah, though to be fair the CEA for Malawi was b/c it was LEEP’s literal first campaign. I’d imagine LEEP has CEAs for all their country work which include adjustments for likelihood of success, though I don’t know whether they intend to publish them any time soon.
This is an important reflection, and one I’ve found myself querying when seeing various programs claim they are hyper effective. Incredibly well performing interventions are rare, but we might expect to see a higher number of them to be showcased on this forum given there is already a selection bias from the membership/readership here.
However, I do feel the community naturally creates an incentive to inflate (conciously or not) the CEA of interventions—afterall, if you aren’t working on something which can compete with AMF, then why take money away from that? The fix to this being you live in the ambiguity of your intervention and argue that under certain assumptions, your program could be better.
As you effectively note, the problem is could (a priori) judgments are riddled with reasoning risks and errors, which is why I feel the community could do more to better support and also challenge reasoning methods (cognitive and computational). For example, lots of posts mention key uncertainties people have on their interventions, but they often don’t state the second order probabilities of them (not even GiveWell does this consistently) along with how much that uncertainty fundamentally underpins the intervention. A relatively simple fix, which could be a community norm.
I agree about the incentives/motivated reasoning problem. I suspect that uncertainty intervals would be uninformatively huge, so I don’t know if they really are useful in practice. Remember that cost effectiveness is the ratio of two uncertain quantities (benefits and costs), and the ratio of two random variables follows a ratio distribution which generally has huge tails.
FWIW I think it’s a bad solution, but why not quantify the uncertainty in the ex ante CEA? See this GiveWell Change Our Minds submission as an example—I don’t think the uncertainty intervals are uninformatively large, although there is a rather strong assumption that the GiveWell models capture the right structure of the problem. Once the uncertainty is quantified, we could run something like the Bayesian adjustment I demonstrate in this PDF to (in theory!) eliminate the positive bias for more uncertain estimates. And then compare the posterior distribution to an analogous distribution for AMF/other relevant benchmark.
Conceptually, the difference between the ex ante and ex post CEA isn’t categorical. It is a matter of degree—the degree of uncertainty about the model and its parameters. This difference could be captured with an adequate explicit treatment of uncertainty in the CEA.
Interesting, I don’t know why the tails aren’t larger, and I find Squiggle kinda hard to parse. Do you quantify cost uncertainty in addition to benefit uncertainty? Because that would, I think, make the bounds huge.
How do we find new interventions that can beat the best? The main approach has been to seek leverage; interventions where you pay a dollar to move a larger amount of money/resources to a cause (e.g. advocacy interventions. effective giving campaigns). But people don’t often recognize market-based leverage; interventions where you pay a dollar to enable a market transaction whose value is much larger than that dollar. Two examples:
High-school educated workers in urban South Africa struggle to find their first job, because school performance is a very weak signal of ability, and businesses are reluctant to take chances because firing workers is costly. Harambee assesses job-seekers’ skills and gives them certification that they can show to employers as a credible signal. Getting the certification increases their chances of finding a job. Thus, Harambee increased their income without giving them any money, just by fixing an information problem in the labor market.
Rug manufacturers in Egypt would like to export their rugs for big profit. An unnamed German retailer would like to buy these rugs, since they are cheaper than alternatives. However, the global market is huge, and these two parties have no way of getting in touch with each other, or knowing that they are mutually interested in a sale. Aid to Artisans helps the local intermediary for the rug manufacturers get in touch with the German retailer, helps them travel to meet with the retailer to provide samples of their products, and arranges an initial large order. The rug manufacturers gain a reputation, and over the next four years, they get orders totalling $150,000. Thus, ATA increased the incomes of all the artisans without giving them any money, just by connecting them with buyers who wanted to give them money.
These interventions get their leverage from fixing market failures. Both of these examples are failures of information, which are both ubiquitous and cheap to fix. I suspect that it shouldn’t be too hard to find one where spending $1 generates more than $10 in income, which is roughly the bar for a GiveWell top charity.
I suspect that it shouldn’t be too hard to find one where spending $1 generates more than $10 in income, which is roughly the bar for a GiveWell top charity.
This seems wrong to me in that both of your examples are constituencies that are quite a bit better off than Give Directly recipients for which that would hold, i.e. the actual multiplier would need to be a lot higher or apply to constituencies as poor as GD-recipients.
Yeah, hence the caveat with roughly. I actually don’t think they’re much better off—the former group are unemployed and thus have basically no income! - but I feel pretty sanguine about generating $50 or $100 in income per $1 spent if your intervention operates at scale, just because the unit costs of solving an information friction seem trivially small. (Also, business operators are better off but the potential to multiply business income is way higher.)
The easiest way to get this would be through agricultural livelihood interventions. Farmers are the extreme poor, and they have tons of frictions to market transactions, so you are targeting the right population and also getting market-based leverage.
I am not an GHD expert but I would expect someone who has a high school diploma in the richest country in Africa to be a lot better off than the typical GD recipient which seems to be from the poorest strata of the poorest countries.
And so, yeah, I agree one would probably a 50-100x expected multiplier to make this work. I am not saying this is not possible, I just thought the bar stated here was significantly too optimistic.
I picked South Africa because Harambee works there, but the same issue—employers don’t know who is good to hire so job seekers struggle to find jobs—is true across Africa and for much poorer populations than high school educated workers.
But the point would have been better demonstrated with livelihood interventions for farmers.
People often justify a fast takeoff of AI by pointing to how fast AI could improve beyond some point. But The Great Data Integration Schlep is an excellent LW post about the absolute sludge of trying to do data work inside corporate bureaucracy. The key point is that even when companies seemingly benefit from having much more insights into their work, a whole slew of incentive problems and managerial foibles prevent this from being realized. She applies this to be skeptical of AI takeoff:
If you’re imagining an “AI R&D researcher” inventing lots of new technologies, for instance, that means integrating it into corporate R&D, which primarily means big manufacturing firms with heavy investment into science/engineering innovation (semiconductors, pharmaceuticals, medical devices and scientific instruments, petrochemicals, automotive, aerospace, etc). You’d need to get enough access to private R&D data to train the AI, and build enough credibility through pilot programs to gradually convince companies to give the AI free rein, and you’d need to start virtually from scratch with each new client. This takes time, trial-and-error, gradual demonstration of capabilities, and lots and lots of high-paid labor, and it is barely being done yet at all.
This is also the story of computers, and the story of electricity. A transformative new technology was created, but it took decades for its potential to be realized because of all the existing infrastructure that had to be upended to maximize its impact.
In general, even if AI is technologically unprecedented, the social infrastructure through which AI will be deployed is much more precedented, and we should consider those barriers as actually slowing down AI impacts.
I often return to this bit of 80000 Hours’ anonymous career advice, about how when you’re great at your job, no one’s advice is that useful.
I think in general if you want to accomplish something, or you want to be good at something, or you want something to happen — you want to throw yourself into it, and get really deep into it. Focus on it. Obsess over it. If you don’t, you probably won’t get the result you want. And if you do — you’re going to find that advice from other people often isn’t that useful. By “advice” here I mean talking to people who aren’t paying much attention to what you’re doing and don’t know you very well and are just reacting to your description of the situation and your questions. And I’m assuming this isn’t a case where you’re asking the world’s expert on X to answer a narrowly defined question that they’re clearly the best person to answer because it’s 100% about X.
I think EAs tend to ask for a lot of advice, as defined above. And they should, I think asking for advice is good. But I think it’s good to understand what that advice is and isn’t. And usually when you’re doing your job really well, no one’s advice is going to be that useful. They’re never going to know what you should do. They’re never going to know as much as you do, and if they do, you’re not doing it right.
I like it a lot. It reminds me of Agnes Callard’s observation about a young writer asking Margaret Atwood for advice and getting only the trite advice to “write every day”:
The young person is not approaching Atwood for instructions on how to operate Microsoft Word, nor is she making the unreasonable demand that Atwood become her writing coach. She wants the kind of value she would get from the second, but she wants it given to her in the manner of the first. But there is no there there. Hence the advice-giver is reduced to repeating reasonable-sounding things she has heard others say—thoughts that are watered down so far that there’s really no thought left, just water.
I have a thought on this. It related to the level of effort from the advice giver, and the willingness to understand the recipient’s context. Often advice is given with only a few seconds of effort, or with the giver applying a sort of cookie-cutter template to their understanding of the recipient. That is when useless advice comes from. When the giver dedicates some time minutes toward understanding and exploring the receiver’s context, toward actually paying attention, then the advice is able to be of much better quality.
This is specifically fresh in my mind because a few days ago I helped John Doe review his resume. John told me that I was not the first person to help, several other people had looked at his resume and told him that is was pretty good. But I did more than merely glance at his resume; I read through it with a critical eye. I had a page full of notes for him. Some of the notes were preference/stylistic things, but plenty of the notes were ‘errors’ that other people hadn’t bothered to notice: the text used two different shades of dark blue, there was inconsistent formatting in the dates. John was amazed that multiple people had reviewed his resume, and nobody had noticed or bothered to tell him that he was using two different colors (it was not intentional on his part to use two different colors).
In contrast, I’ve heard and (heard of) plenty of career advice within EA that simply isn’t apt. Recommending a recipient with no interest in an area to pursue that area, or ignoring a recipient’s visa/legal status, ignoring a recipient’s financial constraints, etc. I was once told to treat people to coffee and to use my parents’ professional networks. Both of those things are true in general, but I don’t live within a hundred miles of a place where I could treat networking contacts to coffee, and my retired working class parents don’t have professional networks. It reminds me a little bit of trying to try; how much effort do people actually put into the act of giving helpful advice.
It seems like there’s a scale between rubber-ducking and mentorship that a person can operate at depending on the skill differences. Furthermore, some advice-givers are better at rubberducking and others are better at mentorship, adding another dimension.
Yudkowsky’s old sequence post on Cached Thoughts is pretty brief and goes rather deep into the watering-down phenomenon you described.
One essay that I thought about a lot growing up is The Quality of Life. It argues that the metrics of wellbeing that we optimize for are basically meaningless, the dreams of bureaucrats rather than of people.
Why isn’t the most important financial threshold in the inner lives of many, rich or poor, the subjective notion of fuck-you money, the first thing to study? Why isn’t there a major UN study tracking what people consider fuck-you money? Why aren’t Nobel-winning behavioral economists designing clever experiments to tease out how we think about this quantity? It is, after all, our main subjective measure of how not-free we perceive ourselves to be.
Nobody, other than bureaucrats who fund research and economists, asks the question “how much income is needed to be happy?” We already know that talking about happiness without talking about what trade-offs we are making to pursue it is meaningless. The rest of us real people ask the question “how much wealth is required to be free of scripts that dictate what trade-offs you are allowed to make?”
It does not really matter if you generalize beyond income to various in-kind quality-of-life elements like a clean environment or access to healthcare. If you are not measuring prevailing levels of freedom you are measuring nothing relevant. Until people start answering $0 to the fuck-you-money question across the planet, you can be sure that they do not perceive themselves to be free enough to properly pursue quality of life.
This essay definitely had an impact on me, highlighting the agency benefits of cash transfers and pushing me towards a large recurring GiveDirectly donation. Re-reading it today, it feels terribly confusing and difficult to grok. But it feels difficult to grok in a way that suggests it’s not a confused or incoherent or obviously wrong take. It’s just pointing out the water we swim in, which is a valuable thing for more EAs to keep in mind.
I find it encouraging that EAs have quickly pivoted to viewing AI companies as adversaries, after a long period of uneasily viewing them as necessary allies (c.f. Why Not Slow AI Progress?). Previously, I worried that social/professional entanglements and image concerns would lead EAs to align with AI companies even after receiving clear signals that AI companies are not interested in safety. I’m glad to have been wrong about that.
Caveat: we’ve only seen this kind of scrutiny applied to OpenAI and it remains to be seen whether Anthropic and DeepMind will get the same scrutiny.
I don’t think it’s accurate to say that “EAs have quickly pivoted to viewing AI companies as adversaries, after a long period of uneasily viewing them as necessary allies.”
My understanding is that no matter how you define “EAs,” many people have always been supportive of working with/at AI companies, and many others sceptical of that approach.
I think Kelsey Piper’s article marks a huge turning point. In 2022, there were lots of people saying in an abstract sense “we shouldn’t work with AI companies”, but I can’t imagine that article being written in 2022. And the call for attorneys for ex-OpenAI employees is another step so adversarial I can’t imagine it being taken in 2022. Both of these have been pretty positively received, so I think they reflect a real shift in attitudes.
To be concrete, I imagine if Kelsey wrote an article in 2022 about the non disparagement clause (assume it existed then), a lot of people’s response would be “this clause is bad, but we shouldn’t alienate the most safety conscious AI company or else we might increase risk”. I don’t see anyone saying that today. The obvious reason is that people have quickly updated on evidence that OpenAI is not actually safety-conscious. My fear was that they would not update this way, hence my positive reaction.
During the animal welfare vs global health debate week, I was very reluctant to make a post or argument in favor of global health, the cause I work in and that animates me. Here are some reflections on why, that may or may not apply to other people:
Moral weights are tiresome to debate. If you (like me) do not have a good grasp of philosophy, it’s an uphill struggle to grasp what RP’s moral weights project means exactly, and where I would or would not buy into its assumptions.
I don’t choose my donations/actions based on impartial cause prioritization. I think impartially within GHD (e.g. I don’t prioritize interventions in India just because I’m from there, I treat health vs income moral weights much more analytically than species moral weights) but not for cross-cause comparison. I am okay with this. But it doesn’t make for a persuasive case to other people.
It doesn’t feel good to post something that you know will provoke a large volume of (friendly!) disagreement. I think of myself as a pretty disagreeable person, but I am still very averse to posting things that go against what almost everyone around me is saying, at least when I don’t feel 100% confident in my thesis. I have found previous arguments about global health vs animal welfare to be especially exhausting and they did not lead to any convergence, so I don’t see the upside that justifies the downside.
I don’t fundamentally disagree with the narrow thesis that marginal money can do more good in animal welfare. I just feel disillusioned with the larger implications that global health is overfunded and not really worth the money we spend on it.
I’m deliberately focusing on emotional/psychological inhibitions as opposed to analytical doubts I have about animal welfare. I do have some analytical doubts, but I think of them as secondary to the personal relationship I have with GHD.
Often people post cost-effectiveness analyses of potential interventions, which invariably conclude that the intervention could rival GiveWell’s top charities. (I’m guilty of this too!) But this happens with such frequency, and I am basically never convinced that the intervention is actually competitive with GWTC. The reason is that they are comparing ex-ante cost-effectiveness (where you make a bunch of assumptions about costs, program delivery mechanisms, etc) with GiveWell’s calculated ex-post cost-effectiveness (where the intervention is already delivered, so there are much fewer assumptions).
Usually, people acknowledge that ex-ante cost-effectiveness is less reliable than ex-post cost-effectiveness. But I haven’t seen any acknowledgement that this systematically overestimates cost-effectiveness, because people who are motivated to try and pursue an intervention are going to be optimistic about unknown factors. Also, many costs are “unknown unknowns” that you might only discover after implementing the project, so leaving them out underestimates costs. (Also, the planning fallacy in general.) And I haven’t seen any discussion of how large the gap between these estimates could be. I think it could be orders of magnitude, just because costs are in the denominator of a benefit-cost ratio, so uncertainty in costs can have huge effects on cost-effectiveness.
One straightforward way to estimate this gap is to redo a GiveWell CEA, but assuming that you were setting up a charity to deliver that intervention for the first time. If GiveWell’s ex-post estimate is X and your ex-ante estimate is K*X for the same intervention, then we would conclude that ex-ante cost-effectiveness is K times too optimistic, and deflate ex-ante estimates by a factor of K.
I might try to do this myself, but I don’t have any experience with CEAs, and would welcome someone else doing it.
I think a similar view is found in ‘Why we can’t take expected value estimates literally even when they’re unbiased’ I.e. we should have a pretty low prior that any particular intervention is above (e.g.) 10x cash transfers, but the strength and robustness of top charities’ CEAs are sufficient to clear them over the bar. And most CEAs of specific interventions written up on the forum aren’t compelling enough to bring the estimate all that much higher from the low prior.
I agree it’d be informative to see what ‘naive’ versions of top charity CEAs would be like. As a quick and dirty version, I looked at Givewell’s stuff on AMF—their 2023 central figure is $5,500 per life saved, with 226 rows in the spreadsheet. If we look at their 2012 CEA (downloadable here), they have 45 rows with their optimistic value being $1819 per life saved. Leaving aside inter-temporal confounders, naively a 3x cut is reasonable if the random forum CEA is equivalent to 2012 optimistic Givewell. Though it depends on the quality of the random CEA, I’d guess a 2x-10x cut is a reasonable prior? Plus a stronger cut for really high estimates—eg a 500x cost-effectiveness is more likely to be due to an over-generous methodology
That’s an interesting separate point, I certainly agree that our prior should have low mass around 10x cash and above and that has its own large effect. But I don’t feel like I would make this point contingent on the quality of the CEA; I think even the highest-quality ex-ante CEA can’t avoid these issues. Some CEAs are probably high-quality because there are real decisions attached to them (e.g. Charity Entrepreneurship’s ex-ante CEAs of their prospective charities) and I don’t think I would be convinced by those either.
Neat exercise with 2012 GiveWell. Does 2023 have a country breakdown? Because the main intertemporal confounder I would want to guard against is the change in country mix. I would compare 2012 to the 2023 country in which AMF had the most activity in 2012, which I don’t know off the top of my head. But 3x seems reasonable to me.
I sympathise with this view, but I think I see it in more continuous terms than ex ante vs. ex post, and maybe akin to quality. This is because even ex post, I think there would still be substantial guess-work and assumptions, and the bottom line still relies on interpretation. But the difference for ex post is how empirically informed that analysis can be, and how specific. I.e an ex post analysis can ground estimates on data for that specific org, with that program, in that community. Ex ante analyses can also differ in quality for how empirically informed they are, and how specific they are. But a great ex ante CEA could be more empirically informed and specific than an sub-par ex post CEA. But all this is ~semantics—I think we basically agree.
Not sure about the geography question—from this it looks like the 2012 estimate was based on distribution in Malawi. In 2022 they distribute in DRC, Ghana, Guinea, Malawi, Papua New Guinea, Togo, Uganda, and Zambia, and my guess is that the Givewell figure is an average of those programs? Read into that what you will.
Ooh another angle would be to compare Charity Entrepreneurship’s ex ante CEAs with the eventual charities’ own ex post CEAs. But there’d be a strong selection effect given it depends on eventual charity success/stability, plus the interventions change a lot from the research to implementaiton.
Yes, I agree quality matters a lot, but I think people are universally aware of that—I just wanted to draw attention to the ex-ante/ex-post distinction, which I hadn’t seen raised before.
The CE approach is a good idea, because actually I think the interventions changing a lot from research to implementation is a key part of why ex-ante estimates are unreliable. I don’t know if both estimates are available but it would be great if they are!
One example I know of off the top of my head is LEEP—their CEA for their Malawi campaign found a median of $14/DALY. CE’s original report on lead paint regulation suggested $156/DALY (as a central estimate, I think). That direction and magnitude is pretty surprising to me. I expect it would be explicable based on the details of the different approaches/considerations, but I’d need to look into the details. Maybe a motivating story is that LEEP’s Malawi campaign was surprisingly fast and effective compared to the original report’s hopes?
Another is Family Empowerment Media. An ex post Rethink Priorities report mentions FEM used a Givewell model to estimate a cost-effectiveness 26.9x cash transfers, and Founders Pledge estimated 22x. The original CE report links to a CEA that estimates $984/DALY averted, which is lower than Givewell top charities—though I don’t know the exact comparison to cash transfers, and there are other benefits to family planning than just DALYs.
I suspect a strong selection effect is in play—i.e. I know of these examples and their CEAs are prominent because they were successful—and the ideas survived the gauntlet of further research, selection, founding, piloting, and scaling.
LEEP is a pretty unusual situation in general I think, and I’m not sure is super generalisable. If you get an easy-ish win with lead things, the cost-effectiveness can be insane (see the bangladesh cumin situation).
Yeah makes sense, and that the early research could have been heavily discounted by pessimism about a charity achieving big wins.
This is one of the reasons I don’t love post-hoc Cost-effectiveness assessments of successful individual campaigns and policy changes which don’t take into account the probability that their (now successful) campaign might have failed—which I have seen a number of times on the lead front. For every win there might be 5, or 10 or 20 failures (which is fine). If you just zero in on the successes then cost-effective numbers look unrealistically rosy.
If the initial assessment say for LEEP in Malawi assessed say a 20% chance of success, then this should be factored into their final calculation I think, then they can perhaps update it if they realise their success rate increases. Otherwise we end up not costing in the failed campaigns, while the successful ones appear ludicrously cost-effective.
Yeah, though to be fair the CEA for Malawi was b/c it was LEEP’s literal first campaign. I’d imagine LEEP has CEAs for all their country work which include adjustments for likelihood of success, though I don’t know whether they intend to publish them any time soon.
This is an important reflection, and one I’ve found myself querying when seeing various programs claim they are hyper effective. Incredibly well performing interventions are rare, but we might expect to see a higher number of them to be showcased on this forum given there is already a selection bias from the membership/readership here.
However, I do feel the community naturally creates an incentive to inflate (conciously or not) the CEA of interventions—afterall, if you aren’t working on something which can compete with AMF, then why take money away from that? The fix to this being you live in the ambiguity of your intervention and argue that under certain assumptions, your program could be better.
As you effectively note, the problem is could (a priori) judgments are riddled with reasoning risks and errors, which is why I feel the community could do more to better support and also challenge reasoning methods (cognitive and computational). For example, lots of posts mention key uncertainties people have on their interventions, but they often don’t state the second order probabilities of them (not even GiveWell does this consistently) along with how much that uncertainty fundamentally underpins the intervention. A relatively simple fix, which could be a community norm.
I agree about the incentives/motivated reasoning problem. I suspect that uncertainty intervals would be uninformatively huge, so I don’t know if they really are useful in practice. Remember that cost effectiveness is the ratio of two uncertain quantities (benefits and costs), and the ratio of two random variables follows a ratio distribution which generally has huge tails.
FWIW I think it’s a bad solution, but why not quantify the uncertainty in the ex ante CEA? See this GiveWell Change Our Minds submission as an example—I don’t think the uncertainty intervals are uninformatively large, although there is a rather strong assumption that the GiveWell models capture the right structure of the problem. Once the uncertainty is quantified, we could run something like the Bayesian adjustment I demonstrate in this PDF to (in theory!) eliminate the positive bias for more uncertain estimates. And then compare the posterior distribution to an analogous distribution for AMF/other relevant benchmark.
Conceptually, the difference between the ex ante and ex post CEA isn’t categorical. It is a matter of degree—the degree of uncertainty about the model and its parameters. This difference could be captured with an adequate explicit treatment of uncertainty in the CEA.
Interesting, I don’t know why the tails aren’t larger, and I find Squiggle kinda hard to parse. Do you quantify cost uncertainty in addition to benefit uncertainty? Because that would, I think, make the bounds huge.
How do we find new interventions that can beat the best? The main approach has been to seek leverage; interventions where you pay a dollar to move a larger amount of money/resources to a cause (e.g. advocacy interventions. effective giving campaigns). But people don’t often recognize market-based leverage; interventions where you pay a dollar to enable a market transaction whose value is much larger than that dollar. Two examples:
High-school educated workers in urban South Africa struggle to find their first job, because school performance is a very weak signal of ability, and businesses are reluctant to take chances because firing workers is costly. Harambee assesses job-seekers’ skills and gives them certification that they can show to employers as a credible signal. Getting the certification increases their chances of finding a job. Thus, Harambee increased their income without giving them any money, just by fixing an information problem in the labor market.
Rug manufacturers in Egypt would like to export their rugs for big profit. An unnamed German retailer would like to buy these rugs, since they are cheaper than alternatives. However, the global market is huge, and these two parties have no way of getting in touch with each other, or knowing that they are mutually interested in a sale. Aid to Artisans helps the local intermediary for the rug manufacturers get in touch with the German retailer, helps them travel to meet with the retailer to provide samples of their products, and arranges an initial large order. The rug manufacturers gain a reputation, and over the next four years, they get orders totalling $150,000. Thus, ATA increased the incomes of all the artisans without giving them any money, just by connecting them with buyers who wanted to give them money.
These interventions get their leverage from fixing market failures. Both of these examples are failures of information, which are both ubiquitous and cheap to fix. I suspect that it shouldn’t be too hard to find one where spending $1 generates more than $10 in income, which is roughly the bar for a GiveWell top charity.
This seems wrong to me in that both of your examples are constituencies that are quite a bit better off than Give Directly recipients for which that would hold, i.e. the actual multiplier would need to be a lot higher or apply to constituencies as poor as GD-recipients.
Yeah, hence the caveat with roughly. I actually don’t think they’re much better off—the former group are unemployed and thus have basically no income! - but I feel pretty sanguine about generating $50 or $100 in income per $1 spent if your intervention operates at scale, just because the unit costs of solving an information friction seem trivially small. (Also, business operators are better off but the potential to multiply business income is way higher.)
The easiest way to get this would be through agricultural livelihood interventions. Farmers are the extreme poor, and they have tons of frictions to market transactions, so you are targeting the right population and also getting market-based leverage.
I am not an GHD expert but I would expect someone who has a high school diploma in the richest country in Africa to be a lot better off than the typical GD recipient which seems to be from the poorest strata of the poorest countries.
And so, yeah, I agree one would probably a 50-100x expected multiplier to make this work. I am not saying this is not possible, I just thought the bar stated here was significantly too optimistic.
I picked South Africa because Harambee works there, but the same issue—employers don’t know who is good to hire so job seekers struggle to find jobs—is true across Africa and for much poorer populations than high school educated workers.
But the point would have been better demonstrated with livelihood interventions for farmers.
Thanks, and sorry if I was too nitpicky then.
People often justify a fast takeoff of AI by pointing to how fast AI could improve beyond some point. But The Great Data Integration Schlep is an excellent LW post about the absolute sludge of trying to do data work inside corporate bureaucracy. The key point is that even when companies seemingly benefit from having much more insights into their work, a whole slew of incentive problems and managerial foibles prevent this from being realized. She applies this to be skeptical of AI takeoff:
This is also the story of computers, and the story of electricity. A transformative new technology was created, but it took decades for its potential to be realized because of all the existing infrastructure that had to be upended to maximize its impact.
In general, even if AI is technologically unprecedented, the social infrastructure through which AI will be deployed is much more precedented, and we should consider those barriers as actually slowing down AI impacts.
I often return to this bit of 80000 Hours’ anonymous career advice, about how when you’re great at your job, no one’s advice is that useful.
I like it a lot. It reminds me of Agnes Callard’s observation about a young writer asking Margaret Atwood for advice and getting only the trite advice to “write every day”:
I have a thought on this. It related to the level of effort from the advice giver, and the willingness to understand the recipient’s context. Often advice is given with only a few seconds of effort, or with the giver applying a sort of cookie-cutter template to their understanding of the recipient. That is when useless advice comes from. When the giver dedicates some time minutes toward understanding and exploring the receiver’s context, toward actually paying attention, then the advice is able to be of much better quality.
This is specifically fresh in my mind because a few days ago I helped John Doe review his resume. John told me that I was not the first person to help, several other people had looked at his resume and told him that is was pretty good. But I did more than merely glance at his resume; I read through it with a critical eye. I had a page full of notes for him. Some of the notes were preference/stylistic things, but plenty of the notes were ‘errors’ that other people hadn’t bothered to notice: the text used two different shades of dark blue, there was inconsistent formatting in the dates. John was amazed that multiple people had reviewed his resume, and nobody had noticed or bothered to tell him that he was using two different colors (it was not intentional on his part to use two different colors).
In contrast, I’ve heard and (heard of) plenty of career advice within EA that simply isn’t apt. Recommending a recipient with no interest in an area to pursue that area, or ignoring a recipient’s visa/legal status, ignoring a recipient’s financial constraints, etc. I was once told to treat people to coffee and to use my parents’ professional networks. Both of those things are true in general, but I don’t live within a hundred miles of a place where I could treat networking contacts to coffee, and my retired working class parents don’t have professional networks. It reminds me a little bit of trying to try; how much effort do people actually put into the act of giving helpful advice.
It seems like there’s a scale between rubber-ducking and mentorship that a person can operate at depending on the skill differences. Furthermore, some advice-givers are better at rubberducking and others are better at mentorship, adding another dimension.
Yudkowsky’s old sequence post on Cached Thoughts is pretty brief and goes rather deep into the watering-down phenomenon you described.
One essay that I thought about a lot growing up is The Quality of Life. It argues that the metrics of wellbeing that we optimize for are basically meaningless, the dreams of bureaucrats rather than of people.
This essay definitely had an impact on me, highlighting the agency benefits of cash transfers and pushing me towards a large recurring GiveDirectly donation. Re-reading it today, it feels terribly confusing and difficult to grok. But it feels difficult to grok in a way that suggests it’s not a confused or incoherent or obviously wrong take. It’s just pointing out the water we swim in, which is a valuable thing for more EAs to keep in mind.
I find it encouraging that EAs have quickly pivoted to viewing AI companies as adversaries, after a long period of uneasily viewing them as necessary allies (c.f. Why Not Slow AI Progress?). Previously, I worried that social/professional entanglements and image concerns would lead EAs to align with AI companies even after receiving clear signals that AI companies are not interested in safety. I’m glad to have been wrong about that.
Caveat: we’ve only seen this kind of scrutiny applied to OpenAI and it remains to be seen whether Anthropic and DeepMind will get the same scrutiny.
I don’t think it’s accurate to say that “EAs have quickly pivoted to viewing AI companies as adversaries, after a long period of uneasily viewing them as necessary allies.”
My understanding is that no matter how you define “EAs,” many people have always been supportive of working with/at AI companies, and many others sceptical of that approach.
I think Kelsey Piper’s article marks a huge turning point. In 2022, there were lots of people saying in an abstract sense “we shouldn’t work with AI companies”, but I can’t imagine that article being written in 2022. And the call for attorneys for ex-OpenAI employees is another step so adversarial I can’t imagine it being taken in 2022. Both of these have been pretty positively received, so I think they reflect a real shift in attitudes.
To be concrete, I imagine if Kelsey wrote an article in 2022 about the non disparagement clause (assume it existed then), a lot of people’s response would be “this clause is bad, but we shouldn’t alienate the most safety conscious AI company or else we might increase risk”. I don’t see anyone saying that today. The obvious reason is that people have quickly updated on evidence that OpenAI is not actually safety-conscious. My fear was that they would not update this way, hence my positive reaction.