This was my fault, sorry. I was travelling and ill so I was slow giving feedback on the draft. I belatedly sent Saulius some comments without realising it had just been published, so he took it down it in order to incorporate some of my suggestions.
Derek
>TLYCS only endorses 22 charities, all of which work in the developing world on causes that are plausibly cost-effective on the level of some GiveWell interventions (even though evidence is fairly weak on some of them...)
It’s plausible that some of these are as cost-effective as the GW top charities, but perhaps not that they are as cost-effective on average, or in expectation.
>This selection only looks narrow if your point of comparison is another EA-aligned evaluator like GiveWell, ACE, or Founder’s Pledge.
You mean only looks broad?
Anyway, I would agree TLYCS’s selection is narrow relative to some others; just not the EA evaluators that seem like the most natural comparators.
>I more intended to make the general point that I’d have liked to have a few more concrete facts that I could use to help me weigh Rethink’s judgment.
That’s fair. Initially I was going to write a summary of our evidence and reasoning for all 42 parameters, or at least the 5-10 that the results were most sensitive to. In the end we decided against it for various reasons, e.g.:
- Some were based fairly heavily on information that had to remain confidential, so a lot would have to be redacted.
- Often the 6 team members had different rationales and drew on different information/experiences, so it would be hard in some cases to give a coherent summary.
- Sometimes team members noted their rationales in the elicitation document, but with so many parameters, there wasn’t always time to do this properly. Any summary would therefore also be incomplete.
- The report was already too long and was taking too much time, so this seemed like an easy way of limiting both length and delays.But maybe it was the wrong call.
>Knowing that Donational started out with all or almost all TLYCS charities reduces my concern a lot. The impression I had was that they’d been working with a very broad range of charities and were radically cutting back on their selection.
I would consider TLYCS’s range very broad, but you may disagree. Anyway, you can see Donational’s current list at https://donational.org/charities
Hi Aaron,
Many thanks for the comments, and sorry for the slow reply—I’ve been travelling. Currently very jet-lagged and a bit ill so let me know if the following isn’t clear/sensible. Also, as in all my responses, these views are my own and not necessarily shared by the whole Rethink team.
>Perhaps this indicates that your models could be filled in over time with “default scores” that apply to all projects within a certain area? For example, any program aiming to reduce poverty could get the same “indirect harm” scores as this project’s anti-poverty side.
I agree. If RG continues, we may well standardise indirect effect scores to some extent, perhaps publishing fairly in-depth analyses of the various issues in separate posts. We discussed this early on in the process but it didn’t make sense to do it for an initial ‘experimental’ evaluation.
>Something also feels off about noting the potential harms of effects which are generally very good … The more that poverty is reduced, and the faster the economy grows, the more animals are likely to be eaten. This is an “indirect bad outcome” that I’m actually happy to see in some sense, because the existence of the bad outcome indicates that I succeeded in my primary goal.
I think I see where you’re coming from, but this seems to constitute a general case against considering indirect effects at all. I suppose there are some that don’t scale linearly with the intended effects, but I’m not sure this example does either (e.g. meat consumption may plateaux above a certain income), and it strikes me as a pretty arbitrary criterion to use. (I’m not sure whether you were actually suggesting we use that.)
Maybe this comes back to the point I made to Jonas, i.e. the effect of the charities, both direct and indirect, seems to be captured by the ‘counterfactual dollars moved to top charities’ metric so we should perhaps only consider indirect effects of CAP itself. But it sounds like if we were just evaluating, say, AMF, you’d be opposed to considering its effects on animal consumption, which doesn’t seem right to me. But maybe I’m misunderstanding you.
>It’s as though someone were to warn me about donating to an X-risk-reduction organization by pointing out that more humans living their lives implies that more humans will get cancer.
This example feels different. Assuming your main goal of reducing x-risk is that humans live longer, and the main bad consequence of cancer is shortening of human lives, your primary metric—something like (wellbeing-adjusted) life-years gained—will already account for the additional cancer.
More broadly: it seems like what counts as ‘indirect’ depends on what you’ve decided apriori to be ‘direct’ or intended. So we haven’t included, say, non-creation of net positive factory farmed animal lives as an indirect harm of charities that reduce meat consumption, because we think on average intensively-farmed animals’ lives would be net negative; but I would consider something like harm to the livelihoods of farmers an indirect effect, as that is not the goal, and may not be considered at all by potential donors if it weren’t highlighted. Likewise, if the goal of a charity is to mitigate global warming by reducing meat consumption, the effect on animal welfare would be an indirect effect; but because the main aim of the ACE-recommended animal charities seems to be reducing animal suffering, we have considered any consequences for climate change (and antibiotic resistance) to be indirect.
>If a socialist tells me they’re concerned about poor people losing autonomy as a result of charitable giving, my response would be something like: ”...okay. That’s going to be par for the course in this entire category of projects… By the project’s very nature, it should be clear that it’s not something you’ll want to support if you generally oppose charity.”
That was in the moral uncertainty section, not indirect effects, and the point was to essentially highlight that for people with this worldview (and a range of others) this project—or at least some of the recipient charities—may look bad. So it seems broadly consistent with your suggested approach, if I’ve understood you correctly.
>I wish the section on Donational’s basic strategy had been a lot longer.
Okay, I can see how that would have been useful. To briefly respond to some of your specific questions:
• I’m not aware of any ‘competitors’ as such. AFAIK there is no comparable organization operating in workplaces.
• ‘Market size’, or something close to it, was factored into the ‘growth rate’ and ‘max scale’ parameters of the model. We didn’t provide justifications for every parameter for time and space reasons but I can dig out my notes on specific ones if you want.
• The selection of charities is going to be a trade-off between mass appeal and effectiveness. Currently Donational recommends a very broad range (based on The Life You Can Save’s, I think) but we felt there were too many that were unproven or seemed likely to be orders of magnitude worse than the top charities, which undermined the impact focus. After some discussion, Ian agree to limit them to ACE and GW top charities, plus a US criminal justice one. (If it were my program, I’d probably exclude Give Directly, some of the ACE charities, and the criminal justice one, and would perhaps include some longtermist options—but ultimately it’s not my choice.)
I’ll leave your points about strategy for Tee or Luisa to answer.
Agree. I wasted several years doing dead-end minimum wage jobs to pay for my degrees, chose a less prestigious university to do them at due to their lower fees, and still had to take a leave of absence halfway through my master’s partly for financial reasons. Even a regular loan with a reasonable rate of interest would have been fine—the fees in the UK aren’t that high for domestic students (~£5-16k for a whole master’s course) - but at the time the UK government wasn’t giving postgrad loans, and even now they are capped at about 10k so you have to find most of your living costs from elsewhere. A loan of 3-10k would have added a couple of counterfactual years of EA-focused work to my career.
There really doesn’t seem to be much awareness within EA of the basic financial challenges that some people face. When I mentioned my situation, the response was often a confused ‘Why don’t you just borrow the money from your friends/family/bank?’, as if that were an option for everyone. Relatedly, when I mentioned at an EA meetup that I was on the dole because I couldn’t find work, someone asked how much I got (~£65 per week), and said “Oh, is that all? It’s not really worth bothering to claim then, is it?” Someone else was baffled why I took a 24-hour bus trip for $20 rather than pay $80 for a two-hour flight. It’s no wonder EAs are seen as elitist and out of touch.
I would add that one advantage of explicitly considering the indirect effects of charities is that it makes them more salient. I’d imagine most donors/funders would only really think of the direct benefits when deciding whether a project like CAP (or the charities themselves) is worthwhile, so it helps to highlight the other issues at stake. This consideration may outweigh the methodological concerns with ‘double-counting’ etc.
Thanks Jonas. Sorry for the slow reply—I took some time off and am now travelling.
I was also uneasy about counting indirect effects of the beneficiary charities rather than CAP itself, though I’m not sure it’s for exactly the same reasons as you. The measure of benefit in the CEA was dollars counterfactually moved to the recommended charities (or equivalent), which seems to implicitly cover all consequences of those dollars. Considering indirect effects separately, with the implication that these consequences are in addition to the dollars moved, risks double-counting.
I’m not sure if that’s what you’re getting at, or if you also think we should be explicitly modelling the indirect effects of all the beneficiary charities in the CEA. I don’t think the latter is really feasible, for several reasons. For a start, we would need some universal metric for capturing both the direct and indirect consequences. The best one would probably be subjective wellbeing, but figuring out the SWB associated with a wide range of circumstances, across a large number of species, is not within the scope of an evaluation like this. Another issue is that indirect effects would likely overwhelm the direct ones, and it isn’t clear that this is appropriate given their far more speculative nature. There would probably have to be some kind of weighting system that discounts the more uncertain predictions (could be a fully Bayesian approach, or a simpler option like this), but implementing such a system itself would be very hard to do well, and perhaps less useful than highlighting them and scoring them much more subjectively as we’ve done here.
Yeah, I did wonder if we were talking past each other a bit, and I’d be interested to clear that up – but no worries if you don’t have time.
Hi Oliver.
Thanks for your comments.
I think there are some reasonable points here.
• I certainly agree that the model is overkill for this particular evaluation. As we note at the beginning, this was a ‘proof of concept’ experiment in more detailed evaluation of a kind that is common in other fields, such as health economics, but is not often (if ever) seen in EA. In my view – and I can’t speak for the whole team here – this kind of cost-effectiveness analysis is most suitable for cases where (a) it is not possible to run a cheap pilot study with short feedback loops, (b) there is more actual data to populate the parameters, and (c) there is more at stake, e.g. it is a choice between one-off funding of several hundred thousand dollars or nothing at all.
• I would also be interested to see an explicit case against Donational.
However, I’d like to push back on some of your criticisms as well, many of which are addressed in the text (often the Executive Summary).
• A description of what Donational has done so far, and the plans for CAP, is in the Introducing Donational section. This could also constitute a basic argument for Donational, but maybe you mean something else by that. I don’t know what you want to know about its operations beyond what is in this section and the Team Strength section. If you tell us what exactly you think is missing, maybe we can add it somewhere.
• We don’t give “an explanation of a set of cruxes and observations that would change the evaluators mind” as such, but we say what the CEE is most sensitive to (and that the pilot should focus on those), which sounds like kind of the same thing, e.g. if the number and size of donations were much higher or lower then our conclusions would be different. I’ve added a sentence to the relevant part of the Exec Summary: “The base case donation-cost ratio of around 2:1 is below the 3x return that we consider the approximate minimum for the project to be worthwhile, and far from the 10x or higher reported by comparable organizations. The results are sensitive to the number and size of pledges (recurring donations), and CAP’s ability to retain both ambassadors and pledgers. Because of the high uncertainty, very rough value of information calculations suggest that the benefits of running a pilot study to further understand the impact of CAP would outweigh the costs by a large margin.” EDIT: We also address potential ‘defeators’ in the Team Strength section, and note that we would be reluctant to support a project with a high probability of one or more major indirect harm, or that looked very bad according to one or more plausible worldviews. This strongly implies at least some of the observations that would change our mind.
• We mention an early BOTEC of expected donations (which I assume is similar to the Fermi estimate that you’re suggesting) in at least three places. This includes the Model Verification section where I note that “Parameter values, and final results, were compared to our preliminary estimates, and any major disparities investigated.” Maybe I should have been clearer that this was the BOTEC, and perhaps we should have published the BOTEC alongside the main model.
• We make direct comparisons with OFTW, TLYCS, and GWWC throughout the CEA and to a lesser extent in other sections, and explain why we don’t fully model their costs and impacts alongside CAP.
• “after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.”
That’s kind of what we did. We converted the CE ratio, and scores on other criteria, to crude Low/Medium/High categories, and made a somewhat subjective final decision that was informed by, but not mechanistically determined by, those scores and other information. A more purely intuition-driven approach would likely have either enthusiastically embraced the full CAP or rejected it entirely, whereas a formal model led us to what we think is a more reasonable middle ground (though we may have arrived in a similar place with a simpler model).
• Even for this evaluation, there was some value in the more ‘advanced’ methods. E.g. the VOI calculation, rough though it was, was important for deciding how much to recommend be spent on a pilot; and our final CEE (<2x) was a fair bit lower than the BOTEC (about 3-4x), largely because of the more pessimistic inputs we elicited from people with a more detached perspective, and more precise modelling of costs.It seems like a large part of the problem is that most people don’t have time to read such a long post in detail. In future we should perhaps do a more detailed Exec Summary, and I’ll consider expanding this one further if there is enough demand.
Thanks again for engaging with this!- Jul 26, 2019, 7:46 PM; 16 points) 's comment on Rethink Grants: an evaluation of Donational’s Corporate Ambassador Program by (
Thanks - should be fixed now.
“Health and happiness: some open research topics”
This has been 90% complete for >6 months but finishing it has never seemed the top priority. The draft summary is below, and I can share the drafts with interested people, e.g. those looking for a thesis topic.
Summary
While studying health economics and working on the 2019 Global Happiness and Wellbeing Policy Report, I accumulated a list of research gaps within these fields. Most are related to the use of subjective wellbeing (SWB) as the measure of utility in the evaluation of health interventions and the quantification of the burden of disease, but many are relevant to cause prioritisation more generally.
This series of posts outlines some of these topics, and discusses ways they could be tackled. Some of them could potentially be addressed by non-profits, but the majority are probably a better fit for academia. In particular, many would be suitable for undergraduate or master’s theses in health economics, public health, psychology and maybe straight economics – and some could easily fill up an entire PhD, or even constitute a new research programme.
The topics are divided into three broad themes, each of which receives its own post.
Part 1: Theory
The first part focuses on three fundamental issues that must be addressed before the quality-adjusted life-year (QALY) and the disability-adjusted life-year (DALY) can be derived from SWB measures, which would effectively create a wellbeing-adjusted life-year (WELBY).
Topic 1: Reweighting the QALY and DALY using SWB
Topic 2: Anchoring SWB measures to the QALY/DALY scale
Topic 3: Valuing states ‘worse than dead’
Part 2: Application
Assuming the technical and theoretical hurdles can be overcome, this section considers four potential applications of a WELBY-style metric.
Topic 4: Re-estimating the global burden of disease based on SWB
Topic 5: Re-estimating disease control priorities based on SWB
Topic 6: Estimating SWB-based cost-effectiveness thresholds
Topic 7: Comparing human and animal wellbeing
Parts 1 and 2 include a brief assessment of each topic in terms of importance, tractability and neglectedness. I’m pretty sceptical of the ITN framework, especially as applied to solutions rather than problems, and I haven’t tried to give numerical scores to each criterion, but I found it useful for highlighting caveats. Overall, I’m fairly confident that these topics are neglected, but I’m not making any great claims about their tractability, importance or overall priority relative to other areas of global health/development, let alone compared to issues in other cause areas. It would take much more time than I have at the moment to make that kind of judgement.
Part 3: Challenges
The final section highlights some additional questions that require answering before the case for a wellbeing approach can be considered proven. These are not discussed in as much detail and no ITN assessment is provided (the Roman numerals reinforce their distinction from the main topics addressed in Parts 1 and 2).
(i) Don’t QALYs and DALYs have to be derived from preferences?
(ii) In any case, shouldn’t we focus on improving preference-based methods?
(iii) Should the priority be reforming the QALY rather than the DALY?
(iv) Are answers to SWB questions really interpersonally comparable?
(v) Which SWB self-report measure is best?
(vi) Whose wellbeing is actually measured by self-reported SWB scales?
(vii) Whose wellbeing should be measured?
(viii) How feasible is it to obtain the required data?
(ix) Are more objective measures of SWB viable yet?
Part 3 also concludes the series by considering the general pros and cons of working on outcome metrics.
EA does not support psychedelic scale-up at all. (Good Ventures does fund some academic research into psychedelics on the order of a million USD annually, though this funding isn’t under the aegis of EA.)
I wonder if this is partly because of fears it would make EA look weird(er). Whether or not it’s an been an issue up to this point, perhaps some EAs would be keener to support these efforts in future if they were not closely associated with the EA ‘brand’.
Interesting, thanks.
I don’t really see ESM as being in opposition to QALYs. It seems like it’s a method that you would use as an input in QALY weight determinations.
It can be, yes, but QALYs trade off length and quality of life, whereas ESM would only tell you about QoL/wellbeing. There would still need to be some other process (or some major assumptions) to anchor the results on the QALY (or DALY) scale.
To add to the chorus: I included a couple paragraphs on psychedelics (broadly construed) to Chapter 3 of the Global Happiness and Wellbeing Policy Report. http://www.happinesscouncil.org/
I’m not sure I should have said that the drugs are unpatentable. While strictly true, the delivery mechanisms and other aspects of treatment can be and have been patented, with the potential for raising costs and restricting availability.
As an aside: I just came across this disturbing critique of Compass Pathways. Sounds like they might not be as public-spirited as I’d initially assumed. But I haven’t looked into it in detail. https://qz.com/1454785/a-millionaire-couple-is-threatening-to-create-a-magic-mushroom-monopoly/
I agree that mental health is an area worth looking into, and you make some good suggestions.
A few comments based on a pretty quick, first-pass reading of the post:
I’m not sure why you use 2013 GBD estimates. They are updated every year and 2017 figures are on the IHME website, and published in the Lancet – though I’d expect the general pattern to be similar. Note that GBD is likely to underestimate the burden of mental illness for various methodological reasons: see Vigo et al (2016).
A ketamine nasal spray and a psilocybin formulation have also been given ‘breakthrough drug’ status by the FDA, and as you mention there is ongoing research in the UK on psilocybin, so it seems like these will become prescribable for mental disorders within the next few years in some countries. I’m unsure how much additional benefit would come from further funding/campaigning on this issue, though rescheduling the substances to Schedule 2 or lower would certainly make research easier, and making them available in developing countries could take a lot of work. (Ketamine is already Schedule 2 in many countries, and there are clinics in the US [including San Fransisco], UK [including Oxford] and Canada that already offer intravenous ketamine therapy for depression).
It’s worth noting that psilocybin and other psychedelics can genuinely cure depression (and anxiety, OCD, addiction...), perhaps in a similar way to CBT, by breaking harmful thought patterns. My inexpert understanding is that ketamine is probably more like standard anti-depressants in that it ‘numbs the pain’ rather than solving the underlying causes (the most recent evidence suggests it works via opioid system activation, making the pain analogy even more apt), though it does lead to permanent remission in some cases, and works much faster with (usually) fewer side-effects. MDMA is sort of inbetween, not directly curing PTSD etc through some biochemical process, but enabling more effective psychological therapy, which can lead to a permanent cure.
You don’t seem to mention electroconvulsive therapy (ECT). My vague understanding is that it can help with schizophrenia and chronic pain as well as depression and is probably underused due to its associations with One Flew Over the Cuckoo’s Nest, etc.
“As the global development charities recommended by GiveWell are widely accepted as the most effective organisations to donate to at present, they are the natural point of comparison.” I suspect a minority of EAs think GiveWell orgs are the most effective, given the interest in far future, animals, etc. But I agree they are the most natural comparators.
Having skimmed the Strong Minds Phase 2 report, I’m skeptical of the claimed effect size, e.g. the control group was not randomly selected from the same population as the intervention group; I think it was from people who didn’t want group therapy, who may be different in all sorts of ways. They did take some steps to adjust for social desirability bias, but I suspect that’s also still an issue. More generally, things tend to look worse the more you examine the evidence, especially when that comes from an interested party. So I agree it’s a promising candidate but would probably assume a smaller effect size when making cost-effectiveness estimates.
GiveWell’s latest model says AMF averts a death for more like $4,000, or $4,500 after accounting for leverage and funging.
GW also says: “Note that our cost-effectiveness analyses are simplified models that do not take into account a number of factors. For example, our model does not include the short-term impact of non-fatal cases of malaria prevented on health or productivity, prevention of other mosquito-borne diseases, or reductions in health care costs due to LLINs reducing the number of cases of malaria. It also does not include possible offsetting impacts or other harms. We do include possible developmental impacts on children who sleep under an LLIN.” So the true effect of AMF on happiness could be substantially greater than your estimate.
I’m increasingly skeptical of analyses that don’t try to account for long-term indirect effects/‘cluelessness’. It’s plausible that most value created/lost by most interventions is not captured by standard assessments, so the magnitude and even sign of their impact is often unclear. I’m not sure how best to deal with this though; Peter Hurford has made a few suggestions and I think there was a recent series of posts in this forum on the topic.
I wouldn’t assume the mean LS among AMF beneficiaries is the same as the national average – though this assumption may make your analysis more conservative, since I’d assume recipients of bed nets tend to have worse lives than the average.
I’m more skeptical now that mapping from affect to LS is a good approach, because they track different things, e.g. I’d expect someone with low affect because of poverty to report a lower LS than someone with the same level of affect due to mental illness, since (I think) LS responses tend to be more heavily influenced by objective circumstances.
Relatedly: I wouldn’t rely too heavily on LS as it doesn’t seem a great proxy for actual hedonic states, which are probably more important (even if you’re not a strict utilitarian). It might be the best available data so I don’t object to its being used in preliminary analyses, but it adds uncertainty to your claims. That said, I’d expect a shift to measuring affect, or some combination of affect and satisfaction, to favour mental health treatments, since mental illness has even greater impacts on mood than on LS.
My GP and some friends recommended Sleepio, a CBT-based online programme for insomnia. It’s not cheap, but if you participate in their research you get it for free, and anecdotally it seems most people who request that option are accepted eventually (I had to wait a couple months, I think). I’m not sure how it compares to other CBT programmes; the only evidence they cite for their specific programme is a pretty small RCT (N=164, divided into 3 treatment groups) that they conducted themselves.
When it comes to drug therapy, I’m a little surprised there isn’t more attention given to mirtazapine (Remeron in the US), which is an anti-depressant that’s also sedating. The effect size for depression compares favourably to most alternatives (e.g. Cipriani et al., 2018), and there is good evidence it improves sleep in a large proportion of users (e.g. Wichniak et al., 2017). In the UK at least, it’s not supposed to be prescribed for insomnia alone, just comorbid insomnia and depression, and is considered a ‘second-line’ antidepressant after SSRIs, but I think it’s used off-label for insomnia alone in some countries.
Aside from weight gain and withdrawal effects, the main concern is that it’s mildly anti-cholinergic. Other drugs with a much stronger anti-cholinergic effect have recently been found to increase the risk of dementia in over-60s (e.g. Richardson et al., 2018), so there are theoretical grounds for suspecting it could cause non-clinical deficits in brain functioning of younger people. But chronic sleep deprivation and depression are also really bad for long- as well as short-term cognitive functioning, as are other drug therapies (e.g. diphenhydramine [Nytol/Benadryl] and other anti-histamines are much more strongly anti-cholinergic, and benzodiazepines/Z-drugs are bad for you in all kinds of ways). So if CBT etc doesn’t work, it might be worth considering.
I’m working with Paul Dolan on a report related to this topic, and the need to value states worse than dead (SWD) is our only major point of disagreement. He gives some kind of justification for his views on p26 of this report, but I find it extremely unconvincing.
But to be fair, nobody has proposed a particularly good method for dealing with SWD. Most attempts have used versions of the time trade-off, with limited success – for a slightly outdated review, see Tilling et al., 2012.
In this Facebook post I’ve listed a few options for achieving the related but more modest objective of determining the point in happiness scales that is equivalent to death.
Any deterministic analysis (using point estimates, rather than probability distributions, as inputs and outputs) is unlikely to be accurate because of interactions between parameters. This also applies to deterministic sensitivity analyses: by only changing a limited subset of the parameters at a time (usually just one) they tend to underestimate the uncertainty in the model. See Claxton (2008) for an explanation, especially section 3.
This is one reason I don’t take GiveWell’s estimates too seriously (though their choice of outcome measure is probably a more serious problem).