Level 3 headings should be supported. Unless it’s changed recently, it currently jumps from Level 2 to Level 4, which makes it hard to logically format complex documents.
Derek
I like your general approach to this evaluation, especially:
the use of formal Bayesian updating from a prior derived in part from evidence for related programmes
transparent manual discounting of the effect size based on particular concerns about the direct study
acknowledgement of most of the important limitations of your analysis and of the RCT on which it was based
careful consideration of factors beyond the cost-effectiveness estimate.
I’d like to see more of this kind of medium-depth evaluation in EA.
I don’t have time at the moment for a close look at the CEA, but aside from limitations acknowledged in your text, 3 aspects stand out as potential concerns:
1. The “conservative” and “optimistic” results are quite extreme. This seems to be in part because “conservative” and “optimistic” values for several parameters are multiplied together (e.g. DALYs gained, yearly retention rate of benefits, % completing the course, discount rates...). As you’ll know, it is highly improbable that even, say, three independent parameters would simultaneously obtain at, say, the 10th percentile: 0.1*0.1*0.1 = 0.001. Did you consider making a probabilistic model in Guesstimate, Causal, Excel (with macros for Monte Carlo simulation), R, etc in order to generate confidence intervals around the final results? (I appreciate there are major advantages to using Sheets, but it should be fairly straightforward to reproduce at least the “Main CEA” and “Subjective CEA inputs” tabs in, for example, Guesstimate. This would also enable a rudimentary sensitivity analysis.)
2. The inputs for “Yearly retention rate of benefits” (row 10) seem pretty high (0.30, 0.50, and 0.73 for conservative, best guess, and optimistic, respectively) and the results seem fairly sensitive to this parameter. IIRC the study this was based on only had an 8-week follow-up, which would be about half your “conservative” figure (8/52 = 0.15). Even their “extended” follow-up (without a control group) was only for another 2 months. It is certainly plausible that the benefits endure for several months, but I would say that estimates of about 0.1, 0.3, and 0.7 are more reasonable. With those inputs, the cost per DALY increases to about $47,000, $4,500, or $196. That central figure is roughly on a par with CBT for depression in high-income countries, i.e. pretty good but not comparable with developing-country interventions. (And I wouldn’t take the “optimistic” figure seriously for the reasons given in (1) above.)
3. I haven’t seen the “growth model” on which the cost estimates are based, but my guess is that it doesn’t account for the opportunity cost of facilitators’ (or participants’) time. IIRC each course is led by two “skilled” volunteers who may otherwise do another pro-social activity.
The following is a tidy, oversimplified version of what happened.
I learned about Bentham and Mill in A-level history class (aged 17) and I think read a Peter Singer book. I was very left-wing at the time but I remember being really frustrated that all the other altruistically-minded kids in my class supported standard leftist policies for ideological reasons even when they harmed disadvantaged people. This influenced me to study philosophy at undergrad level, where I defended utilitarianism.
Unfortunately EA hadn’t been invented at the time so I spent the first year after graduation working in warehouses and call centers, followed by about nine years of direct development work in low-income countries. I got frustrated by the inefficiency of most development orgs and decided to switch fields into either law (‘earning to give’ before I’d heard of the concept) or public health (to do direct work with more quantifiable impacts).
Around the same time I was searching online for information about charity evaluation and came across GiveWell, then the Singer TED Talk and the wider EA community. This may have influenced me to choose public health, though there were other factors (e.g. the 2008 financial crash made it even harder than usual to pursue a lucrative law career). I spent 18 months in Australia doing whatever work I could find – mostly farm labouring – to pay for my master’s course.
During the course I became more involved in EA, and got interested in health economics, especially methods for cost-effectiveness analysis. But I couldn’t get a job or PhD in health economics with a general public health background, so to save up for a second master’s I spent two more years doing mostly sub-minimum wage temp jobs, or saving dole money when I couldn’t find work (though I also got a bit of contract work with GiveWell towards the end of this period). Halfway through that course I ran out of money and had some health issues, so I took a leave of absence, during which time I worked on the 2019 Global Happiness Policy Report (Chapter 3), then got the Rethink job.
My reasons for continuing to work in EA are some mixture of those given by my colleagues.
A recent post on this forum (the fourth most popular ever, at the time of writing) argued that “randomista” development projects like AMF are probably less cost-effective than projects to promote economic growth. Do you have any thoughts on this?
Most cost-effectiveness analyses by EA orgs (and other charities) use a ratio of costs to effects, or effects to costs, as the main—or only—outcome metric, e.g. dollars per life saved, or lives affected per dollar. This is a good start, but it can be misleading as it is not usually the most decision-relevant factor.
If the purpose is to inform a decision of whether to carry out a project, it is generally better to present:
(a) The probability that the intervention is cost-effective at a range of thresholds (e.g. there is a 30% chance that it will avert a death for less than my willingness-to-pay of $2,000, 50% at $4,000, 70% at $10,000...). In health economics, this is shown using a cost-effectiveness acceptability curve (CEAC).
(b) The probability that the most cost-effective option has the highest net benefit (a term that is roughly equivalent to ‘net present value’), which can be shown with a cost-effectiveness acceptability frontier (CEAF). It’s a bit hard to get one’s head around, but sometimes the most cost-effective intervention has lower expected value than an alternative, because the distribution of benefits is skewed.
(c) A value of information analysis to assess how much value would be generated by a study to reduce uncertainty. As we found in our evaluation of Donational, sometimes interventions that have a poor cost-effectiveness ratio and a low probability of being cost-effective nevertheless warrant further research; and the same can be true of interventions that look very strong on those metrics.
See Briggs et al. (2012) for a general overview of uncertainty analysis in health economics, Barton et al. (2008) for CEACs, CEAFs and expected value of perfect information, and Wilson (2014) for a practical guide to VOI analyses (including the value of imperfect information gathered from studies).
Of course, these require probabilistic analyses that tend to be more time-consuming and perhaps less transparent than deterministic ones, so simpler models that give a basic cost-effectiveness ratio may sometimes be warranted. But it should always be borne in mind that they will often mislead users as to the best course of action.
[EDIT: I no longer endorse all of this comment. After looking more closely at the papers, I’m more confident that the spillover effects of the latest version of the program are neutral to positive (at least on humans – growth in meat consumption is an important caveat).]
Thanks for posting this.
Though not reported here, I was pleased to see that non-market effects were also recorded in the study, and that these were neutral or positive for both recipients (‘treated households’) and non-recipients.
For treated households, we find positive and significant effects for four of the six indices: psychological well-being, food security, education and security [i.e. crime rates]. Estimated effects are close to zero and not significant for the health index and female empowerment index. When looking at total effects including spillovers for the treated, we find a similar pattern for all but the security index. For untreated households, we find no significant effects of local cash transfers except for the education index, which is higher by 0.1 SD (p < 0.10). Importantly, we do not find evidence of adverse spillover effects for untreated households on any of the indices, with point estimates positive for all but the security index, which is indistinguishable from zero (-0.02 SD, SE 0.07).
I’m particularly interested in the “psychological wellbeing” index, which Appendix C1 says comprises a “weighted, standardized average of depression (10 question CES-D scale), happiness, life satisfaction, and perceived stress (PSS-4)”. I would like to know: (a) what measures were used for “happiness” and “life satisfaction”; (b) how the components of the index were weighted; and most of all (c) a breakdown of scores for each measure. I can’t find this information in the paper.
I’m asking because there is a fair amount of research suggesting that one person’s income increase causes wellbeing declines among other members of the community (i.e. people feel worse when their neighbour gets richer), at least for some accounts of wellbeing. For instance, Haushofer, Reisinger, & Shapiro (2019) found that neighbours of GiveDirectly cash recipients experienced a decline in psychological wellbeing (seemingly measured by a similar index to the one used in the most recent study) about half as great as the psychological wellbeing benefit to the recipient. Depending on how many neighbours are affected by each transfer, this would seem to indicate that GiveDirectly may have a net negative effect on aggregate wellbeing. However, this effect was driven entirely by life satisfaction, an ‘evaluative’ or ‘cognitive’ measure; there were no negative spillovers on measures of ‘hedonic’ wellbeing, namely “happiness”, “stress”, and “depression”. As the authors note:
This result is intuitive: the wealth of one’s neighbors may plausibly affect one’s overall assessment of life, but have little effect on how many positive emotional experiences one encounters in everyday life. This result complements existing distinctions between these different facets of well-being, e.g. the finding that hedonic well-being has a “satiation point” in income, whereas evaluative well-being may not (Kahneman and Deaton, 2010).
Without seeing the disaggregated scores from the new study, it seems possible that there were non-trivial and statistically significant harms (or benefits) according to some components of the index. This matters to those with a preferred moral theory or conception of wellbeing, e.g. a classical utilitarian probably cares more about hedonic states than life evaluations, and a prioritarian more about severe states like depression than positive ones like happiness.
This is a recognised issue in health technology assessment. The most common solution is to first plot the incremental costs and effects on a cost-effectiveness plane to get a sense of the distributions:
Then to represent uncertainty in terms of the probability that an intervention is cost-effective at different cost-effectiveness thresholds (e.g. 20k and 30k per QALY). On the CEP above this is the proportion of samples below the respective lines, but it’s generally better represented by cost-effectiveness acceptability curves (CEACs), as below:
Often, especially with multiple interventions, a cost-effectiveness acceptability frontier (CEAF) is added, representing the probability that the optimal decision (i.e. the one with highest expected net benefit) is the most cost-effective.
I can dig out proper references and examples if it would be useful, including Excel spreadsheets with macros you can adapt to generate them from your own data (such as samples exported from Guesstimate). There are also R packages that can do this, e.g. hesim and bcea.
Can you explain your ’20 minute rule’?
Hi Oliver.
Thanks for your comments.
I think there are some reasonable points here.
• I certainly agree that the model is overkill for this particular evaluation. As we note at the beginning, this was a ‘proof of concept’ experiment in more detailed evaluation of a kind that is common in other fields, such as health economics, but is not often (if ever) seen in EA. In my view – and I can’t speak for the whole team here – this kind of cost-effectiveness analysis is most suitable for cases where (a) it is not possible to run a cheap pilot study with short feedback loops, (b) there is more actual data to populate the parameters, and (c) there is more at stake, e.g. it is a choice between one-off funding of several hundred thousand dollars or nothing at all.
• I would also be interested to see an explicit case against Donational.
However, I’d like to push back on some of your criticisms as well, many of which are addressed in the text (often the Executive Summary).
• A description of what Donational has done so far, and the plans for CAP, is in the Introducing Donational section. This could also constitute a basic argument for Donational, but maybe you mean something else by that. I don’t know what you want to know about its operations beyond what is in this section and the Team Strength section. If you tell us what exactly you think is missing, maybe we can add it somewhere.
• We don’t give “an explanation of a set of cruxes and observations that would change the evaluators mind” as such, but we say what the CEE is most sensitive to (and that the pilot should focus on those), which sounds like kind of the same thing, e.g. if the number and size of donations were much higher or lower then our conclusions would be different. I’ve added a sentence to the relevant part of the Exec Summary: “The base case donation-cost ratio of around 2:1 is below the 3x return that we consider the approximate minimum for the project to be worthwhile, and far from the 10x or higher reported by comparable organizations. The results are sensitive to the number and size of pledges (recurring donations), and CAP’s ability to retain both ambassadors and pledgers. Because of the high uncertainty, very rough value of information calculations suggest that the benefits of running a pilot study to further understand the impact of CAP would outweigh the costs by a large margin.” EDIT: We also address potential ‘defeators’ in the Team Strength section, and note that we would be reluctant to support a project with a high probability of one or more major indirect harm, or that looked very bad according to one or more plausible worldviews. This strongly implies at least some of the observations that would change our mind.
• We mention an early BOTEC of expected donations (which I assume is similar to the Fermi estimate that you’re suggesting) in at least three places. This includes the Model Verification section where I note that “Parameter values, and final results, were compared to our preliminary estimates, and any major disparities investigated.” Maybe I should have been clearer that this was the BOTEC, and perhaps we should have published the BOTEC alongside the main model.
• We make direct comparisons with OFTW, TLYCS, and GWWC throughout the CEA and to a lesser extent in other sections, and explain why we don’t fully model their costs and impacts alongside CAP.
• “after writing down their formal models and truly understanding their consequences, most decision makers are well-advised to throw away the formal models and go with what their updated gut-sense is.”
That’s kind of what we did. We converted the CE ratio, and scores on other criteria, to crude Low/Medium/High categories, and made a somewhat subjective final decision that was informed by, but not mechanistically determined by, those scores and other information. A more purely intuition-driven approach would likely have either enthusiastically embraced the full CAP or rejected it entirely, whereas a formal model led us to what we think is a more reasonable middle ground (though we may have arrived in a similar place with a simpler model).
• Even for this evaluation, there was some value in the more ‘advanced’ methods. E.g. the VOI calculation, rough though it was, was important for deciding how much to recommend be spent on a pilot; and our final CEE (<2x) was a fair bit lower than the BOTEC (about 3-4x), largely because of the more pessimistic inputs we elicited from people with a more detached perspective, and more precise modelling of costs.It seems like a large part of the problem is that most people don’t have time to read such a long post in detail. In future we should perhaps do a more detailed Exec Summary, and I’ll consider expanding this one further if there is enough demand.
Thanks again for engaging with this!- 26 Jul 2019 19:46 UTC; 16 points) 's comment on Rethink Grants: an evaluation of Donational’s Corporate Ambassador Program by (
I’m not a philosopher, but to the extent I have opinions on such things they are about the same as Moss’s, i.e. classical hedonistic utilitarianism with quite a lot of moral uncertainty. I have somewhat suffering-focused intuitions but (a) I’ve never seen a remotely convincing argument for a suffering-focused ethic, and (b) I think my intuitions – and, I suspect, those of many people who identify as suffering-focused – can be explained by other factors. In particular, I think there are problems with the scales people use to measure valence/wellbeing/value of lives, both in reality and in thought experiments, e.g. it seems common for philosophers to assume a symmetrical scale like −10 to +10, whereas it seems pretty obvious to me that the worst lives – or even, say, the 5th percentile of lives – are many times more bad then the best lives are good. So if the best few percent of lives are 10⁄10 and 0 is equivalent to being dead, the bottom few percent of any large population are probably somewhere between −100 and −100,000. (It is not widely appreciated just how awful things are for so many people.) If true, classical utilitarianism may have policy implications similar to prioritarianism and related theories, e.g. more resources for the worst off (assuming tractability). But I haven’t seen much literature on these scale issues so I’m not confident this is correct. If you know of any relevant research, preferably peer-reviewed, I’d be very interested.
Do you think adopting subjective wellbeing as your primary focus would materially affect your recommendations?
In particular:
(a) Would using SWB as the primary outcome measure in your cost-effectiveness analysis change the rank ordering of your current top charities in terms of estimated cost-effectiveness?
(b) If it did, would that affect the ranking of your recommendations?
(c) Would it likely cause any of your current top charities to no longer be recommended?
(d) Would it likely cause the introduction of other charities (such as ones focused on mental health) into your top charity list?
This is very good, but I think busy (or unmotivated) EAs without much exercise experience would benefit from even more specific recommendations, especially for resistance exercises (i.e. strength training).
I found the Start Bodyweight program useful when beginning resistance training at home with no equipment other than a pull-up bar. An EA recommended the book Overcoming Gravity for more detailed information on bodyweight exercises.
I now I prefer to use the gym. At a glance, the following (which I just found with a quick Google search) seem like sensible gym-based* options for beginners, but maybe you have better ideas.
https://stronglifts.com/5x5/ [I’d add some core exercises to this, like situps and planks]
https://www.shape.com/fitness/workouts/strength-training-beginners
When I’m too busy to do the full range of strength and cardio (or when I’m travelling), I sometimes do moderate/high-intensity interval classes at home using YouTube videos. The Body Coach is pretty good—he has a videos with a range of difficulty (beginner to advanced), duration (10 min+), and muscle focus (legs, upper body, abs, full-body, etc). There are also videos meeting specific needs, e.g. low-impact routines so you don’t disturb your neighbours or hurt your knees, and ones designed for small spaces. This kind of thing is perhaps the most efficient form of exercise: you can do it anywhere, it doesn’t require any equipment, it’s free, it covers both cardio and strength, and it doesn’t take much time.
When travelling, I also take a resistance band. If you choose the weight carefully, a single band (which folds up to the size of a cigarette packet) can arguably substitute for any dumbbell that you’d use in the gym, and some of the machines as well. (The main thing you’re lacking is the ability to do deadlifts, but there are ways around that too.)
I’ve heard some EAs recommend GymPass, especially if you travel a lot and don’t like to exercise alone.
Feel free to correct me on any of this – I don’t have any relevant expertise.
*They could obviously be done at home if you buy the equipment. The last one just needs dumbbells or resistance bands, which are pretty cheap.
There is much to be admired in this report, and I don’t find it intuitively implausible that mental health interventions are several times more cost-effective than cash transfers in terms of wellbeing (which I also agree is probably what matters most). That said, I have several concerns/questions about certain aspects of the methodology, most of which have already been raised by others. Here are just a few of them, in roughly ascending order of importance:
Outcomes should be time-discounted, for at least two reasons. First, to account for uncertainty as to whether they will obtain, e.g. there could be no counterfactual benefit in 10 years because of social upheaval, catastrophic events (e.g. an AI apocalypse, natural disaster), or the availability of more effective treatments for depression/ill-being/poverty. Second, to account for generally improving circumstances and opportunities for reinvestment: these countries are generally getting richer, people can invest cash transfers, etc. This will be even more important when assessing deworming and other interventions with benefits far in the future. (There is probably no need to discount costs as it seems they are incurred around the time the intervention is delivered in both cases.)
I’ve only skimmed the reports, but it isn’t clear to me what exactly is included in the costs for StrongMinds, e.g. sometimes capital costs (buildings etc), or overheads like management salaries and rent, are incorrectly left out of cost-effectiveness analyses. If you haven’t already, you might also want to consider any costs to the beneficiaries, e.g. if therapy recipients had to travel, pay for materials, miss work, etc. As you note, most of the difference in the cost-effectiveness is determined by the programmes’ costs rather than their consequences, so it’s important to get this right (which you may well have done).
You note that both interventions are assessed only in terms of their effect on depression. A couple years ago I summarised the findings of the four available evaluations of GiveDirectly in an unpublished draft post ( see Appendix 2.1, copied below, and the “GiveWell” subsection of section 2.2, the relevant part of which is copied below). The studies recorded data on many other indicators of wellbeing, which were sometimes combined into indices of “psychological wellbeing” with up to 10 components (as well as many non-wellbeing outcomes like consumption and education). Apologies if you explain this somewhere, but why did you only use the data on depression? Was it to facilitate an ‘apples to apples’ comparison, or something like that? If so, I wonder if it that was loading the dice a bit: at first blush, it seems unfair to compare two interventions in terms of outcome A when one is aimed solely at improving outcome A and the other is aimed at improving outcomes A, B, C, D, E, F, G and H (at least when B–H are relevant, i.e. indicators of subjective wellbeing).
I share others’ concerns about the omission of spillovers. In the draft post I linked above (partly copied below), I recorded my impression that the evidence so far, while somewhat lacking, suggests only null or positive spillovers to other households (at least for the current version of the programme, which ‘treats’ all eligible households in the village). As part of a separate project I did last year (which I’m not allowed to share), I also concluded that non-recipients within the household benefited considerably: “Only about 1.6 members of each household (average size ~4.3) were surveyed to get the wellbeing results, of which only 1 actually received the money. There was no statistically significant wellbeing difference between the recipients and surveyed non-recipient household members, and there is evidence of many benefits to non-recipients other than psychological wellbeing (e.g. education, domestic violence, child labour). Nevertheless, we expect the effects to be a little lower among non-recipients…” Omitting the inter-household spillovers is perhaps reasonable for the primary analysis, but it seems harder to justify ignoring benefits to others within the household.
Whatever may be justified for the base case, I don’t understand why you haven’t done a proper sensitivity analysis. Stochastic uncertainty is captured well by the Monte Carlo simulations, but it is standard practice in many fields (including health economics) to carry out scenario analyses that investigate the effects of contestable structural and methodological assumptions. It should be quite straightforward to adapt the model so as to include/exclude (or vary the values of) spillovers, non-depression data, certain kinds of costs, discount rates, etc. You can present the results of these analyses yourself, but users can also put their own set of assumptions in a well-constructed model to see how that changes things. (Many other analyses are also potentially helpful, especially when the difference in cost-effectiveness between the alternatives is relatively small, e.g. deterministic one-way and two-way analyses that show how the cost-effectiveness ratio changes with high/low values for each parameter; threshold analyses that show what value a parameter must attain for the ‘worse’ programme to become the more cost-effective; value of information, showing how much it would be worth spending on further studies to reduce uncertainty; and perhaps most usefully in this case, a cost-effectiveness acceptability curve indicating the probability that StrongMinds is cost-effective at a given threshold, such as the 3-8x GiveDirectly that GiveWell is currently using as its bar for new charities. Some examples are here.)
Topic 2.2: (Re-)prioritising causes and interventions
[…]
GiveWell
[…]
Spillover effects
Secondly, there are also potential issues with ‘spillover effects’ of increased consumption, i.e. the impact on people other than the beneficiaries. This is particularly relevant to GiveDirectly, which provides unconditional cash transfers; but consumption is also, according to GiveWell’s model, the key outcome of deworming (Deworm the World, Sightsavers, the END Fund) and vitamin A supplementation (Hellen Keller International). Evidence from multiple contexts suggests that, to some extent, the psychological benefits of wealth are relative: increasing one person’s income improves their SWB, but this is at least partly offset by decreases in the SWB of others in the community, particularly on measures of life satisfaction (e.g. Clark, 2017). If increasing overall wellbeing is the ultimate aim, it seems important to factor these ‘side-effects’ into the cost-effectiveness analysis.
As usual, GiveWell provides a sensible discussion of the relevant evidence. However, it is somewhat out of date and does not fully report the findings most relevant to SWB, so I’ve provided a summary of wellbeing outcomes from the four most relevant papers in Appendix 2.1. In brief:
All four studies found positive treatment effects, i.e. improvement to the psychological wellbeing of cash recipients, though in two cases this finding was sensitive to particular methodological choices.
Two studies of GiveDirectly found negative psychological spillovers.
Two found only null or positive spillovers.
As GiveWell notes, it is hard to aggregate the evidence on spillovers (psychological and otherwise) because of:
Major differences in study methodology (e.g. components of the psychological wellbeing index, type of control, inclusion/exclusion criteria, follow-up period).
Major differences in the programs being studied (e.g. size of transfers, proportion of households in a village receiving transfers).
Absence of key information (e.g. how many non-recipient households are affected by spillover effects for each treated household, how the magnitude of spillovers changes with distance and over time, how they differ among eligible and ineligible households).
Like GiveWell, I suspect the adverse happiness spillovers from GiveDirectly’s current program are fairly small. In order of importance, these are the three main reasons:
The negative findings were based on within-village analyses, i.e. comparing treated and untreated households in the same village. These may not be relevant to the current GiveDirectly program, which gives money to all eligible households in treated villages (and sometimes all households in the village). The two studies that investigated potential spillovers in untreated villages in the same area as the treated ones found no statistically significant effect.
Eggers et al. (2019) (the “general equilibrium” study), which found only null or positive spillovers, was by far the largest, seems to have had the fewest methodological limitations, and investigated a version of the program most similar to current practice.
At least one of the ‘negative’ studies, Haushofer & Shapiro (2018), had significant methodological issues, e.g. differential attrition rates and lack of baseline data on across-village controls (though results were fairly robust to authors’ efforts to address these).
In addition, any psychological harm seems to be primarily to life satisfaction rather than hedonic states. As noted in Haushofer, Reisinger, & Shapiro (2019): “This result is intuitive: the wealth of one’s neighbors may plausibly affect one’s overall assessment of life, but have little effect on how many positive emotional experiences one encounters in everyday life. This result complements existing distinctions between these different facets of well-being, e.g. the finding that hedonic well-being has a “satiation point” in income, whereas evaluative well-being may not (Kahneman and Deaton, 2010).” This is reassuring for those of us who tend to think feelings ultimately matter more than cognitive evaluations.
Nevertheless, I’m not extremely confident in the net wellbeing impact of GiveDirectly.
Non-trivial comparison effects are found in many other contexts, so it is perhaps reasonable to expect them here too. (I haven’t properly looked at that evidence so I’m not sure how strong my prior should be.)
As with any metric, there are various potential biases in wellbeing measures that could lead to under- or over-estimation of effects. When assessing the actual effect on wellbeing/welfare/utility (rather than on the specific measures of wellbeing used in the study), we should consider the evidence in the context of other findings that I haven’t discussed here.
Even a negative spillover with a very small effect size, which seems plausible in this case, could offset much or all of the positive impact. For instance, if recipient households gain 1 happiness point from the transfer, but every transfer causes 10 other households to lose 0.1 points for the same duration, the net effect is neutral.
I have only summarised the relevant papers; I haven’t tried to critique them in detail. GiveWell has also not analysed the latest versions of some of the key studies, which differ considerably from the working papers, so they might uncover some issues that I haven’t spotted.
A few more notes on interpreting the wellbeing effects of GiveDirectly:
As with other health and poverty interventions, I suspect the overall, long-run impact will be more sensitive to unmeasured and unmodeled indirect effects (e.g. consumption of factory-farmed meat, population size, CO2 emissions) than to methods for estimating welfare (e.g. SWB instruments vs consumption). But I’m leaving these broader issues with short-termist methodology aside for now.
The mechanisms of any adverse wellbeing effects have not been established in this case, and may not be pure psychological ‘comparison effects’ (jealousy, reduced status, etc). For instance, they could be mediated through consumption (e.g. poorer households selling goods to richer ones) or through some other, perhaps culture-specific, process.
Like any metric, SWB measures are imperfect. So even when SWB data are available, an assessment of the SWB effects of an intervention may be improved by taking into account information on other outcomes, plus ‘common sense’ reasoning.
In addition, I would note that the other income-boosting charities reviewed by GiveWell could potentially cause negative psychological spillovers. According to GiveWell’s model, the primary benefit of deworming and vitamin A supplementation is increased earnings later in life, yet no adjustment is made for any adverse effects this could have on other members of the community. As far as I can tell, the issue has not been discussed at all. Perhaps this is because these more ‘natural’ boosts to consumption are considered less likely to impinge on neighbours’ wellbeing than windfalls such as large cash transfers. But I’d like to see this justified using the available evidence.
I make some brief suggestions for improving assessment of psychological spillover effects in the “potential solutions” subsection below.
Four studies investigated psychological impacts of GiveDirectly transfers. Two of these found wellbeing gains for cash recipients (“treatment effects”) and only null or positive psychological spillovers:
Haushofer & Shapiro (2016) (9-month follow-up)
0.26 standard deviation (SD; p<0.01), positive, within-village treatment effect (i.e. comparing treated and untreated households in the same village) on an index of psychological wellbeing with 10 components (Table IV, p. 2011).
Statistically significant benefits for (in decreasing order of magnitude) Depression, Stress, Life Satisfaction, and Happiness at the 1% level, and Worries at the 10% level. Null effects (at the 10% level) on Cortisol, Trust, Locus of Control, Optimism, and Self-esteem (though point estimates were mostly positive).
Null, precise, within-village spillover effect on the index of psychological wellbeing; point estimate positive (0.1 SD; Table III, p. 2004).
Egger et al. (2019) (the “general equilibrium” study)
0.09 SD (p<0.01) within-village treatment effect (i.e. assuming all spillovers are contained within a village) on a 4-item index of psychological wellbeing.
Driven entirely by Life Satisfaction; no effect on Depression, Happiness, or Stress. (See this table, which the authors kindly sent to me on request.)
0.12 SD (p<0.1) “total” treatment effect (both within-village and across-village) on psychological wellbeing.
Driven by Happiness (0.15 SD; p<0.05); no others significant at the 10% level. (See this table.)
Null, fairly precise “total” spillover effect (combining within- and across-village effects) on the index of psychological wellbeing (and on every individual component); point estimate small and positive (0.08 SD). (See this table.)
Note: GiveWell reports a positive, statistically significant within-village spillover effect on psychological wellbeing of about 0.1 SD, based on an earlier draft of the paper. I can’t find this in the published paper; perhaps it was cut because of the authors’ stated preference for the “total” specification.
However, two studies are more concerning:
Haushofer & Shapiro (2018) (3-year follow-up; working paper)
Within-village 0.16 SD (p<0.01) treatment effect on an 8-component index of psychological wellbeing (Table 3, p. 16).
Driven primarily by improvements to Depression and Locus of Control (p<0.05), followed by Happiness and Life Satisfaction (p<0.1). No statistically significant (at the 10% level) change in Stress, Trust, Optimism, and Self-esteem. (Table B.7, p. 55)
Null across-village treatment effect on psychological wellbeing (Table 5, p. 22).
Approx. −0.2 SD (p<0.01) adverse psychological wellbeing spillover on untreated households in treated villages (Table 7, p. 26).
Driven by Stress (p<0.01), Depression (p<0.05), Happiness (p<0.1), and Optimism (p<0.1). No statistically significant (at the 10% level) change in Life Satisfaction, Trust, Locus of control, or Self-esteem. (Table B.15, p. 63)
Haushofer, Reisinger, & Shapiro (2019)
A 1 SD increase in own wealth causes a 0.13 SD (p<0.01) increase in the psychological well-being index (p.13; Table 3, p. 27).
At the average change in own wealth of eligible (thatched-roof) households of USD 354, this translates into a treatment effect of 0.09 SD.
At the average transfer of $709 among treated households, this translates into a treatment effect of 0.18 SD.
Driven by Happiness and Stress (p<0.01) then Life Satisfaction and Depression (p<0.05). No statistically significant (at the 10% level) effect on Salivary Cortisol. (Table 5, p. 29)
A 1 SD increase in village mean wealth (i.e. neighbours in one’s own village having a larger average transfer size) causes a decrease of 0.06 SD in psychological well-being over a 15 month period, only significant at the 10% level (p. 14; Table 3, p. 27).
At the average cross-village change in neighbours’ wealth of $327, this translates into an effect of −0.2 SD.
Driven entirely by Life Satisfaction (0.14 SD; p<0.01; p. 15; Table 5, p. 29)
At a change in neighbours’ wealth of $327, this translates into a Life Satisfaction effect of −0.4 SD (which is much larger than the own-wealth benefit, but less precisely estimated).
Subgroup analysis 1: No statistically significant within-village difference between treated and untreated households in psychological wellbeing effects of a change in neighbours’ wealth. (This suggests that what matters is how much more your neighbours received, not whether you received any transfer.)
Subgroup analysis 2: No statistically significant within-village difference in the psychological wellbeing effect of a change in neighbours’ wealth between households below versus above the median wealth of their village at baseline. (This suggests poorer households did not suffer more adverse psychological spillovers than wealthier ones.)
Methodological variations: Broadly similar results using alternative measures of the change in village mean wealth. (See p. 17 and Tables A.9–A.14 for details.)
No effect of village-level inequality on psychological wellbeing (holding constant one’s own wealth) over any time period and using three alternative measures of inequality.
Note: GiveWell’s review of an earlier version of the paper reports a “statistically significant negative effect on an index of psychological well-being that is larger than the short-term positive effect that the study finds for receiving a transfer, but the negative effect becomes smaller and non-statistically significant when including data from the full 15 months of follow-up… The authors interpret these results as implying that cash transfers have a negative effect on well-being that fades over time.” I’m not sure why the authors removed those analyses from the final version.
Any deterministic analysis (using point estimates, rather than probability distributions, as inputs and outputs) is unlikely to be accurate because of interactions between parameters. This also applies to deterministic sensitivity analyses: by only changing a limited subset of the parameters at a time (usually just one) they tend to underestimate the uncertainty in the model. See Claxton (2008) for an explanation, especially section 3.
This is one reason I don’t take GiveWell’s estimates too seriously (though their choice of outcome measure is probably a more serious problem).
This is very useful – thanks for writing it up.
This heterogeneity across intervention types means that we should be cautious about broad claims about the efficacy of mindfulness for depression and anxiety.
True, but that applies equally to claims of null or small effect sizes, e.g. some forms of mindfulness could be very effective even if ‘on average’ it’s not. Did any of the meta-analyses contain useful subgroup analyses?
(For what it’s worth, a few years ago I used the Headspace app ~5x/week for 3 months and found it to be actively detrimental to my mood. Anecdotally, this seem fairly common: https://www.theguardian.com/lifeandstyle/2016/jan/23/is-mindfulness-making-us-ill)
Thanks for the reply. I don’t have much more time to think about this at the moment, but some quick thoughts:
On time discounting: It might have been reasonable to omit discounting in this case for the reasons you suggest, but (a) it limits comparability across analyses if you or others do it elsewhere; (b) for various reasons, it would be good to have some estimate of the absolute, not just relative, costs and effects of these interventions; and (c) it’s pretty easy to implement in most software, e.g. Excel and R (maybe less so in Guesstimate), so there isn’t usually much reason not to do it.
On costs: (a) You only seem to measure depression, so if costs affect some other aspect of SWB then your analysis will not account for it. (b) It is also a good idea, where feasible, to account for non-monetary costs, such as lost time spent with family, and informal caregiver time. In this case, these are probably best covered by SWB outcomes, rather than being monetised, but since they involve spillovers on people other than the patient, they were not captured in this case. (c) Your detailed CEA of StrongMinds does not make it entirely clear what you mean by “all costs”; it just says “Our estimates of the average cost for treating a person in each programme are taken directly from StrongMinds’ accounting of its costs from 2019,” with no details about those accounts. For example, if they bought an expensive building in which to deliver training in 2018, that cost should normally be amortised over future years (roughly speaking, shared among future beneficiaries for the life of the building). So simply looking at 2019 expenditure does not necessarily capture “all costs”. I suggest reading Chapter 7 of Drummond et al to begin with, for a discussion of practical and conceptual issues in costing of health interventions.
On the focus on depression data: My “loading the dice” comment wasn’t about SDB/demand effects. Suppose, for example, that you want to compare intervention A, which treats both depression and severe physical pain; and intervention B, which only treats depression. You find that B reduces depression by more per dollar than A, so you conclude it is more cost-effective than A, and recommend it to donors. But it’s not really a fair comparison: you don’t know whether the overall benefit per dollar is greater in B than A, because you are ignoring the pain-relieving effects, which are likely greater in A. I haven’t looked at the GD data recently, but I can imagine something like that going on here, e.g. the cash has all sorts of benefits that aren’t captured by the depression measure, whereas the psychotherapy could have few such benefits.
On spillovers: I’m glad you are updating the analysis. To be frank, I think you probably shouldn’t have published this analysis in its current state, primarily due to the omission of spillovers. It’s just too misleading.
On sensitivity analysis: Also pleased you are going to add some of these. You’re right that some take longer than others, and it’s hard/impossible to do some of them in Guesstimate. But I think you can export the samples from Guesstimate to Excel, which should allow you to do some of the key ones without too much work, e.g. EVPI and CEAC/CEAF just need a simple macro and graph; see my Donational model for examples. (For extra usability and flexibility, you can do it in R and make a Shiny web app, but that takes a lot more work.)
This paper, the Drummond book above, and this book are good starting points if you want to learn how to do cost-effectiveness analysis (including sensitivity analysis).
A couple nitpicks:
Your title is misleading: this isn’t/these aren’t “meta-analyses comparing the cost-effectiveness of cash transfers and psychotherapy”. AFAICT, you are doing a cost-effectiveness analysis informed by meta-analyses of the effects of the two interventions. You aren’t doing a meta-analysis of cost-effectiveness studies.
The y axes of your graphs, and some of your tables, say things like “Effects of Depression Improvement”. As far as I can tell, these are showing the effects of the interventions on depression/SWB/MHa in terms of SD. They aren’t, for example, showing the effects of depression (i.e. the consequences of depression for something else), as implied by this wording.
Hi Michael. Thanks for the feedback.
A few general points to begin with:
I think it’s generally fine to use terminology any way you like as long as you’re clear about what you mean.
In this piece I was summarising debates in health economics, and my framing reflects that literature.
The main objective of these posts is to highlight particular issues that may deserve further attention from researchers, and sometimes that has to come at the expense of conceptual rigour (or at least I couldn’t think of a way to avoid that tradeoff). Like you, my natural inclination is to put everything in mutually exclusive and collectively exhaustive categories, but that doesn’t always result in the most action-relevant information being front and centre.
To address your specific points:
I try to make it very clear what I mean by “welfarism” and its alternatives:
The QALY originally emerged from welfare economics, grounded in expected utility theory (EUT), which defined welfare in terms of the satisfaction of individual preferences. QALYs were intended to reflect, at least approximately, the preferences of a rational individual decision-maker (as described by the von Neumann-Morgenstern [vNM] axioms) concerning their own health, and could therefore properly be called utilities.
Others have argued that QALYs should not represent utility in this sense. These “non-welfarists” or “extra-welfarists” typically believe things like equity, capability, or health itself are of intrinsic value (Brouwer et al., 2008; Coast, Smith, & Lorgelly, 2008; Birch & Donaldson, 2003; Buchanan & Wordsworth, 2015). If such considerations are included in the QALY, the (welfarist) utility of patients may not change proportionally with the size of QALY gains.
Most criticism of HALYs has come from three broad camps: welfare economics (which aims to maximise the satisfaction of individual preferences), extra-welfarism (which has other objectives), and wellbeing (often but not always from a classical utilitarian perspective).
In a nutshell, welfarists complain that QALYs, and CEAs based on them, do not reflect the preferences of rational, self-interested utility-maximizers.
Extra-welfarists, on the other hand, generally think the QALY (and CEA more broadly) is currently too welfarist. Though extra-welfarism is ill-defined and encompasses a broad range of views, the uniting belief is that there is inherent value in things other than the satisfaction of individuals’ preferences (Brouwer et al., 2008).
For the welfarist, there are broader efficiency-related issues with using cost-per-HALY CEAs for resource allocation […] Therefore, counting everyone’s health the same does not maximise utility in the welfarist sense, even within the health sector.
So it should be clear that welfarism, as the term is used in modern (health) economics, offers a very specific theory of value (satisfaction of rational, self-regarding preferences that adhere to the axioms of expected utility theory) that is much more narrow than most desire theories. That said, I agree welfarism, extra-welfarism, and wellbeing-oriented ideas are not entirely distinct categories, and note overlaps between them:
Hedonism: … This is associated with the classical utilitarianism of Jeremy Bentham and John Stuart Mill, classical economics (mid-18th to late 19th century)…
Desire theories: Wellbeing consists in the satisfaction of preferences or desires. This is linked with neoclassical (welfare) economics, which began defining utility/welfare in terms of preferences around 1900 (largely because they were easier to measure than hedonic states), preference utilitarianism, …
Objective list theories: Wellbeing consists in the attainment of goods that do not consist in merely pleasurable experience nor in desire-satisfaction (though those can be on the list). … These have influenced some conceptions of psychological wellbeing,[46] and many extra-welfarist ideas. The capabilities approach also falls under this heading…
I mention distributional issues in the context of extra-welfarism:
These “non-welfarists” or “extra-welfarists” typically believe things like equity, capability, or health itself are of intrinsic value (Brouwer et al., 2008; Coast, Smith, & Lorgelly, 2008; Birch & Donaldson, 2003; Buchanan & Wordsworth, 2015). If such considerations are included in the QALY, the (welfarist) utility of patients may not change proportionally with the size of QALY gains.
Descriptively, it seems the extra-welfarists are winning. Although QALYs, and CEA as a whole, do not generally include overt consideration of distributional factors, they do depart from traditional welfare economics in a number of ways …
This “QALY egalitarianism” is often challenged by welfarists on the grounds that WTP varies among individuals, but many extra-welfarists reject it for other reasons. For example, some have argued that more value should be attached to health gained by the young—those who have not yet had their “fair innings”—than by the elderly (Williams, 1997); by those in a worse initial state of health, or for larger individual health gains[43] (e.g., Nord, 2005); by those who were not responsible for their illness (e.g., Dworkin, 1981a, 1981b); by those at the end of life, as currently implemented by NICE; or by people of low socioeconomic status.[44]
They are addressed further in Part 2 when I discussed how HALYs should be aggregated.
I do think I could perhaps have been clearer about the distinction between HALYs and economic evaluation (the latter is typically HALY-maximising, but doesn’t have to be), and analogously between the unit of value (e.g. wellbeing, health) and moral theory (utilitarianism, egalitarianism, etc). I may edit the post later if I have time.
What you call problem 2 I’d reframe as expectations =/= reality.
“Preferences =/= value” was intended as shorthand for something like “the preferences on which current HALY weights are based do not accurately reflect the value of the states to people experiencing them”. Or as I put it elsewhere: “They are based on ill-informed judgements of the general public”. It wasn’t a philosophical comment on desire theories. Still, I can see how it might be misleading (plus it doesn’t strictly apply to DALYs, which arguably aren’t preference-based), so I may change it to your suggestion...though “expectations” doesn’t really fit DALYs either, so I’d welcome alternative ideas.
I agree problem 3 (suffering/happiness) is about inadequate scaling and doesn’t presuppose hedonism, but I don’t think I imply otherwise. I decided to include it as a separate problem, even though it’s applicable to more than one type of scale/theory, because it’s an issue that is very neglected—in health economics and elsewhere. As noted above, the aim of this series is to draw attention to issues that I think more people should be working on, not make a conceptually/philosophically rigorous analysis.
That’s also why I didn’t have distributional issues as a separate “problem”. I note at the the start of the list that “The criticisms assume the objective is to maximize aggregate SWB” (while also noting that they “should also hold some force from a welfarist, extra-welfarist, or simply ‘common sense’ perspective”) and from that standpoint the current default (in most HALY-based analyses/guidelines) of HALY maximisation is not a “problem,” so long as they better reflect SWB. That said, as noted above, I do mention distributional issues earlier in the post and in Part 2, in case someone does want to work on those.
Problem 4 is not that HALYs don’t include spillovers; it’s that “They are difficult to interpret, capturing some but not all spillover effects.” (When I say “Neglect of spillover effects,” I mean that the issue of spillovers is problematically neglected in the literature, not that HALYs don’t measure them at all.) This should be clear from the text:
there is some evidence that people valuing health states take into account other factors, especially impact on relatives … On the other hand, it seems reasonable to assume health state values do not fully reflect the consequences for the rest of society—something that would be impossible for most respondents to predict, even if they were wholly altruistic.
I agree this is likely to be an issue with other metrics too (Part 6 is all about this, and it’s mentioned in Part 2), and I suspect it will mostly have to be dealt with at the aggregation stage, but it’s not the case that the content of the metrics is irrelevant. For example, the questionnaires (and therefore the descriptive system) could include items like “To what extent do you feel you’re a burden on others?” (a very common concern expressed in qualitative studies); and/or the valuation exercise could ask people to take into account the impact of their (e.g.) health condition on others (or alternatively to consider only their own health/wellbeing). If this makes a difference to the values produced, it would make HALYs/WELBYs easier to interpret, which would also inform broader evaluation methodology, like whether to administer health/wellbeing measures to relatives separately and add them to the total.
Problem 5 is not merely a restatement Problem 1, though of course they’re closely connected. Problem 1 focuses on why HALYs aren’t that good at prioritising within healthcare (i.e. achieving technical efficiency, from a fixed budget). Problem 5 is that are useless at cross-sector prioritisation (i.e. allocative efficiency). The cause is similar (health focus), and I think I combined them in an early draft; but as with states worse than dead, I wanted to have 5 as a separate issue in order to draw particular attention to it. The difference becomes especially relevant when comparing, for example, the sHALY (which assigns weight to health states based on SWB, thereby addressing Problem 1 but not 5) and the WELBY (which potentially addresses both, but probably at the expense of validity within specific domains such as healthcare, in which case it may be useful for high-level cross-sector prioritisation, e.g., setting budgets for different government departments [Problem 5], but not for priority-setting within, say, the NHS [Problem 1]). Following similar feedback from others, I did change 5 to “They are consequently of limited use in prioritising across sectors or cause areas” in my main list in order to highlight the relationship.
(Really, all of these problems are due to (a) the descriptive system, (b) the valuation method, and possibly (c) the aggregation method, so any further breakdown risks overlap and confusion—but those categories don’t really tell you why you should care about them, or what elements you should focus on, so it didn’t seem like a helpful typology for the “Problems” section.)
Still, I am not entirely happy with this way of dividing things up or framing things (e.g., some problems focus more “causes” and some on “effects”) and would welcome suggestions of alternatives that are both conceptually rigorous/consistent and draw attention to the practical implications.
MAICER = maximum acceptable incremental cost-effectiveness ratio. This is often called the willingness to pay for a unit of outcome, though the concepts are a little different. It is typically represented by lambda.
The CE plane is also useful as it indicates which quadrant the samples are in, i.e. NE = more effective but more costly (the most common), SE = more effective and cheaper (dominant), NW = less effective and more costly (dominated), and SW = less effective and cheaper. When there are samples in more than one quadrant, which is very common, confidence/credible intervals around the ICER are basically meaningless, as are negative ICERs more broadly. Distributions in Guesstimate, Causal, etc can therefore be misleading.
The standard textbook for heath economic evaluation is Drummond et al, 2015, and it’s probably the best introduction to these methods.
For more details on the practicalities of modelling, especially in Excel, see Briggs, Claxton, & Sculpher, 2006.
For Bayesian (and grudgingly frequentist) approaches in R, see stuff by Gianluca Baio at UCL, e.g. this book, and his R package BCEA.
Cost-effectiveness planes are introduced in Black (1990). CEACs, CEAFs, and value of information are explained in more detail in Barton, Briggs, & Fenwick (2008); the latter is a very useful paper.
For more on VOI, see Wilson et al., 2014 and Strong, Oakley, Brennan, & Breeze, 2015.
For a very clear step-by-step explanation of calculating and interpreting ICERs and net benefit, see Paulden 2020. In the same issue of PharmacoEconomics there was a nice debate between those who favour dropping ICERs entirely and those who think they should be presented alongside net benefit. (I think I’m in the latter camp, though if I had to pick one I’d go for NB as you can’t really quantify uncertainty properly around ICERs.)
For an application of some of those methods in EA, you can look at the evaluation we did of Donational. I’m not sure it was the right tool for the job (a BOTEC + heuristics might have been as good or better, given how speculative much of it was), and I had to adapt the methods a fair bit (e.g. to “donation-cost ratio” rather than “cost-effectiveness ratio”), but you can get the general idea. The images aren’t showing for me, though; not sure if it’s an issue on my end or the links are broken.
Here is a more standard model in Excel I did for an assignment.
Hope that helps. LMK if you want more.
There is a lot of potential in fish welfare/stunning. In addition to what others have mentioned, IIRC from some reading a few years ago:
The greatest bottleneck in humane slaughter is research, e.g. determining parameters/designing machines for stunning each major species, as they differ so much. There just aren’t many experts in this field, and the leading researchers are mostly very busy (and pretty old), but perhaps financial incentives would persuade some people with the right sort of background to go into this area.
As well as electrical and percussive stunning, anaesthetising with clove oil/eugenol seems a promising and under-researched method of reducing the pain of slaughter. Because it may just involve adding a liquid/powder to a tank containing the fish, it may also require less tailoring to each species than than other methods (though it can affect the flavour if “too much” is used). I have some notes on this if anyone is interested.
Crustastun could be mass-produced and supplied cheaply/freely to places that would otherwise boil crustaceans alive. I seem to recall a French lawyer had invented another machine that was even better (or cheaper) but was too busy to promote it; maybe EAs could buy the patent or something?
- 11 Jul 2022 15:09 UTC; 5 points) 's comment on Some research questions that you may want to tackle by (
I don’t have much time to spend on this, but here are a few thoughts based on a quick skim of the paper.
The study was done by some of the world’s leading experts in wellbeing and the study design seems okay-ish (‘waitlist randomisation’). The main concern with internal validity, which the authors acknowledge, is that changes in the biomarkers, while mostly heading in the right direction, were far from statistically significant. This could indicate that the effects reported on other measures were due to some factor other than actual SWB improvement, e.g. social desirability bias. But biomarkers are not a great metric, and measures were taken to address these concerns, so I find it plausible that the effects in the study population were (nearly) as large as reported.
However:
- The participants were self-selected, largely from people who were already involved with Action for Happiness (“The charity aims to help people take action to create more happiness, with a focus on pro-social behaviour to bring happiness to others around them”), and largely in the UK. They also had to register online. It’s unclear how useful it would be for other populations.
- It’s quite an intensive program, involving weekly 2–2.5 hour group meetings with a trained facilitator two volunteer facilitators. (“Each of these sessions builds on a thematic question, for example, what matters in life, how to find meaning at work, or how to build happier communities.”) This may limit its scalability and accessibility to certain groups.
- Follow-up was only for 2 months, the duration of the course itself. (This limitation seems to be due to the study design: the control group was people taking the course 8 weeks later.)
- The effect sizes for depression and anxiety were smaller than for CBT, so it may still not be the best option for mental health treatment (though the CBT studies were done in populations with a diagnosed mental disorder, so direct comparison is hard; and subgroup analyses showed that people with lower baseline wellbeing benefited most from the program).
- For clarity, the average effect size for life satisfaction was about 1 point on a 10-point scale. This is good compared to most wellbeing interventions, but that might say more about how ineffective most other interventions are than about how good this one is.
So at the risk of sounding too negative: it’s hardly surprising that people who are motivated enough to sign up for and attend a course designed to make them happier do in fact feel a bit happier while taking the course. It seems important to find out how long these effects endure, and whether the course is suitable for a broader range of people.