This is a great summary of what I was and wasn’t saying :)
Thanks for the link—looking forward to reading. Might return to this after reading
This is a great summary of what I was and wasn’t saying :)
Thanks for the link—looking forward to reading. Might return to this after reading
You’re very welcome! I really enjoyed reading and commenting on the post :)
One thing I can’t quite get my head round—if we divide E(C) by E(L) then don’t we lose all the information about the uncertainty in each estimate? Are we able to say that the value of averting a death is somewhere between X and Y times that to doubling consumption (within 90% confidence)?
Good question, I’ve also wondered this and I’m not sure. In principle, I feel like something like the standard error of the mean (the standard deviation of the sample divided by the square root of the sample size) should be useful here. But applying it naively doesn’t seem to give plausible results because guesstimate uses 5000 samples, so we end up with very small standard errors. I don’t have a super strong stats background though—maybe someone who does can help you more here
I wish this preference was more explicit in Founders Pledge’s writing. It seems like a substantial value judgment, almost an aesthetic preference, and one that is unintuitive to me!
We don’t say much about this because none of our conclusions depends on it but we’ll be sure to be more explicit about this if it’s decision-relevant. In the particular passage you’re interested in here, we were trying to get a sense of the broader SWB benefits of psychedelic use. We didn’t find strong evidence for positive effects on experiential or evaluative measures of SWB. As you rightly note, just using PANAS leaves open the possibility that life satisfaction could have increased (the former is an experiential measure and the latter is an evaluative one). But there wasn’t evidence for improvements in evaluative SWB either so that fact that we place more weight on experiential than evaluative measures didn’t play a role here.
The only time that we’ve used SWB measures to evaluate a funding opportunity, we looked at both happiness (an experiential measure) and life satisfaction (an evaluative measure).
I wonder which of hedonistic and preference utilitarianism you’re more sympathetic to, or which of hedonism and preference/desire theories of well-being you’re more sympathetic to. The former tend to go with experiential SWB and the latter with evaluative or eudaimonic SWB (see Michael Plant’s recent paper). I don’t think it’s a perfect mapping but my inclination towards hedonism is closely related to my earlier claim that
experiential measures, such as affective balance (e.g. as measured by Positive and Negative Affect Schedule (PANAS)), capture more of what we care about and less of what we don’t care about, compared to evaluative measures, such as life satisfaction
This might explain our disagreement.
e.g. favoring affective balance over life satisfaction implies that having children is a bad decision in terms of one’s subjective well-being. (If I recall correctly, on average having kids tends to make affective balance go down but life satisfaction go up; many people seem very happy to have had children.)
This is an interesting example, thanks for bringing it up. I don’t have a strong view on whether having children increases or decreases hedonistic well-being (though it seems likely to increase well-being in desire/preference terms). So I’m not too sure what to make of it but here are a few thoughts:
1. This could well be a case in which life satisfaction captures something important that affect and happiness miss—I don’t have a strong view on that.
2. The early years of parenting intuitively seem really hard and sleep-depriving but also fulfilling and satisfying in a broad sense. So it seems very plausible that they decrease affect/happiness but increase life satisfaction. I’d expect children to be a source of positive happiness as well, later in life though, so maybe having children increases affect/happiness overall anyway.
3. If having children decreases affect/happiness, I don’t find it very surprising that lots of people want to have children and are satisfied by having children anyway. There are clearly strong evolutionary pressures to have strong preferences for having children but much less reason to think that having children would make people happier (arguably the reverse: having children results in parents having fewer resources for themselves!)
Hi Milan, thanks very much for your comments (here and on drafts of the report)!
On 1, we don’t intend to claim that psychedelics don’t improve subjective well-being (SWB), just that the only study (we found) that measured SWB pre- and post-intervention found no effect. This is a (non-conclusive) reason to treat the findings that participants self-report improved well-being with some suspicion.
As I mentioned to you in our correspondence, we think that experiential measures, such as affective balance (e.g. as measured by Positive and Negative Affect Schedule (PANAS)), capture more of what we care about and less of what we don’t care about, compared to evaluative measures, such as life satisfaction. But I take your point that PANAS doesn’t encompass all of SWB.
On 2, behaviour change still hasn’t been studied enough for there to be more than “weak evidence” but yeah, I agree that reports from third-parties are stronger evidence than self-reported changes.
Also interesting here – individuals may rescale their assessments of subjective well-being over time. I speculate that the particulars of the psychedelic experience may drive rescaling like this in an intense way.
Yeah, I don’t think we understand this very well yet but it’s an interesting thought :)
I’ve hopefully clarified this in my response to your first comment :)
Thanks for your questions, Siebe!
Based on the report itself, my impression is that high-quality academic research into microdosing and into flow-through effects* of psychedelic use is much more funding-constrained. Have you considered those?
Yes, but only relatively briefly. You’re right that these kinds of research are more neglected than studies of mental health treatments but we think that the benefits are much smaller in expectation. That’s not to say that there couldn’t be large benefits from microdosing or flow-through effects, just that these are much more speculative.
Note that we think it’s more likely than not (59%) that psilocybin will turn out to be less effective than existing treatments for depression (pg. 35). Even the mental health benefits are fairly uncertain and these other benefits you mention are even less likely to materialise. The kinds of research you suggest could be valuable but I think it makes sense to focus on the mental health treatments first.
On microdosing specifically, we mention our specific concerns (pg. 21):
Another psychedelics intervention that is often suggested as potentially promising is microdosing: taking psychedelics in very low doses. Here, however, the evidence is even sparser. We currently see no reason to think this will have benefits comparable to those of higher-dose psychedelic-assisted mental health treatments, as there is reason to believe that with classic psychedelics, the latter benefits are mediated by ‘mystical-type’ experiences, which microdosing doesn’t occasion. Furthermore, we don’t know much yet about the risks of prolonged microdosing, and from a legal perspective, making microdosing available for healthy people seems much further away than psychedelic-assisted mental health treatments.
I think the last point, about microdosing being further away than mental health treatments, applies to many flow-through effects. If, indeed, psychedelics could bring about wide-ranging benefits, then the best first step is probably to get them approved as mental health treatments anyway and so advancing this seems valuable. If approved, it will also be easier to carry out other kinds of research.
2. Did you consider more organisations than Usona and MAPS? It seems a little bit unlikely that these are the only two organisations lobbying for drug approval?
This is related to your other comment, so I’ll answer both together.
I was confused about the usage of the term drug development as it sounds to me like it’s about the discovery/creation of new drugs, which clearly does not seem to be the high-value aspect here.
Drug development can but need not involve the creation of new drugs. It’s the process that has to happen in order for banned or new substances to be approved for medical use. It involves high-quality studies to prove efficacy and safety. Drug development is very expensive—it costs at least tens of millions of dollars (usually more) to go through the FDA approval process. So actually, there just aren’t many organisations able to do this. Usona and MAPS aren’t just lobbying for approval, they’re conducting clinical research in order to approve psilocybin and MDMA for medical use.
Another org also doing drug development of psilocybin (but for treatment-resistant depression, rather than major depression) is Compass Pathways. Compass is for-profit though, so we didn’t consider it as a funding opportunity here.
I don’t think Greaves’ example suffers the same problem actually—if we truly don’t know anything about what the possible colours are (just that each book has one colour), then there’s no reason to prefer {red, yellow, blue, other} over {red, yellow, blue, green, other}.
In the case of truly having no information, I think it makes sense to use Jeffreys prior in the box factory case because that’s invariant to reparametrisation, so it doesn’t matter whether the problem is framed in terms of length, area, volume, or some other parameterisation. I’m not sure what that actually looks like in this case though
yeah, these aren’t great examples because there’s a choice of partition which is better than the others—thanks for pointing this out. The problem is more salient if instead, you suppose that you have no information about how many different coloured marbles there are and ask what the probability of picking a blue marble is. There are different ways of partitioning the possibilities but no obviously privileged partition. This is how Hilary Greaves frames it here.
Another good example is van Fraassen’s cube factory, e.g. described here.
Thanks for the clarification—I see your concern more clearly now. You’re right, my model does assume that all balls were coloured using the same procedure, in some sense—I’m assuming they’re independently and identically distributed.
Your case is another reasonable way to apply the maximum entropy principle and I think it’s points to another problem with the maximum entropy principle but I think I’d frame it slightly differently. I don’t think that the maximum entropy principle is actually directly problematic in the case you describe. If we assume that all balls are coloured by completely different procedures (i.e. so that the colour of one ball doesn’t tell us anything about the colours of the other balls), then seeing 99 red balls doesn’t tell us anything about the final ball. In that case, I think it’s reasonable (even required!) to have a 50% credence that it’s red and unreasonable to have a 99% credence, if your prior was 50%. If you find that result counterintuitive, then I think that’s more of a challenge to the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others rather than a challenge to the maximum entropy principle. (I appreciate you want to assume nothing about the colouring processes rather than making the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others, but in setting up your model this way, I think you’re assuming that implicitly.)
Perhaps another way to see this: if you don’t follow the maximum entropy principle and instead have a prior of 30% that the final ball is red and then draw 99 red balls, in your scenario, you should maintain 30% credence (if you don’t, then you’ve assumed something about the colouring process that makes the balls not independent). If you find that counterintuitive, then the issue is with the assumption that the balls are all coloured in such a way that learning the colour of some doesn’t tell you anything about the colour of the others because we haven’t used the principle of maximum entropy in that case.
I think this actually points to a different problem with the maximum entropy principle in practice: we rarely come from a position of complete ignorance (or complete ignorance besides a given mean, variance etc.), so it’s actually rarely applicable. Following the principle sometimes gives counterintuive/unreasonable results because we actually know a lot more than we realise and we lose much of that information when we apply the maximum entropy principle.
The maximum entropy principle does give implausible results if applied carelessly but the above reasoning seems very strange to me. The normal way to model this kind of scenario with the maximum entropy prior would be via Laplace’s Rule of Succession, as in Max’s comment below. We start with a prior for the probability that a randomly drawn ball is red and can then update on 99 red balls. This gives a 100⁄101 chance that the final ball is red (about 99%!). Or am I missing your point here?
Somewhat more formally, we’re looking at a Bernoulli trial—for each ball, there’s a probability p that it’s red. We start with the maximum entropy prior for p, which is the uniform distribution on the interval [0,1] (= beta(1,1)). We update on 99 red balls, which gives a posterior for p of beta(100,1), which has mean 100⁄101 (this is a standard result, see e.g. conjugate priors - the beta distribution is a conjugate prior for a Bernoulli likelihood).
The more common objection to the maximum entropy principle comes when we try to reparametrise. A nice but simple example is van Fraassen’s cube factory (edit: new link): a factory manufactures cubes up to 2x2x2 feet, what’s the probability that a randomly selected cube has side length less than 1 foot? If we apply the maximum entropy principle (MEP), we say 1⁄2 because each cube has length between 0 and 2 and MEP implies that each length is equally likely. But we could have equivalently asked: what’s the probability that a randomly selected cube has face area less than 1 foot squared? Face area ranges from 0 to 4, so MEP implies a probability of 1⁄4. All and only those cubes with side length less than 1 have face area less than 1, so these are precisely the same events but MEP gave us different answers for their probabilities! We could do the same in terms of volume and get a different answer again. This inconsistency is the kind of implausible result most commonly pointed to.
An important difference between overall budgets and job boards is that budgets tell you how all the resources are spent whereas job boards just tell you how (some of) the resources are spent on the margin. EA could spend a lot of money on some area and/or employ lots of people to work in that area without actively hiring new people. We’d miss that by just looking at the job board.
I think this is a nice suggestion for getting a rough idea of EA priorities but because of this + Habryka’s observation that the 80k job board is not representative of new jobs in and around EA, I’d caution against putting much weight on this.
The latex isn’t displaying well (for me at least!) which makes this really hard to read. You just need to press ‘ctrl’/‘cmd’ and ‘4’ for inline latex and ‘ctrl’/‘cmd’ and ‘M’ for block :)
I found the answers to this question on stats.stackexchange useful for thinking about and getting a rough overview of “uninformative” priors, though it’s mainly a bit too technical to be able to easily apply in practice. It’s aimed at formal Bayesian inference rather than more general forecasting.
In information theory, entropy is a measure of (lack of) information—high entropy distributions have low information. That’s why the principle of maximum entropy, as Max suggested, can be useful.
Another meta answer is to use Jeffreys prior. This has the property that it is invariant under a change of coordinates. This isn’t the case for maximum entropy priors in general and is a source of inconsistency (see e.g. the partition problem for the principle of indifference, which is just a special case of the principle of maximum entropy). Jeffrey’s priors are often unwieldy, but one important exception is for the interval (e.g. for a probability), for which the Jeffrey’s prior is the distribution. See the red line in the graph at the top of the beta distribution Wikipedia page—the density is spread to the edges close to 0 and 1.
This relates to Max’s comment about Laplace’s Rule of Succession: taking N_v = 2, M_v = 1 corresponds to the uniform distribution on (which is just beta(1,1)). This is the maximum entropy entropy distribution on . But as Max mentioned, we can vary N_v and M_v. Using Jeffrey’s prior would be like setting N_v = 1 and M_v = 1⁄2, which doesn’t have as nice an interpretation (1/2 a success?) but has nice theoretical features. Especially useful if you want to put the density around 0 and 1 but still have mean 1⁄2.
There’s a bit more discussion of Laplace’s Rule of Sucession and Jeffrey’s prior in an EA context in Toby Ord’s comment in response to Will MacAskill’s Are we living at the most influential time in history?
Finally, a bit of a cop-out, but I think worth mentioning, is the suggestion of imprecise credences in one of the answers to the stats.stackexchange question linked above. Select a range of priors and seeing how much they converge, you might find prior choice doesn’t matter that much and when it does matter, I expect this could be useful for determining your largest uncertainties.
Reflecting on this example and your x-risk questions, this highlights the fact that in the beta(0.1,0.1) case, we’re either very likely fine or really screwed, whereas in the beta(20,20) case, it’s similar to a fair coin toss. So it feels easier to me to get motivated to work on mitigating the second one. I don’t think that says much about which is higher priority to work on though because reducing the risk in the first case could be super valuable. The value of information narrowing uncertainty in the first case seems much higher though.
Nice post! Here’s an illustrative example in which the distribution of matters for expected utility.
Say you and your friend are deciding whether to meet up but there’s a risk that you have a nasty, transmissible disease. For each of you, there’s the same probability that you have the disease. Assume that whether you have the disease is independent of whether your friend has it. You’re not sure if has a beta(0.1,0.1) distribution or a beta(20,20) distribution, but you know that the expected value of is 0.5.
If you meet up, you get +1 utility. If you meet up and one of you has the disease, you’ll transmit it to the other person, and you get −3 utility. (If you both have the disease, then there’s no counterfactual transmission, so meeting up is just worth +1.) If you don’t meet up, you get 0 utility.
It makes a difference which distribution has. Here’s an intuitive explanation. In the first case, it’s really unlikely that one of you has it but not the other. Most likely, either (i) you both have it, so meeting up will do no additional harm or (ii) neither of you has it, so meeting up is harmless. In the second case, it’s relatively likely that one of you has the disease but not the other, so you’re more likely to end up with the bad outcome.
If you crunch the numbers, you can see that it’s worth meeting up in the first case, but not in the second. For this to be true, we have to assume conditional independence: that you and your friend having the disease are independent events, conditional on the probability of an arbitrary person having the disease being . It doesn’t work if we assume unconditional independence but I think conditional independence makes more sense.
The calculation is a bit long-winded to write up here, but I’m happy to if anyone is interested in seeing/checking it. The gist is to write the probability of a state obtaining as the integral wrt of the probability of that state obtaining, conditional on , multiplied by the pdf of (i.e. ). Separate the states via conditional independence (i.e. ) and plug in values (e.g. P(you have it|p)=p) and integrate. Here’s the calculation of the probability you both have it, assuming the beta(0.1,0.1) distribution. Then calculate the expected utility of meeting up as normal, with the utilities above and the probabilities calculated in this way. If I haven’t messed up, you should find that the expected utility is positive in the beta(0.1,0.1) case (i.e. better to meet up) and negative in the beta(20,20) case (i.e. better not to meet up).
Thanks, this is a good criticism. I think I agree with the main thrust of your comment but in a bit of a roundabout way.
I agree that focusing on expected value is important and that ideally we should communicate how arguments and results affect expected values. I think it’s helpful to distinguish between (1) expected value estimates that our models output and (2) the overall expected value of an action/intervention, which is informed by our models and arguments etc. The guesstimate model is so speculative that it doesn’t actually do that much work in my overall expected value, so I don’t want to overemphasise it. Perhaps we under-emphasised it though.
The non-probabilistic model is also speculative of course, but I think this offers stronger evidence about the relative cost-effectiveness than the output of the guesstimate model. It doesn’t offer a precise number in the same way that the guesstimate model does but the guesstimate model only does that by making arbitrary distributional assumptions, so I don’t think it adds much information. I think that the non-probabilistic model offers evidence of greater cost-effectiveness of THL relative to AMF (given hedonism, anti-speciesism) because THL tends to come out better and sometimes comes out much, much better. I also think this isn’t super strong evidence but that you’re right that our summary is overly agnostic, in light of this.
In case it’s helpful, here’s a possible explanation for why we communicated the findings in this way. We actually came into this project expecting THL to be much more cost-effective, given a wide range of assumptions about the parameters of our model (and assuming hedonism, anti-speciesism) and we were surprised to see that AMF could plausibly be more cost-effective. So for me, this project gave an update slightly in favour of AMF in terms of expected cost-effectiveness (though I was probably previously overconfident in THL). For many priors, this project should update the other way and for even more priors, this project should leave you expecting THL to be more cost-effective. I expect we were a bit torn in communicating how we updated and what the project showed and didn’t have the time to think this through and write this down explicitly, given other projects competing for our time and energy. It’s been helpful to clarify a few things through this discussion though :)
Thanks for raising this. It’s a fair question but I think I disagree that the numbers you quote should be in the top level summary.
I’m wary of overemphasising precise numbers. We’re really uncertain about many parts of this question and we arrived at these numbers by making many strong assumptions, so these numbers don’t represent our all-things-considered-view and it might be misleading to state them without a lot of context. In particular, the numbers you quote came from the Guesstimate model, which isn’t where the bulk of the work on this project was focused (though we could have acknowledged that more). To my mind, the upshot of this investigation is better described by this bullet in the summary than by the numbers you quote:
In this model, in most of the most plausible scenarios, THL appears better than AMF. The difference in cost-effectiveness is usually within 1 or 2 orders of magnitude. Under some sets of reasonable assumptions, AMF looks better than THL. Because we have so much uncertainty, one could reasonably believe that AMF is more cost-effective than THL or one could reasonably believe that THL is more cost-effective than AMF.
Agreed. I didn’t mean to imply that totalism is the only view sensitive to the mortality-fertility relationship—just that the results could be fairly different on totalism and that it’s especially important to see the results on totalism and that it makes sense to look at totalism before other population ethical views not yet considered. Exploring other population ethical views would be good too!
I think my concern here was that the post suggested that saving lives might not be very valuable on totalism due to a high fertility adjustment:
Roodman’s report (if I recall correctly) suggested that this likely happens to a lower degree in areas where infant mortality is high (i.e. parents adjust fertility less in high infant mortality settings) so saving lives in these settings is plausibly still very valuable according to totalism.