Research Fellow at Open Philanthropy. Former Quantitative Researcher on the Worldview Investigations Team at Rethink Priorities. Completed a PhD at the Paris School of Economics.
David Bernard
Section 2.2.2 of their report is titled “Choosing a fixed or random effects model”. They discuss the points you make and clearly say that they use a random effects model. In section 2.2.3 they discuss the standard measures of heterogeneity they use. Section 2.2.4 discusses the specific 4-level random effects model they use and how they did model selection.
I reviewed a small section of the report prior to publication but none of these sections, and it only took me 5 minutes now to check what they did. I’d like the EA Forum to have a higher bar (as Gregory’s parent comment exemplifies) before throwing around easily checkable suspicions about what (very basic) mistakes might have been made.
Innovations for Poverty Action just released their Best Bets: Emerging Opportunities for Impact at Scale report. It covers what they think are best evidence-backed opportunities in global health and development. The opportunities are:
Small-quantity lipid-based nutrient supplements to reduce stunting
Mobile phone reminders for routine childhood immunization
Social signaling for routine childhood immunization
Cognitive behavioral therapy to reduce crime
Teacher coaching to improve student learning
Psychosocial stimulation and responsive care to promote early childhood development
Soft-skills training to boost business profits and sales
Consulting services to support small and medium-sized businesses
Empowerment and Livelihoods for Adolescents to promote girls’ agency and health
Becoming One: Couples’ counseling to reduce intimate partner violence
Edutainment to change attitudes and behavior
Digital payments to improve financial health
Childcare for women’s economic empowerment and child development
Payment for ecosystem services to reduce deforestation and protect the environment
David Rhys Bernard’s Quick takes
Thanks Vasco, I’m glad you enjoyed it! I corrected the typo and your points about inverse-variance weighting and lognormal distributions are well-taken.
I agree that doing more work to specify what our priors should be in this sort of situation is valuable although I’m unsure if it rises to the level of a crucial consideration. Our ability to predict long-run effects has been an important crux for me hence the work I’ve been doing on it, but in general, it seems to be more of an important consideration for people who lean neartermist than those who lean longtermist.
Hi Michael, thanks for this.
On 1: Thorstad argues that if you want to hold both claims (1) Existential Risk Pessimism—per-century existential risk is very high, and (2) Astronomical Value Thesis—efforts to mitigate existential risk have astronomically high expected value, then TOP is the most plausible way to jointly hold both claims. He does look at two arguments for TOP—space settlement and an existential risk Kuznets curve—but says these aren’t strong enough to ground TOP and we instead need a version of TOP that appeals to AI. It’s fair to think of this piece as starting from that point, although the motivation for appealing to AI here was more due to this seeming to be the most compelling version of TOP to x-risk scholars.
On 2: I don’t think I’m an expert on TOP and was mostly aimed at summarising premises that seem to be common, hence the hedging. Broadly, I think you do only need the 4 claims that formed the main headings (1) high levels x-risk now, (2) significantly reduced levels of x-risk in the future, (3) a long and valuable / positive EV future, and (4) a moral framework that places a lot of weight on this future. I think the slimmed down version of the argument focuses solely on AI as it’s relevant for (1), (2) and (3), but as I say in the piece, I think there are potentially other ways to ground TOP without appealing to AI and would be very keen to see those articulated and explored more.
(2) is the part where my credences feel most fragile, especially the parts about AI being sufficiently capable to drastically reduce other x-risks and misaligned AI, and AI remaining aligned near indefinitely. It would be great to have a better sense of how difficult various x-risks are to solve and how powerful an AI system we might need to near eliminate them. No unknown unknowns seems like the least plausible premise of the group, but its very nature makes it hard to know how to cash this out.
Uncertainty over time and Bayesian updating
Yep, I agree you can generate the time of perils conclusion if AI risk is the only x-risk we face. I was attempting to empirically describe a view that seem to be popular in the x-risk space here, that other x-risks beside AI are also cause for concern, but you’re right that we don’t necessarily need this full premise.
I was somewhat surprised by the lack of distinction between the cases where we go extinct and the universe is barren (value 0) and big negative futures filled with suffering. The difference between these cases seem large to me and seems like they will substantially affect the value of x-risk and s-risk mitigation. This is even more the case if you don’t subscribe to symmetric welfare ranges and think our capacity to suffer is vastly greater than our capacity to feel pleasure, which would make the worst possible futures way worse than the best possible futures are good. I suspect this is related to the popularity of the term ‘existential catastrophe’ which collapses any difference between these cases (as well as cases where we bumble along and produce some small positive value but far from our best possible future).
Thanks for highlighting this Michael and spelling out the different possibilities. In particular, it seems like if aliens are present and would expand into the same space we would have expanded into had we not gone extinct, then for the totalist, to the extent that aliens have similar values to us, the value of x-risk mitigation is reduced. If we are replaceable by aliens, then it seems like not much is lost if we do go extinct, since the aliens would still produce the large valuable future that we would have otherwise produced.
I have to admit though, it is personally uncomfortable for my valuation of x-risk mitigation efforts and cause prioritisation to depend partially on something as abstract and unknowable as the existence of aliens.
Charting the precipice: The time of perils and prioritizing x-risk
Hi Geoffrey, thanks for these comments, they are really helpful as we move to submitting this to journals. Some miscellaneous responses:
I’d definitely be interested in seeing a project where the surrogate index approach is applied to even longer-run settings, especially in econ history as you suggest. You could see this article as testing whether the surrogate index approach works in the medium-run, so thinking about how well it works in the longer-run is a very natural extension. I spent some time thinking about how to do this during my PhD and datasets you might do it with, but didn’t end up having capacity. So if you or anyone else is interested in doing this, please get in touch! That said, I don’t think it makes sense to combine these two projects (econ history and RCTs) into one paper, given the norms of economics articles and subdiscipline boundaries.
4a. The negative bias is purely an empirical result, but one that we expect to rise in many applications. We can’t say for sure whether it’s always negative or attenuation bias, but the hypothesis we suggest to explain it is compatible with attenuation bias of the treatment effects to 0 and treatment effects generally being positive. However, when we talk about attenuation in the paper, we’re typically talking about attenuation in the prediction of long-run outcomes, not attenuation in the treatment effects.
4b. The surrogate index is unbiased and consistent if the assumptions behind it are satisfied. This is the case for most econometric estimators. What we do in the paper is show that the key surrogacy assumption is empirically not perfectly satisfied in a variety of contexts. Since this assumption is not satisfied, then the estimator is empirically biased and inconsistent in our applications. However, this is not what people typically mean when they say an estimator is theoretically biased and inconsistent. Personally, I think econometrics focuses too heavily on unbiasedness and am sympathetic to the ML willingness to trade off bias and variance, and cares too much about asymptotic properties of estimators and too little about how well they perform in these empirical LaLonde-style tests.
4c. The normalisation depends on the standard deviation of the control group, not the standard error, so we should be fine to do that regardless of what the actual treatment effect is. We would be in trouble if there was no variation in the control group outcome, but this seems to occur very rarely (or never).
Estimating long-term treatment effects without long-term outcome data (David Rhys Bernard, Jojo Lee and Victor Yaneng Wang)
The JPAL and IPA Dataverses have data from 200+ RCTs from development economics and the 3ie portal has 500+ studies with datasets available (and you can further filter by study type if you want to limit to RCTs). I can’t point you to particular studies that having missing or mismeasured covariates, but from personal experience, a lot of them have lots of missing data.
Can you explain more why the bootstrapping approach doesn’t give a causal effect (or something pretty close to one) here? The aggregate approach is clearly confounded since questions with more answers are likely easier. But once you condition on the question and directly control the number of forecasters via bootstrapping different sample sizes, it doesn’t seem like there are any potential unobserved confounders remaining (other than the time issue Nikos mentioned). I don’t see what a natural experiment or RCT would provide above the bootstrapping approach.
Side note: a Cohen’s d of .31 is not small. My opinion is that the rules of thumb used to interpret effect sizes in psychology are messed up, because so much p-hacking in the past produced way overinflated effect sizes. Regardless, 0.3 is typically seen as a moderate effect size. A 0.3 standard deviation increase in IQ would be 4.5 points which would lead to economically meaningful differences in income.
Within 3 days of departing the UK to return to the US, take another COVID test. This is required by the US CDC according to this link, and both PCR and Rapid Antigen tests are acceptable. I am planning to walk into an NHS location near the EA conference venue (like this) and get a free test. You don’t have to be a UK citizen to get free tests from the NHS (link).
My understanding is that you should not be using the free NHS test for travel and should instead book a private test, which is possible across London and at airports on the day of your flight. See the travelling abroad section of this NHS page. More practically, I think you only get a text message confirming your result from the NHS tests and this is not sufficient documentation for the CDC requirements.
What information must be included on the test result? A test result must be in the form of written documentation (paper or electronic copy). The documentation must include:
1 . Type of test (indicating it is a NAAT or antigen test) 2. Entity issuing the result (e.g. laboratory, healthcare entity, or telehealth service) 3. Specimen collection date. A negative test result must show the specimen was collected within the 3 days before the flight. A positive test result for documentation of recovery from COVID-19 must show the specimen was collected within the 3 months before the flight. 4. Information that identifies the person (full name plus at least one other identifier such as date of birth or passport number) 5. Test Result
Hi Edo!
Our funder was interested in How Asia Works, presumably from positive reviews it’s received from people like Bill Gates and Noah Smith, and asked us to check the land section in more detail. We had a comparative advantage here given my background in development economics.
I wouldn’t be particularly interested in more land redistribution research, given that there don’t seem to be any clear funding opportunities in this space. If someone could find decent opportunities then that would make it a bit more interesting. But given the ambiguous results on the relationship between farm size and yield, I imagine research on other unexplored development interventions would have higher value of information.
I would be interested to read a deep dive into tenure reform, but this is just my personal opinion. A bunch more work, both policy and academic, seems to have been done on tenure reform so there would probably be more literature and case studies to work with. We link a couple of systematic reviews (Gignoux et al. 2014 and Lawry et al. 2017) but didn’t look into them ourselves.
Intervention report: Agricultural land redistribution
If a whole book is too much, you could also try their article, Economic Lives of the Poor—https://www.aeaweb.org/articles?id=10.1257/jep.21.1.141 - but this is explicitly focused on people living below the extreme poverty line, who are an order of magnitude poorer than the global median.
Thanks for flagging this, I just made a submission!