GCR Capacity Building Program Officer at Coefficient Giving. Former GCR Cause Prio researcher at CG and researcher on multiple teams at Rethink Priorities. Did an Econ PhD at the Paris School of Economics.
David Bernard
You should turn your project into an organization
If your team’s work is worth doing, it’s worth doing as an org
When a few people are doing good work together, the question of whether to formally incorporate into an organization can feel like a distraction from doing the actual work. Why take time away from your exciting research project to create an org? There are some real up-front costs to incorporating – dealing with bureaucracy, legal overhead, governance obligations – but I think the benefits of doing so are usually greater and underappreciated.
Orgs are sticky
A project that loses its founder usually just ends. An org that loses its founder is usually able to recruit a replacement and persist. Orgs can outlast their founders in a way that projects almost never do. This is because orgs have persistent identity, infrastructure, culture and mutual commitments that projects lack and this allows them to live on. In other words, the org itself is a form of capacity and it has a ‘spirit’ that survives the individuals involved. If the work matters, you don’t want it to be dependent on any one person choosing to stay, and forming an org reduces that dependency.
Orgs can hire
Orgs hire people; people join projects. The difference is larger than it sounds. There’s a large pool of people who will respond to a job posting at a real organization with a website, but a much smaller pool of people who would respond to a vaguer ask to join a project. When you hire someone, they quit their current job, accept a salary, and take on a defined role with actual responsibility and accountability. When you add someone to a project, they help out at whatever level of commitment they find convenient, which is often not that much, and even that can change at any point. The quality and reliability of the people you can attract and retain is substantially different, and orgs give you the option value to grow in ways that projects don’t.
Orgs are legitimate
A formal organization is a more credible actor in basically every relevant dimension. Funders take you more seriously if you have good governance. Potential hires prefer to work at places with some structure and processes in place, as well as being able to confidently tell people where they might work. Journalists and policymakers have something real and credible to signal that it’s worth their time to engage with you. You can also make credible long-term commitments – receive multi-year grants or investments, make long-term hires, establish lasting institutional relationships – in a way a project simply can’t. These things compound over time in ways that are easy to underestimate at the start.
Orgs force clarity
Incorporating forces you to answer questions that a project lets you defer indefinitely: who’s actually in, what are you aiming for, what is everyone’s role. Projects have no forcing function to resolve these questions productively, so they can stay unresolved for years and eventually become the reason why people drift away from projects and they fall apart. The forced clarity of forming an org is usually good, even when it’s uncomfortable in the moment.
The marginal cost of forming an org is low but the benefits compound. If your team’s work is worth doing, it’s probably worth doing as an org.
Post on Substack
A couple of further questions that would help me interpret the results:
”with 20 multiply imputed datasets”—What does this mean? What are you imputing and how are you imputing it? What are the results if you don’t do any imputation?
How can you say the effect strengthens or is maintained after 1 month if you don’t observe the control group outcomes after 1 month? Generally, you see control group outcomes continue to improve over time even if they don’t get treated (as you can see by doing control group pre-post comparisons for every outcome), so doesn’t seem like you can claim much about whether the effect grows or shrinks over time.
I have a paper that can help answer this, which uses JPAL and IPA studies! However, you might think observational study overestimates come from selection bias during the publication process—our result doesn’t say anything about that.
https://www.jondequidt.com/pdfs/Lalonde30.pdf
“First, we find that there is little bias on average. Using our best-performing observational method (DDML), there is a statistically insignificant and modest negative mean bias of −0.025 standard deviations. This implies that observational studies do not systematically over- or underestimate the welfare impact of the programs they evaluate.”
Thanks for flagging this, Ozzie. I led the GCR Cause Prio team for the last year before it was wound down, so I can add some context.
The honest summary is that the team never really achieved product-market fit. Despite the name, we weren’t really doing “cause prioritization” as most people would conceive of it. GCR program teams have wide remits within their areas and more domain expertise and networks than we had, so the separate cause prio team model didn’t work as well as it does for GHW, where it’s more fruitful to dig into new literatures and build quantitative models. In practice, our work ended up being a mix of supporting a variety of projects for different program teams and trying to improve grant evaluation methods. GCR leadership felt that this set-up wasn’t on track to answer their most important strategy and research questions and that it wasn’t worth the opportunity cost of the people on the team. GCR leadership are considering alternative paths forward, though haven’t decided on anything yet.
I don’t think there are any other comparably major structural changes at Coefficient to flag, other than that we’re trying to scale Good Ventures’ giving and work with other partners, as described in our name change announcement post. I’ll also note that the Worldview Investigation team also wound down in H2, although that case was because team members left for other high-impact roles (e.g. Joe) and not through a top-down decision. This means that there’s no longer much dedicated pure research capacity within GCR, though grantmaking here is fairly contiguous with research in practice.
Thanks for flagging this, I just made a submission!
Section 2.2.2 of their report is titled “Choosing a fixed or random effects model”. They discuss the points you make and clearly say that they use a random effects model. In section 2.2.3 they discuss the standard measures of heterogeneity they use. Section 2.2.4 discusses the specific 4-level random effects model they use and how they did model selection.
I reviewed a small section of the report prior to publication but none of these sections, and it only took me 5 minutes now to check what they did. I’d like the EA Forum to have a higher bar (as Gregory’s parent comment exemplifies) before throwing around easily checkable suspicions about what (very basic) mistakes might have been made.
Innovations for Poverty Action just released their Best Bets: Emerging Opportunities for Impact at Scale report. It covers what they think are best evidence-backed opportunities in global health and development. The opportunities are:
Small-quantity lipid-based nutrient supplements to reduce stunting
Mobile phone reminders for routine childhood immunization
Social signaling for routine childhood immunization
Cognitive behavioral therapy to reduce crime
Teacher coaching to improve student learning
Psychosocial stimulation and responsive care to promote early childhood development
Soft-skills training to boost business profits and sales
Consulting services to support small and medium-sized businesses
Empowerment and Livelihoods for Adolescents to promote girls’ agency and health
Becoming One: Couples’ counseling to reduce intimate partner violence
Edutainment to change attitudes and behavior
Digital payments to improve financial health
Childcare for women’s economic empowerment and child development
Payment for ecosystem services to reduce deforestation and protect the environment
David Rhys Bernard’s Quick takes
Thanks Vasco, I’m glad you enjoyed it! I corrected the typo and your points about inverse-variance weighting and lognormal distributions are well-taken.
I agree that doing more work to specify what our priors should be in this sort of situation is valuable although I’m unsure if it rises to the level of a crucial consideration. Our ability to predict long-run effects has been an important crux for me hence the work I’ve been doing on it, but in general, it seems to be more of an important consideration for people who lean neartermist than those who lean longtermist.
Hi Michael, thanks for this.
On 1: Thorstad argues that if you want to hold both claims (1) Existential Risk Pessimism—per-century existential risk is very high, and (2) Astronomical Value Thesis—efforts to mitigate existential risk have astronomically high expected value, then TOP is the most plausible way to jointly hold both claims. He does look at two arguments for TOP—space settlement and an existential risk Kuznets curve—but says these aren’t strong enough to ground TOP and we instead need a version of TOP that appeals to AI. It’s fair to think of this piece as starting from that point, although the motivation for appealing to AI here was more due to this seeming to be the most compelling version of TOP to x-risk scholars.
On 2: I don’t think I’m an expert on TOP and was mostly aimed at summarising premises that seem to be common, hence the hedging. Broadly, I think you do only need the 4 claims that formed the main headings (1) high levels x-risk now, (2) significantly reduced levels of x-risk in the future, (3) a long and valuable / positive EV future, and (4) a moral framework that places a lot of weight on this future. I think the slimmed down version of the argument focuses solely on AI as it’s relevant for (1), (2) and (3), but as I say in the piece, I think there are potentially other ways to ground TOP without appealing to AI and would be very keen to see those articulated and explored more.
(2) is the part where my credences feel most fragile, especially the parts about AI being sufficiently capable to drastically reduce other x-risks and misaligned AI, and AI remaining aligned near indefinitely. It would be great to have a better sense of how difficult various x-risks are to solve and how powerful an AI system we might need to near eliminate them. No unknown unknowns seems like the least plausible premise of the group, but its very nature makes it hard to know how to cash this out.
Uncertainty over time and Bayesian updating
Yep, I agree you can generate the time of perils conclusion if AI risk is the only x-risk we face. I was attempting to empirically describe a view that seem to be popular in the x-risk space here, that other x-risks beside AI are also cause for concern, but you’re right that we don’t necessarily need this full premise.
I was somewhat surprised by the lack of distinction between the cases where we go extinct and the universe is barren (value 0) and big negative futures filled with suffering. The difference between these cases seem large to me and seems like they will substantially affect the value of x-risk and s-risk mitigation. This is even more the case if you don’t subscribe to symmetric welfare ranges and think our capacity to suffer is vastly greater than our capacity to feel pleasure, which would make the worst possible futures way worse than the best possible futures are good. I suspect this is related to the popularity of the term ‘existential catastrophe’ which collapses any difference between these cases (as well as cases where we bumble along and produce some small positive value but far from our best possible future).
Thanks for highlighting this Michael and spelling out the different possibilities. In particular, it seems like if aliens are present and would expand into the same space we would have expanded into had we not gone extinct, then for the totalist, to the extent that aliens have similar values to us, the value of x-risk mitigation is reduced. If we are replaceable by aliens, then it seems like not much is lost if we do go extinct, since the aliens would still produce the large valuable future that we would have otherwise produced.
I have to admit though, it is personally uncomfortable for my valuation of x-risk mitigation efforts and cause prioritisation to depend partially on something as abstract and unknowable as the existence of aliens.
Charting the precipice: The time of perils and prioritizing x-risk
Hi Geoffrey, thanks for these comments, they are really helpful as we move to submitting this to journals. Some miscellaneous responses:
I’d definitely be interested in seeing a project where the surrogate index approach is applied to even longer-run settings, especially in econ history as you suggest. You could see this article as testing whether the surrogate index approach works in the medium-run, so thinking about how well it works in the longer-run is a very natural extension. I spent some time thinking about how to do this during my PhD and datasets you might do it with, but didn’t end up having capacity. So if you or anyone else is interested in doing this, please get in touch! That said, I don’t think it makes sense to combine these two projects (econ history and RCTs) into one paper, given the norms of economics articles and subdiscipline boundaries.
4a. The negative bias is purely an empirical result, but one that we expect to rise in many applications. We can’t say for sure whether it’s always negative or attenuation bias, but the hypothesis we suggest to explain it is compatible with attenuation bias of the treatment effects to 0 and treatment effects generally being positive. However, when we talk about attenuation in the paper, we’re typically talking about attenuation in the prediction of long-run outcomes, not attenuation in the treatment effects.
4b. The surrogate index is unbiased and consistent if the assumptions behind it are satisfied. This is the case for most econometric estimators. What we do in the paper is show that the key surrogacy assumption is empirically not perfectly satisfied in a variety of contexts. Since this assumption is not satisfied, then the estimator is empirically biased and inconsistent in our applications. However, this is not what people typically mean when they say an estimator is theoretically biased and inconsistent. Personally, I think econometrics focuses too heavily on unbiasedness and am sympathetic to the ML willingness to trade off bias and variance, and cares too much about asymptotic properties of estimators and too little about how well they perform in these empirical LaLonde-style tests.
4c. The normalisation depends on the standard deviation of the control group, not the standard error, so we should be fine to do that regardless of what the actual treatment effect is. We would be in trouble if there was no variation in the control group outcome, but this seems to occur very rarely (or never).
Estimating long-term treatment effects without long-term outcome data (David Rhys Bernard, Jojo Lee and Victor Yaneng Wang)
The JPAL and IPA Dataverses have data from 200+ RCTs from development economics and the 3ie portal has 500+ studies with datasets available (and you can further filter by study type if you want to limit to RCTs). I can’t point you to particular studies that having missing or mismeasured covariates, but from personal experience, a lot of them have lots of missing data.
Can you explain more why the bootstrapping approach doesn’t give a causal effect (or something pretty close to one) here? The aggregate approach is clearly confounded since questions with more answers are likely easier. But once you condition on the question and directly control the number of forecasters via bootstrapping different sample sizes, it doesn’t seem like there are any potential unobserved confounders remaining (other than the time issue Nikos mentioned). I don’t see what a natural experiment or RCT would provide above the bootstrapping approach.
Posting in a personal capacity.
I’m excited about EA doing more to shout about its wins and defend itself against bad-faith detractors and would love to see more discussion of what it could concretely look like for EA to take more control of its own narrative in this way.
I think the core point here is right: whether we like it or not, there is a narrative about EA that gets constructed, and if we choose not to do our own narrative shaping, then EA’s critics control more of the narrative. Choosing not to engage doesn’t make the narrative ‘authentic’ and the status quo isn’t ‘neutral’. EA has had a lot of cool wins, almost nobody outside the community knows about them, so I want to see more people pointing out the wins and pointing out how EA principles informed these wins.
I think @Andy Masley has been a great example of what this can look like in practice. Andy has written a lot about a topic not usually associated with EA, data center water usage, but has written and engaged publicly in a way that’s clearly informed by EA ideas, and he’s unashamed about his connection to EA. I’d personally love to see more of this. There’s an EA thought leadership gap right now, with not many people who write and speak publicly from an EA perspective. I’d love to see a new generation of people who write, speak, and engage publicly on a variety of topics from a perspective informed by EA principles.
I also don’t think that doing more of this is the same thing as trying to make EA cool or trying to expand it. It’s just trying to make sure that people have a clearer understanding of what EA principles are and what they have led to in the world so far, the good and the bad. I’d love an end state where people who are into EA ideas are clearer about how they see EA and their relationship to it, and for people not to feel embarrassed to say they’re into EA ideas or part of the EA community.
I’m keen to hear more concrete ideas for what doing more good proactive narrative shaping could look like.