Research Fellow at Open Philanthropy. Former Quantitative Researcher on the Worldview Investigations Team at Rethink Priorities. Completed a PhD at the Paris School of Economics.
David Rhys Bernard
The first 5 paragraphs are repeated twice. Could someone fix this?
Thanks for the paper suggestions! Most of my own research is on internal validity in the LaLonde style so I definitely think it is important too. I’ll add a section on replicability to the syllabus.
1. Thinking vs. reading.
Another benefit of thinking before reading is that it can help you develop your research skills. Noticing some phenomena and then developing a model to explain it is a super valuable exercise. If it turns out you reproduce something that someone else has already done and published, then great, you’ve gotten experience solving some problem and you’ve shown that you can think through it at least as well as some expert in the field. If it turns out that you have produced something novel then it’s time to see how it compares to existing results in the literature and get feedback on how useful it is.
This said, I think this is more true for theoretical work than applied work, e.g. the value of doing this in philosophy > in theoretical economics > in applied economics. A fair amount of EA-relevant research is summarising and synthesising what the academic literature on some topic finds and it seems pretty difficult to do that by just thinking to yourself!
3. Is there something interesting here?
I mostly try to work out how excited I am by this idea and whether I could see myself still being excited in 6 months, since for me having internal motivation to work on a project is pretty important. I also try to chat about this idea with various other people and see how excited they are by it.
4. Survival vs. exploratory mindset.
I also haven’t heard these terms before, but from your description (which frames a survival mindset pretty negatively), an exploratory mindset comes fairly naturally to me and therefore I haven’t ever actively cultivated it. Lots of research projects fail so extreme risk aversion in particular seems like it would be bad for researchers.
5. Optimal hours of work per day.
I typically aim for 6-7 hours of deep work a day and a couple of dedicated hours for miscellaneous tasks and meetings. Since starting part-time at RP I’ve been doing 6 days a week (2 RP, 4 PhD), but before that I did 5. I find RP deep work less taxing than PhD work. 6 days a week is at the upper limit of manageable for me at the moment, so I plan to experiment with different schedules in the new year.
6. Learning a new field.
I’m a big fan of textbooks and schedule time to read a couple of textbook chapters each week. Lesswrong’s best textbooks on every subject thread is pretty good for finding them. I usually make Anki flashcards to help me remember the key facts, but I’ve recently started experimenting with Roam Research to take notes which I’m also enjoying so my “learning flow” is in flux at the moment.
8. Emotional motivators.
My main trick for dealing with this is to always plan my day the night before. I let System 2 Dave work out what is important and needs to be done and put blocks in the calendar for these things. When System 1 Dave is working the next day, his motivation doesn’t end up mattering so much because he can easily defer to what System 2 Dave said he should do. I don’t read too much into lack of System 1 motivation, it happens and I haven’t noticed that it is particularly correlated with how important the work is, it’s more correlated with things like how scary it is to start some new task and irrelevant things like how much sunlight I’ve been getting.
9. Typing speed.
I struggle to imagine typing speed being a binding constraint on research productivity since I’ve never found typing speed to be a problem for getting into flow, but when I just checked my wpm was 85 so maybe I’d feel different if it was slower. When I’m coding the vast majority of my time is spent thinking about how to solve the problem I’m facing, not typing the code that solves the problem. When I’m writing first drafts, I think typing speed is a bit more helpful for the reasons you mention, but again more time goes into planning the structure of what I want to say and polishing, than the first pass at writing where speed might help.
11. Tiredness, focus, etc.
My favourite thing to do is to stop working! Not all days can be good days and I became a lot happier and more productive when I stopped beating myself up for having bad days and allowed myself to take the rest of the afternoon off.
12. Meta.
The questions I didn’t answer were because I didn’t have much to say about them so I’d be happy to see answers to them!
I’m happy to see an increase in the number of temporary visiting researcher positions at various EA orgs. I found my time visiting GPI during their Early Career Conference Programme very valuable (hint: applications for 2021 are now open, apply!) and would encourage other orgs to run similar sorts of programmes to this and FHI’s (summer) research scholars programme. I’m very excited to see how our internship program develops as I really enjoy mentoring.
I think I was competitive for the RP job because of my T-shaped skills, broad knowledge in lots of EA-related things but also specialised knowledge in a specific useful area, economics in my case. Michael Aird probably has the most to say about developing broad knowledge given how much EA content he has consumed in the last couple of years, but in general reading things on the Forum and actively discussing them with other people (perhaps in a reading group) seems to be the way to develop in this area. Developing specialised skills obviously depends a lot on the skill, but graduate education and relevant internships are the most obvious routes here.
Thanks for this Luisa, I found it very interesting and appreciated the level of detail in the different cases. One thought and related questions that came up when reading the toy calculations at the end of each case:
For a fixed number of survivors, there is a trade-off between groups of different sizes. The larger the groups, the more likely each group is to survive, but the fewer groups need to be wiped out in order for humanity to go extinct.
What might this trade-off look like and is there some optimal group size to minimise the risk of extinction?
What are the game theoretic considerations of individuals forming groups of varying sizes and how do these vary depending on the extent to which people care about their own individual survival and human extinction?
What group sizes might we expect in practice and is there anything we could do to influence group sizes in the event of a catastrophe?
Given the low likelihood of extinction you suggest, I think these are relatively low priority questions but could be potentially interesting for someone to look at in more detail.
Thanks for the post, but I don’t think you can conclude from your analysis that your criteria weren’t helpful and the result is not necessarily that surprising.
If you look at professional NBA basketball players, there’s not much of a correlation between how tall a basketball player is and how much they get paid or some other measure of how good they are. Does this mean NBA teams are making a mistake by choosing tall basketball players? Of course not!
The mistake your analysis is making is called ‘selecting on the dependent variable’ or ‘collider bias’. You are looking at the correlation between two variables (interview score and engagement) in a specific subpopulation, the subpopulation that scored highly in interview score. However, that specific subpopulation correlation may not be representative of the correlation between interview score and engagement in the broader relevant population i.e., all students who applied to the fellowship. This is related to David Moss’s comment on range restrictions.
The correlation in the population is the thing you care about, not the correlation in your subpopulation. You want to know whether the scores are helpful for selecting people into or out of the fellowship. For this, you need to know about engagement of people not in the fellowship as well as people in the fellowship.
This sort of thing comes up all the time, like in the basketball case. Another common example with a clear analogy to your case is grad school admissions. For admitted students, GRE scores are (usually) not predictive of success. Does that mean schools shouldn’t select students based on GRE? Only if the relationship between success and GPA for admitted students is representative of the relationship for unadmitted students, which is unlikely to be the case.
The simplest thing you could do to improve this would be to measure engagement for all the people who applied (or who you interviewed if you only have scores for them) and then re-estimate the correlation on the full sample, rather than the selected subsample. This will provide a better answer to your question of whether scores are predictive of engagement. It seems like the things included in your engagement measure are pretty easy to observe so this should be easy to do. However, a lot of them are explicitly linked to participation in the fellowship which biases it towards fellows somewhat, so if you could construct an alternative engagement measure which doesn’t include these, that would likely be better.
This paper was a chapter in the book Randomized Control Trials in the Field of Development: A Critical Perspective, a collection of articles on RCTs. Assuming the author of this chapter, Timothy Ogden doesn’t identify as a randomista, the only other author who maybe does is Jonathan Morduch, so it’s a pretty one-sided book (which isn’t necessarily a problem, just something to be aware of).
There was a launch event for the book with talks from Sir Angus Deaton, Agnès Labrousse, Jonathan Morduch, Lant Pritchett and moderated by William Easterly, which you might find interesting if you enjoyed this post.
The tag seems focused on how much weight should be assigned to different moral patients. But some people and posts use the phrase moral weight to refer to relative importance of different outcomes, e.g. how much should we care about consumption vs saving a life? Examples include:
Should we include both under this wiki-tag and broaden the definition? Or should we make a new tag and disambiguate between the two?
Thanks Jeremy!
That was just a typo. Previously we were unsure whether they would be an ally or an opponent and then Pure Earth told us they considered them to be an ally. I wasn’t careful enough when editing that section so I’ve deleted “or an opponent” now.
Thanks Mark, both for your time and feedback while we were writing the report and your comments now.
On 1, I agree that charter cities sit somewhere between neartermist and longtermist so thinking about them as mid/mediumtermist makes sense. I imagine Rethink Priorities’ future work in this space will be a mixture of traditionally neartermist and mediumtermist topics. However, most of the current arguments for charter cities, especially Mason (2019), have an explicitly neartermist flavour, given the direct comparisons to GiveWell charities and a focus on the direct benefits. I’m keen to see robust medium/longtermist arguments for charter cities being made more explicitly.
On 2 & 3, there’s some tension between the claims that (1) Chinese growth is a result of SEZs, (2) the charter cities movement is trying to replicate the success of China, and (3) that SEZs are not the right comparison for charter cities.
To simplify the argument somewhat, we are taking the position that the more useful currently existing empirical analogue for charter cities is all SEZs, whereas your position is that it is Shenzhen. I totally accept your points about the important differences between SEZs and charter cities, however I am still concerned that focusing solely on the Shenzhen SEZ is cherry picking and an unrepresentative sample of how we might expect charter cities to perform. I think the ideal empirical analogue would be the subset of all SEZs that were large, had relatively high autonomy and multiple industries, however we couldn’t find any analysis of the performance of this subset.
On 4, I think the report is clear about why we are currently skeptical of the tractability of charter cities despite recent history (although I recognise that you have inside knowledge that might cause us to update more positively). I’d also highlight that regardless of what you think of the absolute tractability of charter cities, it seems intuitive that the relative tractability is lower than alternatives such as special reform zones, which aim at delivering the same benefits as charter cities without having to set up and build a brand new city. That said, I’m happy you and CCI are still working on this and I would love for you to prove us wrong!
I’m not convinced that our CEA is particularly useful for more generalised interventions. All we really do is assume that the intervention causes some growth increase (a distribution rather than a point estimate) and then model expected income with the intervention, with the intervention 10 years later and with no intervention. The amount the intervention increases growth is the key parameter and is very uncertain so further research on this will have the highest VoI, but this will be different for each intervention. We treat how the intervention increases growth as a black box so I think looking inside the box and trying to understand the mechanisms better would shed some light on how robust the assumed growth increase is and how we might expect it to generalise to other contexts.
Furthermore, we only model the direct benefits of the growth intervention. In general, I’d expect the indirect effects to be larger and our modelling approach doesn’t say anything about these so I expect looking into these indirect benefits, perhaps via an alternative model, to have higher VoI than further modelling of the direct benefits.
For charter cities in particular, we could probably further tighten the bounds on the direct benefits by getting more rigorous information on city population growth rates and the correlation between population growth and income growth.
I know you were explicit about these being your views and not Founders Pledge’s, but is there anyone better placed to think through those implications than Founders Pledge? And similarly, it seems like Founders Pledge would be one of the most natural organisations to advocate against limits on patient philanthropy, given the work on the long-term investment fund.
For example, David and Jason’s report on charter cities was completed in 100 hours, a reasonable fraction of which was extra legwork for external writeup/following up with affected parties, after the original report was delivered to Open Phil. My impression is that the bulk of the work was done on a fairly short calendar time cycle too, in ways that may be hard for external parties to replicate. But naively the report would still be useful to Open Phil and cost-effective to fund if it took 200 hours to complete and 3x the calendar time.
Just to clarify, the 100 hours was actually just for the original report and doesn’t include any of the extra leg work for the public version, because I forgot to update that time taken estimate in the public version. The extra work for the public version was an additional 10-15 hours of work from the two of us, but there was also work from others reviewing the report. This extra work took place over 5 weeks of calendar time.
Hi Edo!
Our funder was interested in How Asia Works, presumably from positive reviews it’s received from people like Bill Gates and Noah Smith, and asked us to check the land section in more detail. We had a comparative advantage here given my background in development economics.
I wouldn’t be particularly interested in more land redistribution research, given that there don’t seem to be any clear funding opportunities in this space. If someone could find decent opportunities then that would make it a bit more interesting. But given the ambiguous results on the relationship between farm size and yield, I imagine research on other unexplored development interventions would have higher value of information.
I would be interested to read a deep dive into tenure reform, but this is just my personal opinion. A bunch more work, both policy and academic, seems to have been done on tenure reform so there would probably be more literature and case studies to work with. We link a couple of systematic reviews (Gignoux et al. 2014 and Lawry et al. 2017) but didn’t look into them ourselves.
Within 3 days of departing the UK to return to the US, take another COVID test. This is required by the US CDC according to this link, and both PCR and Rapid Antigen tests are acceptable. I am planning to walk into an NHS location near the EA conference venue (like this) and get a free test. You don’t have to be a UK citizen to get free tests from the NHS (link).
My understanding is that you should not be using the free NHS test for travel and should instead book a private test, which is possible across London and at airports on the day of your flight. See the travelling abroad section of this NHS page. More practically, I think you only get a text message confirming your result from the NHS tests and this is not sufficient documentation for the CDC requirements.
What information must be included on the test result? A test result must be in the form of written documentation (paper or electronic copy). The documentation must include:
1 . Type of test (indicating it is a NAAT or antigen test) 2. Entity issuing the result (e.g. laboratory, healthcare entity, or telehealth service) 3. Specimen collection date. A negative test result must show the specimen was collected within the 3 days before the flight. A positive test result for documentation of recovery from COVID-19 must show the specimen was collected within the 3 months before the flight. 4. Information that identifies the person (full name plus at least one other identifier such as date of birth or passport number) 5. Test Result
Side note: a Cohen’s d of .31 is not small. My opinion is that the rules of thumb used to interpret effect sizes in psychology are messed up, because so much p-hacking in the past produced way overinflated effect sizes. Regardless, 0.3 is typically seen as a moderate effect size. A 0.3 standard deviation increase in IQ would be 4.5 points which would lead to economically meaningful differences in income.
Can you explain more why the bootstrapping approach doesn’t give a causal effect (or something pretty close to one) here? The aggregate approach is clearly confounded since questions with more answers are likely easier. But once you condition on the question and directly control the number of forecasters via bootstrapping different sample sizes, it doesn’t seem like there are any potential unobserved confounders remaining (other than the time issue Nikos mentioned). I don’t see what a natural experiment or RCT would provide above the bootstrapping approach.
The JPAL and IPA Dataverses have data from 200+ RCTs from development economics and the 3ie portal has 500+ studies with datasets available (and you can further filter by study type if you want to limit to RCTs). I can’t point you to particular studies that having missing or mismeasured covariates, but from personal experience, a lot of them have lots of missing data.
Hi Geoffrey, thanks for these comments, they are really helpful as we move to submitting this to journals. Some miscellaneous responses:
I’d definitely be interested in seeing a project where the surrogate index approach is applied to even longer-run settings, especially in econ history as you suggest. You could see this article as testing whether the surrogate index approach works in the medium-run, so thinking about how well it works in the longer-run is a very natural extension. I spent some time thinking about how to do this during my PhD and datasets you might do it with, but didn’t end up having capacity. So if you or anyone else is interested in doing this, please get in touch! That said, I don’t think it makes sense to combine these two projects (econ history and RCTs) into one paper, given the norms of economics articles and subdiscipline boundaries.
4a. The negative bias is purely an empirical result, but one that we expect to rise in many applications. We can’t say for sure whether it’s always negative or attenuation bias, but the hypothesis we suggest to explain it is compatible with attenuation bias of the treatment effects to 0 and treatment effects generally being positive. However, when we talk about attenuation in the paper, we’re typically talking about attenuation in the prediction of long-run outcomes, not attenuation in the treatment effects.
4b. The surrogate index is unbiased and consistent if the assumptions behind it are satisfied. This is the case for most econometric estimators. What we do in the paper is show that the key surrogacy assumption is empirically not perfectly satisfied in a variety of contexts. Since this assumption is not satisfied, then the estimator is empirically biased and inconsistent in our applications. However, this is not what people typically mean when they say an estimator is theoretically biased and inconsistent. Personally, I think econometrics focuses too heavily on unbiasedness and am sympathetic to the ML willingness to trade off bias and variance, and cares too much about asymptotic properties of estimators and too little about how well they perform in these empirical LaLonde-style tests.
4c. The normalisation depends on the standard deviation of the control group, not the standard error, so we should be fine to do that regardless of what the actual treatment effect is. We would be in trouble if there was no variation in the control group outcome, but this seems to occur very rarely (or never).
Hey Kaj, I just thought I’d let you know that you’re not alone in Scandinavia! A few of us are starting an EA group in Uppsala, Sweden and Trondheim, Norway launched a couple of weeks ago. I know it’s late notice, but we’re having a Google Hangout this evening, 9pm your time so if you could join, that’d be great!