I don’t work in physical goods (I’m a data scientist) but I am definitely interested in leveling up my skillset in this way. I’m probably only available for 3 to 4 hours a week to start, but that will probably change soon.
Thanks for making this post! This is an interesting observation.
Thank you for doing this work! I really admire the rigor of this process. I’m really curious to hear how this work is received by (1) other evaluation orgs and (2) mental health experts. Have you received any such feedback so far? Has it been easy to explain? Have you had to defend any particular aspect of it in conversations with outsiders?
I do have one piece of feedback. You have included a data visualization here that, if you’ll forgive me for saying so, is trying to tell a story without seeming to care about the listener. There is simply too much going on in the viz for it to be useful.
I think a visualization can be extremely useful here in communicating various aspects of your process and its results, but cramming all of this information into a single pane makes the chart essentially unreadable; there are too many axes that the viewer needs to understand simultaneously.
I’m not sure exactly what you wanted to highlight in the visualization, but if you want to demonstrate the simple correlation between mechanical and intuitive estimates, a simple scatterplot will do, without the extra colors and shapes. On the other hand, if that extra information is substantive, it should really be in separate panes for the sake of comprehensibility. Here’s a quick example with your data (direct link to a larger version here):
I don’t think this is the best possible version of this chart (I’d guess it’s too wide, and opinions differ as to whether all axes should start at 0), but it’s an example of how you might tell multiple stories in a slightly more readable way. The linear trend is visible in each plot, it’s easier to make out the screening sizes, and I’ve outlined the axes delineating the four quadrants of each pane in order to highlight the fact that mostly top-scoring programmes on both measures were included in Round 2.
Feel free to take this with as much salt as necessary. I’m working from my own experience, which is that communicating data has tended to take just as much work on the communication as it does on the data.
while some users reported finding the Prize valuable or motivating, that number wasn’t quite as high as I had been hoping for
It seems like the instrumental thing here is whether users who won prizes found them motivating. Most users will not write prize-winning posts, but if the users who did were at least partially motivated by the prospect of winning one, then the world with the prize is almost certainly better than the counterfactual. More generally, if users who wrote original posts were likelier to endorse the prize than users in general, that is some indication that the prize is somewhat effective. Did you have enough data to determine whether either of these situations obtain?
I don’t have anything to say except that I loved this, and I’m really happy somebody is starting to present a warmer and fuzzier side of EA.
In general, I’m skeptical about software solutionism, but I wonder if there’s a need/appetite for group decision-making tools. While it’s unclear exactly what works for helping groups make decisions, it does seem like a structured format could provide value to lots of organizations. Moreover, tools like this could provide valuable information about what works (and doesn’t).
The usual caveats apply here: cross-country comparisons are often BS, correlation is not causation, I’m presenting smoothed densities instead of (jagged) histograms, etc, etc...
I’ve combined data on electoral system design and covid response to start thinking about the possible relationships between electoral system and crisis response. Here’s some initial stuff: the gap, in days, between first confirmed cases and first school and workplace closures. Note that n= ~80 for these two datasets, pending some cleaning and hopefully a fuller merge between the different datasets.
To me, the potentially interesting thing here is the apparently lower variability of PR government responses. But I think there’s a 75% chance that this is an illusion… there are many more PR governments than others in the dataset, and this may just be an instance of variability decreasing with sample size.
If there’s an appetite here for more like this, I’ll try and flesh out the analysis with some more instructive stuff, with the predictable criticisms either dismissed or validated.
Or of course, restrict our sample to a smaller geographic region in the US with more prevalence.
It seems like there’s a significant need right now to identify what the plausible relationship is between mask-wearing and covid19 symptoms. The virus is now widespread enough that a very quick Mechanical Turk survey could provide useful information.
Collect the following:
• Age group (5 categories)
• Wear a mask in public 1 month ago? (y/n)
• If yes to above, type of mask? (bandana/N95+/surgical/cloth/other)
• Sick with covid19 symptoms in past month? (y/n)
• Know anyone in everyday life who tested positive for covid19 in past month? (y/n)
• Postal code (for pop. density info)
Based on figures from this Gallup piece, a back-of-the-envelope says we could get usable results from surveying 20,000 Americans—but we could work with a much smaller sample if we survey in a country where the virus is more prevalent.
I’d love to see some more information about the distribution (e.g. percentiles, change since previous years, breakdown by organization size/type or by role). Is it possible to provide that while maintaining anonymity?
This is a great post and I, like @rohinmshah, feel that simply the introduction of this general class of discussion is of value to the community.
With respect to expert surveys, I am somewhat surprised that there isn’t someone in the EA community already pursuing this avenue in earnest. I think that it’s firmly within the wheelhouse of the community’s larger knowledge-building project to conduct something like the IGM experts panel across a variety of fields. I think, first, that this sort of thing is direly needed in the world at large and could have considerable direct positive effects, but secondly that it could have a number of virtues for the EA community:
Improve efficiency of additional research: Knowing what the expert consensus is on a given topic will save some nontrivial percentage of time when starting a literature review, and help researchers contextualize papers that they find over the course of the review. Expert consensus is a good starting place for a lit review, and surveys will save time and reduce uncertainty in that phase.
Let EAs know where we stand relative to the expert consensus: when we explore topics like growth as a cause area, we need to be able to (1) have a quick reference to the expert consensus at vital pivots in a conversation (e.g. do structural adjustments work?) and (2) identify with certainty where EA views might depart from the consensus.
Provide a basis for argument to policymakers and philanthropists: Appeals to authority are powerful persuasive mechanisms outside the EA community. Being able to fall back on expert consensus in any range of issues can be a powerful obstacle or motivator, depending on the issue. Here’s an example: governments around the world continue to locally relitigate conversations about the degree to which electronic voting is safe, desirable, secure or feasible. Security researchers have a pretty solid consensus on these questions—that consensus should be available to these governments and those of us who seek to influence them.
Demonstrate to those outside the community that EAs are directly linked to the mainstream research community: This is a legitimacy issue: regardless of whether the EA community ends up being broader or narrower, we are often insisting to some degree on a new way of doing things: we need to be able to demonstrate to newcomers and outsiders that we are not simply starting from scratch.
Establish continued relationships with experts across a variety of fields: Repeated deployment of these expert surveys affords opportunities for contact with experts who can be integrated into projects, sought for advice, or deployed (in the best case scenario) as voices on behalf of sensible policies or interventions.
Identify funding opportunities for further research or for novel epistemic avenues like the adversarial collaborations mentioned in the initial post: Expert surveys will reveal areas where there is no consensus. Although consensus can be and sometimes is wrong, areas where there is considerable disagreement seem like obvious avenues for further exploration. Where issues have a direct bearing on human wellbeing, uncovering a relative lack of conclusive research seems like a cause area in and of itself.
Finally, the question-finding and -constructing process is itself an important activity that requires expert input. Identifying the key questions to ask experts is itself very important research, and can result in constructive engagements with experts and others.
I agree that EAs should continue investigating and possibly advocating different voting methods, and I strongly agree that electoral reform writ large should be part of the “EA portfolio.”I don’t think EAs (qua EAs, as opposed to as individuals concerned as a matter of principle with having their electoral preferences correctly represented) should advocate for different voting methods in isolation, even though essentially all options are conceptually superior to FPTP/plurality voting.
This is because A democratic system is not the same as a utility-maximizing one. The various criteria used to evaluate voting systems in social choice theory are, generally speaking, formal representations of widely-shared intuitions about how individuals’ preferences should be aggregated or, more loosely, how democratic governments should function.Obviously, the only preferences voting systems aggregate are those over the topic being voted on. But voters have preferences over lots of other areas as well, and the choice of voting system relates only to two of them: (a) their preferences over the choice in question and (b) their meta-preferences over how preferences are aggregated (e.g. how democratic their society is).As others in this thread have pointed out, individuals’ electoral preferences cannot be convincingly said to represent their preferences over all of the other areas their choice will influence.So an individual gains utility from a voting system if and only if the utility gained by its superior representation of their preferences exceeds the utility lost in other areas lost by switching. I don’t think this is a high bar to clear, but I do think that, beyond the contrast between broadly democratic and non-democratic systems, we have next-to-no good information about the relationship between electoral systems and non-electoral outcomes.In the simplest terms possible: we know that some voting systems are better than others when it comes to meeting our intuitive conception of democratic government. But we’re concerned about people’s welfare beyond just having people’s electoral preferences represented, and we don’t know what the relationship between these things is.It is totally possible that voting systems that violate the Condorcet criterion also dominate systems that meet the criterion with respect to social welfare. We simply don’t know.It’s also not clear to what degree different voting systems induce a closer relationship between individuals’ electoral preferences and their preferences over non-electoral topics, e.g. by incentivizing or disincentivizing voter education.To reiterate, I strongly support the increased interest in approval voting and RCV that we’re seeing, and I voted for it here in NYC. I want to see my own electoral preferences represented more accurately and I don’t think there is a big risk that (at least here) my other preferences will suffer. But as consequentialists I think we are on very uncertain ground.
I’m doing a lit review on the effectiveness of lobbying and on some of the relevant theoretical background that I’m planning on posting when I’m done. I feel like this is potentially very relevant but I’m not sure if people will be interested.
Just want to follow up to acknowledge that I see that you’re already conducting a survey and that I’m proposing you add a set of questions about personal beliefs/stances/positions.
This is a really cool project! Just want to plug this as a really good opportunity to rigorously study how EA ideas spread: a quick 5-minute pre- and post-survey asking participants Likert-style questions about their positions on various EA-relevant topics and perhaps their style of argument/conversation would be potentially high-value here.
Since assignment will be randomized, there’s a real opportunity here to draw causal conclusions about how ideas spread, even if the external validity will be largely restricted to the EA population.
Thanks for your response! I still have some confusion, but this is somewhat tangentially related. In your CBA, you use an NPV figure of $3752bn as the output gain from growth. This is apparently derived from India’s 1993 and 2002 growth episodes.
The CBA calculation calculates the EV of the GDP increase therefore as 0.5*0.1*3572 = $178.56 bn. You acknowledge elsewhere in your writeup that efforts to increase GDP entail some risk of harm (and likewise with the randomista approach) so my confusion lies with the elision of this possible harm from the EV calculation.
Even if the probability that a think tank induces a growth episode—e.g. the probability that a think tank influences economic policy in country X according to its own recommendations—is 10%, then there is still obviously a probability distribution over the possible influence that successfully implemented think tank recommendations would have. This should include possible harms and their attendant likelihoods, right?
I recognize that the $3,572bn figure comes directly from Pritchett as part of an assessment of the Indian experience, but it’s not obvious to me that the number encapsulates the range of possibilities for a successful (in the sense of being implemented) intervention. I may be missing something, but it seems to me that a (perhaps only slightly) more rigorous CBA would have to itself include an expected value of success that incorporates possible benefits and harms for both Growth and Randomista approaches in the line of your spreadsheet model reading “NPV (@ 5%) of output loss from growth deceleration relative to counter-factual growth.”
I understand that what you’re envisioning is a sort of high-confidence approach to growth advocacy: target only countries where improvements are mostly obvious, and then only with the most robustly accepted recommendations. I still think there is a risk of harm and that the CBA may not capture a meaningful qualitative difference between the growth and randomista approaches. In principle, at least, the use of localized, small-scale RCTs to test development programs before they are deployed avoids large-scale harm and (in my view) pushes the mass of the distribution of possible outcomes largely above 0. No such obstacle to large harms exists, or indeed is even possible, in the case of growth recommendations. Pro-growth recommendations by economists have not been uniformly productive in the past and (I think) are unlikely to be so in the future.
I still favor this approach you suggest but, given the state of the field of growth economics—and the failure of GDP/capita to capture many welfare-relevant variables that you cite at the end of the writeup—I’d be keen to see more highly quantified conversation around possible harms.
Thanks for writing this! I am coming somewhat late to the party , but I wanted to add my support for what you have both written here. I back the concerted research effort you propose and believe it somewhat likely that it will have the benefits you suggest are probable.
I was digging through the Pritchett paper in hopes of doing my own analysis, and I do have a question: how did you calculate the median figure for Vietnam that you reference in section 4 ($6,914 GDP per capita)? I’ve been looking at the Pritchett paper and I can’t quite figure it out. It seems close to the median absolute growth in $PPP presented in Pritchett’s Table 4, but I imagine that’s not right since Table 4 only lists the top 20 growth episodes from the full set of about 300. When I look at the those figures in Appendix A, though, it seems like the median growth episode calculated using PRM (without reference to dollar size) is somewhere around Ecuador’s negative growth in 1978, which doesn’t seem like it would line up even with the conversion to $PPP.
I see that you’ve written that Vietnam/89 is the median growth episode “to be affected by a think tank,” and a little research reveals that Vietnam began a concerted economic liberalization in 1986, so perhaps you have a secondary subset of growth episodes that you believe were affected by think tanks?
I can also sort of see a case for selecting the median from Table 4 of the top 20 but that seems strange since (a) the cutoff is arbitrary and (b) it doesn’t factor in the risk of harm from a think tank-influenced growth episode.
Thanks for your response. I think I should make clear (as I really didn’t do in my initial post) that I mean my comment more broadly: when EAs think about doing ballot initiatives, they should strongly consider doing public opinion polling. In a setting where an EA advocacy group is trying to select (a) which of X effective policies to advocate and (b) in which of Y locales to advocate it, it seems (to me, at least) that polling is cost-effective, since choosing between X*Y potentially large number of independent options is a nontrivial problem that requires a rigorous approach.
In your setting, however (making the binary choice of whether or not to advocate for policy P in location L), I understand why you chose the strategy you did. Your point about the relative cost-effectiveness of talking to local politicians versus conducting an (arguably) expensive poll is well-taken. I don’t have any idea how Swiss referenda work and I conclude from your comment that voters largely follow the lead of their representatives.
I’m not sure how you’re thinking about future efforts along these lines, but if you’re planning on selecting from a longer list of policies and cantons, I think polling—in a cheap way—could challenge your legislative strategy for cost-effectiveness, at least as a guide for initial research investment.
Fantastic work! In your post introducing this initiative you wrote that the base rate for passage of ballot initiatives was 11%. A conservative reading of the data here (taking the low value of $20m for development funding raised) seems to indicate a 100:1 return on investment. Taking the base rate, this $10 in effective development aid for $1 spent on advocacy (in expectation). If the development aid is effectively spent, the implication here is that money spent on an initiative like this might be ten times as effective in expectation as money donated directly to a top-rated charity. This assumes, of course, that the base rate is accurate.
In that initial post, you had an exchange with Stefan Schubert about the relevance of your assumed base rate. You discussed the importance of polling at that point but it’s not clear to me where you left off.
This success really seems to highlight the importance of public opinion polling here. The value of information in this domain is very high, since you’re trying to identify the avenue which will provide the greatest leverage. Choosing the wrong avenue has no value, and potentially even minor reputational costs for your organization or for EA in general. Choosing the right avenue has huge upsides.
Public opinion polling seems crucial to this end. In this scenario, prior polling might have allowed you to identify a reasonable figure beforehand (avoiding the $87 million overreach). More importantly, though (if I understand the procedure correctly), it might have enabled you to avoid the counterproposal process and to pinpoint an optimal figure to ask for—perhaps one higher than the one you ultimately got.
I don’t want to diminish the achievement here, which I think is huge; I just want to point out that extremely useful information for this effort can be retrieved from the public at relatively low cost. In the future, this information can be used to reduce the uncertainty around efforts to fund ballot proposals and increase the expected value of these efforts by lowering the probability of failure in expectation.
I think that it’s unnecessary to go to such great (and risky) lengths to find out what the public believes with respect to issues relevant to EAs. A well-constructed survey conducted via Mechanical Turk, for example, would (in conjunction with a technique like multilevel regression and poststratification) yield very accurate estimates of public opinion at various arbitrary levels of geographic aggregation. I’d be supportive of this and would be interested in helping to design and/or fund such a survey.