Jonas loves his wife, being in nature, and exploring interesting worlds both fictional and real. He uses his bamboo bike daily to get around in Munich. He’s currently a freelance software engineer, and was working at the Against Malaria Foundation and Google before that. Jonas enjoys playing Ultimate and dancing.
Sjlver
Personally, I’m not using the forum as much as I could and as much as I used to, because it is a time-sink. I’m the kind of person who can easily get lost on the Internet; clicking a link here and opening another tab there, and… look where those two hours went. Because of this, I’m wary of spending too much time here.
I don’t know whether my declining forum use is due to changes in my behavior or changes to the forum. Probably it’s a combination. On the forum side, the home page feels a bit more cluttered than it used to be. The forum feels slightly more gamified (e.g., emoji reactions).
I don’t have concrete suggestions, other than thinking about what would be an ideal time for users to spend on the forum. A time that takes both the forum quality and its user’s productivity into account.
OP here :) Thanks for the interesting discussion that the two of you have had!
Lukas_Gloor, I think we agree on most points. Your example of estimating a low probability of medical emergency is great! And I reckon that you are communicating appropriately about it. You’re probably telling your doctor something like “we came because we couldn’t rule out complication X” and not “we came because X has a probability of 2%” ;-)
You also seem to be well aware of the uncertainty. Your situation does not feel like one where you went to the ER 50 times, were sent home 49 times, and have from this developed a good calibration. It looks more like a situation where you know about danger signs which could be caused by emergencies, and have some rules like “if we see A and B and not C, we need to go to the ER”.[1]
Your situation and my post both involve low probabilities in high-stakes situations. That said, the goal of my post is to remind people that this type of probability is often uncertain, and that they should communicate this with the appropriate humility.
That’s how I would think about it, at least… it might well be that you’re more rational than I, and use probabilities more explicitly. ↩︎
Richard Chappell writes something similar here, better than I could. Thanks Lizka for linking to that post!
Pascalian probabilities are instead (I propose) ones that lack robust epistemic support. They’re more or less made up, and could easily be “off” by many, many orders of magnitude. Per Holden Karnofsky’s argument in ‘Why we can’t take explicit expected value estimates literally’, Bayesian adjustments would plausibly mandate massively discounting these non-robust initial estimates (roughly in proportion to their claims to massive impact), leading to low adjusted expected value after all.
Maybe I should have titled this post differently, for example “Beware of non-robust probability estimates multiplied by large numbers”.
I agree that our different reactions come partly from having different intuitions about the boundaries of a thought experiment. Which factors should one include vs exclude when evaluating answers?
For me, I assumed that the question can’t be just about expected values. This seemed too trivial. For simple questions like that, it would be clearer to ask the question directly (e.g., “Are you in favor of high-risk interventions with large expected rewards?”) than to use a thought experiment. So I concluded that the thought experiment probably goes a bit further.
If it goes further, there are many factors that might come into play:
How certain are we of the numbers?
Are there any negative effects if the intervention fails? These could be direct negative outcomes, but also indirect ones like difficulty to raise funds in the future, reputation loss...
Are we allocating a small part of a budget, or our total money? Is this a repeated decision or a one-off?
I had no good answers, and no good guesses about the question’s intent. Maybe this is clearer for you, given that you mention “the way EA culture has handled thought experiments thus far” in a comment below. I, for one, decided to skip the question :/
This is a great point.
Clearly you are right. That said, the examples that you give are the kind of frequentist probabilities for which one can actually measure rates. This is quite different from the probability given in the survey, which presumably comes from an imperfect Bayesian model with imprecise inputs.
I also don’t want to belabor the point… but I’m pretty sure my probability of being stuck by lightning today is far from 0.001%. Given where I live and today’s weather, it could be a few orders of magnitude lower. If I use your unadjusted probability (10 micromorts) and am willing to spend $25 to avert a micromort, I would conclude that I should invest $250 in lightning protection today… that seems the kind of wrong conclusion that my post warns about.
I think humility is useful in cases like the present survey question, when a specific low probability, derived from an imperfect model, can change the entire conclusion. There are many computations where the outcome is fairly robust to small absolute estimation errors (e.g., intervention (1) in the question). On the other hand, for computations that depend on a low probability with high sensitivity, we should be extra careful about that probability.
Sorry for having been imprecise in my post—I wrote the question from memory after having already submitted the survey. I’ll change it to “avert”.
Probabilities might be off by one percentage point
There is some public information about this here: https://www.givewell.org/charities/amf#Registration
Details vary by country. It’s often a process where enumerators go door-to-door and interview the head of household to determine how many people live in a household. There can be some incentives to over-report the number of people, to receive more bednets. However, there is a limit on the number of nets per household (usually 3 or 4), and some of the data is independently verified by a second team of enumerators.
For what it’s worth, AMF has population data from distributing bednets to every household. As an organization that cares about being highly effective, AMF tries hard to get the number of nets right. The target is to have approximately one net per 1.8 people (a net covers two people usually, but then there are households with an odd number of people or with pregnant women).
AMF distributed nets in five Nigerian states in the last two years. You can see these distributions here: https://www.againstmalaria.com/Distributions.aspx?MapID=68
AMF reports the population for each state; to see them, click on the state name, then on “Pre-Distribution”. These numbers are:
I’ve compared the numbers with those from UNFPA, from here: https://data.humdata.org/dataset/cod-ps-nga
It looks like AMF’s numbers are quite a bit higher, except in Bauchi state. This makes me slightly less willing to believe that Nigeria’s population numbers are inflated. But of course, AMF could have been a victim of bad initial population estimates, or could have had left-over nets that were then given to routine distribution or used in other locations. I don’t have any information about that.
I’ve appreciated this response.
The biggest discrepancy seems to be around the number of nurses:
Lee writes that 1,709 nurses emigrated from Nigeria to the UK in a year, and that the UK takes ~85% of the total.
Nick cites a Guardian article claiming that 15,000 nurses emigrate per year, and says that less than 25% go to the UK
Any insight on these large differences?
There is now also a translation into Toki Pona.
TLDR: Full-stack software engineer (previously at Google and AMF) looking for part-time opportunities.
Skills & background: Expertise in software engineering for backend and frontend development, using a wide range of tech stacks. At AMF, I also worked on many data science tasks: automatic importing and cleaning of data, analyzing geospatial data, database design and optimizations. I have a security mindset and have done PhD research on software testing and hardening. I enjoy working with team members and partner organizations, and have excellent communication skills in English, French, and German.
Location/remote: Munich, Germany. Open for (and experienced in) remote work.
Availability & type of work: Ideally 20h/week. I can offer a lot of flexibility.
Resume/CV/LinkedIn: https://blog.purpureus.net/cv/
Email/contact: Jonas Wagner ltlygwayh@gmail.com
Other notes: I’m particularly interested in work that has a clear and simple theory of change. While I am most experienced in global health and development, I am open to any cause area. I value meaningful work over high pay.
For European people on a budget, here’s a multivitamin at €0.07 per day: https://www.amazon.de/-/en/Multivitamins-Minerals-Multivitamin-Essential-Vitamins/dp/B08BX439HX They don’t deliver to the US, though. And you might want to add in some omega 3 fatty acids (DHA/EPA) for a more complete supplementation
What you write is almost right, but not 100%… we are getting at the heart of the problem here. Thanks for making me re-think this and state it more clearly!
Edited to add: I’ve now also read the discussion that you’ve linked to in your comment. It is now clear to me that the team has thought through issues like this… so I wouldn’t be angry if you prefer to use your time more wisely than for responding to my ramblings :)
Assume as an example that, without my vote, there is the following situation:
candidate A received 933 points from other voters
candidate B: 977 points
candidate C: 1000 points
candidate D: 1001 points
In this case:
If I put most of my votes to A, it gets in the top three along with C and D
If I put most of my votes to B, it gets in the top three along with C and D
If I split my 100 points just right, A and B both get in the top three
I understand that this is a constructed example with low probability of happening. It is meant to illustrate the case where, as a voter, I would like to support two candidates, but my support for one will hurt the other, and vice versa.
As a voter, I’d be particularly vexed if I had allocated 60 votes to A and 40 to B. In that case, I would have caused B to eliminate A, despite having more strongly supported A. This could not happen in approval voting, non-weighted instant run-off voting, or any Condorcet voting method.
As I wrote earlier, no voting system is perfect. For each system, one can construct silly counter-examples for which the system behaves counter-intuitively. For the subproblems “top-3 election” and “funding allocation”, there are known solutions, for which the counter-intuitive situations are somewhat well understood. In your case, you have combined the two sub-problems into one harder combined problem. This makes it more difficult to reason about corner-cases, and creates a few more undesired incentives for strategic voting.
I don’t think this is a critical flaw, so there is no urgent need to change things. If you did choose to change the approach, you might end up with two separate voting steps that are simpler and require fewer explanations than the current system.
Thanks for setting up this donation election!
Choosing voting methods is difficult, and no voting method is without flaw. Nevertheless, I am somewhat unhappy with the method proposed here, because it is very difficult for users to support multiple candidates. The problem arises because the method tried to do two things: (1) determine which candidates are in the top three, and (2) determine their relative popularity.
The problem: as a voter who likes two candidates A and B, I cannot support A without harming B, and vice versa. My rational behavior is to allocate all points to either A or B, to maximize the chance that one of them ends up in the top three. If I split my points between two candidates, I face the risk that neither makes it in the top three.
Other voting methods behave better with respect to this problem. For example, if we used approval voting to determine the top three, I could vote for both A and B without one vote harming the other. Similarly, in classical instant runoff voting without weights, I can put A and B at the top of my list, without having to work about negative consequences for either of them.
I think that this problem is best solved with a two-step voting process. In a first step, determine the top three candidates. In a second step, determine relative allocation of money. The second step would probably use different information than the first. This could be done with the current weights, if the first step considered only the order of candidates on the ballots.
This is very well written. Thanks! It’s the kind of article that sparks (my) curiosity.
I looked for some information on Helvetas’ website. Helvetas is a Swiss charity that has been running safe water interventions for about 50 years; they are funded by private donors, but also receive development aid money from the Swiss government.
Helvetas provides some ideas why water interventions might help, besides diarrhea:
Disproportionally helps women and girls: Women and girls in poor communities often spend several hours a day fetching water ⇒ big opportunity cost, probably unhealthy for their heads and back.
Unsafe water is used in critical situations, such as during child delivery. (edited to add: There are hints that this might be significant. For example, WHO and many organizations work to promote breastfeeding, and this is shown to reduce child mortality. Presumably many of the averted deaths would be due to unsafe water)
There are also positive effects of water management in general. These don’t apply to chlorination or to filters at existing wells… but I found it helpful to consider more holistic approaches to water:
Improving water resource management is also key to equitable development, climate change adaptation, disaster risk reduction, sustainable agriculture and the prevention of conflicts.
Unfortunately, Helvetas’ websites and reports are somewhat light on research. They provide numbers for the number of people reached by their programs, but to my knowledge there isn’t any cost-effectiveness analysis. +1 that we need more research!
Thanks! I completely understand… putting these systems in place can be time-consuming, and the regulations differ for each country.
I hope you’ll find great US/Canada candidates!
PS, but only tangentially related: I’ve recently documented the situation of someone working in Germany for an international organization, at https://blog.purpureus.net/posts/how-to-work-in-germany-for-a-foreign-organization/
This sounds interesting, thanks for posting!
I noted that the application is open to candidates in the US or Canada. Is that a strict requirement, or could you make exceptions?
Here are some reasons why I think that units of ~100 households are ideal. The post itself has more examples.
-
It’s best for detailed planning. There is a type of humanitarian/development work that tries to reach every household in a region. Think vitamin A supplementation, vaccination programs, bednet distributions, cash transfers, … For these, one typically needs logistics per settlement, such as a contact person/agent/community health worker, some means of transportation, a specific amount of bednets/simcards/..., etc.
Of course, the higher levels of the location hierarchy (health areas, counties, districts, …) are also needed. But these are often not sufficient for planning. Also note that some programs use other units of planning altogether (e.g., schools or health centers), but the settlement is common.
-
It’s great for monitoring. The interventions mentioned above typically want to reach 100% settlement coverage. It makes sense to monitor things at that level, i.e., ensure that each settlement is reached.
-
It’s great for research. Many organizations use household sampling surveys. These are typically clustered, which means that researchers select a given number of “enumeration units”, and then sample a fixed number of households in each unit. Ideally, these enumeration units have roughly even size, clear and well-understood boundaries, and known population counts. The type of locations that I’m aiming for would make good enumeration units.
-
This type of place name is used and known. For example, people in the region will know where “Kalamu” is. There will likely be a natural contact person, such as a village chief. There will be a road that leads there and a way to obtain transportation. One can ask questions like “is there cellphone coverage in Kalamu” and get a good answer. In the majority of cases, a place name is a well-understood, unambiguous and meaningful concept.
The final reason is about data availability: settlement names are usually the most detailed names available, and their names are reasonably stable and accepted. The data exists, we only need to collect and aggregate and publish it. In contrast, streets or buildings often don’t have names, so we can’t easily have more fine-grained data than place names. Plus, there are some solutions like Plus Codes for situations where address-like data are preferred.
-
EA charities can also combine education and global health, like https://healthlearn.org/blog/updated-impact-model
HealthLearn builds a mobile app for health workers (nurses, midwives, doctors, community health workers) in Nigeria und Uganda. Health workers use it to learn clinical best practices. This leads to better outcomes for patients.
I’m personally very excited by this. Health workers in developing countries often have few training resources available. There are several clinical practices that can improve patient outcomes while being easy to implement (such as initiating breastfeeding immediately after birth). These are not as widely used as we would like.
HealthLearn uses technology as a way to faithfully scale the intervention to thousands of health workers. At this point, AI does not play a significant role in the learning process yet. Courses are manually designed. This was important to get started quickly, but also to get approval from government health agencies and professional organizations such as nursing councils.
The impact model that I’ve linked to above estimates that the approach has been cost-effective so far, and could become better with scale.
(disclaimer: I’m one of the software engineers building the app)