I believe the paper you’re referring to is “Water Treatment And Child Mortality: A Meta-Analysis And Cost-effectiveness Analysis” by Kremer, Luby, Maertens, Tan, & Więcek (2023).
The abstract of this version of the paper (which I found online) says:
We estimated a mean cross-study reduction in the odds of all-cause under-5 mortality of about 30% (Peto odds ratio, OR, 0.72; 95% CI 0.55 to 0.92; Bayes OR 0.70; 95% CrI 0.49 to 0.93). The results were qualitatively similar under alternative modeling and data inclusion choices. Taking into account heterogeneity across studies, the expected reduction in a new implementation is 25%.
That’s a point estimate of a 25-30% reduction in mortality (across 3 methods of estimating that number), with a confidence/credible interval that has a lower bound of a 7-8% reduction in mortality. So, it’s a fairly noisy estimate, due to some combination of the noisiness of individual studies and the heterogeneity across different studies.
That interval for the reduction in mortality just barely overlaps with your number that “Sub-saharan Africa diarrhoea causes 5-10% of child mortality.” (The overlap might be larger if that rate was higher than 5-10% in the years & locations where the studies were conducted.)
So it could be that the clean water interventions prevent most children’s deaths from diarrhoea and few other deaths, if the mortality reduction is near the bottom of the range that Kremer & colleagues estimate. Or they might prevent a decent chunk of other deaths, but not nearly as many as your Part 4 chart & list suggest, if the true mortality reduction is something like 15%.
There is generally also a possibility of a meta-analysis giving inflated results, due to factors like publication bias affecting which studies they include or other methodological issues in the original studies, which could mean that the true effect is smaller than the lower bound of their interval. I don’t know how likely that is in this case.
Here’s a more detailed look at their meta-analysis results:
These numbers seem pretty all-over-the-place. On nearly every question, the odds given by the 7 forecasters span at least 2 orders of magnitude, and often substantially more. And the majority of forecasters (4/7) gave multiple answers which seem implausible (details below) in ways that suggest that their numbers aren’t coming from a coherent picture of the situation.
I have collected the numbers in a spreadsheet and highlighted (in red) the ones that seem implausible to me.
Odds span at least 2 orders of magnitude:
Another commenter noted that the answers to “What is the probability that Russia will use a nuclear weapon in Ukraine in the next MONTH?” range from .001 to .27. In odds that is from 1:999 to 1:2.7, which is an odds ratio of 369. And this was one of the more tightly clustered questions; odds ratios between the largest and smallest answer on the other questions were 144, 42857, 66666, 332168, 65901, 1010101, and (with n=6) 12.
Other than the final (tactical nuke) question, these cover enough orders of magnitude for my reaction to be “something is going on here; let’s take a closer look” rather than “there are some different perspectives which we can combine by aggregating” or “looks like this is roughly the range of well-informed opinion.”
Individual extreme outlier answers:
Two forecasters gave an estimate on one of the component questions that was more than 2 orders of magnitude away from the next closest estimate (odds ratio over 100).
On the question “Conditional on Russia using a nuclear weapon in Ukraine, what is the probability that nuclear conflict will scale beyond Ukraine in the next YEAR after the initial nuclear weapon use?”, one forecaster gave the answer 10^-5. The next smallest answer was 0.0151, an odds ratio of 1533. On the MONTH version of this question, the ratio was 130. So the 10^-5 answer differs wildly from each of the other answers, and also (IMO) seems implausibly low.
On the question “Conditional on the nuclear conflict expanding to NATO, what is the chance that London would get hit, one MONTH after the first non-Ukraine nuclear bomb is used?”, the largest answer was .9985 and the 2nd largest was 0.5, an odds ratio of 666. The ratio was the same for the YEAR version of this question. This multiple-orders-of-magnitude outlier from all the other forecasts also seems implausibly high to me.
Implausible month-to-year ratios:
We can compare the answers to “Conditional on Russia using a nuclear weapon in Ukraine, what is the probability that nuclear conflict will scale beyond Ukraine in the next MONTH after the initial nuclear weapon use?” to the YEAR version of this question to see how likely each forecaster thought that the escalation would happen within a month, conditional on it happening within a year. From smallest to largest, these probabilities for p(escalation within a MONTH | escalation within a YEAR) are .067, .086, .5, .6, .75, .75, 1. Probabilities below 10% seem implausible here, both considering the question (nuclear escalation will very likely take more than a month if it happens?) and considering the other estimates, but 2 forecasters are in that range. (A probability of 1 would be implausibly high if forecasters were estimating it directly, but given that this is calculated from 2 probabilities and many answers only had 1 sigfig I guess it’s not a major issue.)
Similarly, the implied estimates for p(London hit within a MONTH of a non-Ukraine nuke | London hit within a YEAR of a non-Ukraine nuke) are, from smallest to largest, .17, .2, .5, .89, 1, 1, 1. Again, low probabilities (.2 or smaller) seem implausible.
Conjunction vs. direct elicitation:
One sanity check in the original post is comparing the implied probability for a London nuke (based on p(London within a month | escalation), p(escalation within a month | Ukraine nuke), and p(Ukraine nuke within a month)) with the directly elicited p(London nuke in October). The implied probability covers a longer time period (since the monthlong window resets with each event), but the directly elicited probability covers all paths to London being nuked (not just the path via escalation from Russia nuking Ukraine), so it’s not obvious which should be larger, but I think they should be close (and Nuño thought the conjunction should be larger).
Looking at each forecaster, the ratio of p(London nuke in October) to the conjunction, from smallest to largest, is .57, .62, 1.04, 8, 20, 25, 48. Five of seven forecasters gave estimates which imply that the direct estimate (shorter timeframe, more pathways) is larger. Four of them gave estimates which imply a ratio of 8 or higher, which seems implausible.
And all four of those forecasters gave at least one of the other implausible forecasts mentioned above (an outlier individual estimate and/or an implausible month:year ratio). The three forecasters who have plausible ratios here (.57, .62, 1.04) did not give any of the implausible answers according to my other two sanity checks.
Bottom line:
3 of the 7 forecasters passed all three of these sanity checks. The other 4 forecasters each failed at least 2 of these sanity checks.
Aggregation which treats all this as noise and tries to find the central tendency helps keep the final estimate in a plausible range (and generally within the range of the 3 forecasters who passed the sanity checks), but it still seems possible to do significantly better.
IMO the epistemic status here is not seven good generalist forecasters who have thought carefully enough about these questions to give well-considered estimates, aggregated with some math that helps combine their different perspectives. Instead, the math is mainly just helping to filter out the not-carefully-considered answers.