FWIW, itâs unclear to me how persuasive COVID-19 is as a motivating case for epistemic modesty. I can also recall plenty of egregious misses from public health/âepi land, and I expect re-reading the podcast transcript would remind me of some of my own.
On the other hand, the bar would be fairly high: I am pretty sure both EA land and rationalist land had edge over the general population re. COVID. Yet the main battle would be over whether they had âedge generallyâ over âconsensus/âaugust authoritiesâ.
Adjudicating this seems murky, with many âpick and chooseâ factors (âreasonable justification for me, desperate revisionist cope for theeâ, etc.) if you have a favoured team you want to win. To skim a few:
There are large âblobsâ on both sides, so you can pick favourable/âunfavourable outliers for any given âpointâ you want to score. E.g. I recall in the early days some rationalists having homebrew âgrocery sterilization proceduresâ, and EA buildings applying copper tape, so PH land wasnât alone in getting transmission wrong at first. My guess is in aggregate EA/ârationalist land corrected before the preponderance of public health (so 1-0 to them), but you might have to check the (internet) tape on what stuff got pushed and abandoned when.
Ditto picking and choosing what points to score, how finely to individuate them, or how to aggregate (does the copper tape fad count as a âpoint to PHâ because they generally didnât get on board as it cancels out all the stuff on handwashingâso 1-1? Or much less as considerably less consequential â 2-1, and surely they were much less wrong about droplets etc., so >3-1?). âDumbest COVID policies canvassed across the G7 vs. most perceptive metaculus commentsâ is a blowout, but so too âWorst âin houseâ flailing vs. Singaporeâ.
Plenty of epistemic dark matter to excuse misses. Maybe the copper tape thing was just an astute application of precaution under severe uncertainty (or even mechanistically superior/âinnovative âhedgingâ for transmission route vs. PH-land âWash your hands!!â) Or maybe ânon-zero COVIDâ was the right call ex-ante, and the success stories which locked down until vaccine deployment were more âluckyâ (re. vaccines arriving soon enough and the early variants of COVID being NPI-suppressible enough before then) than âgoodâ (fairly pricing the likelihood of this âoutâ and correctly assessing it was worth âplaying toâ).
It is surprisingly hard for me to remember exactly what I âhad in mindâ for some of my remarks at-the-time, despite the advantage being able to read exactly what I said, and silent variations (e.g, subjective credence behind âmaybeâ or mentioning, rationale) would affect the âscoringâ a lot. I expect I would fare much worse in figuring this out for others, still less âprevailing view of [group] at [date]â
Plenty of bulveristic stories to dismiss hits as âyou didnât deserve to be rightâ. âEAs and rationalists generally panicked and got tilted off the face of the planet during COVID, so always advocated for maximal risk aversionâthey were sometimes right, but seldom truth-trackingâ/â âAuthorities tended staid, suppressive of any inconvenient truths, and ineffectual, and got lucky when no action was the best option as well as the default outcome.â
Sometimes what was âproven rightâ ex post remains controversial.
For better or worse, I still agree with my piece on epistemic modesty, although perhaps I find myself an increasing minority amongst my peers.
To clarify, are you saying that, in retrospect, the process through which people in EA did research on epidemiology, public health, and related topics looks any better to you now that it looked to you back in April 2020 when you did this interview?
I think I understand your point that it would probably be nearly impossible to score the conclusions in a way that people in EA would agree is convincing or fair â thereâs tons of ambiguity and uncertainty, hence tons of wiggle room. (I hope Iâm understanding that right.)
But in the April 2020 interview, you said that many of these conclusions were akin to calling a coin flip. Crudely, many interventions that experts were still debating could be seen as roughly having a 50-50 chance of being good or bad (or maybe itâs anywhere from 70-30 to 30-70, doesnât really matter), so any conclusion that an intervention is good or bad has a roughly 50-50 chance of being right. You said a stopped clock is right twice a day, and it may turn out that Donald Trump got some things right about the pandemic, but if so, it will be through dumb luck rather than good science.
So, Iâm curious: leaving aside the complicated and messy question of scoring the conclusions, do you now think the EA communityâs approach to the science â particularly, the extent to which they wanted to do it themselves, as non-experts, rather than just trying to find the expert consensus on any given topic, or even seeing if any expert would talk to them about it (e.g. in 2020, you suggested some names of experts to have on the 80,000 Hours Podcast) â was any less bad than you saw it in 2020?
Iâd say my views now are roughly the same now as they were then. Perhaps a bit milder, although I am not sure how much of this is âThe podcast was recorded at a time I was especially/â?unduly annoyed at particular EA antics which coloured my remarks despite my best efforts (such as they were, and alas remain) at moderationâ (the complements in the pre-amble to my rant were sincere; I saw myself as hectoring a minority), vs. âTime and lapses of memory have been a salve for my apoplexyâbut if I could manage a full recounting, I would reprise my erstwhile rageâ.
But at least re. epistemic modesty vs. âEA/ârationalist exceptionalismâ, what ultimately decisive is overall performance: ~âActually, we donât need to be all that modest, because when we strike out from âexpert consensusâ or hallowed authorities, we tend to be proven rightâ. Litigating this is harder still than re. COVID specifically (even if âEA landâ spanked âcredentialed expertise landâ re. COVID, its batting average across fields could still be worse, or vice versa),
Yet if I was arguing against my own position, what happened during COVID facially looks like fertile ground to make my case. Perhaps it would collapse on fuller examination, but certainly doesnât seem compelling evidence in favour of my preferred approach on its face.
Iâm curious why you say that about the accuracy/âperformance of the conclusions of the EA community with regard to covid. Are you saying itâs just overly complicated and messy to evaluate these conclusions now, even to your own satisfaction? Or you do personally have a sense of how good/âbad overall the conclusions were, you just donât think you could convince people in EA of your sense of things?
The comparison that comes to mind for me is how amateur investors (including those who donât know the first thing about investing, how companies are valued, GAAP accounting, and so on) always seem to think theyâre doing a great job. Part of this is they typically donât even benchmark their performance against market indexes like the S&P 500. Or, if they do, they do it in a really biased, non-rigorous way, e.g. oh, my portfolio of 3 stocks went up a lot recently, let me compare it to the S&P 500 year-to-date now. So, theyâre not even measuring their performance properly in the first place, yet they seem to believe this is a great idea and theyâre doing a great job anyway.
Studies of even professional investors find itâs rare for an investor to beat the market over a 5-year period, and even rarer for an investor who beats the market in a 5-year period to beat the market again in the next 5-year period. There actually seems to be surprisingly weak correlation between beating the market in one period to the next. Using your coin flip analogy, if every stock trade is a bet on a roughly 50â50 proposition, i.e., âthis stock will beat the marketâ or âthis stock wonât beat the marketâ, then you need a large sample size of trades to rule out the influence of chance. Itâs so easy for amateurs to cherry-pick trades, prematurely declare victory (e.g. say they beat the market the moment a stock goes up a lot, rather than waiting until the end of the quarter or the end of the year), become overconfident on too small a number of trades (e.g. just bought Apple stock), or not even benchmark their performance against the market at all.
Seeing these irrationalities so often and so viscerally, and even seeing how hard it is to talk people out of them even when you can show them the research and expert opinion, or explain these concepts, Iâm extremely skeptical of people who just an intuitive, gut feeling that theyâve outperformed experts on making calls or predictions with a statistically significant sample size of calls, in the absence of any kind of objective accounting of their performance. It just seems too tempting, feels too good, to feel like one is winning, to take a moment of sober second thought and double-check that feeling against an objective measure (in the case of stocks, checking a market index), wonder if you can rule out luck (e.g. just buying Apple and thatâs it), and wonder if you can rule out bias in your assessment of performance (e.g., checking the S&P 500 when your favourite stock has just gone up a lot).
If the process was as bad as you say, as in, people who have done a few weeks of reading on the relevant science and medicine making elementary mistakes, then Iâm very skeptical of the amount of psychological bias involved in people recalling and subjectively assessing their own track record, or any sense of confidence they have about that. It seems like if we donât need people who understand science and medicine to do science and medicine properly, then a lot of our education system and scientific and medical institutions are a waste. Given that itâs just so commonsense that understanding a subject better should lead you to make better calls on that subject â overall, over the long term, statistically â we should not violate common sense on the basis of a few amateurs guessing a few coin flips better than experts, and we should especially not violate common sense when we canât even confirm whether that actually happened.
FWIW, itâs unclear to me how persuasive COVID-19 is as a motivating case for epistemic modesty. I can also recall plenty of egregious misses from public health/âepi land, and I expect re-reading the podcast transcript would remind me of some of my own.
On the other hand, the bar would be fairly high: I am pretty sure both EA land and rationalist land had edge over the general population re. COVID. Yet the main battle would be over whether they had âedge generallyâ over âconsensus/âaugust authoritiesâ.
Adjudicating this seems murky, with many âpick and chooseâ factors (âreasonable justification for me, desperate revisionist cope for theeâ, etc.) if you have a favoured team you want to win. To skim a few:
There are large âblobsâ on both sides, so you can pick favourable/âunfavourable outliers for any given âpointâ you want to score. E.g. I recall in the early days some rationalists having homebrew âgrocery sterilization proceduresâ, and EA buildings applying copper tape, so PH land wasnât alone in getting transmission wrong at first. My guess is in aggregate EA/ârationalist land corrected before the preponderance of public health (so 1-0 to them), but you might have to check the (internet) tape on what stuff got pushed and abandoned when.
Ditto picking and choosing what points to score, how finely to individuate them, or how to aggregate (does the copper tape fad count as a âpoint to PHâ because they generally didnât get on board as it cancels out all the stuff on handwashingâso 1-1? Or much less as considerably less consequential â 2-1, and surely they were much less wrong about droplets etc., so >3-1?). âDumbest COVID policies canvassed across the G7 vs. most perceptive metaculus commentsâ is a blowout, but so too âWorst âin houseâ flailing vs. Singaporeâ.
Plenty of epistemic dark matter to excuse misses. Maybe the copper tape thing was just an astute application of precaution under severe uncertainty (or even mechanistically superior/âinnovative âhedgingâ for transmission route vs. PH-land âWash your hands!!â) Or maybe ânon-zero COVIDâ was the right call ex-ante, and the success stories which locked down until vaccine deployment were more âluckyâ (re. vaccines arriving soon enough and the early variants of COVID being NPI-suppressible enough before then) than âgoodâ (fairly pricing the likelihood of this âoutâ and correctly assessing it was worth âplaying toâ).
It is surprisingly hard for me to remember exactly what I âhad in mindâ for some of my remarks at-the-time, despite the advantage being able to read exactly what I said, and silent variations (e.g, subjective credence behind âmaybeâ or mentioning, rationale) would affect the âscoringâ a lot. I expect I would fare much worse in figuring this out for others, still less âprevailing view of [group] at [date]â
Plenty of bulveristic stories to dismiss hits as âyou didnât deserve to be rightâ. âEAs and rationalists generally panicked and got tilted off the face of the planet during COVID, so always advocated for maximal risk aversionâthey were sometimes right, but seldom truth-trackingâ/â âAuthorities tended staid, suppressive of any inconvenient truths, and ineffectual, and got lucky when no action was the best option as well as the default outcome.â
Sometimes what was âproven rightâ ex post remains controversial.
For better or worse, I still agree with my piece on epistemic modesty, although perhaps I find myself an increasing minority amongst my peers.
To clarify, are you saying that, in retrospect, the process through which people in EA did research on epidemiology, public health, and related topics looks any better to you now that it looked to you back in April 2020 when you did this interview?
I think I understand your point that it would probably be nearly impossible to score the conclusions in a way that people in EA would agree is convincing or fair â thereâs tons of ambiguity and uncertainty, hence tons of wiggle room. (I hope Iâm understanding that right.)
But in the April 2020 interview, you said that many of these conclusions were akin to calling a coin flip. Crudely, many interventions that experts were still debating could be seen as roughly having a 50-50 chance of being good or bad (or maybe itâs anywhere from 70-30 to 30-70, doesnât really matter), so any conclusion that an intervention is good or bad has a roughly 50-50 chance of being right. You said a stopped clock is right twice a day, and it may turn out that Donald Trump got some things right about the pandemic, but if so, it will be through dumb luck rather than good science.
So, Iâm curious: leaving aside the complicated and messy question of scoring the conclusions, do you now think the EA communityâs approach to the science â particularly, the extent to which they wanted to do it themselves, as non-experts, rather than just trying to find the expert consensus on any given topic, or even seeing if any expert would talk to them about it (e.g. in 2020, you suggested some names of experts to have on the 80,000 Hours Podcast) â was any less bad than you saw it in 2020?
Iâd say my views now are roughly the same now as they were then. Perhaps a bit milder, although I am not sure how much of this is âThe podcast was recorded at a time I was especially/â?unduly annoyed at particular EA antics which coloured my remarks despite my best efforts (such as they were, and alas remain) at moderationâ (the complements in the pre-amble to my rant were sincere; I saw myself as hectoring a minority), vs. âTime and lapses of memory have been a salve for my apoplexyâbut if I could manage a full recounting, I would reprise my erstwhile rageâ.
But at least re. epistemic modesty vs. âEA/ârationalist exceptionalismâ, what ultimately decisive is overall performance: ~âActually, we donât need to be all that modest, because when we strike out from âexpert consensusâ or hallowed authorities, we tend to be proven rightâ. Litigating this is harder still than re. COVID specifically (even if âEA landâ spanked âcredentialed expertise landâ re. COVID, its batting average across fields could still be worse, or vice versa),
Yet if I was arguing against my own position, what happened during COVID facially looks like fertile ground to make my case. Perhaps it would collapse on fuller examination, but certainly doesnât seem compelling evidence in favour of my preferred approach on its face.
Thanks, thatâs very helpful.
Iâm curious why you say that about the accuracy/âperformance of the conclusions of the EA community with regard to covid. Are you saying itâs just overly complicated and messy to evaluate these conclusions now, even to your own satisfaction? Or you do personally have a sense of how good/âbad overall the conclusions were, you just donât think you could convince people in EA of your sense of things?
The comparison that comes to mind for me is how amateur investors (including those who donât know the first thing about investing, how companies are valued, GAAP accounting, and so on) always seem to think theyâre doing a great job. Part of this is they typically donât even benchmark their performance against market indexes like the S&P 500. Or, if they do, they do it in a really biased, non-rigorous way, e.g. oh, my portfolio of 3 stocks went up a lot recently, let me compare it to the S&P 500 year-to-date now. So, theyâre not even measuring their performance properly in the first place, yet they seem to believe this is a great idea and theyâre doing a great job anyway.
Studies of even professional investors find itâs rare for an investor to beat the market over a 5-year period, and even rarer for an investor who beats the market in a 5-year period to beat the market again in the next 5-year period. There actually seems to be surprisingly weak correlation between beating the market in one period to the next. Using your coin flip analogy, if every stock trade is a bet on a roughly 50â50 proposition, i.e., âthis stock will beat the marketâ or âthis stock wonât beat the marketâ, then you need a large sample size of trades to rule out the influence of chance. Itâs so easy for amateurs to cherry-pick trades, prematurely declare victory (e.g. say they beat the market the moment a stock goes up a lot, rather than waiting until the end of the quarter or the end of the year), become overconfident on too small a number of trades (e.g. just bought Apple stock), or not even benchmark their performance against the market at all.
Seeing these irrationalities so often and so viscerally, and even seeing how hard it is to talk people out of them even when you can show them the research and expert opinion, or explain these concepts, Iâm extremely skeptical of people who just an intuitive, gut feeling that theyâve outperformed experts on making calls or predictions with a statistically significant sample size of calls, in the absence of any kind of objective accounting of their performance. It just seems too tempting, feels too good, to feel like one is winning, to take a moment of sober second thought and double-check that feeling against an objective measure (in the case of stocks, checking a market index), wonder if you can rule out luck (e.g. just buying Apple and thatâs it), and wonder if you can rule out bias in your assessment of performance (e.g., checking the S&P 500 when your favourite stock has just gone up a lot).
If the process was as bad as you say, as in, people who have done a few weeks of reading on the relevant science and medicine making elementary mistakes, then Iâm very skeptical of the amount of psychological bias involved in people recalling and subjectively assessing their own track record, or any sense of confidence they have about that. It seems like if we donât need people who understand science and medicine to do science and medicine properly, then a lot of our education system and scientific and medical institutions are a waste. Given that itâs just so commonsense that understanding a subject better should lead you to make better calls on that subject â overall, over the long term, statistically â we should not violate common sense on the basis of a few amateurs guessing a few coin flips better than experts, and we should especially not violate common sense when we canât even confirm whether that actually happened.