I don’t think that this is what you’re saying, but I think if someone drew the lesson from your post that, when reality is underpowered, there’s no point in doing research into the question, that would be a mistake.
When I look at tiny-n sample sizes for important questions (e.g.: “How have new ideas made major changes to the focus of academic economics?” or “Why have social movements collapsed in the past?”), I generally don’t feel at all like I’m trying to get a p<0.05 ; it feels more like hypothesis generation. So when I find out that Kahneman and Tversky spent 5 years honing the article Prospect Theory into a form that could be published in an economics journal, I think “wow, ok, maybe that’s the sort of time investment that we should be thinking of”. Or when I see social movements collapse because of in-fighting (e.g. pre-Copenhagen UK climate movement), or romantic disputes between leaders (e.g. Objectivism), then—insofar as we just want to take all the easy wins to mitigate catastrophic risks to the EA community—I know that this risk is something to think about and focus on for EA.
For these sorts of areas, the right approach seems to be granular qualitative research—trying to really understand in depth what happened in some other circumstance, and then think through what lessons that entail for the circumstance you’re interested in. I think that, as a matter of fact, EA does this quite a lot when relevant. (E.g. Grace on Szilard, or existing EA discussion of previous social movements). So I think this gives us extra reason to push against the idea that “EA-style analysis” = “quant-y RCT-esque analysis” rather than “whatever research methods are most appropriate to the field at hand”. But even on qualitative research I think the “EA mindset” can be quite distinctive—certainly I think, for example, that a Bayesian-heavy approach to historical questions, often addressing counterfactual questions, and looking at those issues that are most interesting from an EA perspective (e.g. how modern-day values would be different if Christianity had never taken off), would be really quite different from almost all existing historical research.
I definitely agree we can look at qualitative data for hypothesis generation (after all, n=1 is still an existence proof). But I’d generally recommend breadth-first rather than depth-first if we’re trying to adduce considerations.
For many/most sorts of policy decisions although we may find a case of X (some factor) --> Y (some desirable outcome), we can probably also find cases of ¬X --> Y and X --> ¬Y. E.g., contrasting with what happened with prospect theory, there are also cases where someone happened on an important breakthrough with much less time/effort, or where people over-committed to an intellectual dead-end (naturally, partisans of X or ¬X tend to be good at cultivating sets of case-studies which facially support the claim it leads to Y.)
I generally see getting a steer of the correlation of X and Y (so the relative abundance of (¬/)X --> (¬/)Y across a broad reference class as more valuable than determining whether in a given case (even one which seems nearby to the problem we’re interested in) X really was playing a causal role in driving Y. Problems of selection are formidable, but I take the problems of external validity to tend even worse (and worse enough to make the former have a better ratio of insight:resources).
Thus I’d be much more interested to see (e.g.) a wide survey of cases which suggests movements prone to in-fighting tend to be less successful than an in depth look of how in-fighting caused the destruction of a nearby analogue to the EA community. Ditto the ‘macro’ in macrohistory being at least partly about trying to adduce takeaways across history, as well as trying to divine its big contours.
And although I think work like this is worthwhile to attempt, I think in some instances we may come to learn that reality is so underpowered that there’s essentially no point doing research (e.g. maybe large bits of history are just ultra-chaotic, so all we can ever see is noise).
I agree with your points, but from my perspective they somewhat miss the mark.
Specifically, your discussion seems to assume that we have a fixed, exogenously given set of propositions or factors X, Y, …, and that our sole task is to establish relations of correlation and causation between them. In this context, I agree on preferring “wide surveys” etc.
However, in fact, doing research also requires the following tasks:
Identify which factors X, Y, … to consider in the first place.
Refine the meaning of the considered factors X, Y, … by clarifying their conceptual and hypothesized empirical relationships to other factors.
Prioritize which of the myriads of possible correlational or causal relationships between the factors X, Y, … to test.
I think that depth can help with these three tasks in ways in which breadth can’t.
For instance, in Will’s example, my guess is that the main value of considering the history of Objectivism does not come from moving my estimate for the strength of the hypothesis “X = romantic involvement between movement leaders → Y = movement collapses”. Rather, the source of value is including “romantic involvement between movement leaders” into the set of factors I’m considering in the first place. Only then am I able to investigate its relation to outcomes of interests, whether by a “wide survey of cases” or otherwise. Moreover, I might only have learned about the potential relevance of “romantic involvement between movement leaders” by looking at some depth into the history of Objectivism. (I know very little about Objectivism, and so don’t know if this is true in this instance; it’s certainly possible that the issue of romantic involvement between Objectivist leaders is so well known that it would be mentioned in any 5-sentence summary one would encounter during a breadth-first process. But it also seems possible that it’s not, and I’m sure I could come up with examples where the interesting factor was buried deeply.)
My model here squares well with your observation that a “common feature among superforecasters is they read a lot”, and in fact makes a more specific prediction: I expect that we’d find that superforecasters read a fair amount (say, >10% of their total reading) of deep, small-n case studies—for example, historical accounts of a single war, economic policy, or biographies.
[My guess is that my comment is largely just restating Will’s points from his above comment in other words.]
(FWIW, I think some generators of my overall model here are:
Frequently experiencing disagreements I have with others, especially around AI timelines and takeoff scenarios, as noticing a thought like “Uh… I just think your overall model of the world lacks depth and detail.” rather than “Wait, I’ve read about 50 similar cases, and only 10 of them are consistent with your claim”.
Semantic holism, or at least some of the arguments usually given in its favor.
Some intuitive and fuzzy sense that, in the terminology of this Julia Galef post, being a “Hayekian” has worked better for me than being a “Planner”, including for making epistemic progress.
Some intuitive and fuzzy sense of what I’ve gotten out of “deep” versus “broad” reading. E.g. my sense is that reading Robert Caro’s monumental, >1,300-page biography of New York city planner Robert Moses has had a significant impact on my model of how individuals can attain political power, albeit by adding a bunch of detail and drawing my attention to factors I previously wouldn’t have considered rather than by providing evidence for any particular hypothesis.)
Thanks Greg—I really enjoyed this post.
I don’t think that this is what you’re saying, but I think if someone drew the lesson from your post that, when reality is underpowered, there’s no point in doing research into the question, that would be a mistake.
When I look at tiny-n sample sizes for important questions (e.g.: “How have new ideas made major changes to the focus of academic economics?” or “Why have social movements collapsed in the past?”), I generally don’t feel at all like I’m trying to get a p<0.05 ; it feels more like hypothesis generation. So when I find out that Kahneman and Tversky spent 5 years honing the article Prospect Theory into a form that could be published in an economics journal, I think “wow, ok, maybe that’s the sort of time investment that we should be thinking of”. Or when I see social movements collapse because of in-fighting (e.g. pre-Copenhagen UK climate movement), or romantic disputes between leaders (e.g. Objectivism), then—insofar as we just want to take all the easy wins to mitigate catastrophic risks to the EA community—I know that this risk is something to think about and focus on for EA.
For these sorts of areas, the right approach seems to be granular qualitative research—trying to really understand in depth what happened in some other circumstance, and then think through what lessons that entail for the circumstance you’re interested in. I think that, as a matter of fact, EA does this quite a lot when relevant. (E.g. Grace on Szilard, or existing EA discussion of previous social movements). So I think this gives us extra reason to push against the idea that “EA-style analysis” = “quant-y RCT-esque analysis” rather than “whatever research methods are most appropriate to the field at hand”. But even on qualitative research I think the “EA mindset” can be quite distinctive—certainly I think, for example, that a Bayesian-heavy approach to historical questions, often addressing counterfactual questions, and looking at those issues that are most interesting from an EA perspective (e.g. how modern-day values would be different if Christianity had never taken off), would be really quite different from almost all existing historical research.
Thanks, Will!
I definitely agree we can look at qualitative data for hypothesis generation (after all, n=1 is still an existence proof). But I’d generally recommend breadth-first rather than depth-first if we’re trying to adduce considerations.
For many/most sorts of policy decisions although we may find a case of X (some factor) --> Y (some desirable outcome), we can probably also find cases of ¬X --> Y and X --> ¬Y. E.g., contrasting with what happened with prospect theory, there are also cases where someone happened on an important breakthrough with much less time/effort, or where people over-committed to an intellectual dead-end (naturally, partisans of X or ¬X tend to be good at cultivating sets of case-studies which facially support the claim it leads to Y.)
I generally see getting a steer of the correlation of X and Y (so the relative abundance of (¬/)X --> (¬/)Y across a broad reference class as more valuable than determining whether in a given case (even one which seems nearby to the problem we’re interested in) X really was playing a causal role in driving Y. Problems of selection are formidable, but I take the problems of external validity to tend even worse (and worse enough to make the former have a better ratio of insight:resources).
Thus I’d be much more interested to see (e.g.) a wide survey of cases which suggests movements prone to in-fighting tend to be less successful than an in depth look of how in-fighting caused the destruction of a nearby analogue to the EA community. Ditto the ‘macro’ in macrohistory being at least partly about trying to adduce takeaways across history, as well as trying to divine its big contours.
And although I think work like this is worthwhile to attempt, I think in some instances we may come to learn that reality is so underpowered that there’s essentially no point doing research (e.g. maybe large bits of history are just ultra-chaotic, so all we can ever see is noise).
I agree with your points, but from my perspective they somewhat miss the mark.
Specifically, your discussion seems to assume that we have a fixed, exogenously given set of propositions or factors X, Y, …, and that our sole task is to establish relations of correlation and causation between them. In this context, I agree on preferring “wide surveys” etc.
However, in fact, doing research also requires the following tasks:
Identify which factors X, Y, … to consider in the first place.
Refine the meaning of the considered factors X, Y, … by clarifying their conceptual and hypothesized empirical relationships to other factors.
Prioritize which of the myriads of possible correlational or causal relationships between the factors X, Y, … to test.
I think that depth can help with these three tasks in ways in which breadth can’t.
For instance, in Will’s example, my guess is that the main value of considering the history of Objectivism does not come from moving my estimate for the strength of the hypothesis “X = romantic involvement between movement leaders → Y = movement collapses”. Rather, the source of value is including “romantic involvement between movement leaders” into the set of factors I’m considering in the first place. Only then am I able to investigate its relation to outcomes of interests, whether by a “wide survey of cases” or otherwise. Moreover, I might only have learned about the potential relevance of “romantic involvement between movement leaders” by looking at some depth into the history of Objectivism. (I know very little about Objectivism, and so don’t know if this is true in this instance; it’s certainly possible that the issue of romantic involvement between Objectivist leaders is so well known that it would be mentioned in any 5-sentence summary one would encounter during a breadth-first process. But it also seems possible that it’s not, and I’m sure I could come up with examples where the interesting factor was buried deeply.)
My model here squares well with your observation that a “common feature among superforecasters is they read a lot”, and in fact makes a more specific prediction: I expect that we’d find that superforecasters read a fair amount (say, >10% of their total reading) of deep, small-n case studies—for example, historical accounts of a single war, economic policy, or biographies.
[My guess is that my comment is largely just restating Will’s points from his above comment in other words.]
(FWIW, I think some generators of my overall model here are:
Frequently experiencing disagreements I have with others, especially around AI timelines and takeoff scenarios, as noticing a thought like “Uh… I just think your overall model of the world lacks depth and detail.” rather than “Wait, I’ve read about 50 similar cases, and only 10 of them are consistent with your claim”.
Semantic holism, or at least some of the arguments usually given in its favor.
Some intuitive and fuzzy sense that, in the terminology of this Julia Galef post, being a “Hayekian” has worked better for me than being a “Planner”, including for making epistemic progress.
Some intuitive and fuzzy sense of what I’ve gotten out of “deep” versus “broad” reading. E.g. my sense is that reading Robert Caro’s monumental, >1,300-page biography of New York city planner Robert Moses has had a significant impact on my model of how individuals can attain political power, albeit by adding a bunch of detail and drawing my attention to factors I previously wouldn’t have considered rather than by providing evidence for any particular hypothesis.)