There is Little Evidence on Question Decomposition
If we say ” will happen if and only if and and … all happen, so we estimate and and &c, and then multiply them together to estimate …”, do we usually get a probability that is close to ? Does this improve forecasts where one tries to estimate directly?
This type of question decomposition (which one could call multiplicative decomposition) appears to be a relatively common method for forecasting, see Allyn-Feuer & Sanders 2023, Silver 2016, Kaufman 2011, Carlsmith 2022 and Hanson 2011, but there have been conceptual arguments against this technique, see Yudkowsky 2017, AronT 2023 and Gwern 2019, which all argue that it reliably underestimates the probability of events.
What is the empirical evidence for decomposition being a technique that improves forecasts?
Lawrence et al. 2006 summarize the state of research on the question:
Decomposition methods are designed to improve accuracy by splitting the judgmental task into a series of smaller and cognitively less demanding tasks, and then combining the resulting judgements. Armstrong a(2001) distinguishes between decomposition, where the breakdown of the task is multiplicative (e.g. sales forecast=market size forecast×market share forecast), and segmentation, where it is additive (e.g. sales forecast=Northern region forecast+Western region forecast+Central region forecast), but we will use the term for both approaches here. Surprisingly, there has been relatively little research over the last 25 years into the value of decomposition and the conditions under which it is likely to improve accuracy. In only a few cases has the accuracy of forecasts resulting from decomposition been tested against those of control groups making forecasts holistically. One exception is Edmundson (1990) who found that for a time series extrapolation task, obtaining separate estimates of the trend, seasonal and random components and then combining these to obtain forecasts led to greater accuracy than could be obtained from holistic forecasts. Similarly, Webby, O’Connor and Edmundson (2005) showed that, when a time series was disturbed in some periods by several simultaneous special events, accuracy was greater when forecasters were required to make separate estimates for the effect of each event, rather than estimating the combined effects holistically. Armstrong and Collopy (1993) also constructed more accurate forecasts by structuring the selection and weighting of statistical forecasts around the judge’s knowledge of separate factors that influence the trends in time series (causal forces). Many other proposals for decomposition methods have been based on an act of faith that breaking down judgmental tasks is bound to improve accuracy or upon the fact that decomposition yields an audit trail and hence a defensible rationale for the forecasts (Abramson & Finizza, 1991; Bunn & Wright, 1991; Flores, Olson, & Wolfe, 1992; Saaty & Vargas, 1991; Salo & Bunn, 1995; Wolfe & Flores, 1990). Yet, as Goodwin and Wright (1993) point out, decomposition is not guaranteed to improve accuracy and may actually reduce it when the decomposed judgements are psychologically more complex or less familiar than holistic judgements, or where the increased number of judgements required by the decomposition induces fatigue.
(Emphasis mine).
The types of decomposition described here seem quite different from the ones used in the sources above: Decomposed time series are quite dissimilar to multiplied probabilities for binary predictions, and in combination with the conceptual counter-arguments the evidence appears quite weak.
It appears as if a team of a few (let’s say 4) dedicated forecasters could run a small experiment to determine whether multiplicative decomposition for binary forecasts a good method, by randomly spending 20 minutes either making explicitely decomposed forecasts or control forecasts (although the exact method for control needs to be elaborated on). Working in parallel, making 70 forecasts should take less than 6 hours, although it’d be useful to search for more recent literature on the question.
Maybe a slightly better title for this post would be “There is little evidence on question decomposition”? Because the evidence against question decomposition seems equally weak as the evidence for it (based on your source).
Good point, will change the title.
Although I’d consider the counter-arguments against multiplicative decomposition to be decent evidence against it.
Seems like a question where the answer has to be “it depends”.
There are some questions which have a decomposition that helps with estimating them (e.g. Fermi questions like estimating the mass of the Earth), and there are some decompositions that don’t help (for one thing, decompositions always stop somewhere, with components that aren’t further decomposed).
Research could help add texture to “it depends”, sketching out some generalizations about which sorts of decompositions are helpful, but it wouldn’t show that decomposition is just generally good or just generally bad or useless.
Executive summary: The evidence that multiplicative decomposition improves binary probability forecasts is weak.
Key points:
Conceptual arguments question whether decomposition improves accuracy.
Empirical research finds limited evidence for decomposition benefits.
Time series decomposition seems quite different from binary probability decomposition.
A small experiment could help determine if decomposition helps binary forecasts.
Recent literature should be reviewed for more evidence on the technique.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.