Yes, this is the main difference compared to forecasters being randomly assigned to a question.
Michael_Wiebe
I don’t think you can learn much from observational data like this about the causal effect of the number of forecasters on performance. Do you have any natural experiments that you could exploit? (ie. some ‘random’ factor affecting the number of forecasters, that’s not correlated with forecaster skill.) Or can you run a randomized experiment?
It sounds like you’re doing subsampling. Bootstrapping is random sampling with replacement.
If, for example, we kept increasing the size of the sample we draw, then eventually the variance would be guaranteed to go to zero (when the sample size equals the total number of forecasters and there is only one possible sample we can draw).
With bootstrapping, there are possible draws when the bootstrap sample size is equal to the actual sample size . (And you could choose a bootstrap sample size .)
Imagine two cities. In one, it is safe for women to walk around at night and in the second it is not. I think the former city is better even if women don’t want to walk around at night, because I think that option is valuable to people even if they do not take it. Preference-satisfaction approaches miss this.
Don’t people also have preferences for having more options?
- Feb 1, 2023, 2:14 PM; 4 points) 's comment on The Capability Approach to Human Welfare by (
I’m surprised the Nigerian business plan competition was not included. (Chris Blattman writeup from 2015 here: “Is this the most effective development program in history?”.)
I say “They were arguably right, ex ante, to advocate for and participate in a project to deter the Nazi use of nuclear weapons.” Actions in 1939-42 or around 1957-1959 are defensible.
Given this, is it accurate to call Einstein’s letter a ‘tragedy’? The tragic part was continuing the nuclear program after the German program was shut down.
I suppose sprints start out as jogs.
2 August 1939: Einstein-Szilárd letter to Roosevelt advocates for setting up a Manhattan Project. [...]
June 1942: Hitler decides against an atomic program for practical reasons.
Is it accurate to say that the US and Germans were in a nuclear weapons race until 1942? So perhaps the takeaway is “if you’re in a race, make sure to keep checking that the race is still on”.
How much would I personally have to reduce X-risk to make this the optimal decision? Well, that’s simple. We just calculate:
25 billion * X = 20,000 lives saved
X = 20,000 / 25 billion
X = 0.0000008
That’s 0.00008% in x-risk reduction for a single individual.
I’m not sure I follow this exercise. Here’s how I’m thinking about it:
Option A: spend your career on malaria.
Cost: one career
Payoff: save 20k lives with probability 1.
Option B: spend your career on x-risk.
Cost: one career
Payoff: save 25B lives with probability p (=P(prevent extinction)), save 0 lives with probability 1-p.
Expected payoff: 25B*p.
Since the costs are the same, we can ignore them. Then you’re indifferent between A and B if p=8x10^-7, and B is better if p>8x10^-7.
But I’m not sure how this maps to a reduction in P(extinction).
How much would I personally have to reduce X-risk to make this the optimal decision?
Shouldn’t this exercise start with the current P(extinction), and then calculate how much you need to reduce that probability? I think your approach is comparing two outcomes: save 25B lives with probability p, or save 20,000 lives with probability 1. Then the first option has higher expected value if p>20000/25B. But this isn’t answering your question of personally reducing x-risk.
Also, I think you should calculate marginal expected value, ie., the value of additional resources conditional on the resources already allocated, to account for diminishing marginal returns.
Adding to the causal evidence, there’s a 2019 paper that uses wind direction as an instrumental variable for PM2.5. They find that IV > OLS, implying that observational studies are biased downwards:
Comparing the OLS estimates to the IV estimates in Tables 2 and 3 provides strong evidence that observational studies of the relationship between air pollution and health outcomes suffer from significant bias: virtually all our OLS estimates are smaller than the corresponding IV estimates. If the only source of bias were classical measurement error, which causes attenuation, we would not expect to see significantly negative OLS estimates. Thus, other biases, such as changes in economic activity that are correlated with both hospitalization patterns and pollution, appear to be a concern even when working with high-frequency data.
They also compare their results to the epidemiology literature:
To facilitate comparison to two studies from the epidemiological literature with settings similar to ours, we have also estimated the effect of PM 2.5 on one-day mortality and hospitalizations [...] Using data from 27 large US cities from 1997 to 2002, Franklin, Zeka, and Schwartz (2007) reports that a 10 μg/m3 increase in daily PM 2.5 exposure increases all-cause mortality for those aged 75 and above by 1.66 percent. Our one-day IV estimate for 75+ year-olds [...] is an increase of 2.97 percent [...]
On the hospitalization side, Dominici et al. (2006) uses Medicare claims data from US urban counties from 1999 to 2002 and finds an increase in elderly hospitalization rates associated with a 10 μg/m3 increase in daily PM 2.5 exposure ranging from 0.44 percent (for ischemic heart disease hospitalizations) to 1.28 percent (for heart failure hospitalizations). We estimate that a 10 μg/m3 increase in daily PM 2.5 increases one-day all-cause hospitalizations by 2.22 percent [...], which is 70 percent larger than the heart failure estimate and over five times larger than the ischemic heart disease estimate. Overall, these comparisons suggest that observational studies may systematically underestimate the health effects of acute pollution exposure.
Related, John von Neumann on x-risk:
Finally and, I believe, most importantly, prohibition of technology (invention and development, which are hardly separable from underlying scientific inquiry), is contrary to the whole ethos of the industrial age. It is irreconcilable with a major mode of intellectuality as our age understands it. It is hard to imagine such a restraint successfully imposed in our civilization. Only if those disasters that we fear had already occurred, only if humanity were already completely disillusioned about technological civilization, could such a step be taken. But not even the disasters of recent wars have produced that degree of disillusionment, as is proved by the phenomenal resiliency with which the industrial way of life recovered even—or particularly—in the worst-hit areas. The technological system retains enormous vitality, probably more than ever before, and the counsel of restraint is unlikely to be heeded.
What safeguard remains? Apparently only day-to-day — or perhaps year-to-year — opportunistic measures, along sequence of small, correct decisions. [...] Under present conditions it is unreasonable to expect a novel cure-all. For progress there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment.
I didn’t suggest otherwise.
It sounds like you’re arguing that we should estimate ‘good done/additional resources’ directly (via Fermi estimates), instead of indirectly using the ITN framework. But shouldn’t these give the same answer?
And even when you can multiply the three quantities together, I feel like speaking in terms of importance, neglectedness and tractability might make you feel that there is no total ordering of intervention (“some have higher importance, some have higher tractability, whether you prefer one or the other is a matter a personal taste”)
I don’t follow this. If you multiply I*T*N and get ‘good done/additional resources’, how is that not an ordering?
There seems to be a “intentions don’t matter, results do” lesson that’s relevant here. Intending to solve AI alignment is secondary, and doesn’t mean that you’re making progress on the problem.
And we don’t want people saying “I’m working on AI” just for the social status, if that’s not their comparative advantage and they’re not actually being productive.
Hm, then I find necessitarianism quite strange. In practice, how do we identify people who exist regardless of our choices?
The longtermist claim is that because humans could in theory live for hundreds of millions or billions of years, and we have potential to get the risk of extinction very almost to 0, the biggest effects of our actions are almost all in how they affect the far future. Therefore, if we can find a way to predictably improve the far future this is likely to be, certainly from a utilitarian perspective, the best thing we can do.
I don’t find this framing very useful. The importance-tractability-crowdedness framework gives us a sophisticated method for evaluating causes (allocate resources according to marginal utility per dollar), which is flexible enough to account for diminishing returns as funding increases.
But the longtermist framework collapses this down to a binary: is this the best intervention or not?
Because of this heavy tailed distribution of interventions
Is it actually heavy-tailed? It looks like an ordered bar chart, not a histogram, so it’s hard to tell what the tails are like.
What is the ‘policy relevance’ of answering the title question? Ie. if the answer is “yes, forecaster count strongly increases accuracy”, how would you go about increasing the number of forecasters?