The obvious response here is that I don’t think longtermist questions are more amenable to explicit quantitative modeling than global poverty, but I’m even more suspicious of other methodologies here.
Yeah, I’m just way, way more suspicious of quantitative modeling relative to other methodologies for most longtermist questions.
I think we might just be arguing about different things here?
Makes sense, I’m happy to ignore those sorts of methods for the purposes of this discussion.
Medicine is less amenable to empirical testing than physics, but that doesn’t mean that clinical intuition is a better source of truth for the outcomes of drugs than RCTs.
You can’t run an RCT on arms races between countries, whether or not AGI leads to extinction, whether totalitarian dictatorships are stable, whether civilizational collapse would be a permanent trajectory change vs. a temporary blip, etc.
What’s the actual evidence for this?
It just seems super obvious in almost every situation that comes up? I also don’t really know how you expect to get evidence; it seems like you can’t just “run an RCT” here, when a typical quantitative model for a longtermist question takes ~a year to develop (and that’s in situations that are selected for being amenable to quantitative modeling).
For example, here’s a subset of the impact-related factors I considered when I was considering where to work:
Lack of non-xrisk-related demands on my time
Freedom to work on what I want
Ability to speak publicly
Career flexibility
Salary
I think incorporating just these factors into a quantitative model is a hell of an ask (and there are others I haven’t listed here—I haven’t even included the factors for the academia vs industry question). A selection of challenges:
I need to make an impact calculation for the research I would do by default.
I need to make that impact calculation comparable with donations (so somehow putting them in the same units).
I need to predict the counterfactual research I would do at each of the possible organizations if I didn’t have the freedom to work on what I wanted, and quantify its impact, again in similar units.
I need to model the relative importance of technical research that tries to solve the problem vs. communication.
To model the benefits of communication, I need to model field-building benefits, legitimizing benefits, and the benefit of convincing key future decision-makers.
I need to quantify the probability of various kinds of “risks” (the org I work at shuts down, we realize AI risk isn’t actually a problem, a different AI lab reveals that they’re going to get to AGI in 2 years, unknown unknowns) in order to quantify the importance of career flexibility.
I think just getting a framework that incorporates all of these things is already a Herculean effort that really isn’t worth it, and even if you did make such a framework, I would be shocked if you could set the majority of the inputs based on actually good reference classes rather than just “what my gut says”. (And that’s all assuming I don’t notice a bunch more effects I failed to mention initially that my intuitions were taking into account but that I hadn’t explicitly verbalized.)
It seems blatantly obvious that the correct choice here is not to try to get to the point of “quantitative model that captures the large majority of the relevant considerations with inputs that have some basis in reference classes / other forms of legible evidence”, and I’d be happy to take a 100:1 bet that you wouldn’t be able to produce a model that meets that standard (as I evaluate it) in 1000 person-hours.
I have similar reactions for most other cost effectiveness analyses in longtermism. (For quantitative modeling in general, it depends on the question, but I expect I would still often have this reaction.)
Eg, weird to use median staff member’s views as a proxy for truth
If you mean that the weighting on saving vs. improving lives comes from the median staff member, note that GiveWell has been funding research that aims to set these weights in a manner with more legible evidence, because the evidence didn’t exist. In some sense this is my point—that if you want to get legible evidence, you need to put in large amounts of time and money in order to generate that evidence; this problem is worse in the longtermist space and is rarely worth it.
Yeah, I’m just way, way more suspicious of quantitative modeling relative to other methodologies for most longtermist questions.
Makes sense, I’m happy to ignore those sorts of methods for the purposes of this discussion.
You can’t run an RCT on arms races between countries, whether or not AGI leads to extinction, whether totalitarian dictatorships are stable, whether civilizational collapse would be a permanent trajectory change vs. a temporary blip, etc.
It just seems super obvious in almost every situation that comes up? I also don’t really know how you expect to get evidence; it seems like you can’t just “run an RCT” here, when a typical quantitative model for a longtermist question takes ~a year to develop (and that’s in situations that are selected for being amenable to quantitative modeling).
For example, here’s a subset of the impact-related factors I considered when I was considering where to work:
Lack of non-xrisk-related demands on my time
Freedom to work on what I want
Ability to speak publicly
Career flexibility
Salary
I think incorporating just these factors into a quantitative model is a hell of an ask (and there are others I haven’t listed here—I haven’t even included the factors for the academia vs industry question). A selection of challenges:
I need to make an impact calculation for the research I would do by default.
I need to make that impact calculation comparable with donations (so somehow putting them in the same units).
I need to predict the counterfactual research I would do at each of the possible organizations if I didn’t have the freedom to work on what I wanted, and quantify its impact, again in similar units.
I need to model the relative importance of technical research that tries to solve the problem vs. communication.
To model the benefits of communication, I need to model field-building benefits, legitimizing benefits, and the benefit of convincing key future decision-makers.
I need to quantify the probability of various kinds of “risks” (the org I work at shuts down, we realize AI risk isn’t actually a problem, a different AI lab reveals that they’re going to get to AGI in 2 years, unknown unknowns) in order to quantify the importance of career flexibility.
I think just getting a framework that incorporates all of these things is already a Herculean effort that really isn’t worth it, and even if you did make such a framework, I would be shocked if you could set the majority of the inputs based on actually good reference classes rather than just “what my gut says”. (And that’s all assuming I don’t notice a bunch more effects I failed to mention initially that my intuitions were taking into account but that I hadn’t explicitly verbalized.)
It seems blatantly obvious that the correct choice here is not to try to get to the point of “quantitative model that captures the large majority of the relevant considerations with inputs that have some basis in reference classes / other forms of legible evidence”, and I’d be happy to take a 100:1 bet that you wouldn’t be able to produce a model that meets that standard (as I evaluate it) in 1000 person-hours.
I have similar reactions for most other cost effectiveness analyses in longtermism. (For quantitative modeling in general, it depends on the question, but I expect I would still often have this reaction.)
If you mean that the weighting on saving vs. improving lives comes from the median staff member, note that GiveWell has been funding research that aims to set these weights in a manner with more legible evidence, because the evidence didn’t exist. In some sense this is my point—that if you want to get legible evidence, you need to put in large amounts of time and money in order to generate that evidence; this problem is worse in the longtermist space and is rarely worth it.