I’d guess the funding mechanism has to be somewhat different given the incentives at play with AI x-risk. Specifically, the Omega critiques do not seem bottlenecked by funding but by time and anonymity in ways that can’t be solved with money.
Whats the version/route to value of this that you are excited about? I feel quite skeptical anything like this could work (see my answer on this post) but would be eager for people to change my mind.
I agree that the lack of feedback loops and complicated nature of all of the things is a core challenge in making this useful.
I think you really can’t do better than trying to evaluate people’s track records and the quality of their higher level reasoning, which is essentially the meaning of grantmakers’ statements like “just trust us”.
I do have this sense that we can do better than illegible “just trust us”. For example, in the GHD regime, it seems to me like the quality of the reasoning associated with people championing different interventions could be quite decorrelated with the actual amount of impact—it seems like you need to have numbers on how many people are being impacted by how much.
In my experience even fairly rough BOTECs do shine some light on what we should be doing, and it feels tractable to improve the quality of these.
Whats the version/route to value of this that you are excited about?
The version that I am excited about tries to quantify how impactful longtermists interventions are. I think this could help with both grantmaking, but also people’s career choices. Things that would be cool to estimate the value of include:
A marginal average research hour on each of the alignment agendas.
A marginal person doing EA or AIS community building
A person figuring out AI policy details.
An Op-ed about AI safety in a well respected news site
An extra safety researcher getting hired at a leading lab
An extra person estimating AI timelines, or doing epoch-style research.
… and a ton more things
How to do this estimation? There’s two main ways I feel excited about people trying:
Building a quantitative model, say, a guesstimate, which quantitatively estimates their worldview.
Building an “outside view” model aggregating a bunch of people’s opinions, perhaps some mix of superforecasters and domain experts. One can make this more efficient by just having a ton of public prediction markets, and then picking a small subset at random to have the experts spend a long time thinking through what they think the result should be.
Here are two silly examples of (1).
Suppose that I’m trying to estimate the impact of a project attempting to pass regulation enacting a frontier AI training run, say, similar to the ideas described here. One can factor this into a) the probability of passing legislation and b) if passed, how good would this legislation be. To estimate (a), I look at the base rate of legislation passing, factors that might make this more or less likely to pass given the current climate, etc. All in all, I estimate something like P(pass) = 4%. To estimate (b), I might crowdsource vibes-based opinions of how much doom might change, or I might list out various worldviews, and estimate how good this might be conditional on each of these worldviews. This one has more uncertainty, but I’m going to put expected reduction in P(AI doom) to be around 1%.[1] This naively puts the impact of this at .01 * .04 = .0004 = 400 microdooms.
Now, suppose that I’m trying to estimate the impact of one year of research produced by the mean SERI MATS scholar. Remember that research impact is heavy tailed, so the mean is probably >> the median. The alignment field is roughly 10 years old, and say it had an average of 100 people working on it, so 1000 person years. This seems low, so I’m going to arbitrarily bump up this number to 3000 person years. Now, suppose the progress so far has been 1% of the alignment problem[2] being solved, and suppose that alignment being solved is 50% of what is necessary for an existential win. Let’s say the average MATS scholar is equal to the average person in the alignment field so far. This means that the value of a year of research produced by the average MATS scholar is .5*.01* (1/3000) = 1.6 microdooms.
These are both terrible estimates that get so many things wrong! Probably most of the value of the MATS person’s research now is upskilling to make future research better, or perhaps its in fieldbuilding to excite more people to start working on alignment later, or perhaps something else. Maybe the project doing legislation makes it way more difficult for other efforts to do something in this space later, or maybe it makes a mistake in the regulation and creates a licensing body that immediately gets regulatory captured, so the value isn’t 1% of an existential win, it could be −5%. My claims are:
a)These estimates are better than nothing, and that they at least inform one’s intuition of what is going on.
b) These estimates can be improved radically by working on them. I had a lot of this cached (having thought about stuff like this a fair bit), but this was a very quick comment, and with some dedicated effort, a bunch of cruxes could be uncovered. I feel like one could spend 10 hours on each of these and really narrow down the uncertainty quite a bit.
This is a wild estimate! I’m typing this on the fly but in reality I’d make a guesstimate with a distribution over these parameters because multiplying point estimates instead of distributions induces error.
I’m being horrendously glossy, but by “the alignment problem” I mean something like “the technical solution that tells us how to build a superintelligent AI pointed at humanities reflectively endorsed values”.
Thanks for the detailed response! Your examples were helpful to illustrate your general thinking, and I did update slightly towards thinking some version of this could work, but I am still getting stuck on a few points:
Re. the GHD comparison: firstly to clarify, I meant “quality of reasoning” primarily in terms of the stated theory of change rather than a much more difficult to assess general statement. I would expect the quality of reasoning around a ToC to quite strongly correlate with expected impact. Of course this might not always cash out in actual impact, but this doesn’t necessarily feel relevant for funding longtermist projects due to the inability to get feedback on actual impact. I think most longtermist work focuses on wicked problems, and this makes even progress of existing projects also not necessarily a good proxy for overall success.
For your 2 suggestions of methodology, it seems like (2) would be very useful to donors but would be very costly in expert time and not obviously worth it to me (although I’d be keen to try a small test-run and see) for the marginal gains compared to a grantmakers’ decision.
For method (1), I think that quantification is most useful for clarifying your own intuitions and allowing for some comparison within your own models. So I am certainly pro grantmakers doing their own quick evaluations, but I am not sure how useful it would be as a charity evaluator. I think you still have such irreducibly huge uncertainty bars on some of the key statements you need to get there (especially when you consider counterfactuals), that a final quantification of impact for a longtermist charity is just quite misleading for less well-informed donors.
For example, I’m not sure what a statement like “alignment being solved is 50% of what is necessary for an existential win” means exactly, but I think it does illustrate how messy this is. Does this mean it reduces AI X-risk by half this century? Increases chance of existential security by 50% (any effect on this seems to change an evaluation by orders of magnitude)? I am guessing it means it is 50% of the total work needed to reduce AI risk to ~0, but it seems awfully unclear how to quantify this as there must be some complex distribution of overall risk reduction depending on the amount of other progress made rather than a binary, which feels very hard to quantify. Thus I agree with claim(a), but am skeptical of our ability to make progress in a reasonable space of time for b.
One thing that I would be excited about is more explicit statements by longtermist charities themselves detailing their own BOTECs along the lines of what you are talking about, justifying from their perspective why their project is worth funding. This allows you to clearly understand their worldview, the assumptions they are making, and what a “win” would look like for them, which allows you to make your own evaluation. I think it would be great to make reasoning more explicit and allow for more comparison probably within the AI safety community, but it feels unlikely to be useful for non extremely well-informed donors.
I’m currently a guest fund manager at LTFF and, personally, would love to fund a project like this if I thought that the execution would be good.
If you are interested in working on this and need funding, I’d be excited about you reaching out, or for you just to apply to the LTFF.
I’d guess the funding mechanism has to be somewhat different given the incentives at play with AI x-risk. Specifically, the Omega critiques do not seem bottlenecked by funding but by time and anonymity in ways that can’t be solved with money.
Whats the version/route to value of this that you are excited about? I feel quite skeptical anything like this could work (see my answer on this post) but would be eager for people to change my mind.
I agree that the lack of feedback loops and complicated nature of all of the things is a core challenge in making this useful.
I do have this sense that we can do better than illegible “just trust us”. For example, in the GHD regime, it seems to me like the quality of the reasoning associated with people championing different interventions could be quite decorrelated with the actual amount of impact—it seems like you need to have numbers on how many people are being impacted by how much.
In my experience even fairly rough BOTECs do shine some light on what we should be doing, and it feels tractable to improve the quality of these.
The version that I am excited about tries to quantify how impactful longtermists interventions are. I think this could help with both grantmaking, but also people’s career choices. Things that would be cool to estimate the value of include:
A marginal average research hour on each of the alignment agendas.
A marginal person doing EA or AIS community building
A person figuring out AI policy details.
An Op-ed about AI safety in a well respected news site
An extra safety researcher getting hired at a leading lab
An extra person estimating AI timelines, or doing epoch-style research.
… and a ton more things
How to do this estimation? There’s two main ways I feel excited about people trying:
Building a quantitative model, say, a guesstimate, which quantitatively estimates their worldview.
Building an “outside view” model aggregating a bunch of people’s opinions, perhaps some mix of superforecasters and domain experts. One can make this more efficient by just having a ton of public prediction markets, and then picking a small subset at random to have the experts spend a long time thinking through what they think the result should be.
Here are two silly examples of (1).
Suppose that I’m trying to estimate the impact of a project attempting to pass regulation enacting a frontier AI training run, say, similar to the ideas described here. One can factor this into a) the probability of passing legislation and b) if passed, how good would this legislation be. To estimate (a), I look at the base rate of legislation passing, factors that might make this more or less likely to pass given the current climate, etc. All in all, I estimate something like P(pass) = 4%. To estimate (b), I might crowdsource vibes-based opinions of how much doom might change, or I might list out various worldviews, and estimate how good this might be conditional on each of these worldviews. This one has more uncertainty, but I’m going to put expected reduction in P(AI doom) to be around 1%.[1] This naively puts the impact of this at .01 * .04 = .0004 = 400 microdooms.
Now, suppose that I’m trying to estimate the impact of one year of research produced by the mean SERI MATS scholar. Remember that research impact is heavy tailed, so the mean is probably >> the median. The alignment field is roughly 10 years old, and say it had an average of 100 people working on it, so 1000 person years. This seems low, so I’m going to arbitrarily bump up this number to 3000 person years. Now, suppose the progress so far has been 1% of the alignment problem[2] being solved, and suppose that alignment being solved is 50% of what is necessary for an existential win. Let’s say the average MATS scholar is equal to the average person in the alignment field so far. This means that the value of a year of research produced by the average MATS scholar is .5*.01* (1/3000) = 1.6 microdooms.
These are both terrible estimates that get so many things wrong! Probably most of the value of the MATS person’s research now is upskilling to make future research better, or perhaps its in fieldbuilding to excite more people to start working on alignment later, or perhaps something else. Maybe the project doing legislation makes it way more difficult for other efforts to do something in this space later, or maybe it makes a mistake in the regulation and creates a licensing body that immediately gets regulatory captured, so the value isn’t 1% of an existential win, it could be −5%. My claims are:
a)These estimates are better than nothing, and that they at least inform one’s intuition of what is going on.
b) These estimates can be improved radically by working on them. I had a lot of this cached (having thought about stuff like this a fair bit), but this was a very quick comment, and with some dedicated effort, a bunch of cruxes could be uncovered. I feel like one could spend 10 hours on each of these and really narrow down the uncertainty quite a bit.
This is a wild estimate! I’m typing this on the fly but in reality I’d make a guesstimate with a distribution over these parameters because multiplying point estimates instead of distributions induces error.
I’m being horrendously glossy, but by “the alignment problem” I mean something like “the technical solution that tells us how to build a superintelligent AI pointed at humanities reflectively endorsed values”.
Thanks for the detailed response! Your examples were helpful to illustrate your general thinking, and I did update slightly towards thinking some version of this could work, but I am still getting stuck on a few points:
Re. the GHD comparison: firstly to clarify, I meant “quality of reasoning” primarily in terms of the stated theory of change rather than a much more difficult to assess general statement. I would expect the quality of reasoning around a ToC to quite strongly correlate with expected impact. Of course this might not always cash out in actual impact, but this doesn’t necessarily feel relevant for funding longtermist projects due to the inability to get feedback on actual impact. I think most longtermist work focuses on wicked problems, and this makes even progress of existing projects also not necessarily a good proxy for overall success.
For your 2 suggestions of methodology, it seems like (2) would be very useful to donors but would be very costly in expert time and not obviously worth it to me (although I’d be keen to try a small test-run and see) for the marginal gains compared to a grantmakers’ decision.
For method (1), I think that quantification is most useful for clarifying your own intuitions and allowing for some comparison within your own models. So I am certainly pro grantmakers doing their own quick evaluations, but I am not sure how useful it would be as a charity evaluator. I think you still have such irreducibly huge uncertainty bars on some of the key statements you need to get there (especially when you consider counterfactuals), that a final quantification of impact for a longtermist charity is just quite misleading for less well-informed donors.
For example, I’m not sure what a statement like “alignment being solved is 50% of what is necessary for an existential win” means exactly, but I think it does illustrate how messy this is. Does this mean it reduces AI X-risk by half this century? Increases chance of existential security by 50% (any effect on this seems to change an evaluation by orders of magnitude)? I am guessing it means it is 50% of the total work needed to reduce AI risk to ~0, but it seems awfully unclear how to quantify this as there must be some complex distribution of overall risk reduction depending on the amount of other progress made rather than a binary, which feels very hard to quantify. Thus I agree with claim(a), but am skeptical of our ability to make progress in a reasonable space of time for b.
One thing that I would be excited about is more explicit statements by longtermist charities themselves detailing their own BOTECs along the lines of what you are talking about, justifying from their perspective why their project is worth funding. This allows you to clearly understand their worldview, the assumptions they are making, and what a “win” would look like for them, which allows you to make your own evaluation. I think it would be great to make reasoning more explicit and allow for more comparison probably within the AI safety community, but it feels unlikely to be useful for non extremely well-informed donors.