Thanks for the detailed response! Your examples were helpful to illustrate your general thinking, and I did update slightly towards thinking some version of this could work, but I am still getting stuck on a few points:
Re. the GHD comparison: firstly to clarify, I meant “quality of reasoning” primarily in terms of the stated theory of change rather than a much more difficult to assess general statement. I would expect the quality of reasoning around a ToC to quite strongly correlate with expected impact. Of course this might not always cash out in actual impact, but this doesn’t necessarily feel relevant for funding longtermist projects due to the inability to get feedback on actual impact. I think most longtermist work focuses on wicked problems, and this makes even progress of existing projects also not necessarily a good proxy for overall success.
For your 2 suggestions of methodology, it seems like (2) would be very useful to donors but would be very costly in expert time and not obviously worth it to me (although I’d be keen to try a small test-run and see) for the marginal gains compared to a grantmakers’ decision.
For method (1), I think that quantification is most useful for clarifying your own intuitions and allowing for some comparison within your own models. So I am certainly pro grantmakers doing their own quick evaluations, but I am not sure how useful it would be as a charity evaluator. I think you still have such irreducibly huge uncertainty bars on some of the key statements you need to get there (especially when you consider counterfactuals), that a final quantification of impact for a longtermist charity is just quite misleading for less well-informed donors.
For example, I’m not sure what a statement like “alignment being solved is 50% of what is necessary for an existential win” means exactly, but I think it does illustrate how messy this is. Does this mean it reduces AI X-risk by half this century? Increases chance of existential security by 50% (any effect on this seems to change an evaluation by orders of magnitude)? I am guessing it means it is 50% of the total work needed to reduce AI risk to ~0, but it seems awfully unclear how to quantify this as there must be some complex distribution of overall risk reduction depending on the amount of other progress made rather than a binary, which feels very hard to quantify. Thus I agree with claim(a), but am skeptical of our ability to make progress in a reasonable space of time for b.
One thing that I would be excited about is more explicit statements by longtermist charities themselves detailing their own BOTECs along the lines of what you are talking about, justifying from their perspective why their project is worth funding. This allows you to clearly understand their worldview, the assumptions they are making, and what a “win” would look like for them, which allows you to make your own evaluation. I think it would be great to make reasoning more explicit and allow for more comparison probably within the AI safety community, but it feels unlikely to be useful for non extremely well-informed donors.
Thanks for the detailed response! Your examples were helpful to illustrate your general thinking, and I did update slightly towards thinking some version of this could work, but I am still getting stuck on a few points:
Re. the GHD comparison: firstly to clarify, I meant “quality of reasoning” primarily in terms of the stated theory of change rather than a much more difficult to assess general statement. I would expect the quality of reasoning around a ToC to quite strongly correlate with expected impact. Of course this might not always cash out in actual impact, but this doesn’t necessarily feel relevant for funding longtermist projects due to the inability to get feedback on actual impact. I think most longtermist work focuses on wicked problems, and this makes even progress of existing projects also not necessarily a good proxy for overall success.
For your 2 suggestions of methodology, it seems like (2) would be very useful to donors but would be very costly in expert time and not obviously worth it to me (although I’d be keen to try a small test-run and see) for the marginal gains compared to a grantmakers’ decision.
For method (1), I think that quantification is most useful for clarifying your own intuitions and allowing for some comparison within your own models. So I am certainly pro grantmakers doing their own quick evaluations, but I am not sure how useful it would be as a charity evaluator. I think you still have such irreducibly huge uncertainty bars on some of the key statements you need to get there (especially when you consider counterfactuals), that a final quantification of impact for a longtermist charity is just quite misleading for less well-informed donors.
For example, I’m not sure what a statement like “alignment being solved is 50% of what is necessary for an existential win” means exactly, but I think it does illustrate how messy this is. Does this mean it reduces AI X-risk by half this century? Increases chance of existential security by 50% (any effect on this seems to change an evaluation by orders of magnitude)? I am guessing it means it is 50% of the total work needed to reduce AI risk to ~0, but it seems awfully unclear how to quantify this as there must be some complex distribution of overall risk reduction depending on the amount of other progress made rather than a binary, which feels very hard to quantify. Thus I agree with claim(a), but am skeptical of our ability to make progress in a reasonable space of time for b.
One thing that I would be excited about is more explicit statements by longtermist charities themselves detailing their own BOTECs along the lines of what you are talking about, justifying from their perspective why their project is worth funding. This allows you to clearly understand their worldview, the assumptions they are making, and what a “win” would look like for them, which allows you to make your own evaluation. I think it would be great to make reasoning more explicit and allow for more comparison probably within the AI safety community, but it feels unlikely to be useful for non extremely well-informed donors.