The simple answer is no, there is no AI safety GiveWell.
For two reasons:
It is not possible to come up with a result like “donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%”, at anywhere close to the same level of rigor as GiveWell does. (But sounds like you already knew that.)
It would be possible to come up with extremely rough cost-effectiveness estimates. To make the estimates, you’d have to make up some numbers for highly uncertain inputs, readers would widely disagree about what those inputs should be, and changing the inputs would radically change the results. Someone could still make that estimate if they wanted to, but nobody has done it. Nuno Sempere created some cost-effectiveness models in 2021. This is the only effort like this that I’m aware of.
In spite of the problems with cost-effectiveness estimates in AI safety, I still think they’re underrated and that people should put more work into quantifying their beliefs.
BTW I am open to commissions to do this sort of work; DM me if you’re interested. For example I recently made a back-of-the-envelope model comparing the cost-effectiveness of AnimalHarmBench vs. conventional animal advocacy for improving superintelligent AI’s values with respect to animal welfare. That model should give a sense of what kind of result to expect. (Note for example the extremely made-up inputs.)
Thanks for the estimate, very helpful! The cost for AnimalHarmBench in particular was approximately 10x lower than assumed. 30k is my quick central overall estimate, mostly time cost of the main contributors. This excludes previous work, described in the article, and downstream costs. In-house implementation in AI companies could add to the cost a lot but I don’t think this should enter the cost effectiveness calculation for comparison vs other forms of advocacy.
The simple answer is no, there is no AI safety GiveWell.
For two reasons:
It is not possible to come up with a result like “donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%”, at anywhere close to the same level of rigor as GiveWell does. (But sounds like you already knew that.)
It would be possible to come up with extremely rough cost-effectiveness estimates. To make the estimates, you’d have to make up some numbers for highly uncertain inputs, readers would widely disagree about what those inputs should be, and changing the inputs would radically change the results. Someone could still make that estimate if they wanted to,
but nobody has done it.Nuno Sempere created some cost-effectiveness models in 2021. This is the only effort like this that I’m aware of.In spite of the problems with cost-effectiveness estimates in AI safety, I still think they’re underrated and that people should put more work into quantifying their beliefs.
BTW I am open to commissions to do this sort of work; DM me if you’re interested. For example I recently made a back-of-the-envelope model comparing the cost-effectiveness of AnimalHarmBench vs. conventional animal advocacy for improving superintelligent AI’s values with respect to animal welfare. That model should give a sense of what kind of result to expect. (Note for example the extremely made-up inputs.)
Thanks for the estimate, very helpful! The cost for AnimalHarmBench in particular was approximately 10x lower than assumed. 30k is my quick central overall estimate, mostly time cost of the main contributors. This excludes previous work, described in the article, and downstream costs. In-house implementation in AI companies could add to the cost a lot but I don’t think this should enter the cost effectiveness calculation for comparison vs other forms of advocacy.
My impression was Nuno Sempere did a lot of this, e.g. here way back in 2021?
You might also enjoy https://forum.effectivealtruism.org/s/AbrRsXM2PrCrPShuZ and https://github.com/NunoSempere/SoGive-CSER-evaluation-public
My mistake, yes he did do that. I’ll edit my answer.