Thanks again Phil for taking the read this through and for the in-depth feedback.
I hope to take some time to create a follow-up post, working in your suggestions and corrections as external updates (e.g. to the parameters of lower total AI risk funding, shorter Metaculus timelines).
I don’t know if the “only one big actor” simplification holds closely enough in the AI safety case for the “optimization” approach to be a better guide, but it may well be.
This is a fair point.
The initial motivator for the project was for AI s-risk funding, of which there’s pretty much one large funder (and not much work is done on AI s-risk reduction outside of people and organizations and people outside the effective altruism community) though this result is entirely on AI existential risk, which is less well modeled as a single actor.
My intuition is that the “one big actor” does work sufficiently well for the AI risk community given the shared goal (avoid an AI existential catastrophe) and my guess that a lot of the AI risk done by the community doesn’t change the behaviour of AI labs much (i.e. it could be that they choose to put more effort into capabilities over safety because of work done by the AI risk community, but I’m pretty sure this isn’t happening).
For example, the value of spending after vs. before the “fire alarm” seems to depend erroneously on the choice of units of money. (This is the second bit of red-highlighted text in the linked Google doc.) So I’d encourage someone interested in quantifying the optimal spending schedule on AI safety to start with this model, but then comb over the details very carefully.
To comment on this particular error (though not to say that other errors Phil points to are not also unproblematic—I’ve yet to properly go through them), for what it’s worth, the main results of the post suppose zero post fire alarm spending[1] and (fortunately) since in our results we use units of millions of dollars and take the initial capital to be on the order of 1000 $m, I don’t think we face this problem of smaller η having the reverse than desired effect for
In a future version I expect I’ll just take the post-fire alarm returns to spending to use the same returns exponent η from before the fire alarm but have some multiplier—i.e. xη returns to spending before the fire-alarm and kxη afterwards.
Thanks again Phil for taking the read this through and for the in-depth feedback.
I hope to take some time to create a follow-up post, working in your suggestions and corrections as external updates (e.g. to the parameters of lower total AI risk funding, shorter Metaculus timelines).
This is a fair point.
The initial motivator for the project was for AI s-risk funding, of which there’s pretty much one large funder (and not much work is done on AI s-risk reduction outside of people and organizations and people outside the effective altruism community) though this result is entirely on AI existential risk, which is less well modeled as a single actor.
My intuition is that the “one big actor” does work sufficiently well for the AI risk community given the shared goal (avoid an AI existential catastrophe) and my guess that a lot of the AI risk done by the community doesn’t change the behaviour of AI labs much (i.e. it could be that they choose to put more effort into capabilities over safety because of work done by the AI risk community, but I’m pretty sure this isn’t happening).
To comment on this particular error (though not to say that other errors Phil points to are not also unproblematic—I’ve yet to properly go through them), for what it’s worth, the main results of the post suppose zero post fire alarm spending[1] and (fortunately) since in our results we use units of millions of dollars and take the initial capital to be on the order of 1000 $m, I don’t think we face this problem of smaller η having the reverse than desired effect for
In a future version I expect I’ll just take the post-fire alarm returns to spending to use the same returns exponent η from before the fire alarm but have some multiplier—i.e. xη returns to spending before the fire-alarm and kxη afterwards.
Though if one thinks there will many good opportunities to spend after a fire alarm, our main no-fire-alarm results would likely be an overestimate