Maybe add ways working on it can backfire, either explicitly in the model, or by telling people to take expectations with potentials for backfire in mind, and allow for the possibility that you do more harm than good in the final estimate.
How would you model these effects? I have two ideas :
add a section with how much you speed up AGI (but I’m not sure how I could break this down further)
add a section with how likely it would be for you to take on resources away from other actions that could be used to save the world (either through better AI safety, or something else)
Is one of them what you had in mind? Do you have other ideas?
Ya, those were some of the kinds of things I had in mind, and also the possibility of contributing to or reducing s-risks, and adjustable weights to s-risks vs extinction:
Because of the funding situation, taking resources away from other actions to reduce extinction risks would probably mostly come in people’s time, e.g. the time of the people supervising you, reading your work or otherwise engaging with you. If an AI safety org hires you or you get a grant to work on something, then presumably they think you’re worth the time, though! And one more person going through the hiring or grant process is not that costly for those managing it.