Yeah, I agree that using this to further fund AI alignment wouldn’t help much. I’m less sure about “hitting the metric”—the thing is, we don’t have any good alignment metric right now. But if we somehow managed to build it, convincing AI labs to hit such a metric seems to me like the most feasible thing to make AI race safer. But yeah, building it would be really hard. Do you maybe have some other ideas how to make AI race safer? Maybe it is possible to somehow turn them into a continuous value that they could coordinate to increase?
Re: strategic thinking—It may be true that most people won’t care so much for their real leverage (they won’t consider the counterfactual where they donate less), but it definitely isn’t rational. So while it may more or less work, I wouldn’t like this system to give an impression that it tricks people into donating. And, more importantly, my main hope for this system, is to facilitate cooperation between most powerful agents (powerful states, future supercorporations, TAI systems), rather than individual people. I assume such powerful actors will consider what happens if they do not donate, and selfishly do what’s optimal for them.
Awesome!
I’d love to see the idea tested in a real world situation. I’d be happy to help with building this system if you want :)