Yeah, I agree that using this to further fund AI alignment wouldn’t help much. I’m less sure about “hitting the metric”—the thing is, we don’t have any good alignment metric right now. But if we somehow managed to build it, convincing AI labs to hit such a metric seems to me like the most feasible thing to make AI race safer. But yeah, building it would be really hard.
Do you maybe have some other ideas how to make AI race safer? Maybe it is possible to somehow turn them into a continuous value that they could coordinate to increase?
Re: strategic thinking—It may be true that most people won’t care so much for their real leverage (they won’t consider the counterfactual where they donate less), but it definitely isn’t rational. So while it may more or less work, I wouldn’t like this system to give an impression that it tricks people into donating.
And, more importantly, my main hope for this system, is to facilitate cooperation between most powerful agents (powerful states, future supercorporations, TAI systems), rather than individual people. I assume such powerful actors will consider what happens if they do not donate, and selfishly do what’s optimal for them.
Doesn’t the leverage go both directions? Donating causes earlier people to pay more, but also adds leverage for later people? Such that you don’t know if later people would’ve donated unless you also did.
Though maybe that depends on some factors of the system, whether the leverage grows or shrinks with more donations. I think this might hit your worry that it incentivizes donating later cause that makes you pay less, but if actors are proper EV-maximizers won’t they scale up their donation such that the expected payment/leverage is the same?
Seems like there’s lots of strategies at play here, including donating several times. Making it work both for real-life humans with real-life problems and TAI seems ambitious though, they require very different incentives to work and I imagine the design ends up significantly different.
Interesting stuff!
You’re right, the leverage definitely goes two ways. The thing it, this later leverage will tend to be smaller than the one you get immediately. At least, this is how the system behaves in my naive simulations. The exception is, when you expect some very big contributors to join later on—then the later leverage is bigger. So yeah, it’s a complicated situation and I didn’t want to go into that in the post, because it would get too bloated.
And yeah, humans and TAI may have different strategies which complicates it further. This is why I’m not yet fully satisfied with this mechanism, and I will try to simplify it, so that we don’t have to care for all those strategies.
Yeah, I agree that using this to further fund AI alignment wouldn’t help much. I’m less sure about “hitting the metric”—the thing is, we don’t have any good alignment metric right now. But if we somehow managed to build it, convincing AI labs to hit such a metric seems to me like the most feasible thing to make AI race safer. But yeah, building it would be really hard. Do you maybe have some other ideas how to make AI race safer? Maybe it is possible to somehow turn them into a continuous value that they could coordinate to increase?
Re: strategic thinking—It may be true that most people won’t care so much for their real leverage (they won’t consider the counterfactual where they donate less), but it definitely isn’t rational. So while it may more or less work, I wouldn’t like this system to give an impression that it tricks people into donating. And, more importantly, my main hope for this system, is to facilitate cooperation between most powerful agents (powerful states, future supercorporations, TAI systems), rather than individual people. I assume such powerful actors will consider what happens if they do not donate, and selfishly do what’s optimal for them.
Doesn’t the leverage go both directions? Donating causes earlier people to pay more, but also adds leverage for later people? Such that you don’t know if later people would’ve donated unless you also did.
Though maybe that depends on some factors of the system, whether the leverage grows or shrinks with more donations. I think this might hit your worry that it incentivizes donating later cause that makes you pay less, but if actors are proper EV-maximizers won’t they scale up their donation such that the expected payment/leverage is the same?
Seems like there’s lots of strategies at play here, including donating several times. Making it work both for real-life humans with real-life problems and TAI seems ambitious though, they require very different incentives to work and I imagine the design ends up significantly different. Interesting stuff!
You’re right, the leverage definitely goes two ways. The thing it, this later leverage will tend to be smaller than the one you get immediately. At least, this is how the system behaves in my naive simulations. The exception is, when you expect some very big contributors to join later on—then the later leverage is bigger. So yeah, it’s a complicated situation and I didn’t want to go into that in the post, because it would get too bloated.
And yeah, humans and TAI may have different strategies which complicates it further. This is why I’m not yet fully satisfied with this mechanism, and I will try to simplify it, so that we don’t have to care for all those strategies.