Not sure how to share a file over the ea forum, if you direct message me your email address or message me on signal I can send it to you (or anyone else reading, feel free to do either of those).
In terms of self alignment, it seems pretty geared towards human psychology, so there’s no guarantee it would work for AI, but one strategy discussed in it is recognizing that {what you choose now will affect what decisions you’ll make in the future} can make it easier to re-prioritize the long term. For example, if you’re trying to lose weight and are tempted to eat a cookie, it may be very difficult to resist just by thinking about the end goal, but if you think “if I eat it now, it’ll be more likely that I’ll give in to temptation next time too” then that can help make it easier to resist. Another strategy is to find ways to make limiting decisions when the bad option and good option are both in the future (it’s easier to not buy cookies at the store than it is to resist them when they’re in the pantry. It’s easier to not get them from the pantry than to resist them sitting next to you).
Obviously the cookie is an amoral choice, but a morally relevant one like the dam example might be “If we’re the type of society that will build the dam despite the long term costs (which should outweigh the short term gain), then we will be more likely to make bad-in-the-long-run choices in the future, which will lead to much worse outcomes overall”, and that future of compounded bad decisions might be bad enough to tip the scale to choosing to make the long-run-good choices more often.
Thanks — I’ll DM you an address; I’d love to read the full book.
And I really like the cookie example: it perfectly illustrates how self-prediction turns a small temptation into a long-run coordination problem with our future selves. That mechanism scales up neatly to the dam scenario: when a society “eats the cookie” today, it teaches its future selves to discount tomorrow’s costs as well.
Those two Ainslie strategies — self-prediction and early pre-commitment — map nicely onto Time × Scope: they effectively raise the future’s weight (δ) without changing the math. I’m keen to plug his hyperbolic curve into the model and see how it reshapes optimal commitment devices for individuals and, eventually, AI systems.
Thanks again for offering the file and for the clear, memorable examples!
Not sure how to share a file over the ea forum, if you direct message me your email address or message me on signal I can send it to you (or anyone else reading, feel free to do either of those).
In terms of self alignment, it seems pretty geared towards human psychology, so there’s no guarantee it would work for AI, but one strategy discussed in it is recognizing that {what you choose now will affect what decisions you’ll make in the future} can make it easier to re-prioritize the long term. For example, if you’re trying to lose weight and are tempted to eat a cookie, it may be very difficult to resist just by thinking about the end goal, but if you think “if I eat it now, it’ll be more likely that I’ll give in to temptation next time too” then that can help make it easier to resist. Another strategy is to find ways to make limiting decisions when the bad option and good option are both in the future (it’s easier to not buy cookies at the store than it is to resist them when they’re in the pantry. It’s easier to not get them from the pantry than to resist them sitting next to you).
Obviously the cookie is an amoral choice, but a morally relevant one like the dam example might be “If we’re the type of society that will build the dam despite the long term costs (which should outweigh the short term gain), then we will be more likely to make bad-in-the-long-run choices in the future, which will lead to much worse outcomes overall”, and that future of compounded bad decisions might be bad enough to tip the scale to choosing to make the long-run-good choices more often.
Thanks — I’ll DM you an address; I’d love to read the full book.
And I really like the cookie example: it perfectly illustrates how self-prediction turns a small temptation into a long-run coordination problem with our future selves. That mechanism scales up neatly to the dam scenario: when a society “eats the cookie” today, it teaches its future selves to discount tomorrow’s costs as well.
Those two Ainslie strategies — self-prediction and early pre-commitment — map nicely onto Time × Scope: they effectively raise the future’s weight (δ) without changing the math. I’m keen to plug his hyperbolic curve into the model and see how it reshapes optimal commitment devices for individuals and, eventually, AI systems.
Thanks again for offering the file and for the clear, memorable examples!