I do independent research on EA topics. I write about whatever seems important, tractable, and interesting (to me).
I have a website: https://mdickens.me/ Much of the content on my website gets cross-posted to the EA Forum, but I also write about some non-EA stuff over there.
I used to work as a software developer at Affirm.
Copying from a comment I wrote yesterday:
Either ASI has more than zero values locked in, or it’s fully corrigible. If any values at all are locked in, then we need to have a pretty robust understanding of what the consequences of that will be, because we can’t change it ever. Like I don’t think we know how to encode something like “don’t let people do power grabs, but be fully corrigible in every other way”. I don’t know how much that’s downstream of the facts that (1) we don’t know how to encode any values at all and (2) we don’t know how to encode corrigibility, but my intuition is that even if we solve #1 and #2, the problem of “don’t pick incorrigible values that will screw everything up down the road” is still a hard problem.
This is related to Max Harms’ work on CAST. Part of his argument is that pure corrigibility is a more robust target than any set of values because a near miss fails gracefully. Whereas if you try to encode any values at all, a near miss could be catastrophic. He’s talking more about the “AI kills everyone” flavor of catastrophe, which is valid, but what I’m talking about here is more that a near miss could permanently lock us in to a bad (or maybe just not-that-good) future. Different argument but the concern arises for a similar reason—if you’re specifying values, then you have to get the specification right, beyond just ensuring that the AI does what you want.