The fact that humans can’t assign negative penalties if they’re dead is a good point.
I think you need to say more about what the system is being trained for (and how we train it for that).
I’m definitely just drawing analogies from my (imperfect) understanding of how LLMs/art AIs work here. How do you assume that AI labs will (try to) train more agenty and/more superintelligent AIs?
I think training AGI systems by giving them vast amounts of human-derived data is a terrible idea, and cuts out many of the most promising tactics for aligning AGI systems
How would you do it instead?
Re ‘constraint’, that’s maybe the wrong word: I meant less that AIs would limit their impact, more...like, if I was close-to-omnipotent, I wouldn’t maximize/optimize for just one thing, but probably lots of things that I value. You could frame this as me maximizing my utility, but my point is, it wouldn’t look like paperclip maximizing. AIs might not be like humans, but humans are the most intelligent thing we know of, so it doesn’t seem ridiculous to suppose that complex/intelligent entities tend to have complex/multiple goals.
Thanks, this is helpful.
The fact that humans can’t assign negative penalties if they’re dead is a good point.
I’m definitely just drawing analogies from my (imperfect) understanding of how LLMs/art AIs work here. How do you assume that AI labs will (try to) train more agenty and/more superintelligent AIs?
How would you do it instead?
Re ‘constraint’, that’s maybe the wrong word: I meant less that AIs would limit their impact, more...like, if I was close-to-omnipotent, I wouldn’t maximize/optimize for just one thing, but probably lots of things that I value. You could frame this as me maximizing my utility, but my point is, it wouldn’t look like paperclip maximizing. AIs might not be like humans, but humans are the most intelligent thing we know of, so it doesn’t seem ridiculous to suppose that complex/intelligent entities tend to have complex/multiple goals.
I’ll check out the links you suggest!