Yeah, kinda hoping 1) there exists a sweet spot for alignment where AIs are just nice enough from e.g. good values picked up during pre-training, but can’t be modified during post-training so much to have worse values, and that 2) given that this sweet spot does exist we do hit it with AGI / ASI.
I think there’s some evidence pointing to this happening with current models but I’m not highly confident that it means what I think it means. If this is the case though, further technical alignment research might be bad and acceleration might be good.
I guess the crux of my snarky comment is that if your only choice for master of the universe is between 2 evil empires, your kinda screwed either way.
Yeah, kinda hoping 1) there exists a sweet spot for alignment where AIs are just nice enough from e.g. good values picked up during pre-training, but can’t be modified during post-training so much to have worse values, and that 2) given that this sweet spot does exist we do hit it with AGI / ASI.
I think there’s some evidence pointing to this happening with current models but I’m not highly confident that it means what I think it means. If this is the case though, further technical alignment research might be bad and acceleration might be good.