Hello from the 4 years into the future! Just a random note on the thing you said,
Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive.
Anthropic is now doing exactly this with their Constitutional AI. They let the chatbot respond in some way, then they ask it “reformulate the text so that it is more ethical”, and finally train it to output something more akin to the latter rather than to the former.
Thanks, this is really nice! For those of us (just) before a PhD: any thoughts on how different criteria tradeoff against each other when choosing which program to do? Assuming there’s no one program that is willing to admit me and has the perfect topic alignment, cool supervisor, prestigious university, is located near an AI safety hub, etc. E.g. I remember 80k being quite vocal about how a PhD makes sense mostly only when it’s done at a top 10 institution.