I’m afraid I’m not well read on the problem of inner alignment and why optimizing on an arbitrary goal is a realistic worry. Can you explain why this might happen / provide an good, simple resource that I can read?
The LW wiki entry is good. Also the Rob Miles video I link to above explains it well with visuals and examples. I think there are 3 core parts to the AI x-risk argument: the orthogonality thesis (Copernican revolution applied to mind-space; why outer alignment is hard), Basic AI Drives (convergent instrumental goals leading to power seeking), and Mesaoptimizers (why inner alignment is hard).
Thanks. I watched Robert Miles’ video which was very helpful. Especially the part where he explains why an AI might want to act in accordance with its base objective in a training environment only to then pursue its mesa objective in the real world.
I’m quite uncertain at this point, but I have a vague feeling that Russell’s second principle (The machine is initially uncertain about what those preferences are) is very important here. It is a vague feeling though...
I’m afraid I’m not well read on the problem of inner alignment and why optimizing on an arbitrary goal is a realistic worry. Can you explain why this might happen / provide an good, simple resource that I can read?
The LW wiki entry is good. Also the Rob Miles video I link to above explains it well with visuals and examples. I think there are 3 core parts to the AI x-risk argument: the orthogonality thesis (Copernican revolution applied to mind-space; why outer alignment is hard), Basic AI Drives (convergent instrumental goals leading to power seeking), and Mesaoptimizers (why inner alignment is hard).
Thanks. I watched Robert Miles’ video which was very helpful. Especially the part where he explains why an AI might want to act in accordance with its base objective in a training environment only to then pursue its mesa objective in the real world.
I’m quite uncertain at this point, but I have a vague feeling that Russell’s second principle (The machine is initially uncertain about what those preferences are) is very important here. It is a vague feeling though...