it’s plausible that we’ll have such capable but unaligned AI systems
I simply agree, no need to convince me there 👍
Ought’s approach:
Instead of giving a training signal after the entire AI gives an output,
Do give a signal after each sub-module gives an output.
Yes?
My worry: The sub-modules will themselves be misaligned.
Is your suggestion: Limit compute and neural memory of sub-models in order to lower the risk
?
And my second worry is that the “big AI” (the collection of sub models) will be so good that you could ask it to perform a task and it will be exceedingly effective at it, in some misaligned-to-our-values (misaligned-to-what-we-actually-meant) way
I simply agree, no need to convince me there 👍
Ought’s approach:
Instead of giving a training signal after the entire AI gives an output,
Do give a signal after each sub-module gives an output.
Yes?
My worry: The sub-modules will themselves be misaligned.
Is your suggestion: Limit compute and neural memory of sub-models in order to lower the risk
?
And my second worry is that the “big AI” (the collection of sub models) will be so good that you could ask it to perform a task and it will be exceedingly effective at it, in some misaligned-to-our-values (misaligned-to-what-we-actually-meant) way