If AGI systems had goals that were cleanly separated from the rest of their cognition, such that they could learn and self-improve without risking any value drift (as long as the values-file wasn’t modified), then there’s a straightforward argument that you could stabilise and preserve that system’s goals by just storing the values-file with enough redundancy and digital error correction.
So this would make section 6 mostly irrelevant. But I think most other sections remain relevant, insofar as people weren’t already convinced that being able to build stable AGI systems would enable world-wide lock-in.
Therefore, it seems to me that most of your doc assumes we’re in this scenario [without clean separation between values and other parts]?
I was mostly imagining this scenario as I was writing, so when relevant, examples/terminology/arguments will be taylored for that, yeah.
If AGI systems had goals that were cleanly separated from the rest of their cognition, such that they could learn and self-improve without risking any value drift (as long as the values-file wasn’t modified), then there’s a straightforward argument that you could stabilise and preserve that system’s goals by just storing the values-file with enough redundancy and digital error correction.
So this would make section 6 mostly irrelevant. But I think most other sections remain relevant, insofar as people weren’t already convinced that being able to build stable AGI systems would enable world-wide lock-in.
I was mostly imagining this scenario as I was writing, so when relevant, examples/terminology/arguments will be taylored for that, yeah.