I broadly agree with this. For the civilizations that want to keep thinking about their values or the philosophically tricky parts of their strategy, there will be an open question about how convergent/correct their thinking process is (although there’s lots you can do to make it more convergent/correct — eg. redo it under lots of different conditions, have arguments be reviewed by many different people/AIs, etc).
And it does seem like all reasonable civilizations should want to do some thinking like this. For those civilizations, this post is just saying that other sources of instability could be removed (if they so chose, and insofar as that was compatible with the intended thinking process).
Also, separately, my best guess is that competent civilizations (whatever that means) that were aiming for correctness would probably succeed (at least in areas were correctness is well defined). Maybe by solving metaphilosophy and doing that, maybe because they took lots of precautions like mentioned above, maybe just because it’s hard to get permanently stuck at incorrect beliefs if lots of people are dedicated to getting things right, have all the time and resources in the world, and are really open-minded. (If they’re not open-minded but feel strongly attached to keeping their current views, then I become more pessimistic.)
But even if a civilization was willing to take this extreme step, I’m not sure how you’d design a filter that could reliably detect and block all “reasoning” that might exploit some flaw in your reasoning process.
By being unreasonably conservative. Most AIs could be tasked with narrowly doing their job, a few with pushing forward technology/engineering, none with doing anything that looks suspiciously like ethics/philosophy. (This seems like a bad idea.)
I broadly agree with this. For the civilizations that want to keep thinking about their values or the philosophically tricky parts of their strategy, there will be an open question about how convergent/correct their thinking process is (although there’s lots you can do to make it more convergent/correct — eg. redo it under lots of different conditions, have arguments be reviewed by many different people/AIs, etc).
And it does seem like all reasonable civilizations should want to do some thinking like this. For those civilizations, this post is just saying that other sources of instability could be removed (if they so chose, and insofar as that was compatible with the intended thinking process).
Also, separately, my best guess is that competent civilizations (whatever that means) that were aiming for correctness would probably succeed (at least in areas were correctness is well defined). Maybe by solving metaphilosophy and doing that, maybe because they took lots of precautions like mentioned above, maybe just because it’s hard to get permanently stuck at incorrect beliefs if lots of people are dedicated to getting things right, have all the time and resources in the world, and are really open-minded. (If they’re not open-minded but feel strongly attached to keeping their current views, then I become more pessimistic.)
By being unreasonably conservative. Most AIs could be tasked with narrowly doing their job, a few with pushing forward technology/engineering, none with doing anything that looks suspiciously like ethics/philosophy. (This seems like a bad idea.)