Consider a civilization that has “locked in” the value of hedonistic utilitarianism. Subsequently some AI in this civilization discovers what appears to be a convincing argument for a new, more optimal design of hedonium, which purports to be 2x more efficient at generating hedons per unit of resources consumed. Except that this argument actually exploits a flaw in the reasoning processes of the AI (which is widespread in this civilization) such that the new design is actually optimized for something different from what was intended when the “lock in” happened. The closest this post comes to addressing this scenario seems to be “An extreme version of this would be to prevent all reasoning that could plausibly lead to value-drift, halting progress in philosophy.” But even if a civilization was willing to take this extreme step, I’m not sure how you’d design a filter that could reliably detect and block all “reasoning” that might exploit some flaw in your reasoning process.
Maybe in order to prevent this, the civilization tried to locked in “maximize the quantity of this specific design of hedonium” as their goal instead of hedonistic utilitarianism in the abstract. But 1) maybe the original design of hedonium is already flawed or highly suboptimal, and 2) what if (as an example) some AI discovers an argument that they should engage in acausal trade in order to maximize the quantity of hedonium in the multiverse, except that this argument is actually wrong.
This is related to the problem of metaphilosophy, and my hope that we can one day understand “correct reasoning” well enough to design AIs that we can be confident are free from flaws like these, but I don’t know how to argue that this is actually feasible.
I broadly agree with this. For the civilizations that want to keep thinking about their values or the philosophically tricky parts of their strategy, there will be an open question about how convergent/correct their thinking process is (although there’s lots you can do to make it more convergent/correct — eg. redo it under lots of different conditions, have arguments be reviewed by many different people/AIs, etc).
And it does seem like all reasonable civilizations should want to do some thinking like this. For those civilizations, this post is just saying that other sources of instability could be removed (if they so chose, and insofar as that was compatible with the intended thinking process).
Also, separately, my best guess is that competent civilizations (whatever that means) that were aiming for correctness would probably succeed (at least in areas were correctness is well defined). Maybe by solving metaphilosophy and doing that, maybe because they took lots of precautions like mentioned above, maybe just because it’s hard to get permanently stuck at incorrect beliefs if lots of people are dedicated to getting things right, have all the time and resources in the world, and are really open-minded. (If they’re not open-minded but feel strongly attached to keeping their current views, then I become more pessimistic.)
But even if a civilization was willing to take this extreme step, I’m not sure how you’d design a filter that could reliably detect and block all “reasoning” that might exploit some flaw in your reasoning process.
By being unreasonably conservative. Most AIs could be tasked with narrowly doing their job, a few with pushing forward technology/engineering, none with doing anything that looks suspiciously like ethics/philosophy. (This seems like a bad idea.)
Just to be clear: we mostly don’t argue for the desirability or likelihood of lock-in, just its technological feasibility. Am I correctly interpreting your comment to be cautionary, questioning the desirability of lock-in given the apparent difficulty of doing so while maintaining sufficiently flexibility to handle unforeseen philosophical arguments?
To take a step back, I’m not sure it makes sense to talk about “technological feasibility” of lock-in, as opposed to say its expected cost, because suppose the only feasible method of lock-in causes you to lose 99% of the potential value of the universe, that seems like a more important piece of information than “it’s technologically feasible”.
(On second thought, maybe I’m being unfair in this criticism, because feasibility of lock-in is already pretty clear to me, at least if one is willing to assume extreme costs, so I’m more interested in the question of “but can it be done at more acceptable costs”, but perhaps this isn’t true of others.)
That aside, I guess I’m trying to understand what you’re envisioning when you say “An extreme version of this would be to prevent all reasoning that could plausibly lead to value-drift, halting progress in philosophy.” What kind of mechanism do you have in mind for doing this? Also, you distinguish between stopping philosophical progress vs stopping technological progress, but since technological progress often requires solving philosophical questions (e.g., related to how to safely use the new technology), do you really see much distinction between the two?
Consider a civilization that has “locked in” the value of hedonistic utilitarianism. Subsequently some AI in this civilization discovers what appears to be a convincing argument for a new, more optimal design of hedonium, which purports to be 2x more efficient at generating hedons per unit of resources consumed. Except that this argument actually exploits a flaw in the reasoning processes of the AI (which is widespread in this civilization) such that the new design is actually optimized for something different from what was intended when the “lock in” happened. The closest this post comes to addressing this scenario seems to be “An extreme version of this would be to prevent all reasoning that could plausibly lead to value-drift, halting progress in philosophy.” But even if a civilization was willing to take this extreme step, I’m not sure how you’d design a filter that could reliably detect and block all “reasoning” that might exploit some flaw in your reasoning process.
Maybe in order to prevent this, the civilization tried to locked in “maximize the quantity of this specific design of hedonium” as their goal instead of hedonistic utilitarianism in the abstract. But 1) maybe the original design of hedonium is already flawed or highly suboptimal, and 2) what if (as an example) some AI discovers an argument that they should engage in acausal trade in order to maximize the quantity of hedonium in the multiverse, except that this argument is actually wrong.
This is related to the problem of metaphilosophy, and my hope that we can one day understand “correct reasoning” well enough to design AIs that we can be confident are free from flaws like these, but I don’t know how to argue that this is actually feasible.
I broadly agree with this. For the civilizations that want to keep thinking about their values or the philosophically tricky parts of their strategy, there will be an open question about how convergent/correct their thinking process is (although there’s lots you can do to make it more convergent/correct — eg. redo it under lots of different conditions, have arguments be reviewed by many different people/AIs, etc).
And it does seem like all reasonable civilizations should want to do some thinking like this. For those civilizations, this post is just saying that other sources of instability could be removed (if they so chose, and insofar as that was compatible with the intended thinking process).
Also, separately, my best guess is that competent civilizations (whatever that means) that were aiming for correctness would probably succeed (at least in areas were correctness is well defined). Maybe by solving metaphilosophy and doing that, maybe because they took lots of precautions like mentioned above, maybe just because it’s hard to get permanently stuck at incorrect beliefs if lots of people are dedicated to getting things right, have all the time and resources in the world, and are really open-minded. (If they’re not open-minded but feel strongly attached to keeping their current views, then I become more pessimistic.)
By being unreasonably conservative. Most AIs could be tasked with narrowly doing their job, a few with pushing forward technology/engineering, none with doing anything that looks suspiciously like ethics/philosophy. (This seems like a bad idea.)
Just to be clear: we mostly don’t argue for the desirability or likelihood of lock-in, just its technological feasibility. Am I correctly interpreting your comment to be cautionary, questioning the desirability of lock-in given the apparent difficulty of doing so while maintaining sufficiently flexibility to handle unforeseen philosophical arguments?
To take a step back, I’m not sure it makes sense to talk about “technological feasibility” of lock-in, as opposed to say its expected cost, because suppose the only feasible method of lock-in causes you to lose 99% of the potential value of the universe, that seems like a more important piece of information than “it’s technologically feasible”.
(On second thought, maybe I’m being unfair in this criticism, because feasibility of lock-in is already pretty clear to me, at least if one is willing to assume extreme costs, so I’m more interested in the question of “but can it be done at more acceptable costs”, but perhaps this isn’t true of others.)
That aside, I guess I’m trying to understand what you’re envisioning when you say “An extreme version of this would be to prevent all reasoning that could plausibly lead to value-drift, halting progress in philosophy.” What kind of mechanism do you have in mind for doing this? Also, you distinguish between stopping philosophical progress vs stopping technological progress, but since technological progress often requires solving philosophical questions (e.g., related to how to safely use the new technology), do you really see much distinction between the two?