When I said that there isn’t any adversarial action, I really should have said that you are safe and your learning process is under your control. By default I’m imagining a reflection process under which (a) all of your basic needs are met (e.g. you don’t have to worry about starving), (b) you get to veto any particular experience happening to you, (c) you can build tools (or have other people build tools) that help with your reflection, including by building situations where you can have particular experiences, or by creating simulations of yourself that have experiences and can report back, (d) nothing is trying to manipulate or otherwise attack you (unless you specifically asked for the manipulation / attack), whether it is intelligently designed or natural, (e) you don’t have any time pressure on finishing the reflection.
To be clear this is pretty stringent—the current state of affairs where you regularly go around talking to people who try to persuade you of stuff doesn’t meet the criteria.
So to restate, your claim is that in the absence of such adversaries, moral reasoning processes will in fact all converge to the same place.
Given conditions of safety and control over the reflection.
It’s also not that I think every such process converge to exactly the same place. Rather I’d say that (a) I feel pretty intuitively happy about anything that you get to via such a process, so it seems fine to get any one of them and (b) there is enough convergence that it makes sense to view that as a target which we can approximate or move towards.
Even if we’re exposed to wildly different experiences/observations/futures, the only thing that determines whether there’s convergence or divergence is whether those experiences contain intelligent adversaries or not.
Part of the reflection process would be to seek out different experiences / observations, so I’m not sure they would be “wildly different”.
What precisely about our moral reasoning process make them unlikely to be attacked by “natural” conditions but attackable by an intelligently designed one? [...] Could natural conditions ever play the equivalent of intelligent adversaries?
If they’re attacked by natural conditions that violates my requirements too. (I don’t think I ever said the adversarial action had to be “intelligently designed” instead of “natural”?)
In this process fundamentally everything that happens to you is meant to be your own choice. It’s still possible that you make a mistake, e.g. you send a simulation of yourself to listen to a persuasive argument and then report back, the simulation is persuaded that <bad thing> is great, comes back and persuades you of it as well. (Obviously you’ve already considered that possibility and taken precautions, but it happens anyway; your precautions weren’t sufficient.) But it at least feels unlikely, e.g. you shouldn’t expect to make a mistake (if you did, you should just not do the thing instead).
When I said that there isn’t any adversarial action, I really should have said that you are safe and your learning process is under your control. By default I’m imagining a reflection process under which (a) all of your basic needs are met (e.g. you don’t have to worry about starving), (b) you get to veto any particular experience happening to you, (c) you can build tools (or have other people build tools) that help with your reflection, including by building situations where you can have particular experiences, or by creating simulations of yourself that have experiences and can report back, (d) nothing is trying to manipulate or otherwise attack you (unless you specifically asked for the manipulation / attack), whether it is intelligently designed or natural, (e) you don’t have any time pressure on finishing the reflection.
To be clear this is pretty stringent—the current state of affairs where you regularly go around talking to people who try to persuade you of stuff doesn’t meet the criteria.
Given conditions of safety and control over the reflection.
It’s also not that I think every such process converge to exactly the same place. Rather I’d say that (a) I feel pretty intuitively happy about anything that you get to via such a process, so it seems fine to get any one of them and (b) there is enough convergence that it makes sense to view that as a target which we can approximate or move towards.
Part of the reflection process would be to seek out different experiences / observations, so I’m not sure they would be “wildly different”.
If they’re attacked by natural conditions that violates my requirements too. (I don’t think I ever said the adversarial action had to be “intelligently designed” instead of “natural”?)
In this process fundamentally everything that happens to you is meant to be your own choice. It’s still possible that you make a mistake, e.g. you send a simulation of yourself to listen to a persuasive argument and then report back, the simulation is persuaded that <bad thing> is great, comes back and persuades you of it as well. (Obviously you’ve already considered that possibility and taken precautions, but it happens anyway; your precautions weren’t sufficient.) But it at least feels unlikely, e.g. you shouldn’t expect to make a mistake (if you did, you should just not do the thing instead).