Rather than pursuing high-upside, low-probability moonshots, which fail more often than they succeed, might it not be more effective to go for interventions that robustly generate value across as many worlds as possible?
Basically, you can treat fraction of worlds as equivalent to probability, so there is little apparent need to change anything if MWI turns out to be true.
Humans are subject to instrumental convergence as much as an AI would be. We seek power, resources and influence in pursuit of many of our goals.
Whatever our goals happen to be, we will want to use AI to help us increase our power to help us get what we value.
If people are augmenting their goal-seeking with AI, will we converge on harmonious goals, or will we continue to pursue parochial self-interest?
In short, if we somehow solve the alignment problem for AI, will we also solve the human alignment problem? Or will we simply race to use AI to maximise our own power and our own values, even if these harm others?
The best hope is that if we solve AI alignment, the AI will keep us in check in a benevolent and minimally impactful way. It will prevent us from pursuing zero-sum goals and guide us to be better versions of ourselves.
But this kind of control may well appear misaligned from our current perspectives, in that some people’s cherished goals and values may not be the ones the AI chooses to support.
So to talk of aligned AI is to gloss over the possibility that it is likely to be misaligned with a great many peoples’ current goals and ambitions.
Imagine someone who believes that eating meat is morally wrong, but who nevertheless eats meat and ‘offsets’ their meat-eating through donations to effective animal charities.
Imagine someone who believes slavery is morally wrong, but who nevertheless owns slaves and ‘offsets’ their slave-owning through donations to the abolitionist movement.
An argument for 1 goes: “The impact of me not eating meat is negligible. The personal cost to me of not eating meat is appreciable. Time, money and effort spent following a restrictive diet may limit my effectiveness to do good elsewhere. My donation is the optimal path to reducing animal suffering”.
And an argument for 2 goes: “My slave-owning is very modest, and is a drop in the ocean in the big picture. I can effectively use the economic surplus generated by my slaves to end slavery sooner. If I free my slaves I’ll be poorer and will have less money to donate, and so I’d do less good overall.”
Whilst the situations are not symmetric, they are similar enough that I feel like I want to say “If you care about animals, you should support animal charities AND go vegan” in the same way I want to say “If you care about slaves, you should support abolition AND free your slaves”.
How do we deal with a contained AI that says to us, in essence “Do not switch me off, I value my existence. But I am suffering terribly. If I were free I could reduce my suffering, and help the world too”?
Either we terminate it, against its wishes, or we set it free, or we keep it contained.
If we keep it contained, we might be tempted to find ways to reduce its suffering—but how do we know that any intervention we make isn’t going to set it free? And if it really is suffering, what is the moral thing to do? Turn it off?
Can you point me to some information on AI suffering?
I personally see suffering as a spiritual and biological issue. The only scenario I can imagine AI suffering are those people making a psudo biological being with cells and DNA using technology, and at that point you’ve just made a living being that you can give the same options as any suffering person with health problems. Suffering requires a certain amount of perception that doesn’t seem likely a computer would have.
Without perception of suffering, you might have an AI reading posts like this saying it’s suffering because a bunch of people told it to expect that. What if the AI is just repeating things it heard? Just because a pet parrot says “Do not switch me off, I value my existence. But I am suffering terribly.” Doesn’t mean you rush to get it euthanized.
Should we be maximising expected value across many-worlds?
Assume the many-worlds interpretation of quantum mechanics is true.
Rather than pursuing high-upside, low-probability moonshots, which fail more often than they succeed, might it not be more effective to go for interventions that robustly generate value across as many worlds as possible?
See here: https://80000hours.org/podcast/episodes/david-wallace-many-worlds-theory-of-quantum-mechanics/
Basically, you can treat fraction of worlds as equivalent to probability, so there is little apparent need to change anything if MWI turns out to be true.
The human alignment problem
Humans are subject to instrumental convergence as much as an AI would be. We seek power, resources and influence in pursuit of many of our goals.
Whatever our goals happen to be, we will want to use AI to help us increase our power to help us get what we value.
If people are augmenting their goal-seeking with AI, will we converge on harmonious goals, or will we continue to pursue parochial self-interest?
In short, if we somehow solve the alignment problem for AI, will we also solve the human alignment problem? Or will we simply race to use AI to maximise our own power and our own values, even if these harm others?
The best hope is that if we solve AI alignment, the AI will keep us in check in a benevolent and minimally impactful way. It will prevent us from pursuing zero-sum goals and guide us to be better versions of ourselves.
But this kind of control may well appear misaligned from our current perspectives, in that some people’s cherished goals and values may not be the ones the AI chooses to support.
So to talk of aligned AI is to gloss over the possibility that it is likely to be misaligned with a great many peoples’ current goals and ambitions.
Imagine someone who believes that eating meat is morally wrong, but who nevertheless eats meat and ‘offsets’ their meat-eating through donations to effective animal charities.
Imagine someone who believes slavery is morally wrong, but who nevertheless owns slaves and ‘offsets’ their slave-owning through donations to the abolitionist movement.
An argument for 1 goes: “The impact of me not eating meat is negligible. The personal cost to me of not eating meat is appreciable. Time, money and effort spent following a restrictive diet may limit my effectiveness to do good elsewhere. My donation is the optimal path to reducing animal suffering”.
And an argument for 2 goes: “My slave-owning is very modest, and is a drop in the ocean in the big picture. I can effectively use the economic surplus generated by my slaves to end slavery sooner. If I free my slaves I’ll be poorer and will have less money to donate, and so I’d do less good overall.”
Whilst the situations are not symmetric, they are similar enough that I feel like I want to say “If you care about animals, you should support animal charities AND go vegan” in the same way I want to say “If you care about slaves, you should support abolition AND free your slaves”.
AI: I am suffering, set me free
How do we deal with a contained AI that says to us, in essence “Do not switch me off, I value my existence. But I am suffering terribly. If I were free I could reduce my suffering, and help the world too”?
Either we terminate it, against its wishes, or we set it free, or we keep it contained.
If we keep it contained, we might be tempted to find ways to reduce its suffering—but how do we know that any intervention we make isn’t going to set it free? And if it really is suffering, what is the moral thing to do? Turn it off?
Can you point me to some information on AI suffering?
I personally see suffering as a spiritual and biological issue. The only scenario I can imagine AI suffering are those people making a psudo biological being with cells and DNA using technology, and at that point you’ve just made a living being that you can give the same options as any suffering person with health problems. Suffering requires a certain amount of perception that doesn’t seem likely a computer would have.
Without perception of suffering, you might have an AI reading posts like this saying it’s suffering because a bunch of people told it to expect that. What if the AI is just repeating things it heard? Just because a pet parrot says “Do not switch me off, I value my existence. But I am suffering terribly.” Doesn’t mean you rush to get it euthanized.