I was a bit surprised to see your ‘mediocre’ outcome defined thus:
The superintelligence is aligned to non-utilitarian values (probably normal human values) … [H]umans will populate all reachable galaxies and there will be 10 billion happy humans per star.
Having a superintelligence aligned to normal human values seems like a big win to me!
Given this somewhat unconventional definition of mediocre, it seems like this article is basically advocating for defecting in a prisoners dilemma. Yes, it is better for utilitarians if everyone else collaborates on avoiding extinction, while being much worse for everyone else, and allows the utilitarians to free-ride and instead focus on promoting their own values. But adopting this strategy seems quite hostile to the rest of humanity, and if everyone did adopted it (e.g. muslims focused on trying to promote AGI-sharia rather than reducing extinction) we might all end up worse off (extinct). ‘Normal human values’, which includes utility, seems like a natural schelling point for collaboration and cooperation.
I agree that there might be reasons of moral cooperation and trade to compromise with other value systems. But when deciding how to cooperate, we should at least be explicitly guided by optimising for our own values, subject to constraints. I think it is far from obvious that aligning with the intent of the programmer is the best way to optimise for utilitarian values. Perhaps we should aim for utilitarian alignment first
Having a superintelligence aligned to normal human values seems like a big win to me!
Not super sure what this means but the ‘normal human values’ outcome as I’ve defined it hardly contributes to EV calculations at all compared to the utopia outcome. If you disagree with this, please look at the math and let me know if I made a mistake.
Sure. The math is clearly very handwavy, but I think there are basically two issues.
Firstly, the mediocre outcome supposedly involves a superintelligence optimising for normal human values, potentially including simulating people. Yet it only involves 10 billion humans per star, less than we are currently forecast to support on a single un-optimised planet using no simulations, no AGI help and relatively primitive technology. At the very least I would think we should be having massive terraforming and efficient food production to support much higher populations, if not full dyson spheres and simulations. It’s not going to be as many people as the other scenario, but it’ll hopefully be more than Earth2100.
Secondly, I think the utilitarian outcome is over-valued on anything but purely utilitarian criteria. A world of soma-brains, without love, friendship, meaningful challenges etc. would strike many people as quite undesirable.
It seems like it would be relatively easy to make this world significantly better by conventional lights at relatively low utilitarian cost. For example, giving the simulated humans the ability to turn themselves off might incur a positive but small overhead (as presumably very few happy people would take this option), but be a significant improvement by the standards of a conventional ethics which value consent.
Setting aside what this post said, here’s an attitude I think we should be sympathetic to:
There are possible futures that are great by prosaic standards, where all humans are flourishing and so forth. But some of these futures may not be great by the standards that everyone would adopt if we were smarter, wiser, better-informed, and so forth (which the author happens to believe is utilitarianism). Insofar as the latter is much more choice-worthy in expectation than the former, we should have great concern for not just ensuring survival, but also that good values are realized in the future. This may require some events happening, or some events happening before others, or some specific coordination, to achieve. Phrased more provocatively, superintelligence aligned with normal human values is a prima facie existential catastrophe, since normal human values probably aren’t really good, or aren’t what we would be promoting if we were wiser/etc. I’m not sure the Schelling point note is relevant—it depends on which agents are coordinating on AI—but if it is, a better Schelling point may be some kind of extrapolation of human values.
Edit: ok, I agree we should be cautious about acting certain in utilitarianism or whatever we may happen to value when those-with-whom-we-should-cooperate disagree.
Yes, I agree with that. I think aiming for some sort of CEV-like system to find such values in the future, via some robustly-not-value-degrading process, seems like a good idea. Hopefully such a process could gain widespread assent. It’s the jumping straight to the (perceived) conclusion I am objecting to.
I was a bit surprised to see your ‘mediocre’ outcome defined thus:
Having a superintelligence aligned to normal human values seems like a big win to me!
Given this somewhat unconventional definition of mediocre, it seems like this article is basically advocating for defecting in a prisoners dilemma. Yes, it is better for utilitarians if everyone else collaborates on avoiding extinction, while being much worse for everyone else, and allows the utilitarians to free-ride and instead focus on promoting their own values. But adopting this strategy seems quite hostile to the rest of humanity, and if everyone did adopted it (e.g. muslims focused on trying to promote AGI-sharia rather than reducing extinction) we might all end up worse off (extinct). ‘Normal human values’, which includes utility, seems like a natural schelling point for collaboration and cooperation.
I agree that there might be reasons of moral cooperation and trade to compromise with other value systems. But when deciding how to cooperate, we should at least be explicitly guided by optimising for our own values, subject to constraints. I think it is far from obvious that aligning with the intent of the programmer is the best way to optimise for utilitarian values. Perhaps we should aim for utilitarian alignment first
Not super sure what this means but the ‘normal human values’ outcome as I’ve defined it hardly contributes to EV calculations at all compared to the utopia outcome. If you disagree with this, please look at the math and let me know if I made a mistake.
Sure. The math is clearly very handwavy, but I think there are basically two issues.
Firstly, the mediocre outcome supposedly involves a superintelligence optimising for normal human values, potentially including simulating people. Yet it only involves 10 billion humans per star, less than we are currently forecast to support on a single un-optimised planet using no simulations, no AGI help and relatively primitive technology. At the very least I would think we should be having massive terraforming and efficient food production to support much higher populations, if not full dyson spheres and simulations. It’s not going to be as many people as the other scenario, but it’ll hopefully be more than Earth2100.
Secondly, I think the utilitarian outcome is over-valued on anything but purely utilitarian criteria. A world of soma-brains, without love, friendship, meaningful challenges etc. would strike many people as quite undesirable.
It seems like it would be relatively easy to make this world significantly better by conventional lights at relatively low utilitarian cost. For example, giving the simulated humans the ability to turn themselves off might incur a positive but small overhead (as presumably very few happy people would take this option), but be a significant improvement by the standards of a conventional ethics which value consent.
Setting aside what this post said, here’s an attitude I think we should be sympathetic to:
There are possible futures that are great by prosaic standards, where all humans are flourishing and so forth. But some of these futures may not be great by the standards that everyone would adopt if we were smarter, wiser, better-informed, and so forth (which the author happens to believe is utilitarianism). Insofar as the latter is much more choice-worthy in expectation than the former, we should have great concern for not just ensuring survival, but also that good values are realized in the future. This may require some events happening, or some events happening before others, or some specific coordination, to achieve. Phrased more provocatively, superintelligence aligned with normal human values is a prima facie existential catastrophe, since normal human values probably aren’t really good, or aren’t what we would be promoting if we were wiser/etc. I’m not sure the Schelling point note is relevant—it depends on which agents are coordinating on AI—but if it is, a better Schelling point may be some kind of extrapolation of human values.
Edit: ok, I agree we should be cautious about acting certain in utilitarianism or whatever we may happen to value when those-with-whom-we-should-cooperate disagree.
Yes, I agree with that. I think aiming for some sort of CEV-like system to find such values in the future, via some robustly-not-value-degrading process, seems like a good idea. Hopefully such a process could gain widespread assent. It’s the jumping straight to the (perceived) conclusion I am objecting to.