My proposed counter-argument loosely based on the structure of yours.
Summary of claims
A reasonable fraction of computational resources will be spent based on the result of careful reflection.
I expect to be reasonably aligned with the result of careful reflection from other humans
I expect to be much less aligned with result of AIs-that-seize-control reflecting due to less similarity and the potential for AIs to pursue relatively specific objectives from training (things like reward seeking).
Many arguments that human resource usage won’t be that good seem to apply equally well to AIs and thus aren’t differential.
Full argument
The vast majority of value from my perspective on reflection (where my perspective on reflection is probably somewhat utilitarian, but this is somewhat unclear) in the future will come from agents who are trying to optimize explicitly for doing “good” things and are being at least somewhat thoughtful about it, rather than those who incidentally achieve utilitarian objectives. (By “good”, I just mean what seems to them to be good.)
At present, the moral views of humanity are a hot mess. However, it seems likely to me that a reasonable fraction of the total computational resources of our lightcone (perhaps 50%) will in expectation be spent based on the result of a process in which an agent or some agents think carefully about what would be best in a pretty delibrate and relatively wise way. This could involve eventually deferring to other smarter/wiser agents or massive amounts of self-enhancement. Let’s call this a “reasonably-good-reflection” process.
Why think a reasonable fraction of resources will be spent like this?
If you self-enhance and get smarter, this sort of reflection on your values seems very natural. The same for deferring to other smarter entities. Further, entities in control might live for an extremely long time, so if they don’t lock in something, as long as they eventually get around to being thoughtful it should be fine.
People who don’t reflect like this probably won’t care much about having vast amounts of resources and thus the resources will go to those who reflect.
The argument for “you should be at least somewhat thoughtful about how you spend vast amounts of resources” is pretty compelling at an absolute level and will be more compelling as people get smarter.
Currently a variety of moderately powerful groups are pretty sympathetic to this sort of view and the power of these groups will be higher in the singularity.
I expect that I am pretty aligned (on reasonably-good-reflection) with the result of random humans doing reasonably-good-reflection as I am also a human and many of the underlying arguments/intuitions I think seem important seem likely to seem important to many other humans (given various common human intuitions) upon those humans becoming wiser. Further, I really just care about the preferences of (post-)humans who end care most about using vast, vast amounts of computational resources (assuming I end up caring about these things on reflection), because the humans who care about other things won’t use most of the resources. Additionally, I care “most” about the on-reflection preferences I have which are relatively less contingent and more common among at least humans for a variety of reasons. (One way to put this is that I care less about worlds in which my preferences on reflection seem highly contingent.)
So, I’ve claimed that reasonably-good-reflection resource usage will be non-trivial (perhaps 50%) and that I’m pretty aligned with humans on reasonably-good-reflection. Supposing these, why think that most of the value is coming from something like reasonably-good-reflection prefences rather than other things, e.g. not very thoughtful indexical preferences (selfish) consumption? Broadly three reasons:
I expect huge returns to heavy optimization of resource usage (similar to spending altruistic resources today IMO and in the future we’ll we smarter which will make this effect stronger).
I don’t think that (even heavily optimized) not-very-thoughtful indexical preferences directly result in things I care that much about relative to things optimized for what I care about on reflection (e.g. it probably doesn’t result in vast, vast, vast amounts of experience which is optimized heavily for goodness/$).
Consider how billionaries currently spend money which doesn’t seem to have have much direct value, certainly not relative to their altruistic expenditures.
I find it hard to imagine that indexical self-ish consumption results in things like simulating 10^50 happy minds. See also my other comment. It seems more likely IMO that people with self-ish preferences mostly just buy positional goods that involve little to no experience (separately, I expect this means that people without self-ish preferences get more of the compute, but this is counted in my earlier argument, so we shouldn’t double count it.)
I expect that indirect value “in the minds of the laborers producing the goods for consumption” is also small relative to things optimized for what I care about on reflection. (It seems pretty small or maybe net-negative (due to factory farming) today (relative to optimized altruism) and I expect the share will go down going forward.)
(Aside: I was talking about not-very-thoughtful indexical-preferences. It’s likely to me that doing a reasonably good job reflecting on selfish preferences get back to something like de facto utilitarianism (at least as far as how you spend the vast majority of computational resources) because personal identity and indexical preferences don’t make much sense and the thing you end up thinking is more like “I guess I just care about experiences in general”.)
What about AIs? I think there are broadly two main reasons to expect that what AIs do on reasonably-good-reflection to be worse from my perspective than what humans do:
As discussed above, I am more similar to other humans and when I inspect the object level of how other humans think or act, I feel reasonably optimistic about the results of reasonably-good-reflection for humans. (It seems to me like the main thing holding me back from agreement with other humans is mostly biases/communication/lack of smarts/wisdom given many shared intuitions.) However, AIs might be more different and thus result in less value. Further, the values of humans after reasonably-good-reflection seem close to saturating in goodness from my perspective (perhaps 1⁄3 or 1⁄2 of the value of purely my values), so it seems hard for AI to do better.
To better understand this argument, imagine that instead of humanity the question was between identical clones of myself and AIs. It’s pretty clear I share the same values the clones, so the clones do pretty much strictly better than AIs (up to self-defeating moral views).
I’m uncertain about the degree of similarity between myself and other humans. But, mostly the underlying similarity uncertainties also applies to AIs. So, e.g., maybe I currently think on reasonably-good-reflection humans spend resources 1⁄3 as well as I would and AIs spend resources 1⁄9 as well. If I updated to think that other humans after reasonably-good-reflection only spend resources 1⁄10 as well as I do, I might also update to thinking AIs spend resources 1⁄100 as well.
In many of the stories I imagine for AIs seizing control, very powerful AIs end up directly pursuing close correlated of what was reinforced in training (sometimes called reward-seeking, though I’m trying to point at a more general notion). Such AIs are reasonably likely to pursue relatively obviously valueless-from-my-perspective things on reflection. Overall, they might act more like a ultra powerful corporation that just optimizes for power/money rather than our children (see also here). More generally, AIs might in some sense be subjected to wildly higher levels of optimization pressure than humans while being able to better internalize these values (lack of genetic bottleneck) which can plausibly result in “worse” values from my perspective.
Note that we’re conditioning on safety/alignment technology failing to retain human control, so we should imagine correspondingly less human control over AI values.
I think that the fraction of computation resources of our lightcone used based on the result of a reasonably-good-reflection process seems similar between human control and AI control (perhaps 50%). It’s possible to mess this up of course and either mess up the reflection or to lock-in bad values too early. But, when I look at the balance of arguments, humans messing this up seems pretty similar to AIs messing this up to me. So, the main question is what the result of such a process would be. One way to put this is that I don’t expect humans to differ substantially from AIs in terms of how “thoughtful” they are.
I interpret one of your arguments as being “Humans won’t be very thoughtful about how they spend vast, vast amounts of computational resources. After all, they aren’t thoughtful right now.” To the extent I buy this argument, I think it applies roughly equally well to AIs. So naively, it just divides by both sides rather than making AI look more favorable. (At least, if you accept that all most all of the value comes from being at least a bit thoughtful, which you also contest. See my arguments for that.)
My proposed counter-argument loosely based on the structure of yours.
Summary of claims
A reasonable fraction of computational resources will be spent based on the result of careful reflection.
I expect to be reasonably aligned with the result of careful reflection from other humans
I expect to be much less aligned with result of AIs-that-seize-control reflecting due to less similarity and the potential for AIs to pursue relatively specific objectives from training (things like reward seeking).
Many arguments that human resource usage won’t be that good seem to apply equally well to AIs and thus aren’t differential.
Full argument
The vast majority of value from my perspective on reflection (where my perspective on reflection is probably somewhat utilitarian, but this is somewhat unclear) in the future will come from agents who are trying to optimize explicitly for doing “good” things and are being at least somewhat thoughtful about it, rather than those who incidentally achieve utilitarian objectives. (By “good”, I just mean what seems to them to be good.)
At present, the moral views of humanity are a hot mess. However, it seems likely to me that a reasonable fraction of the total computational resources of our lightcone (perhaps 50%) will in expectation be spent based on the result of a process in which an agent or some agents think carefully about what would be best in a pretty delibrate and relatively wise way. This could involve eventually deferring to other smarter/wiser agents or massive amounts of self-enhancement. Let’s call this a “reasonably-good-reflection” process.
Why think a reasonable fraction of resources will be spent like this?
If you self-enhance and get smarter, this sort of reflection on your values seems very natural. The same for deferring to other smarter entities. Further, entities in control might live for an extremely long time, so if they don’t lock in something, as long as they eventually get around to being thoughtful it should be fine.
People who don’t reflect like this probably won’t care much about having vast amounts of resources and thus the resources will go to those who reflect.
The argument for “you should be at least somewhat thoughtful about how you spend vast amounts of resources” is pretty compelling at an absolute level and will be more compelling as people get smarter.
Currently a variety of moderately powerful groups are pretty sympathetic to this sort of view and the power of these groups will be higher in the singularity.
I expect that I am pretty aligned (on reasonably-good-reflection) with the result of random humans doing reasonably-good-reflection as I am also a human and many of the underlying arguments/intuitions I think seem important seem likely to seem important to many other humans (given various common human intuitions) upon those humans becoming wiser. Further, I really just care about the preferences of (post-)humans who end care most about using vast, vast amounts of computational resources (assuming I end up caring about these things on reflection), because the humans who care about other things won’t use most of the resources. Additionally, I care “most” about the on-reflection preferences I have which are relatively less contingent and more common among at least humans for a variety of reasons. (One way to put this is that I care less about worlds in which my preferences on reflection seem highly contingent.)
So, I’ve claimed that reasonably-good-reflection resource usage will be non-trivial (perhaps 50%) and that I’m pretty aligned with humans on reasonably-good-reflection. Supposing these, why think that most of the value is coming from something like reasonably-good-reflection prefences rather than other things, e.g. not very thoughtful indexical preferences (selfish) consumption? Broadly three reasons:
I expect huge returns to heavy optimization of resource usage (similar to spending altruistic resources today IMO and in the future we’ll we smarter which will make this effect stronger).
I don’t think that (even heavily optimized) not-very-thoughtful indexical preferences directly result in things I care that much about relative to things optimized for what I care about on reflection (e.g. it probably doesn’t result in vast, vast, vast amounts of experience which is optimized heavily for goodness/$).
Consider how billionaries currently spend money which doesn’t seem to have have much direct value, certainly not relative to their altruistic expenditures.
I find it hard to imagine that indexical self-ish consumption results in things like simulating 10^50 happy minds. See also my other comment. It seems more likely IMO that people with self-ish preferences mostly just buy positional goods that involve little to no experience (separately, I expect this means that people without self-ish preferences get more of the compute, but this is counted in my earlier argument, so we shouldn’t double count it.)
I expect that indirect value “in the minds of the laborers producing the goods for consumption” is also small relative to things optimized for what I care about on reflection. (It seems pretty small or maybe net-negative (due to factory farming) today (relative to optimized altruism) and I expect the share will go down going forward.)
(Aside: I was talking about not-very-thoughtful indexical-preferences. It’s likely to me that doing a reasonably good job reflecting on selfish preferences get back to something like de facto utilitarianism (at least as far as how you spend the vast majority of computational resources) because personal identity and indexical preferences don’t make much sense and the thing you end up thinking is more like “I guess I just care about experiences in general”.)
What about AIs? I think there are broadly two main reasons to expect that what AIs do on reasonably-good-reflection to be worse from my perspective than what humans do:
As discussed above, I am more similar to other humans and when I inspect the object level of how other humans think or act, I feel reasonably optimistic about the results of reasonably-good-reflection for humans. (It seems to me like the main thing holding me back from agreement with other humans is mostly biases/communication/lack of smarts/wisdom given many shared intuitions.) However, AIs might be more different and thus result in less value. Further, the values of humans after reasonably-good-reflection seem close to saturating in goodness from my perspective (perhaps 1⁄3 or 1⁄2 of the value of purely my values), so it seems hard for AI to do better.
To better understand this argument, imagine that instead of humanity the question was between identical clones of myself and AIs. It’s pretty clear I share the same values the clones, so the clones do pretty much strictly better than AIs (up to self-defeating moral views).
I’m uncertain about the degree of similarity between myself and other humans. But, mostly the underlying similarity uncertainties also applies to AIs. So, e.g., maybe I currently think on reasonably-good-reflection humans spend resources 1⁄3 as well as I would and AIs spend resources 1⁄9 as well. If I updated to think that other humans after reasonably-good-reflection only spend resources 1⁄10 as well as I do, I might also update to thinking AIs spend resources 1⁄100 as well.
In many of the stories I imagine for AIs seizing control, very powerful AIs end up directly pursuing close correlated of what was reinforced in training (sometimes called reward-seeking, though I’m trying to point at a more general notion). Such AIs are reasonably likely to pursue relatively obviously valueless-from-my-perspective things on reflection. Overall, they might act more like a ultra powerful corporation that just optimizes for power/money rather than our children (see also here). More generally, AIs might in some sense be subjected to wildly higher levels of optimization pressure than humans while being able to better internalize these values (lack of genetic bottleneck) which can plausibly result in “worse” values from my perspective.
Note that we’re conditioning on safety/alignment technology failing to retain human control, so we should imagine correspondingly less human control over AI values.
I think that the fraction of computation resources of our lightcone used based on the result of a reasonably-good-reflection process seems similar between human control and AI control (perhaps 50%). It’s possible to mess this up of course and either mess up the reflection or to lock-in bad values too early. But, when I look at the balance of arguments, humans messing this up seems pretty similar to AIs messing this up to me. So, the main question is what the result of such a process would be. One way to put this is that I don’t expect humans to differ substantially from AIs in terms of how “thoughtful” they are.
I interpret one of your arguments as being “Humans won’t be very thoughtful about how they spend vast, vast amounts of computational resources. After all, they aren’t thoughtful right now.” To the extent I buy this argument, I think it applies roughly equally well to AIs. So naively, it just divides by both sides rather than making AI look more favorable. (At least, if you accept that all most all of the value comes from being at least a bit thoughtful, which you also contest. See my arguments for that.)