Agreed that’s why I wrote “0.1% to 0.01% reduction in p(doom) per year”. I wasn’t talking about the absolute level of doom here. I edited my comment to say “0.1% to 0.01% reduction in p(doom) per year of delay” which is hopefully more clear
Ah, sorry. I indeed interpreted you as saying that we would reduce p(doom) to 0.01-0.1% per year, rather than saying that each year of delay reduces p(doom) by that amount. I think that view is more reasonable, but I’d still likely put the go-ahead-number higher.
That’s why I said “Similarly, I would potentially be happier to turn over the universe to aliens instead of AIs.”
Apologies again for misinterpreting. I didn’t know how much weight to put on the word “potentially” in your comment. Although note that I said, “Even when an EA insists their concern isn’t about the human species per se I typically end up disagreeing on some other fundamental point here that seems like roughly the same thing I’m pointing at.” I don’t think the problem is literally that EAs are anthropocentric, but I think they often have anthropocentric intuitions that influence these estimates.
Maybe a more accurate summary is that people have a bias towards “evolved” or “biological” beings, which I think might explain why you’d be a little happier to hand over the universe to aliens, or dogs, but not AIs.
I would be reasonably happy (e.g. 50-90% of the value relative to human control) to turn the universe over to aliens. [...]
I think the exact same questions apply to AIs, I just have empirical beliefs that AIs which end up taking over are likely to do predictably worse things with the cosmic endowment (e.g. 10-30% of the value).
I guess I mostly think that’s a pretty bizarre view, with some obvious reasons for doubt, and I don’t know what would be driving it. The process through which aliens would get values like ours seems much less robust than the process through which AIs gets our values. AIs are trained on our data, and humans will presumably care a lot about aligning them (at least at first).
From my perspective this is a bit like saying you’d prefer aliens to take over the universe rather than handing control over to our genetically engineered human descendants. I’d be very skeptical of that view too for some basic reasons.
Overall, upon learning your view here, I don’t think I’d necessarily diagnose you as having the intuitions I alluded to in my original comment, but I think there’s likely something underneath your views that I would strongly disagree with, if I understood your views further. I find it highly unlikely that AGIs will be even more “alien” from the perspective of our values than literal aliens (especially if we’re talking about aliens who themselves build their own AIs, genetically engineer themselves, and so on).
If you’re interested in diving into “how bad/good is it to cede the universe to AIs”, I strongly think it’s worth reading and responding to “When is unaligned AI morally valuable?” which is the current state of the art on the topic (same thing I linked above). I now regret rehashing a bunch of these arguments which I think are mostly made better here. In particular, I think the case for “AIs created in the default way might have low moral value is reasonably well argued for here:
Many people have a strong intuition that we should be happy for our AI descendants, whatever they choose to do. They grant the possibility of pathological preferences like paperclip-maximization, and agree that turning over the universe to a paperclip-maximizer would be a problem, but don’t believe it’s realistic for an AI to have such uninteresting preferences.
I disagree. I think this intuition comes from analogizing AI to the children we raise, but that it would be just as accurate to compare AI to the corporations we create. Optimists imagine our automated children spreading throughout the universe and doing their weird-AI-analog of art; but it’s just as realistic to imagine automated PepsiCo spreading throughout the universe and doing its weird-AI-analog of maximizing profit.
It might be the case that PepsiCo maximizing profit (or some inscrutable lost-purpose analog of profit) is intrinsically morally valuable. But it’s certainly not obvious.
Or it might be the case that we would never produce an AI like a corporation in order to do useful work. But looking at the world around us today that’s certainly not obvious.
Neither of those analogies is remotely accurate. Whether we should be happy about AI “flourishing” is a really complicated question about AI and about morality, and we can’t resolve it with a one-line political slogan or crude analogy.
I now regret rehashing a bunch of these arguments which I think are mostly made better here.
It’s fine if you don’t want to continue this discussion. I can sympathize if you find it tedious. That said, I don’t really see why you’d appeal to that post in this context (FWIW, I read the post at the time it came out, and just re-read it). I interpret Paul Christiano to mainly be making arguments in the direction of “unaligned AIs might be morally valuable, even if we’d prefer aligned AI” which is what I thought I was broadly arguing for, in contradistinction to your position. I thought you were saying something closer to the opposite of what Paul was arguing for (although you also made several separate points, and I don’t mean to oversimplify your position).
(But I agree with the quoted part of his post that we shouldn’t be happy with AIs doing “whatever they choose to do”. I don’t think I’m perfectly happy with unaligned AI. I’d prefer we try to align AIs, just as Paul Christiano says too.)
Huh, no I almost entirely agree with this post as I noted in my prior comment. I cited this much earlier: “More generally, I think I basically endorse the views here (which discusses the questions of when you should cede power etc.).”
I do think unaligned ai would be morally valuable (I said in an earlier comment unaligned ai which take over might capture 10-30% of the value. That’s a lot of value.)
I don’t think I’m perfectly happy with unaligned AI. I’d prefer we try to align AIs, just as Paul Christiano says too.
I think we’ve probably been talking past each other. I thought the whole argument here was “how much value do we lose if (presumably misaligned) AI takes over” and you were arguing for “not much, caring about this seems like overly fixating on humanity” and I was arguing “(presumably misaligned) ais which take over probably results in substantially less value”. This now seems incorrect and we perhaps only have minor quantitative disagreements?
I think it probably would have helped if you were more quantitative here. Exactly how much of the value?
I thought the whole argument here was “how much value do we lose if (presumably misaligned) AI takes over”
I think the key question here is: compared to what? My position is that we lose a lot of potential value both from delaying AI and from having unaligned AI, but it’s not a crazy high reduction in either case. In other words they’re pretty comparable in terms of lost value.
Ranking the options in rough order (taking up your offer to be quantitative):
Aligned AIs built tomorrow: 100% of the value from my perspective
Aligned AIs built in 100 years: 50% of the value
Unaligned AIs built tomorrow: 15% of the value
Unaligned AIs built in 100 years: 25% of the value
Note that I haven’t thought about these exact numbers much.
What drives this huge drop? Naive utility would be very close to 100%. (Do you mean “aligned ais built in 100y if humanity still exists by that point”, which includes extinction risk before 2123?)
I attempted to explain the basic intuitions behind my judgement in this thread. Unfortunately it seems I did a poor job. For the full explanation you’ll have to wait until I write a post, if I ever get around to doing that.
The simple, short, and imprecise explanation is: I don’t really value humanity as a species as much as I value the people who currently exist, (something like) our current communities and relationships, our present values, and the existence of sentient and sapient life living positive experiences. Much of this will go away after 100 years.
TBC, it’s plausible that in the future I’ll think that “marginally influencing AIs to have more sensible values” is more leveraged than “avoiding AI take over and hoping that humans (and our chosen successors) do something sensible”. I’m partially defering to others on the view that AI takeover is the best angle of attack, perhaps I should examine further.
(Of course, it could be that from a longtermist perspective other stuff is even better than avoiding AI takeover or altering AI values. E.g. maybe one of conflict avoidance, better decision theory, or better human institutions for post singularity is even better.)
I certainly wish the question of how much worse/better AI takeover is relative to human control was investigated more effectively. It seems notable to me how important this question is from a longtermist perspective and how little investigation it has received.
(I’ve spent maybe 1 person day thinking about it and I think probably less than 3 FTE years have been put into this by people who I’d be interested in defering to.)
The process through which aliens would get values like ours seems much less robust than the process through which AIs gets our values. AIs are trained on our data, and humans will presumably care a lot about aligning them (at least at first).
Note that I’m conditioning on AIs successfully taking over which is strong evidence against human success at creating desirable (edit: from the perspective of the creators) AIs.
if I understood your views further. I find it highly unlikely that AGIs will be even more “alien” from the perspective of our values than literal aliens
For an intuition pump, consider future AIs which are trained for the equivalent of 100 million years of next-token-prediction[1] on low quality web text and generated data and then aggressively selected with outcomes based feedback. This outcomes based feedback results in selecting the AIs for carefully tricking their human overseers in a variety of cases and generally ruthlessly pursuing reward.
This scenario is somewhat worse than what I expect in the median world. But in practice I expect that it’s at least systematically possible to change the training setup to achieve in predictably better AI motivation and values. Beyond trying to influence AI motivations with crude tools, it seems even better to have humans retain control, use AIs to do a huge amount of R&D (or philosophy work), and then decide what should actually happen with access to more options.
Another way to put this is that I feel notably better about the decisions making of current power structures in the western world and in AIs labs than I feel about going with AI motivations which likely result from training.
More generally, if you are the sole person in control, it seems strictly better from your perspective to carefully reflect on who/what you want to defer to rather than doing this somewhat arbitrarily (this still leaves open the question of how bad arbitrarily defering is).
From my perspective this is a bit like saying you’d prefer aliens to take over the universe rather than handing control over to our genetically engineered human descendants. I’d be very skeptical of that view too for some basic reasons.
I’m pretty happy with slow and steady genetic engineering as a handover process, but I would prefer even slower and more deliberate than this. E.g., existing humans thinking carefully for as long as seems to yield returns about what beings we should defer to and then defer to those slightly smart beings which think for a long time and defer to other beings, etc, etc.
I guess I mostly think that’s a pretty bizarre view, with some obvious reasons for doubt, and I don’t know what would be driving it.
Part of my view on aliens or dogs is driven from the principle of “aliens/dogs are in a somewhat similar position to us, so we should be fine with swapping” (roughly speaking) and “the part of my values which seem most dependent on random emprical contingencies about evolved life I put less weight on”. These intuitions transfer somewhat less to the AI case.
Current AIs are trained on perhaps 10-100 trillion tokens and if we think 1 token the equivalent of 1 second then (100*10^12)/(60*60*24*365) = 3 milion years.
Note that I’m conditioning on AIs successfully taking over which is strong evidence against human success at creating desirable AIs.
I don’t think it’s strong evidence, for what it’s worth. I’m also not sure what “AI takeover” means, and I think existing definitions are very ambiguous (would we say Europe took over the world during the age of imperialism? Are smart people currently in control of the world? Have politicians, as a class, taken over the world?). Depending on the definition, I tend to think that AI takeover is either ~inevitable and not inherently bad, or bad but not particularly likely.
This outcomes based feedback results in selecting the AIs for carefully tricking their human overseers in a variety of cases and generally ruthlessly pursuing reward.
Would aliens not also be incentivized to trick us or others? What about other humans? In my opinion, basically all the arguments about AI deception from gradient descent apply in some form to other methods of selecting minds, including evolution by natural selection, cultural learning, and in-lifetime learning. Humans frequently lie to or mislead each other about our motives. For example, if you ask a human what they’d do if they became world dictator, I suspect you’d often get a different answer than the one they’d actually chose if given that power. I think this is essentially the same epistemic position we might occupy with AI.
Also, for a bunch of reasons that I don’t currently feel like elaborating on, I expect humans to anticipate, test for, and circumvent the most egregious forms of AI deception in practice. The most important point here is that I’m not convinced that incentives for deception are much worse for AIs than for other actors in different training regimes (including humans, uplifted dogs, and aliens).
I don’t think it’s strong evidence, for what it’s worth. I’m also not sure what “AI takeover” means, and I think existing definitions are very ambiguous (would we say Europe took over the world during the age of imperialism? Are smart people currently in control of the world? Have politicians, as a class, taken over the world?). Depending on the definition, I tend to think that AI takeover is either ~inevitable and not inherently bad, or bad but not particularly likely.
By “AI takeover”, I mean autonomous AI coup/revolution. E.g., violating the law and/or subverting the normal mechanisms of power transfer. (Somewhat unclear exactly what should count tbc, but there are some central examples.) By this definition, it basically always involves subverting the intentions of the creators of the AI, though may not involve violent conflict.
I don’t think this is super likely, perhaps 25% chance.
Also, for a bunch of reasons that I don’t currently feel like elaborating on, I expect humans to anticipate, test for, and circumvent the most egregious forms of AI deception in practice. The most important point here is that I’m not convinced that incentives for deception are much worse for AIs than for other actors in different training regimes (including humans, uplifted dogs, and aliens).
I don’t strongly disagree with either of these claims, but this isn’t exactly where my crux lies.
The key thing is “generally ruthlessly pursuing reward”.
The key thing is “generally ruthlessly pursuing reward”.
It depends heavily on what you mean by this, but I’m kinda skeptical of the strong version of ruthless reward seekers, for similar reasons given in this post. I think AIs by default might be ruthless in some other senses—since we’ll be applying a lot of selection pressure to them to get good behavior—but I’m not really sure how how much weight to put on the fact that AIs will be “ruthless” when evaluating how good they are at being our successors. It’s not clear how that affects my evaluation of how much I’d be OK handing the universe over to them, and my guess is the answer is “not much” (absent more details).
Humans seem pretty ruthless in certain respects too, e.g. about survival, or increasing their social status. I’d expect aliens, and potentially uplifted dogs to be ruthless too along some axes depending on how we uplifted them.
Ah, sorry. I indeed interpreted you as saying that we would reduce p(doom) to 0.01-0.1% per year, rather than saying that each year of delay reduces p(doom) by that amount. I think that view is more reasonable, but I’d still likely put the go-ahead-number higher.
Apologies again for misinterpreting. I didn’t know how much weight to put on the word “potentially” in your comment. Although note that I said, “Even when an EA insists their concern isn’t about the human species per se I typically end up disagreeing on some other fundamental point here that seems like roughly the same thing I’m pointing at.” I don’t think the problem is literally that EAs are anthropocentric, but I think they often have anthropocentric intuitions that influence these estimates.
Maybe a more accurate summary is that people have a bias towards “evolved” or “biological” beings, which I think might explain why you’d be a little happier to hand over the universe to aliens, or dogs, but not AIs.
I guess I mostly think that’s a pretty bizarre view, with some obvious reasons for doubt, and I don’t know what would be driving it. The process through which aliens would get values like ours seems much less robust than the process through which AIs gets our values. AIs are trained on our data, and humans will presumably care a lot about aligning them (at least at first).
From my perspective this is a bit like saying you’d prefer aliens to take over the universe rather than handing control over to our genetically engineered human descendants. I’d be very skeptical of that view too for some basic reasons.
Overall, upon learning your view here, I don’t think I’d necessarily diagnose you as having the intuitions I alluded to in my original comment, but I think there’s likely something underneath your views that I would strongly disagree with, if I understood your views further. I find it highly unlikely that AGIs will be even more “alien” from the perspective of our values than literal aliens (especially if we’re talking about aliens who themselves build their own AIs, genetically engineer themselves, and so on).
If you’re interested in diving into “how bad/good is it to cede the universe to AIs”, I strongly think it’s worth reading and responding to “When is unaligned AI morally valuable?” which is the current state of the art on the topic (same thing I linked above). I now regret rehashing a bunch of these arguments which I think are mostly made better here. In particular, I think the case for “AIs created in the default way might have low moral value is reasonably well argued for here:
(And the same recommendation for onlookers.)
It’s fine if you don’t want to continue this discussion. I can sympathize if you find it tedious. That said, I don’t really see why you’d appeal to that post in this context (FWIW, I read the post at the time it came out, and just re-read it). I interpret Paul Christiano to mainly be making arguments in the direction of “unaligned AIs might be morally valuable, even if we’d prefer aligned AI” which is what I thought I was broadly arguing for, in contradistinction to your position. I thought you were saying something closer to the opposite of what Paul was arguing for (although you also made several separate points, and I don’t mean to oversimplify your position).
(But I agree with the quoted part of his post that we shouldn’t be happy with AIs doing “whatever they choose to do”. I don’t think I’m perfectly happy with unaligned AI. I’d prefer we try to align AIs, just as Paul Christiano says too.)
Huh, no I almost entirely agree with this post as I noted in my prior comment. I cited this much earlier: “More generally, I think I basically endorse the views here (which discusses the questions of when you should cede power etc.).”
I do think unaligned ai would be morally valuable (I said in an earlier comment unaligned ai which take over might capture 10-30% of the value. That’s a lot of value.)
I think we’ve probably been talking past each other. I thought the whole argument here was “how much value do we lose if (presumably misaligned) AI takes over” and you were arguing for “not much, caring about this seems like overly fixating on humanity” and I was arguing “(presumably misaligned) ais which take over probably results in substantially less value”. This now seems incorrect and we perhaps only have minor quantitative disagreements?
I think it probably would have helped if you were more quantitative here. Exactly how much of the value?
I think the key question here is: compared to what? My position is that we lose a lot of potential value both from delaying AI and from having unaligned AI, but it’s not a crazy high reduction in either case. In other words they’re pretty comparable in terms of lost value.
Ranking the options in rough order (taking up your offer to be quantitative):
Aligned AIs built tomorrow: 100% of the value from my perspective
Aligned AIs built in 100 years: 50% of the value
Unaligned AIs built tomorrow: 15% of the value
Unaligned AIs built in 100 years: 25% of the value
Note that I haven’t thought about these exact numbers much.
What drives this huge drop? Naive utility would be very close to 100%. (Do you mean “aligned ais built in 100y if humanity still exists by that point”, which includes extinction risk before 2123?)
I attempted to explain the basic intuitions behind my judgement in this thread. Unfortunately it seems I did a poor job. For the full explanation you’ll have to wait until I write a post, if I ever get around to doing that.
The simple, short, and imprecise explanation is: I don’t really value humanity as a species as much as I value the people who currently exist, (something like) our current communities and relationships, our present values, and the existence of sentient and sapient life living positive experiences. Much of this will go away after 100 years.
TBC, it’s plausible that in the future I’ll think that “marginally influencing AIs to have more sensible values” is more leveraged than “avoiding AI take over and hoping that humans (and our chosen successors) do something sensible”. I’m partially defering to others on the view that AI takeover is the best angle of attack, perhaps I should examine further.
(Of course, it could be that from a longtermist perspective other stuff is even better than avoiding AI takeover or altering AI values. E.g. maybe one of conflict avoidance, better decision theory, or better human institutions for post singularity is even better.)
I certainly wish the question of how much worse/better AI takeover is relative to human control was investigated more effectively. It seems notable to me how important this question is from a longtermist perspective and how little investigation it has received.
(I’ve spent maybe 1 person day thinking about it and I think probably less than 3 FTE years have been put into this by people who I’d be interested in defering to.)
Note that I’m conditioning on AIs successfully taking over which is strong evidence against human success at creating desirable (edit: from the perspective of the creators) AIs.
For an intuition pump, consider future AIs which are trained for the equivalent of 100 million years of next-token-prediction[1] on low quality web text and generated data and then aggressively selected with outcomes based feedback. This outcomes based feedback results in selecting the AIs for carefully tricking their human overseers in a variety of cases and generally ruthlessly pursuing reward.
This scenario is somewhat worse than what I expect in the median world. But in practice I expect that it’s at least systematically possible to change the training setup to achieve in predictably better AI motivation and values. Beyond trying to influence AI motivations with crude tools, it seems even better to have humans retain control, use AIs to do a huge amount of R&D (or philosophy work), and then decide what should actually happen with access to more options.
Another way to put this is that I feel notably better about the decisions making of current power structures in the western world and in AIs labs than I feel about going with AI motivations which likely result from training.
More generally, if you are the sole person in control, it seems strictly better from your perspective to carefully reflect on who/what you want to defer to rather than doing this somewhat arbitrarily (this still leaves open the question of how bad arbitrarily defering is).
I’m pretty happy with slow and steady genetic engineering as a handover process, but I would prefer even slower and more deliberate than this. E.g., existing humans thinking carefully for as long as seems to yield returns about what beings we should defer to and then defer to those slightly smart beings which think for a long time and defer to other beings, etc, etc.
Part of my view on aliens or dogs is driven from the principle of “aliens/dogs are in a somewhat similar position to us, so we should be fine with swapping” (roughly speaking) and “the part of my values which seem most dependent on random emprical contingencies about evolved life I put less weight on”. These intuitions transfer somewhat less to the AI case.
Current AIs are trained on perhaps 10-100 trillion tokens and if we think 1 token the equivalent of 1 second then (100*10^12)/(60*60*24*365) = 3 milion years.
I don’t think it’s strong evidence, for what it’s worth. I’m also not sure what “AI takeover” means, and I think existing definitions are very ambiguous (would we say Europe took over the world during the age of imperialism? Are smart people currently in control of the world? Have politicians, as a class, taken over the world?). Depending on the definition, I tend to think that AI takeover is either ~inevitable and not inherently bad, or bad but not particularly likely.
Would aliens not also be incentivized to trick us or others? What about other humans? In my opinion, basically all the arguments about AI deception from gradient descent apply in some form to other methods of selecting minds, including evolution by natural selection, cultural learning, and in-lifetime learning. Humans frequently lie to or mislead each other about our motives. For example, if you ask a human what they’d do if they became world dictator, I suspect you’d often get a different answer than the one they’d actually chose if given that power. I think this is essentially the same epistemic position we might occupy with AI.
Also, for a bunch of reasons that I don’t currently feel like elaborating on, I expect humans to anticipate, test for, and circumvent the most egregious forms of AI deception in practice. The most important point here is that I’m not convinced that incentives for deception are much worse for AIs than for other actors in different training regimes (including humans, uplifted dogs, and aliens).
By “AI takeover”, I mean autonomous AI coup/revolution. E.g., violating the law and/or subverting the normal mechanisms of power transfer. (Somewhat unclear exactly what should count tbc, but there are some central examples.) By this definition, it basically always involves subverting the intentions of the creators of the AI, though may not involve violent conflict.
I don’t think this is super likely, perhaps 25% chance.
I don’t strongly disagree with either of these claims, but this isn’t exactly where my crux lies.
The key thing is “generally ruthlessly pursuing reward”.
I’m checking out of this conversation though.
It depends heavily on what you mean by this, but I’m kinda skeptical of the strong version of ruthless reward seekers, for similar reasons given in this post. I think AIs by default might be ruthless in some other senses—since we’ll be applying a lot of selection pressure to them to get good behavior—but I’m not really sure how how much weight to put on the fact that AIs will be “ruthless” when evaluating how good they are at being our successors. It’s not clear how that affects my evaluation of how much I’d be OK handing the universe over to them, and my guess is the answer is “not much” (absent more details).
Humans seem pretty ruthless in certain respects too, e.g. about survival, or increasing their social status. I’d expect aliens, and potentially uplifted dogs to be ruthless too along some axes depending on how we uplifted them.
Alright, that’s fine.