Evolution doesn’t really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.
Evolution favors replication. But patience and resource acquisition aren’t obviously correlated with any sort of value; if anything, better resource-acquirers are destructive and competitive. The claim isn’t that evolution is intrinsically “against” any particular value, it’s that it’s extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic. Furthermore, competitive dynamics lead to systematic failures. See the citation.
Shulman’s post assumes that once somewhere is settled, it’s permanently inhabited by the same tribe. But I don’t buy that. Agents can still spread through violence or through mimicry (remember the quote on fifth-generation warfare).
It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it.
All I am saying is that the argument applies to this issue as well.
Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others’ situations, or to understand what we would believe if we viewed others’ private information.
The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict. Better technology yields better monitoring, but also better hiding—which is easier, monitoring ICBMs in the 1970′s or monitoring cyberweapons today?
One of the most critical pieces of information in these cases is intentions, which are easy to keep secret and will probably remain so for a long time.
By “don’t require superintelligence to be implemented,” do you mean systems of machine ethics that will work even while machines are broadly human level?
Yes, or even implementable in current systems.
I think the mandate of AI alignment easily covers the failure modes you have in mind here.
The failure modes here are a different context where the existing research is often less relevant or not relevant at all. Whatever you put under the umbrella of alignment, there is a difference between looking at a particular system with the assumption that it will rebuild the universe in accordance with its value function, and looking at how systems interact in varying numbers. If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn’t very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics. Figuring out how to optimize large systems of agents is absolutely not a simple matter of figuring out how to build one good agent and then replicating it as much as possible.
If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn’t very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics
I don’t think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with “well that research was silly anyway.”)
Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which “multi-agent dynamics” do you think change the technical situation?
the claim isn’t that evolution is intrinsically “against” any particular value, it’s that it’s extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic
If evolution isn’t optimizing for anything, then you are left with the agents’ optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where “anyone who wants to destroy the world has the option,” as is the security dilemma, and so forth.)
Yes, or even implementable in current systems.
We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will “do what we want” we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).
The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict.
You’re right.
Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don’t think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I’m all for evaluating it on its merits. So maybe we are in agreement here.
I don’t think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with “well that research was silly anyway.”)
Parenthesis is probably true, e.g. most of MIRI’s traditional agenda. If agents don’t quickly gain decisive strategic advantages then you don’t have to get AI design right the first time; you can make many agents and weed out the bad ones. So the basic design desiderata are probably important, but it’s just not very useful to do research on them now. Not familiar enough with your line of work to comment on it, but just think about the degree to which a problem would no longer be a problem if you can build, test and interact with many prototype human-level and smarter-than-human agents.
Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which “multi-agent dynamics” do you think change the technical situation?
Aside from the ability to prototype as described above, there are the same dynamics which plague human society when multiple factions with good intentions end up fighting due to security concerns or tragedies of the commons, or when multiple agents with different priors interpret every new piece of evidence they see differently and so go down intractably separate paths of disagreement. FAI can solve all the problems of class, politics, economics, etc by telling everyone what to do, for better or for worse. But multiagent systems will only be stable with strong institutions, unless they have some other kind of cooperative architecture (such as universal agreement in value functions, in which case you now have the problem of controlling everybody’s AIs but without the benefit of having an FAI to rule the world). Building these institutions and cooperative structures may have to be done right the first time, since they are effectively singletons, and they may be less corrigible or require different kinds of mechanisms to ensure corrigibility. And the dynamics of multiagent systems means you cannot accurately predict the long term future merely based on value alignment, which you would (at least naively) be able to do with a single FAI.
If evolution isn’t optimizing for anything, then you are left with the agents’ optimization, which is precisely what we wanted.
Well it leads to agents which are optimal replicators in their given environments. That’s not (necessarily) what we want.
I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where “anyone who wants to destroy the world has the option,” as is the security dilemma, and so forth.)
Thanks for the comments.
Evolution favors replication. But patience and resource acquisition aren’t obviously correlated with any sort of value; if anything, better resource-acquirers are destructive and competitive. The claim isn’t that evolution is intrinsically “against” any particular value, it’s that it’s extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic. Furthermore, competitive dynamics lead to systematic failures. See the citation.
Shulman’s post assumes that once somewhere is settled, it’s permanently inhabited by the same tribe. But I don’t buy that. Agents can still spread through violence or through mimicry (remember the quote on fifth-generation warfare).
All I am saying is that the argument applies to this issue as well.
The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict. Better technology yields better monitoring, but also better hiding—which is easier, monitoring ICBMs in the 1970′s or monitoring cyberweapons today?
One of the most critical pieces of information in these cases is intentions, which are easy to keep secret and will probably remain so for a long time.
Yes, or even implementable in current systems.
The failure modes here are a different context where the existing research is often less relevant or not relevant at all. Whatever you put under the umbrella of alignment, there is a difference between looking at a particular system with the assumption that it will rebuild the universe in accordance with its value function, and looking at how systems interact in varying numbers. If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn’t very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics. Figuring out how to optimize large systems of agents is absolutely not a simple matter of figuring out how to build one good agent and then replicating it as much as possible.
I don’t think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with “well that research was silly anyway.”)
Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which “multi-agent dynamics” do you think change the technical situation?
If evolution isn’t optimizing for anything, then you are left with the agents’ optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where “anyone who wants to destroy the world has the option,” as is the security dilemma, and so forth.)
We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will “do what we want” we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).
You’re right.
Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don’t think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I’m all for evaluating it on its merits. So maybe we are in agreement here.
Parenthesis is probably true, e.g. most of MIRI’s traditional agenda. If agents don’t quickly gain decisive strategic advantages then you don’t have to get AI design right the first time; you can make many agents and weed out the bad ones. So the basic design desiderata are probably important, but it’s just not very useful to do research on them now. Not familiar enough with your line of work to comment on it, but just think about the degree to which a problem would no longer be a problem if you can build, test and interact with many prototype human-level and smarter-than-human agents.
Aside from the ability to prototype as described above, there are the same dynamics which plague human society when multiple factions with good intentions end up fighting due to security concerns or tragedies of the commons, or when multiple agents with different priors interpret every new piece of evidence they see differently and so go down intractably separate paths of disagreement. FAI can solve all the problems of class, politics, economics, etc by telling everyone what to do, for better or for worse. But multiagent systems will only be stable with strong institutions, unless they have some other kind of cooperative architecture (such as universal agreement in value functions, in which case you now have the problem of controlling everybody’s AIs but without the benefit of having an FAI to rule the world). Building these institutions and cooperative structures may have to be done right the first time, since they are effectively singletons, and they may be less corrigible or require different kinds of mechanisms to ensure corrigibility. And the dynamics of multiagent systems means you cannot accurately predict the long term future merely based on value alignment, which you would (at least naively) be able to do with a single FAI.
Well it leads to agents which are optimal replicators in their given environments. That’s not (necessarily) what we want.
That too!