How do you intend to define âpersonâ in terms of the inputs to an AI system (letâs assume a camera image)?
Can we just define them as we normally do, e.g. biologically with a functioning brain? Is the concern that AIs wonât be able to tell which inputs represent real things from those that wonât? Or they just wonât be able to apply the definitions correctly generally enough?
How do you compute the âprobabilityâ of an event?
The AI would do this. Are AIs that arenât good at estimating probabilities of events smart enough to worry about? I suppose they could be good at estimating probabilities in specific domains but not generally, or have some very specific failure cases that could be catastrophic.
What is âinactionâ?
The AI waits for the next request, turns off or some other inconsequential default action.
(Thereâs also the problem that all actions probably change who does and doesnât exists, so this law would require the AI system to always take inaction, making it useless.)
Maybe my wording didnât capture this well, but my intention was a presentist/ânecessitarian person-affecting approach (not that I agree with the ethical position). Iâll try again:
âA particular person will have been born with action A and with inaction, and will die at least x earlier with probability > p with A than they would have with inaction.â
Can we just define them as we normally do, e.g. biologically with a functioning brain?
How do you define âbiologicalâ and âbrainâ? Again, your input is a camera image, so you have to build this up starting from sentences of the form âthe pixel in the top left corner is this shade of greyâ.
(Or you can choose some other input, as long as we actually have existing technology that can create that input.)
The AI would do this. Are AIs that arenât good at estimating probabilities of events smart enough to worry about?
Powerful AIs will certainly behave in ways that make it look like they are estimating probabilities.
Letâs take AIs trained by deep reinforcement learning as an example. If you want to encode something like âAny particular person dies at least x earlier with probability > p than they would have by inactionâ explicitly and literally in code, you will need functions like getAllPeople() and getProbability(event). AIs do not usually come equipped with such functions, so you either have to say how to use the AI system to implement those functions, or you have to implement them yourself. I am claiming that the second option is hard, and any solution you have for the first option will probably also work for something like telling the AI system to âdo what the user wantsâ.
The AI waits for the next request, turns off or some other inconsequential default action.
If youâre a self-driving car, itâs very unclear what an inconsequential default action is. (Though I agree in general thereâs often some default action that is fine.)
Maybe my wording didnât capture this well, but my intention was a presentist/ânecessitarian person-affecting approach (not that I agree with the ethical position).
I mean, the existence part was not the main pointâmy point was that if butterfly effects are real, then the AI system must always do nothing (even if it canât predict what the butterfly effects would be). If you want to avoid debates about population ethics, you could imagine butterfly effects that affect current people: e.g. you slightly change who talks to whom, which changes whether a person gets hit by a car later in the day or not.
Iâm not arguing that these sorts of butterfly effects are realâIâm not sureâbut it seems bad for the behavior of our AI system to depend so strongly on whether butterfly effects are real.
Maybe this cuts to the chase: Should we expect AIs to be able to know or do anything in particular well âenoughâ. I.e. is there one thing in particular we can say AIs will be good at and only get wrong extremely rarely? Is solving this as hard as technical AI alignment in general?
How do you define âbiologicalâ and âbrainâ? Again, your input is a camera image, so you have to build this up starting from sentences of the form âthe pixel in the top left corner is this shade of greyâ.
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..
AIs do not usually come equipped with such functions, so you either have to say how to use the AI system to implement those functions, or you have to implement them yourself.
This could be an explicit output we train the AI to predict (possibly part of responses in language).
I mean, the existence part was not the main pointâmy point was that if butterfly effects are real, then the AI system must always do nothing (even if it canât predict what the butterfly effects would be). If you want to avoid debates about population ethics, you could imagine butterfly effects that affect current people: e.g. you slightly change who talks to whom, which changes whether a person gets hit by a car later in the day or not.
I ânamedâ a particular person in that sentence. The probability that what I do leads to an earlier death for John Doe is extremely small, and thatâs the probability that Iâm constraining, for each person separately. This will also in practice prevent the AI from conducting murder lotteries up to a certain probability of being killed, but this probability might be too high, so you could have separate constraints for causing an earlier death for a random person or on the change in average life expectancy in the world to prevent, etc..
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..
It really sounds like this sort of training is going to require it to be able to interpret English the way we interpret English (e.g. to read biology textbooks); if youâre going to rely on that I donât see why you donât want to rely on that ability when we are giving it instructions.
This could be an explicit output we train the AI to predict (possibly part of responses in language).
That⌠is ambitious, if you want to do this for every term that exists in laws. But I agree that if you did this, you could try to âtranslateâ laws into code in a literal fashion. Iâm fairly confident that this would still be pretty far from what you wanted, because laws arenât meant to be literal, but Iâm not going to try to argue that here.
(Also, it probably wouldnât be computationally efficientâthat âdonât kill a personâ law, to be implemented literally in code, would require you to loop over all people, and make a prediction for each one: extremely expensive.)
I ânamedâ a particular person in that sentence.
Ah, I see. In that case I take back my objection about butterfly effects.
Can we just define them as we normally do, e.g. biologically with a functioning brain? Is the concern that AIs wonât be able to tell which inputs represent real things from those that wonât? Or they just wonât be able to apply the definitions correctly generally enough?
The AI would do this. Are AIs that arenât good at estimating probabilities of events smart enough to worry about? I suppose they could be good at estimating probabilities in specific domains but not generally, or have some very specific failure cases that could be catastrophic.
The AI waits for the next request, turns off or some other inconsequential default action.
Maybe my wording didnât capture this well, but my intention was a presentist/ânecessitarian person-affecting approach (not that I agree with the ethical position). Iâll try again:
âA particular person will have been born with action A and with inaction, and will die at least x earlier with probability > p with A than they would have with inaction.â
How do you define âbiologicalâ and âbrainâ? Again, your input is a camera image, so you have to build this up starting from sentences of the form âthe pixel in the top left corner is this shade of greyâ.
(Or you can choose some other input, as long as we actually have existing technology that can create that input.)
Powerful AIs will certainly behave in ways that make it look like they are estimating probabilities.
Letâs take AIs trained by deep reinforcement learning as an example. If you want to encode something like âAny particular person dies at least x earlier with probability > p than they would have by inactionâ explicitly and literally in code, you will need functions like getAllPeople() and getProbability(event). AIs do not usually come equipped with such functions, so you either have to say how to use the AI system to implement those functions, or you have to implement them yourself. I am claiming that the second option is hard, and any solution you have for the first option will probably also work for something like telling the AI system to âdo what the user wantsâ.
If youâre a self-driving car, itâs very unclear what an inconsequential default action is. (Though I agree in general thereâs often some default action that is fine.)
I mean, the existence part was not the main pointâmy point was that if butterfly effects are real, then the AI system must always do nothing (even if it canât predict what the butterfly effects would be). If you want to avoid debates about population ethics, you could imagine butterfly effects that affect current people: e.g. you slightly change who talks to whom, which changes whether a person gets hit by a car later in the day or not.
Iâm not arguing that these sorts of butterfly effects are realâIâm not sureâbut it seems bad for the behavior of our AI system to depend so strongly on whether butterfly effects are real.
Maybe this cuts to the chase: Should we expect AIs to be able to know or do anything in particular well âenoughâ. I.e. is there one thing in particular we can say AIs will be good at and only get wrong extremely rarely? Is solving this as hard as technical AI alignment in general?
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..
This could be an explicit output we train the AI to predict (possibly part of responses in language).
I ânamedâ a particular person in that sentence. The probability that what I do leads to an earlier death for John Doe is extremely small, and thatâs the probability that Iâm constraining, for each person separately. This will also in practice prevent the AI from conducting murder lotteries up to a certain probability of being killed, but this probability might be too high, so you could have separate constraints for causing an earlier death for a random person or on the change in average life expectancy in the world to prevent, etc..
It really sounds like this sort of training is going to require it to be able to interpret English the way we interpret English (e.g. to read biology textbooks); if youâre going to rely on that I donât see why you donât want to rely on that ability when we are giving it instructions.
That⌠is ambitious, if you want to do this for every term that exists in laws. But I agree that if you did this, you could try to âtranslateâ laws into code in a literal fashion. Iâm fairly confident that this would still be pretty far from what you wanted, because laws arenât meant to be literal, but Iâm not going to try to argue that here.
(Also, it probably wouldnât be computationally efficientâthat âdonât kill a personâ law, to be implemented literally in code, would require you to loop over all people, and make a prediction for each one: extremely expensive.)
Ah, I see. In that case I take back my objection about butterfly effects.