Maybe this cuts to the chase: Should we expect AIs to be able to know or do anything in particular well “enough”. I.e. is there one thing in particular we can say AIs will be good at and only get wrong extremely rarely? Is solving this as hard as technical AI alignment in general?
How do you define “biological” and “brain”? Again, your input is a camera image, so you have to build this up starting from sentences of the form “the pixel in the top left corner is this shade of grey”.
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..
AIs do not usually come equipped with such functions, so you either have to say how to use the AI system to implement those functions, or you have to implement them yourself.
This could be an explicit output we train the AI to predict (possibly part of responses in language).
I mean, the existence part was not the main point—my point was that if butterfly effects are real, then the AI system must always do nothing (even if it can’t predict what the butterfly effects would be). If you want to avoid debates about population ethics, you could imagine butterfly effects that affect current people: e.g. you slightly change who talks to whom, which changes whether a person gets hit by a car later in the day or not.
I “named” a particular person in that sentence. The probability that what I do leads to an earlier death for John Doe is extremely small, and that’s the probability that I’m constraining, for each person separately. This will also in practice prevent the AI from conducting murder lotteries up to a certain probability of being killed, but this probability might be too high, so you could have separate constraints for causing an earlier death for a random person or on the change in average life expectancy in the world to prevent, etc..
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..
It really sounds like this sort of training is going to require it to be able to interpret English the way we interpret English (e.g. to read biology textbooks); if you’re going to rely on that I don’t see why you don’t want to rely on that ability when we are giving it instructions.
This could be an explicit output we train the AI to predict (possibly part of responses in language).
That… is ambitious, if you want to do this for every term that exists in laws. But I agree that if you did this, you could try to “translate” laws into code in a literal fashion. I’m fairly confident that this would still be pretty far from what you wanted, because laws aren’t meant to be literal, but I’m not going to try to argue that here.
(Also, it probably wouldn’t be computationally efficient—that “don’t kill a person” law, to be implemented literally in code, would require you to loop over all people, and make a prediction for each one: extremely expensive.)
I “named” a particular person in that sentence.
Ah, I see. In that case I take back my objection about butterfly effects.
Maybe this cuts to the chase: Should we expect AIs to be able to know or do anything in particular well “enough”. I.e. is there one thing in particular we can say AIs will be good at and only get wrong extremely rarely? Is solving this as hard as technical AI alignment in general?
These are things it would be trained to learn. It would learn to read and could read biology textbooks and papers or things online, and it would also see pictures of people, brains, etc..
This could be an explicit output we train the AI to predict (possibly part of responses in language).
I “named” a particular person in that sentence. The probability that what I do leads to an earlier death for John Doe is extremely small, and that’s the probability that I’m constraining, for each person separately. This will also in practice prevent the AI from conducting murder lotteries up to a certain probability of being killed, but this probability might be too high, so you could have separate constraints for causing an earlier death for a random person or on the change in average life expectancy in the world to prevent, etc..
It really sounds like this sort of training is going to require it to be able to interpret English the way we interpret English (e.g. to read biology textbooks); if you’re going to rely on that I don’t see why you don’t want to rely on that ability when we are giving it instructions.
That… is ambitious, if you want to do this for every term that exists in laws. But I agree that if you did this, you could try to “translate” laws into code in a literal fashion. I’m fairly confident that this would still be pretty far from what you wanted, because laws aren’t meant to be literal, but I’m not going to try to argue that here.
(Also, it probably wouldn’t be computationally efficient—that “don’t kill a person” law, to be implemented literally in code, would require you to loop over all people, and make a prediction for each one: extremely expensive.)
Ah, I see. In that case I take back my objection about butterfly effects.