For individual tasks, sure, you can implement verifiers, though I think it becomes quickly unwieldy, but there’s no in-principle reason we cannot do this. But you cannot create AGI with a restricted model—we cannot define the space of what outputs we want, otherwise it’s by definition a narrow AI.
Because it can generate outputs that are sometimes correct on new tasks—“write me a program that computes X”, it’s general, even if “compute X” is made of 2 common subcomponents the model saw many times in training.
GPT-4 is perfectly safe if you were to run it in local hardware with a local terminal. The “space of outputs” is “text to the terminal”. As long as you don’t leave a security vulnerability where that text stream can cause commands to execute on the history PC, that’s it, that’s all it can do.
Consider that “a robot tethered to a mount” could do general tasks the same way. Same idea—its a general system but it’s command stream can’t reach anything but the tethered robot because that’s where the wires go.
You also verified the commands empirically. It’s not that you know any given robotic actions or text output is good, it’s that you benchmarked the model and it has a certain pFail on training inputs.
I agree this is not as much generality as humans have. It’s not a narrow AI though the “In distribution detector”—a measure of how similar the current task, current input is to the training set—is essentially narrowing your AI system from a general one to a narrow one, depending on your tolerances.
For tasks where you can’t shut the system down when the input state leaves distribution—say a robotic surgeon, you need it to keep trying best it can- you would use electromechanical interlocks. Same as 50 years ago for interlocks that prevent exposure to radiation. You tether the surgery robotic equipment, restrict it’s network links etc, so that the number of people it can kill is at most 1 (the patient)
For individual tasks, sure, you can implement verifiers, though I think it becomes quickly unwieldy, but there’s no in-principle reason we cannot do this. But you cannot create AGI with a restricted model—we cannot define the space of what outputs we want, otherwise it’s by definition a narrow AI.
What’s GPT-4?
Because it can generate outputs that are sometimes correct on new tasks—“write me a program that computes X”, it’s general, even if “compute X” is made of 2 common subcomponents the model saw many times in training.
GPT-4 is perfectly safe if you were to run it in local hardware with a local terminal. The “space of outputs” is “text to the terminal”. As long as you don’t leave a security vulnerability where that text stream can cause commands to execute on the history PC, that’s it, that’s all it can do.
Consider that “a robot tethered to a mount” could do general tasks the same way. Same idea—its a general system but it’s command stream can’t reach anything but the tethered robot because that’s where the wires go.
You also verified the commands empirically. It’s not that you know any given robotic actions or text output is good, it’s that you benchmarked the model and it has a certain pFail on training inputs.
I agree this is not as much generality as humans have. It’s not a narrow AI though the “In distribution detector”—a measure of how similar the current task, current input is to the training set—is essentially narrowing your AI system from a general one to a narrow one, depending on your tolerances.
For tasks where you can’t shut the system down when the input state leaves distribution—say a robotic surgeon, you need it to keep trying best it can- you would use electromechanical interlocks. Same as 50 years ago for interlocks that prevent exposure to radiation. You tether the surgery robotic equipment, restrict it’s network links etc, so that the number of people it can kill is at most 1 (the patient)