People will continue to prefer controllable to uncontrollable AI and continue to make at least a commonsense level of investment in controllability; that is, they invest as much as naively warranted by recent experience and short term expectations, which is less than warranted by a sophisticated assessment of uncertainty about misalignment, though the two may converge as “recent experience” involves more and more capable AIs. I think this minimal level of investment in control is very likely (99%+).
Next, the proposed sudden/surprising phase transition that breaks controllability properties never materialises so that commonsense investment turns out to be enough for an OK outcome. I think this is about 65%.
Next, AI-enhanced human politics also manages to generate an OK outcome. About 70%.
That’s 45%, but the bar is perhaps higher than you have in mind (I’m also counting non misalignment paths to bad outcomes). There are also worlds where the problem is harder but more is invested & still ends up being enough. Not sure how much weight goes there, somewhat less.
Thanks for answering! The only reason AI is currently controllable is that it is weaker than us. All the GPT-4 jailbreaks show how high the uncontollability potential is, so I don’t think a phase-transition is necessary as we still far from AI being controllable in the first place.
It cannot both be controllable because it’s weak and also uncontrollabile.
That said, I expect more advanced techniques will be needed for more advanced AI; I just think control techniques probably keep up without sudden changes in control requirements.
Also LLMs are more controllable than weaker older designs (compare GPT4 vs Tay).
Yes. This is no comfort for me in terms of p(doom|AGI). There will be sudden changes in control requirements, judging by the big leaps of capability between GPT generations.
More controllable is one thing, but it doesn’t really matter much for reducing x-risk when the numbers being talked about are “29%”.
People will continue to prefer controllable to uncontrollable AI and continue to make at least a commonsense level of investment in controllability; that is, they invest as much as naively warranted by recent experience and short term expectations, which is less than warranted by a sophisticated assessment of uncertainty about misalignment, though the two may converge as “recent experience” involves more and more capable AIs. I think this minimal level of investment in control is very likely (99%+).
Next, the proposed sudden/surprising phase transition that breaks controllability properties never materialises so that commonsense investment turns out to be enough for an OK outcome. I think this is about 65%.
Next, AI-enhanced human politics also manages to generate an OK outcome. About 70%.
That’s 45%, but the bar is perhaps higher than you have in mind (I’m also counting non misalignment paths to bad outcomes). There are also worlds where the problem is harder but more is invested & still ends up being enough. Not sure how much weight goes there, somewhat less.
Don’t take these probabilities too seriously.
Thanks for answering! The only reason AI is currently controllable is that it is weaker than us. All the GPT-4 jailbreaks show how high the uncontollability potential is, so I don’t think a phase-transition is necessary as we still far from AI being controllable in the first place.
It cannot both be controllable because it’s weak and also uncontrollabile.
That said, I expect more advanced techniques will be needed for more advanced AI; I just think control techniques probably keep up without sudden changes in control requirements.
Also LLMs are more controllable than weaker older designs (compare GPT4 vs Tay).
Yes. This is no comfort for me in terms of p(doom|AGI). There will be sudden changes in control requirements, judging by the big leaps of capability between GPT generations.
More controllable is one thing, but it doesn’t really matter much for reducing x-risk when the numbers being talked about are “29%”.
That’s what I meant by “phase transition”