The “language” section is the strongest IMO. But it feels like “self-driving” and “pre-driven” cars probably exist on some kind of continuum. How well do the system’s classification algorithms generalize? To what degree does the system solve the “distributional shift” problem and tell a human operator to take control in circumstances that the car isn’t prepared for? (You call these circumstances “unforeseen”, but what about a car that attempts to foresee likely situations it doesn’t know what to do in and ask a human for input in advance?) What experiment would let me determine whether a particular car is self-driving or pre-driven? What falsifiable predictions, if any, are you making about the future of self-driving cars?
I was confused by this sentence: “The second pattern is superior by wide margin when it comes to present-day software”.
I think leaky abstractions are a big problem in discussions of AI risk. You’re doubtless familiar with the process by which you translate a vague idea in your head into computer code. I think too many AI safety discussions are happening at the “vague idea” level, and more discussions should be happening at the code level or the “English that’s precise enough to translate into code” level, which seems like what you’re grasping at here. I think if you spent more time working on your ontology and the clarity of your thought, the language section could be really strong.
(Any post which argues the thesis “AI safety is easily solvable” is both a post that argues for de-prioritizing AI safety and a post that is, in a sense, attempting to solve AI safety. I think posts like these are valuable; “AI safety has this specific easy solution” isn’t as within the Overton window of the community devoted to working on AI safety as I would like it to be. Even if the best solution ends up being complex, I think in-depth discussion of why easy solutions won’t work has been neglected.)
Re: the anchoring section, pretty sure it is well documented by psychologists that humans are overconfident in their probabilistic judgements. Even if humans tend to anchor on 50% probability and adjust from there, it seems this isn’t enough to counter our overconfidence bias. Regarding the “Discounting the future” section of your post, see the “Multiple-Stage Fallacy”. If a superintelligent FAI gets created, it can likely make humanity’s extinction probability almost arbitrarily low through sufficient paranoia. Regarding AI accidents going “really really wrong”, see the instrumentalconvergence thesis. And AI safety work could be helpful even if countermeasures aren’t implemented universally, through creation of a friendly singleton.
I’m not making any predictions about future cars in the language section. “Self-driving cars” and “pre-driven cars” are the exact same things. I think I’m grasping at a point closer to Clarke’s third law, which also doesn’t give any obvious falsifiable predictions. My only prediction is that thinking about “self-driving cars” leads to more wrong predictions than thinking about “pre-driven cars”.
I changed the sentence you mention to “If you want to understand present-day algorithms, the “pre-driven car” model of thinking works a lot better than the “self-driving car” model of thinking. The present and past are the only tools we have to think about the future, so I expect the “pre-driven car” model to make more accurate predictions.” I hope this is clearer.
Your remark on “English that’s precise enough to translate into code” is close, but not exactly what I meant. I think that it is a hopeless endeavour to aim for such precise language in these discussions at this point in time, because I estimate that it would take a ludicrous amount of additional intellectual labour to reach that level of rigour. It’s too high of a target. I think the correct target is summarised in the first sentence: “All sentences are wrong, but some are useful.”
I think that I literally disagree with every sentence in your last paragraph on multiple levels. I’ve read both pages you linked a couple months ago and I didn’t find them at all convincing. I’m sorry to give such a useless response to this part of your message. Mounting a proper answer would take more time and effort than I have to spare in the foreseeable future. I might post some scraps of arguments on my blog soonish, but those posts won’t be well-written and I don’t expect anyone to really read those.
I changed the sentence you mention to “If you want to understand present-day algorithms, the “pre-driven car” model of thinking works a lot better than the “self-driving car” model of thinking. The present and past are the only tools we have to think about the future, so I expect the “pre-driven car” model to make more accurate predictions.” I hope this is clearer.
That is clearer, thanks!
I think that it is a hopeless endeavour to aim for such precise language in these discussions at this point in time, because I estimate that it would take a ludicrous amount of additional intellectual labour to reach that level of rigour. It’s too high of a target.
Well, it’s already possible to write code that exhibits some of the failure modes AI pessimists are worried about. If discussions about AI safety switched from trading sentences to trading toy AI programs, which operate on gridworlds and such, I suspect the clarity of discourse would improve.
I might post some scraps of arguments on my blog soonish, but those posts won’t be well-written and I don’t expect anyone to really read those.
Some thoughts:
The “language” section is the strongest IMO. But it feels like “self-driving” and “pre-driven” cars probably exist on some kind of continuum. How well do the system’s classification algorithms generalize? To what degree does the system solve the “distributional shift” problem and tell a human operator to take control in circumstances that the car isn’t prepared for? (You call these circumstances “unforeseen”, but what about a car that attempts to foresee likely situations it doesn’t know what to do in and ask a human for input in advance?) What experiment would let me determine whether a particular car is self-driving or pre-driven? What falsifiable predictions, if any, are you making about the future of self-driving cars?
I was confused by this sentence: “The second pattern is superior by wide margin when it comes to present-day software”.
I think leaky abstractions are a big problem in discussions of AI risk. You’re doubtless familiar with the process by which you translate a vague idea in your head into computer code. I think too many AI safety discussions are happening at the “vague idea” level, and more discussions should be happening at the code level or the “English that’s precise enough to translate into code” level, which seems like what you’re grasping at here. I think if you spent more time working on your ontology and the clarity of your thought, the language section could be really strong.
(Any post which argues the thesis “AI safety is easily solvable” is both a post that argues for de-prioritizing AI safety and a post that is, in a sense, attempting to solve AI safety. I think posts like these are valuable; “AI safety has this specific easy solution” isn’t as within the Overton window of the community devoted to working on AI safety as I would like it to be. Even if the best solution ends up being complex, I think in-depth discussion of why easy solutions won’t work has been neglected.)
Re: the anchoring section, pretty sure it is well documented by psychologists that humans are overconfident in their probabilistic judgements. Even if humans tend to anchor on 50% probability and adjust from there, it seems this isn’t enough to counter our overconfidence bias. Regarding the “Discounting the future” section of your post, see the “Multiple-Stage Fallacy”. If a superintelligent FAI gets created, it can likely make humanity’s extinction probability almost arbitrarily low through sufficient paranoia. Regarding AI accidents going “really really wrong”, see the instrumental convergence thesis. And AI safety work could be helpful even if countermeasures aren’t implemented universally, through creation of a friendly singleton.
Thank you for your response and helpful feedback.
I’m not making any predictions about future cars in the language section. “Self-driving cars” and “pre-driven cars” are the exact same things. I think I’m grasping at a point closer to Clarke’s third law, which also doesn’t give any obvious falsifiable predictions. My only prediction is that thinking about “self-driving cars” leads to more wrong predictions than thinking about “pre-driven cars”.
I changed the sentence you mention to “If you want to understand present-day algorithms, the “pre-driven car” model of thinking works a lot better than the “self-driving car” model of thinking. The present and past are the only tools we have to think about the future, so I expect the “pre-driven car” model to make more accurate predictions.” I hope this is clearer.
Your remark on “English that’s precise enough to translate into code” is close, but not exactly what I meant. I think that it is a hopeless endeavour to aim for such precise language in these discussions at this point in time, because I estimate that it would take a ludicrous amount of additional intellectual labour to reach that level of rigour. It’s too high of a target. I think the correct target is summarised in the first sentence: “All sentences are wrong, but some are useful.”
I think that I literally disagree with every sentence in your last paragraph on multiple levels. I’ve read both pages you linked a couple months ago and I didn’t find them at all convincing. I’m sorry to give such a useless response to this part of your message. Mounting a proper answer would take more time and effort than I have to spare in the foreseeable future. I might post some scraps of arguments on my blog soonish, but those posts won’t be well-written and I don’t expect anyone to really read those.
That is clearer, thanks!
Well, it’s already possible to write code that exhibits some of the failure modes AI pessimists are worried about. If discussions about AI safety switched from trading sentences to trading toy AI programs, which operate on gridworlds and such, I suspect the clarity of discourse would improve.
Cool, let me know!