Thanks for the post! I am generally pretty worried that I and many people I know are all deluding ourselves about AI safety—it has a lot of red flags from the outside (although these are lessening as more experts come onboard, more progress is made in AI capabilities, and more concrete work is done on safety). I think it’s more likely than not we’ve got things completely wrong, but that it’s still worth working on. If that’s not the case, I’d like to know!
I like your points about language. I think there’s a closely related problem where it’s very hard to talk or think about anything that’s between human level at some task and omnipotent. Once you try to imagine something that can do things humans can’t, there’s no way to argue that the system wouldn’t be able to do something. There is always a retort that just because you, a human, think it’s impossible, doesn’t mean a more intelligent system couldn’t achieve it.
I also think Iterated Distillation and Amplification is a good example of a discussion of AI safety and potential mitigation strategies that’s couched in ideas of training distributions and gradient descent rather than desires and omnipotence.
Re the sense of meaning point, I don’t think that’s been my personal experience—I switched into CS from biology partly because of concern about x-risk, and know various other people who switched fields from physics, music, maths and medicine. As far as I could tell, the arguments for AI safety still mostly hold up now I know more about relevant fields, and I don’t think I’ve noticed egregious errors in major papers. I’ve definitely noticed some people who advocate for the importance of AI safety making mistakes and being confused about CS/ML fundamentals, but I don’t think I’ve seen this from serious AI safety researchers.
I’d be curious if you have an explanation for why your numbers are so far away from expert estimates? I don’t think that these expert surveys are a reliable source of truth, just a good ballpark for what sort of orders of magnitude we should be considering.
Thanks for the post! I am generally pretty worried that I and many people I know are all deluding ourselves about AI safety—it has a lot of red flags from the outside (although these are lessening as more experts come onboard, more progress is made in AI capabilities, and more concrete work is done on safety). I think it’s more likely than not we’ve got things completely wrong, but that it’s still worth working on. If that’s not the case, I’d like to know!
I like your points about language. I think there’s a closely related problem where it’s very hard to talk or think about anything that’s between human level at some task and omnipotent. Once you try to imagine something that can do things humans can’t, there’s no way to argue that the system wouldn’t be able to do something. There is always a retort that just because you, a human, think it’s impossible, doesn’t mean a more intelligent system couldn’t achieve it.
On the other hand, I think there are some good examples of couching safety concerns in non-anthropomorphic language. I like Dr Krakovna’s list of specification gaming examples: https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/
I also think Iterated Distillation and Amplification is a good example of a discussion of AI safety and potential mitigation strategies that’s couched in ideas of training distributions and gradient descent rather than desires and omnipotence.
Re the sense of meaning point, I don’t think that’s been my personal experience—I switched into CS from biology partly because of concern about x-risk, and know various other people who switched fields from physics, music, maths and medicine. As far as I could tell, the arguments for AI safety still mostly hold up now I know more about relevant fields, and I don’t think I’ve noticed egregious errors in major papers. I’ve definitely noticed some people who advocate for the importance of AI safety making mistakes and being confused about CS/ML fundamentals, but I don’t think I’ve seen this from serious AI safety researchers.
Re anchoring, this seems like a very strong claim. I think a sensible baseline to take here would be expert surveys, which usually put several percent probability on HLMI being catastrophically bad. (e.g. https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/#Chance_that_the_intelligence_explosion_argument_is_about_right)
I’d be curious if you have an explanation for why your numbers are so far away from expert estimates? I don’t think that these expert surveys are a reliable source of truth, just a good ballpark for what sort of orders of magnitude we should be considering.