AI safety is largely about ensuring that humanity can reap the benefits of AI in the long term. To effectively address the risks of AI, it’s useful to keep in mind what we haven’t yet figured out.
I am currently exploring the implications of our current situation and the best ways to contribute to the positive development of AI. I am eager to hear your perspective on the gaps we have not yet addressed. Here is my quick take on things we seem to not have figured out yet:
We have not figured out how to solve the alignment problem. We don’t know whether alignment is solvable in the first place, even though we hope so. It may not be solvable at all.
We don’t know the exact timelines (I define ‘timelines’ here as the moments when an AI system becomes capable of recursively self-improving). It might range from already having happened to 100 years or more.
We don’t know what takeoff will look like once we develop AGI.
We don’t know how likely it is that AI will become uncontrollable, and if it does become uncontrollable, how likely it is to cause human extinction.
We haven’t figured out the most effective ways to govern and regulate AI development and deployment, especially at an international level.
We don’t know how likely it is that rogue actors will use sophisticated open-source AI to cause large-scale harm to the world.
I think it is useful to call it “we have not figured x out” if there is no consensus on it. People in the community have very different probability estimates for each, all across the range.
Do you disagree with any of these points? And what are other points we might want to add to the list?
I agree that none of these seem figured out (both no broad consensus and also personally I am not hugely confident either way).
Some notes
We have not figured out how to solve the alignment problem
It seems useful to distinguish the problem of alignment from the problem of ensuring safety and usefulness from a given AI. Also, it seems worth distinguishing safety issues from wildly superhuman AI and the first AIs which are transformatively useful.
It seems plausible to me that you can adequately control and utilize transformatively useful (but not wildly superhuman) AIs even if these AIs are hugely misaligned (e.g. deceptive alignment). See here for a bit more discussion. By transformatively useful, I mean AIs capable of radically accelerating (e.g. 30x speed up) R&D on key topics like AI safety. It’s not clear that using these AIs to speed up cognitive work will suffice for solving the next problems, but it at least seems relevant.
We don’t know the exact timelines (I define ‘timelines’ here as the moments when an AI system becomes capable of recursively self-improving). It might range from already having happened to 100 years or more.
I think publically known AI is already capable of recursively self-improving via contributing to normal ML research; thus, there is just a quantitative question of how quickly. So, I would try to use a different operationalization of timelines. See here for more discussion.
(As far as “already happened”, I think It seems very unlikely that there are non-publically known AI systems which are much more capable than current publically known AI systems, but much more capable systems might be trained over the next year.)
Ryan, thank you for your thoughts! The distinctions you brought up are something I did not think about yet, so I am going to take a look at the articles you linked in your reply. If I have more to add to this point, I’ll add that. Lots of work ahead to figure out these important things. I hope we have enough time.
Would you consider adding your ideas for 2 minutes? - Creating an comprehensive overview of AI x-risk reduction strategies ------
Motivation: To identify the highest impact strategies for reducing the existential risk from AI, it’s important to know what options are available in the first place.
I’ve just started creating an overview and would love for you to take a moment to contribute and build on it with the rest of us!
AI safety is largely about ensuring that humanity can reap the benefits of AI in the long term. To effectively address the risks of AI, it’s useful to keep in mind what we haven’t yet figured out.
I am currently exploring the implications of our current situation and the best ways to contribute to the positive development of AI. I am eager to hear your perspective on the gaps we have not yet addressed. Here is my quick take on things we seem to not have figured out yet:
We have not figured out how to solve the alignment problem. We don’t know whether alignment is solvable in the first place, even though we hope so. It may not be solvable at all.
We don’t know the exact timelines (I define ‘timelines’ here as the moments when an AI system becomes capable of recursively self-improving). It might range from already having happened to 100 years or more.
We don’t know what takeoff will look like once we develop AGI.
We don’t know how likely it is that AI will become uncontrollable, and if it does become uncontrollable, how likely it is to cause human extinction.
We haven’t figured out the most effective ways to govern and regulate AI development and deployment, especially at an international level.
We don’t know how likely it is that rogue actors will use sophisticated open-source AI to cause large-scale harm to the world.
I think it is useful to call it “we have not figured x out” if there is no consensus on it. People in the community have very different probability estimates for each, all across the range.
Do you disagree with any of these points? And what are other points we might want to add to the list?
I hope to read your take!
I agree that none of these seem figured out (both no broad consensus and also personally I am not hugely confident either way).
Some notes
It seems useful to distinguish the problem of alignment from the problem of ensuring safety and usefulness from a given AI. Also, it seems worth distinguishing safety issues from wildly superhuman AI and the first AIs which are transformatively useful.
It seems plausible to me that you can adequately control and utilize transformatively useful (but not wildly superhuman) AIs even if these AIs are hugely misaligned (e.g. deceptive alignment). See here for a bit more discussion. By transformatively useful, I mean AIs capable of radically accelerating (e.g. 30x speed up) R&D on key topics like AI safety. It’s not clear that using these AIs to speed up cognitive work will suffice for solving the next problems, but it at least seems relevant.
I think publically known AI is already capable of recursively self-improving via contributing to normal ML research; thus, there is just a quantitative question of how quickly. So, I would try to use a different operationalization of timelines. See here for more discussion.
(As far as “already happened”, I think It seems very unlikely that there are non-publically known AI systems which are much more capable than current publically known AI systems, but much more capable systems might be trained over the next year.)
Ryan, thank you for your thoughts! The distinctions you brought up are something I did not think about yet, so I am going to take a look at the articles you linked in your reply. If I have more to add to this point, I’ll add that. Lots of work ahead to figure out these important things. I hope we have enough time.
Would you consider adding your ideas for 2 minutes? - Creating an comprehensive overview of AI x-risk reduction strategies
------
Motivation: To identify the highest impact strategies for reducing the existential risk from AI, it’s important to know what options are available in the first place.
I’ve just started creating an overview and would love for you to take a moment to contribute and build on it with the rest of us!
Here is the work page: https://workflowy.com/s/making-sense-of-ai-x/NR0a6o7H79CQpLYw
Some thoughts on how we collaborate:
Please don’t delete others’ bullet points; instead, use the comment feature to suggest changes or improvements.
If you’re interested in discussing this further, feel free to add your name and contact details here. I may organize a follow-up discussion.