My argument structure within these interviews was basically to ask them these three questions in order, then respond from there. I chose the questions initially, but the details of the spiels were added to as I talked to researchers and started trying to respond to their comments before they made them.
1. “When do you think we’ll get AGI / capable / generalizable AI / have the cognitive capacities to have a CEO AI if we do?”
Example dialogue: “All right, now I’m going to give a spiel. So, people talk about the promise of AI, which can mean many things, but one of them is getting very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs so you could have a CEO AI or a scientist AI, etcetera. And I usually think about this in the frame of the 2012: we have the deep learning revolution, we’ve got AlexNet, GPUs. 10 years later, here we are, and we’ve got systems like GPT-3 which have kind of weirdly emergent capabilities. They can do some text generation and some language translation and some code and some math. And one could imagine that if we continue pouring in all the human investment that we’re pouring into this like money, competition between nations, human talent, so much talent and training all the young people up, and if we continue to have algorithmic improvements at the rate we’ve seen and continue to have hardware improvements, so maybe we get optical computing or quantum computing, then one could imagine that eventually this scales to more of quite general systems, or maybe we hit a limit and we have to do a paradigm shift in order to get to the highly capable AI stage. Regardless of how we get there, my question is, do you think this will ever happen, and if so when?”
2. “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”
Example dialogue: “Alright, so these next questions are about these highly intelligent systems. So imagine we have a CEO AI, and I’m like, “Alright, CEO AI, I wish for you to maximize profit, and try not to exploit people, and don’t run out of money, and try to avoid side effects.” And this might be problematic, because currently we’re finding it technically challenging to translate human values preferences and intentions into mathematical formulations that can be optimized by systems, and this might continue to be a problem in the future. So what do you think of the argument “Highly intelligent systems will fail to optimize exactly what their designers intended them to and this is dangerous”?
3. “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”
Example dialogue: “Alright, next question is, so we have a CEO AI and it’s like optimizing for whatever I told it to, and it notices that at some point some of its plans are failing and it’s like, “Well, hmm, I noticed my plans are failing because I’m getting shut down. How about I make sure I don’t get shut down? So if my loss function is something that needs human approval and then the humans want a one-page memo, then I can just give them a memo that doesn’t have all the information, and that way I’m going to be better able to achieve my goal.” So not positing that the AI has a survival function in it, but as an instrumental incentive to being an agent that is optimizing for goals that are maybe not perfectly aligned, it would develop these instrumental incentives. So what do you think of the argument, “Highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals and this is dangerous”?”
This isn’t particularly helpful since it’s not sorted, but some transcripts with ML researchers: https://www.lesswrong.com/posts/LfHWhcfK92qh2nwku/transcripts-of-interviews-with-ai-researchers
My argument structure within these interviews was basically to ask them these three questions in order, then respond from there. I chose the questions initially, but the details of the spiels were added to as I talked to researchers and started trying to respond to their comments before they made them.
1. “When do you think we’ll get AGI / capable / generalizable AI / have the cognitive capacities to have a CEO AI if we do?”
Example dialogue: “All right, now I’m going to give a spiel. So, people talk about the promise of AI, which can mean many things, but one of them is getting very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs so you could have a CEO AI or a scientist AI, etcetera. And I usually think about this in the frame of the 2012: we have the deep learning revolution, we’ve got AlexNet, GPUs. 10 years later, here we are, and we’ve got systems like GPT-3 which have kind of weirdly emergent capabilities. They can do some text generation and some language translation and some code and some math. And one could imagine that if we continue pouring in all the human investment that we’re pouring into this like money, competition between nations, human talent, so much talent and training all the young people up, and if we continue to have algorithmic improvements at the rate we’ve seen and continue to have hardware improvements, so maybe we get optical computing or quantum computing, then one could imagine that eventually this scales to more of quite general systems, or maybe we hit a limit and we have to do a paradigm shift in order to get to the highly capable AI stage. Regardless of how we get there, my question is, do you think this will ever happen, and if so when?”
2. “What do you think of the argument ‘highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous’?”
Example dialogue: “Alright, so these next questions are about these highly intelligent systems. So imagine we have a CEO AI, and I’m like, “Alright, CEO AI, I wish for you to maximize profit, and try not to exploit people, and don’t run out of money, and try to avoid side effects.” And this might be problematic, because currently we’re finding it technically challenging to translate human values preferences and intentions into mathematical formulations that can be optimized by systems, and this might continue to be a problem in the future. So what do you think of the argument “Highly intelligent systems will fail to optimize exactly what their designers intended them to and this is dangerous”?
3. “What do you think about the argument: ‘highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous’?”
Example dialogue: “Alright, next question is, so we have a CEO AI and it’s like optimizing for whatever I told it to, and it notices that at some point some of its plans are failing and it’s like, “Well, hmm, I noticed my plans are failing because I’m getting shut down. How about I make sure I don’t get shut down? So if my loss function is something that needs human approval and then the humans want a one-page memo, then I can just give them a memo that doesn’t have all the information, and that way I’m going to be better able to achieve my goal.” So not positing that the AI has a survival function in it, but as an instrumental incentive to being an agent that is optimizing for goals that are maybe not perfectly aligned, it would develop these instrumental incentives. So what do you think of the argument, “Highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals and this is dangerous”?”