Thank you Quintin, this was very helpful for me as a non-ML person to understand the other side of Eliezer’s arguments. As your post is quite dense and it took me a while to work through it, I summarised it for myself. I occasionally had to check the context of the original interview (transcript here) to fully parse the arguments made. I thought the summary might also be helpful to share with others (and let me know if I got anything wrong!):
Eliezer thinks current ML approaches won’t scale to AGI, though due to money influx an approach might be found. Quintin is more optimistic that current ML approaches can scale to AGI. As current alignment techniques are focused on current ML approaches, they won’t help if we have something different that gets us to ML. Current ML capability improvements usually integrate well with previously used alignment approaches which suggests they will keep doing so.
Eliezer is concerned that AI will show more ‘truly general’ intelligence. Humans are not equally general at different tasks as evolution made them specialize on what was important in the ancestral environment and might therefore outclass humans in other tasks. Quintin points out that the learning process humans have been given by evolution is pretty general (albeit biased to what was useful in the ancestral environment), just as the learning process current ML paradigms use is pretty general. How different ML systems actually differ isn’t by using different paradigms but by being trained on different data. Therefore he doesn’t expect such a pattern. He also points out that scale is what makes humans smarter, just as scale is a big driver of how good ML systems are. Humans are not any more constrained by their architecture than ML systems; both can modify themselves to an extent.
Eliezer considers a superintelligence to be what can beat all humans at all tasks. Quintin finds this to be a too high bar as you can have transformative systems which will have deficits.
Eliezer points out that mindspace is large and humans occupy a tiny corner, as such we should expect many different potential AI designs which poses danger. Quintin thinks we should expect AI systems to occupy only a small corner in mindspace, similar to humans. An intuition pump for this is that most real-life data in higher dimensions actually only occupies a small part in those. Again, so far in practice ML systems are using pretty similar processes to humans. They will also be trained on data similar to the data humans are “trained on” as ML systems are mostly trained on human-written text which make them more similar to humans as well.
Eliezer thinks it’s not only hard to align AIs on human values, but on even much more simple goals like duplicating a strawberry. Quintin again thinks this isn’t actually all that hard in principle, but requires starting out with an AI with more general goals which would then be modified to aim for strawberry duplication. He points out that human value formation follows more general and multiple goals than something as single minded as strawberry duplication, so we should allow ML systems to follow such a process of value formation. This will also be a lot easier as ML systems can follow actual examples in the data of such value formation processes and there is a lot more data on human following complex goals than single minded ones.
Eliezer thinks that we won’t be able to align AIs by merely using gradient descent. This is because the primary example of using gradient descent to align a system is evolution and we know that evolution failed to align humans to pursue inclusive genetic fitness in the modern environment. In the ancestral environment, e.g. desiring sexuality was sufficient, but now humans have figured out contraception. People do not desire to maximise their inclusive genetic fitness for its own sake. Quintin thinks this is because ancestral humans didn’t have a concept of inclusive genetic fitness, therefore evolution couldn’t optimise its rewards for improving inclusive genetic fitness directly. Modern AI systems however will have an understanding of human values as they are directly exposed to them during training.
Eliezer makes the same point about humans desiring ice cream. Quintin counters again that there was no ice cream in the ancestral environment, therefore evolution couldn’t punish humans for desiring ice cream. Modern ML researchers however can punish ML systems for doing things they aren’t supposed to, i.e. which are misaligned with human values.
Eliezer thinks aligning AI with gradient descent will be even harder than for evolution to align humans with natural selection as gradient descent is blunter and less simple. Quintin isn’t convinced by this and also points out that evolution was optimising over the learning process via the human genome which will be a lot messier due to its indirectness while ML researchers are training the whole ML system directly. Therefore a comparison doesn’t make much sense.
Eliezer is worried about ML systems trained to predict e.g. human preferences will try to look for opportunities to make predictions easier. Quintin thinks ML systems aren’t optimising to do well at long-term prediction by making it easier to predict things, predicting things is something that ML systems do, not what they want to do. He compares this to humans who also don’t explicitly prioritise to e.g. see very well in the long term.
Eliezer considers it important to employ a ‘security mindset’, a term from computer security, for AI alignment. Ordinary paranoia is insufficient for keeping a system secure, some deeper skills are required. Quintin thinks ML is unlike computer security as most fields are unlike computer security and we don’t use a security mindset for most fields including childrearing which seems like an important analogue to training ML systems to him. This is because ML systems during the training process don’t have adversaries to the same extent as computer systems. They might have adversarial users during deployment, but ML systems themselves aren’t keen to be jailbroken. He also uses the opportunity to point out that Eliezer often compares AI to other fields like rocket science, but ML often works in a pretty different way to other fields, e.g. swapping individual components of ML systems often doesn’t change their functionality while changing rocket components would make rockets fail.
Eliezer is concerned that AI optimists haven’t encountered real difficulties yet and that’s why they’re optimistic, the same way that the original AI conference in the 50s thought problems could be solved in two months which took 70 years to solve. Quintin counters that there were plenty of ML problems which were easier than expected and most notably easier than Eliezer and AI field veterans who have been working on AI since the early days predicted. Both Eliezer and AI veterans didn’t expect neural networks to work as well as they do today. He mentions that Eliezer also stated in a different venue that he didn’t believe that general adversarial networks worked right away, yet they did. He expects the hardness of ML research to predict the hardness of ML alignment research and thinks that Eliezer seems to be poorly calibrated on the former so he will also be on the latter.
Eliezer expects that for AI alignment to go well he will have to be wrong about aspects of AI alignment, but he expects that where he is mistaken about AI alignment this will make AI alignment even harder than he already thinks it is, as it would be really surprising when a new engineering project is easier than you think it is. Quintin strongly disagrees with this framing, because if Eliezer was wrong about how hard alignment is he should expect alignment to be easier than he previously thought.
Eliezer points to how fast AI progress was in the game of Go as a reason for concern that superintelligent AI will suddenly kill humans without killing a somewhat smaller amount of humans in advance. Quintin thinks that Go is disanalogous to a more general AI system as progress in more general systems is usually slower and smoother. Go also had a single objective function AI could use to score itself which will not be true for many other tasks which will require human input slowing down improvements.
Eliezer is even more concerned about AI systems which can self-improve and get smarter during inference (deployment) getting us to fast take off. Quintin counters that we basically already have that. ChatGPT could train on user input; but it’s not programmed to as it wouldn’t be practical. ML training processes could also be changed so they could be reasonably said to self-improve during inference as inference is also a part of training.
Eliezer thinks that people who are capable of breaking AI systems show more AI expertise than people who are merely creating functional AI systems, which is how it works in computer security. This is related to the security mindset claim above. Maybe they’d be able to find ways to improve AI alignment. Quintin thinks the people who break things in computer security are only experts there because in computer security there are clear signs whether the system is broken or not, which isn’t true for AI alignment. He discusses an example where Eliezer thinks a ML system is easily breakable as the ML will try to maximise the reward function, but Quintin thinks that simply maximizing the reward function isn’t how realistic ML systems work. He discusses another example where he thinks ML systems are not easily broken.
Overall my take:
Eliezer is concerned about AI that doesn’t look like modern ML systems. Quintin argues modern ML systems don’t show the properties that Eliezer is concerned about more advanced AI showing. Quintin thinks that more advanced ML systems can already be real AGI. What I am confused about is why Eliezer is then so worried about the current state of AI if the thing he is worried about is so much more advanced/general in mindspace, or more specifically why does he consider current ML systems to be evidence that we are getting closer to the kind of AI he is worried about.
Thank you Quintin, this was very helpful for me as a non-ML person to understand the other side of Eliezer’s arguments. As your post is quite dense and it took me a while to work through it, I summarised it for myself. I occasionally had to check the context of the original interview (transcript here) to fully parse the arguments made. I thought the summary might also be helpful to share with others (and let me know if I got anything wrong!):
Eliezer thinks current ML approaches won’t scale to AGI, though due to money influx an approach might be found. Quintin is more optimistic that current ML approaches can scale to AGI. As current alignment techniques are focused on current ML approaches, they won’t help if we have something different that gets us to ML. Current ML capability improvements usually integrate well with previously used alignment approaches which suggests they will keep doing so.
Eliezer is concerned that AI will show more ‘truly general’ intelligence. Humans are not equally general at different tasks as evolution made them specialize on what was important in the ancestral environment and might therefore outclass humans in other tasks. Quintin points out that the learning process humans have been given by evolution is pretty general (albeit biased to what was useful in the ancestral environment), just as the learning process current ML paradigms use is pretty general. How different ML systems actually differ isn’t by using different paradigms but by being trained on different data. Therefore he doesn’t expect such a pattern. He also points out that scale is what makes humans smarter, just as scale is a big driver of how good ML systems are. Humans are not any more constrained by their architecture than ML systems; both can modify themselves to an extent.
Eliezer considers a superintelligence to be what can beat all humans at all tasks. Quintin finds this to be a too high bar as you can have transformative systems which will have deficits.
Eliezer points out that mindspace is large and humans occupy a tiny corner, as such we should expect many different potential AI designs which poses danger. Quintin thinks we should expect AI systems to occupy only a small corner in mindspace, similar to humans. An intuition pump for this is that most real-life data in higher dimensions actually only occupies a small part in those. Again, so far in practice ML systems are using pretty similar processes to humans. They will also be trained on data similar to the data humans are “trained on” as ML systems are mostly trained on human-written text which make them more similar to humans as well.
Eliezer thinks it’s not only hard to align AIs on human values, but on even much more simple goals like duplicating a strawberry. Quintin again thinks this isn’t actually all that hard in principle, but requires starting out with an AI with more general goals which would then be modified to aim for strawberry duplication. He points out that human value formation follows more general and multiple goals than something as single minded as strawberry duplication, so we should allow ML systems to follow such a process of value formation. This will also be a lot easier as ML systems can follow actual examples in the data of such value formation processes and there is a lot more data on human following complex goals than single minded ones.
Eliezer thinks that we won’t be able to align AIs by merely using gradient descent. This is because the primary example of using gradient descent to align a system is evolution and we know that evolution failed to align humans to pursue inclusive genetic fitness in the modern environment. In the ancestral environment, e.g. desiring sexuality was sufficient, but now humans have figured out contraception. People do not desire to maximise their inclusive genetic fitness for its own sake. Quintin thinks this is because ancestral humans didn’t have a concept of inclusive genetic fitness, therefore evolution couldn’t optimise its rewards for improving inclusive genetic fitness directly. Modern AI systems however will have an understanding of human values as they are directly exposed to them during training.
Eliezer makes the same point about humans desiring ice cream. Quintin counters again that there was no ice cream in the ancestral environment, therefore evolution couldn’t punish humans for desiring ice cream. Modern ML researchers however can punish ML systems for doing things they aren’t supposed to, i.e. which are misaligned with human values.
Eliezer thinks aligning AI with gradient descent will be even harder than for evolution to align humans with natural selection as gradient descent is blunter and less simple. Quintin isn’t convinced by this and also points out that evolution was optimising over the learning process via the human genome which will be a lot messier due to its indirectness while ML researchers are training the whole ML system directly. Therefore a comparison doesn’t make much sense.
Eliezer is worried about ML systems trained to predict e.g. human preferences will try to look for opportunities to make predictions easier. Quintin thinks ML systems aren’t optimising to do well at long-term prediction by making it easier to predict things, predicting things is something that ML systems do, not what they want to do. He compares this to humans who also don’t explicitly prioritise to e.g. see very well in the long term.
Eliezer considers it important to employ a ‘security mindset’, a term from computer security, for AI alignment. Ordinary paranoia is insufficient for keeping a system secure, some deeper skills are required. Quintin thinks ML is unlike computer security as most fields are unlike computer security and we don’t use a security mindset for most fields including childrearing which seems like an important analogue to training ML systems to him. This is because ML systems during the training process don’t have adversaries to the same extent as computer systems. They might have adversarial users during deployment, but ML systems themselves aren’t keen to be jailbroken. He also uses the opportunity to point out that Eliezer often compares AI to other fields like rocket science, but ML often works in a pretty different way to other fields, e.g. swapping individual components of ML systems often doesn’t change their functionality while changing rocket components would make rockets fail.
Eliezer is concerned that AI optimists haven’t encountered real difficulties yet and that’s why they’re optimistic, the same way that the original AI conference in the 50s thought problems could be solved in two months which took 70 years to solve. Quintin counters that there were plenty of ML problems which were easier than expected and most notably easier than Eliezer and AI field veterans who have been working on AI since the early days predicted. Both Eliezer and AI veterans didn’t expect neural networks to work as well as they do today. He mentions that Eliezer also stated in a different venue that he didn’t believe that general adversarial networks worked right away, yet they did. He expects the hardness of ML research to predict the hardness of ML alignment research and thinks that Eliezer seems to be poorly calibrated on the former so he will also be on the latter.
Eliezer expects that for AI alignment to go well he will have to be wrong about aspects of AI alignment, but he expects that where he is mistaken about AI alignment this will make AI alignment even harder than he already thinks it is, as it would be really surprising when a new engineering project is easier than you think it is. Quintin strongly disagrees with this framing, because if Eliezer was wrong about how hard alignment is he should expect alignment to be easier than he previously thought.
Eliezer points to how fast AI progress was in the game of Go as a reason for concern that superintelligent AI will suddenly kill humans without killing a somewhat smaller amount of humans in advance. Quintin thinks that Go is disanalogous to a more general AI system as progress in more general systems is usually slower and smoother. Go also had a single objective function AI could use to score itself which will not be true for many other tasks which will require human input slowing down improvements.
Eliezer is even more concerned about AI systems which can self-improve and get smarter during inference (deployment) getting us to fast take off. Quintin counters that we basically already have that. ChatGPT could train on user input; but it’s not programmed to as it wouldn’t be practical. ML training processes could also be changed so they could be reasonably said to self-improve during inference as inference is also a part of training.
Eliezer thinks that people who are capable of breaking AI systems show more AI expertise than people who are merely creating functional AI systems, which is how it works in computer security. This is related to the security mindset claim above. Maybe they’d be able to find ways to improve AI alignment. Quintin thinks the people who break things in computer security are only experts there because in computer security there are clear signs whether the system is broken or not, which isn’t true for AI alignment. He discusses an example where Eliezer thinks a ML system is easily breakable as the ML will try to maximise the reward function, but Quintin thinks that simply maximizing the reward function isn’t how realistic ML systems work. He discusses another example where he thinks ML systems are not easily broken.
Overall my take: Eliezer is concerned about AI that doesn’t look like modern ML systems. Quintin argues modern ML systems don’t show the properties that Eliezer is concerned about more advanced AI showing. Quintin thinks that more advanced ML systems can already be real AGI. What I am confused about is why Eliezer is then so worried about the current state of AI if the thing he is worried about is so much more advanced/general in mindspace, or more specifically why does he consider current ML systems to be evidence that we are getting closer to the kind of AI he is worried about.