This post reads a little like someone pushing at an open door to me. So you write that FTX should ask themselves whether humanity should create AGI. The feeling I get from that is that you think FTX assume that AGI will be good. But the reason theyâve announced the contest is that they think the development of AGI carries a serious risk of global catastrophe.
Two of the propositions focus on when AGI will arrive. This makes it seem like AGI is a natural event, like an asteroid strike or earthquake. But AGI is something we will create, if we create it.
There are immense (economical, other) incentives to build AGI, so while humanity can simply choose not to build AGI, FTX (or any other single actor) is not in a position to choose not to build AGI. I expect FTX is open to considering interventions aimed at making that happen (not least as thereâs been some discussion on whether to try to slow down AI progress recently). But whether those work at all is not obvious.
How would know we had successful AGI if/âwhen we created it? It would nothing like human intelligence, which is shaped not only by information processing, but by embodiment and the emotions central to human existence. ⊠So AGI cannot be like human intelligence.
As far as Iâm aware, writers on AGI risk have been clear from the beginning that thereâs no reason to expect an AGI to take the same form as a human mind (unless itâs the result of whole-brain emulation). E.g. Bostrom roughly defines superintelligence as âany intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interestâ (Superintelligence ch. 2). Thereâs no reason to think a highly capable but alien-to-us intelligence poses less of a threat than one thatâs similar to us.
AGI might be helpful for thinking about complex human problems, but it is doubtful that it would be better than task specific AI. Task specific AI has already proven successful at useful, difficult jobs (such as cancer screening for tissue samples and hypothesizing protein folding structures). Part of what has enabled such successful applications is the task specificity. That allows for clear success/âfail training and ongoing evaluation measures.
There are advantages to generality too, like reducing the need for task-specific data. Thereâs at least one example of a general intelligence being extremely successful, and that is our own, as evidenced by the last few billionsâ years of evolutionary history. An example of fairly successful general-ish AI is GPT-3, which was just trained on next-word prediction but ended up being capable of everything from translation and spell-checking to creative writing and chess-playing.
Let me then be more specific. Take Bostromâs definition. What are all the cognitive tasks in all the domains of human interest? I think this is super vague and ill-defined. We can train up AI on specific tasks we want to accomplish (and have ways of training AI to do so, because success or failure can be made clear). But there is no âall cognitive tasks in the domains of human interestâ training because we have no such list, and for some crucial tasks (e.g. ethics) we cannot even define success clearly.
GPT-3 is impressive for writing, other AI is impressive for image production, but such systems also produce amusingly wrong outputs at times. What is more impressive is useful AI for tasks like the protein folding success.
Our success has been evolutionarily framed (survival, spread of species), and tested against a punishing world. But AGI will not be embodied. So what counts as success or failure for non-task specific AI? Back to Bostromâs definition, we have no such set of cognitive tasks defined or delineated.
So what are the incentives for such a creation? You say they are immense. I want to press on the idea that they are substantial.
Thanks for responding! I think I now understand better what youâre getting at, though Iâm still a bit unsure about how much work each of these beliefs are doing:
We shouldnât build AGI.
We canât build AGI (because thereâs no coherent reward function we can give it, since many of the tasks itâd have to do have fuzzy success criteria).
We wonât build AGI (because the incentives mean narrow AI will be far more useful).
Could you clarify whether you agree with these and how important you think each point is? Or is it something else entirely thatâs key?
I think we could try to build AGI, but I am skeptical it could be anything useful or helpful (a broad alignment problem) because of vague or inapt success criteria, and because of the lack of embodiment of AGI (so it wonât get beat up on by the world generally or have emotional/âaffective learning). Because of these problems, I think we shouldnât try (1).
Further, I am trying this line of argument out to see if it will encourage (3) (not building AGI), because these concerns cast doubt on the value of AGI to us (and thus the incentives to build it).
This takes on additional potency if we embrace the shift to thinking about âshouldâ and not just âcanâ in scientific and technological development generally. So that brings us to the questions I think we should be asking, which is how to encourage a properly responsible approach to AI, rather than shifting credences on the Future Fundsâ propositions about.
I think we could try to build AGI, but I am skeptical it could be anything useful or helpful (a broad alignment problem) because of vague or inapt success criteria, and because of the lack of embodiment of AGI (so it wonât get beat up on by the world generally or have emotional/âaffective learning). Because of these problems, I think we shouldnât try (1).
Hmm, I guess I donât think lack of emotional/âaffective states is a problem for making useful AGIs. Obviously those are parts of how humans learn, but seems like a machine can learn with any reward functionâit just needs some way of mapping a world state to value.
Re success criteria, you could for example train an AI to improve a companyâs profit in a simulated environment. That task requires a broad set of capacities, including high-level ones like planning/âstrategising. If you do this for many things humans care about, youâll get a more general system, as with DeepMindâs GATO. But of course Iâm speculating.
Further, I am trying this line of argument out to see if it will encourage (3) (not building AGI), because these concerns cast doubt on the value of AGI to us (and thus the incentives to build it).
I suppose if you donât think thereâs any value for us in AGI, and if you donât think there are sufficient incentives for us to build it, thereâs no need to encourage not building it? Or is your concern more that weâre wasting energy and resources trying to build it, or even thinking about it?
This takes on additional potency if we embrace the shift to thinking about âshouldâ and not just âcanâ in scientific and technological development generally. So that brings us to the questions I think we should be asking, which is how to encourage a properly responsible approach to AI, rather than shifting credences on the Future Fundsâ propositions about.
The first propositionââConditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGIââseems directly linked to whether or not we should build AGI. If AGI carries a serious risk of catastrophe, we obviously shouldnât build it. So to me it looks like the Future Fund is already thinking about the âshouldâ question?
I can plausibly see such sensors for physical pain but not for emotional pain. Emotional pain is the far more potent teacher of what is valuable and what is not, what is important and what is not. Intelligence needs direction of this sort for learning.
So, can you build embodied AGI with emotional responses built inâthat last like emotions and so are suitable teachers like emotions? Building in empathy (both for happiness and suffering) and the pain of disapproval to AGI would be crucial.
This post reads a little like someone pushing at an open door to me. So you write that FTX should ask themselves whether humanity should create AGI. The feeling I get from that is that you think FTX assume that AGI will be good. But the reason theyâve announced the contest is that they think the development of AGI carries a serious risk of global catastrophe.
There are immense (economical, other) incentives to build AGI, so while humanity can simply choose not to build AGI, FTX (or any other single actor) is not in a position to choose not to build AGI. I expect FTX is open to considering interventions aimed at making that happen (not least as thereâs been some discussion on whether to try to slow down AI progress recently). But whether those work at all is not obvious.
As far as Iâm aware, writers on AGI risk have been clear from the beginning that thereâs no reason to expect an AGI to take the same form as a human mind (unless itâs the result of whole-brain emulation). E.g. Bostrom roughly defines superintelligence as âany intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interestâ (Superintelligence ch. 2). Thereâs no reason to think a highly capable but alien-to-us intelligence poses less of a threat than one thatâs similar to us.
There are advantages to generality too, like reducing the need for task-specific data. Thereâs at least one example of a general intelligence being extremely successful, and that is our own, as evidenced by the last few billionsâ years of evolutionary history. An example of fairly successful general-ish AI is GPT-3, which was just trained on next-word prediction but ended up being capable of everything from translation and spell-checking to creative writing and chess-playing.
Let me then be more specific. Take Bostromâs definition. What are all the cognitive tasks in all the domains of human interest? I think this is super vague and ill-defined. We can train up AI on specific tasks we want to accomplish (and have ways of training AI to do so, because success or failure can be made clear). But there is no âall cognitive tasks in the domains of human interestâ training because we have no such list, and for some crucial tasks (e.g. ethics) we cannot even define success clearly.
GPT-3 is impressive for writing, other AI is impressive for image production, but such systems also produce amusingly wrong outputs at times. What is more impressive is useful AI for tasks like the protein folding success.
Our success has been evolutionarily framed (survival, spread of species), and tested against a punishing world. But AGI will not be embodied. So what counts as success or failure for non-task specific AI? Back to Bostromâs definition, we have no such set of cognitive tasks defined or delineated.
So what are the incentives for such a creation? You say they are immense. I want to press on the idea that they are substantial.
Thanks for responding! I think I now understand better what youâre getting at, though Iâm still a bit unsure about how much work each of these beliefs are doing:
We shouldnât build AGI.
We canât build AGI (because thereâs no coherent reward function we can give it, since many of the tasks itâd have to do have fuzzy success criteria).
We wonât build AGI (because the incentives mean narrow AI will be far more useful).
Could you clarify whether you agree with these and how important you think each point is? Or is it something else entirely thatâs key?
I think we could try to build AGI, but I am skeptical it could be anything useful or helpful (a broad alignment problem) because of vague or inapt success criteria, and because of the lack of embodiment of AGI (so it wonât get beat up on by the world generally or have emotional/âaffective learning). Because of these problems, I think we shouldnât try (1).
Further, I am trying this line of argument out to see if it will encourage (3) (not building AGI), because these concerns cast doubt on the value of AGI to us (and thus the incentives to build it).
This takes on additional potency if we embrace the shift to thinking about âshouldâ and not just âcanâ in scientific and technological development generally. So that brings us to the questions I think we should be asking, which is how to encourage a properly responsible approach to AI, rather than shifting credences on the Future Fundsâ propositions about.
Does that make sense?
Hmm, I guess I donât think lack of emotional/âaffective states is a problem for making useful AGIs. Obviously those are parts of how humans learn, but seems like a machine can learn with any reward functionâit just needs some way of mapping a world state to value.
Re success criteria, you could for example train an AI to improve a companyâs profit in a simulated environment. That task requires a broad set of capacities, including high-level ones like planning/âstrategising. If you do this for many things humans care about, youâll get a more general system, as with DeepMindâs GATO. But of course Iâm speculating.
I suppose if you donât think thereâs any value for us in AGI, and if you donât think there are sufficient incentives for us to build it, thereâs no need to encourage not building it? Or is your concern more that weâre wasting energy and resources trying to build it, or even thinking about it?
The first propositionââConditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGIââseems directly linked to whether or not we should build AGI. If AGI carries a serious risk of catastrophe, we obviously shouldnât build it. So to me it looks like the Future Fund is already thinking about the âshouldâ question?
There are two ways to plausibly embody AGI.
as supervisors of dumb robot bodies, the AGI remotely controls a robot body, processing a portion or all of the robotâs sensor data.
as host of an AGI, the AGIâs hardware is resident in the robot body.
I can plausibly see such sensors for physical pain but not for emotional pain. Emotional pain is the far more potent teacher of what is valuable and what is not, what is important and what is not. Intelligence needs direction of this sort for learning.
So, can you build embodied AGI with emotional responses built inâthat last like emotions and so are suitable teachers like emotions? Building in empathy (both for happiness and suffering) and the pain of disapproval to AGI would be crucial.