I think the “real” orthogonality thesis is what you call the motte. I don’t think the orthogonality thesis by itself proves “alignment is hard”; rather you need additional arguments (things like Goodhart’s law, instrumental convergence, arguments about inner misalignment, etc.).
I don’t want to say that nobody has ever made the argument “orthogonality, therefore alignment is hard”—people say all kinds of things, especially non-experts—but it’s a wrong argument and I think you’re overstating how popular it is among experts.
Armstrong initially states that he’s arguing for the thesis that ‘high-intelligence agents can exist having more or less any final goals’ - ie theoretical possibility—but then adds that he will ‘be looking at proving the … still weaker thesis [that] the fact of being of high intelligence provides extremely little constraint on what final goals an agent could have’ - which I think Armstrong meant as ‘there are very few impossible pairings of high intelligence and motivation’, but which much more naturally reads to me as ‘high intelligence is almost equally as likely to be paired with any set of motivations as any other’.
I think the last part of this excerpt (“almost equally”) is unfair. I mean, maybe some readers are interpreting it that way, but if so, I claim that those readers don’t know what the word “constraint” means. Right?
I posted one poll asking ‘what the orthogonality thesis implies about [a relationship between] intelligence and terminal goals’, to which 14 of 16 respondents selected the option ‘there is no relationship or only an extremely weak relationship between intelligence and goals’, but someone pointed out that respondents might have interpreted ‘no relationship’ as ‘no strict logical implication from one to the other’. The other options hopefully gave context, but in a differently worded version of the poll 10 of 13 people picked options describing theoretical possibility.
I think the key reason that knowledgeable optimistic people are optimistic is the fact that humans will be trying to make aligned AGIs. But neither of the polls mention that. The statement “There is no statistical relationship between intelligence and goals” is very different from “An AGI created by human programmers will have a uniformly-randomly-selected goal”; I subscribe to (something like) the former (in the sense of selecting from “the space of all possible intelligent algorithms” or something) but I put much lower probability on (something like) the latter, despite being pessimistic about AGI doom. Human programmers are not uniformly-randomly sampling the space of all possible intelligent algorithms (I sure hope!)
To clarify, I make no claims about what experts think. I would be moderately surprised if more than a small minority of them pay any attention to the orthogonality thesis, presumably having their own nuanced views how AI development might pan out. My concern is with the non-experts who make up the supermajority of the EA community—who frequently decide whether to donate their money to AI research vs other causes, who are prioritising deeper dives, who in some cases decide whether to make grants, who are deciding whether to become experts, or who are building communities that informally or formally give careers advice to others, and who generally contribute to the picture of ‘what effective altruism is about’, both steering the culture and informing the broader public’s perception.
I think the last part of this excerpt (“almost equally”) is unfair. I mean, maybe some readers are interpreting it that way, but if so, I claim that those readers don’t know what the word “constraint” means. Right?
See above—it doesn’t really matter whether they don’t know what it means if it affects their decisions. I’m not accusing anyone of wrongdoing, just giving an explanation as to why it seems many EAs have come to believe the stronger versions of the thesis and trying to discourage them from holding it without decent evidence.
I subscribe to (something like) [no statistical relationship] (in the sense of selecting from “the space of all possible intelligent algorithms” or something)
I don’t know how to understand ‘the space of all possible intelligent algorithms’ as a statistical relationship without imagining it populated with actual instances. If I said ‘there’s a statistical relationship between age and death’, I think most people would understand my claim as being true, despite the huge space of all possible old people.
My perspective is “orthogonality thesis is one little ingredient of an argument that AGI safety is an important cause area”. One possible different perspective is “orthogonality thesis is the reason why AGI safety is an important cause area”. Your belief is that a lot of non-experts hold the latter perspective, right? If so, I’m skeptical.
I think I’m reasonably familiar with popular expositions of the case for AGI safety, and with what people inside and outside the field say about why or why not to work on AGI safety. And I haven’t come across “orthogonality thesis is the reason why AGI safety is an important cause area” as a common opinion, or even a rare opinion, as far as I can recall.
For example, Brian Christian, Stuart Russell, and Nick Bostrom all talk about Goodhart’s law and/or instrumental convergence in addition to (or instead of) orthogonality, Sam Harris talks about arms races and fragility-of-value, Ajeya Cotra talks about inner misalignment, Rob Miles talks about all of the above, Toby Ord uses the “second species argument”, etc. People way outside the field don’t talk about “orthogonality thesis” because they’ve never heard of it.
So if lots of people are saying “orthogonality thesis is the reason why AGI safety is an important cause area”, I don’t know where they would have gotten that idea, and I remain skeptical that this is actually the case.
I don’t know how to understand ‘the space of all possible intelligent algorithms’ as a statistical relationship without imagining it populated with actual instances.
My main claim here is that asking random EA people about the properties of “intelligence” (in the abstract) is different from asking them about the properties of “intelligent algorithms that will actually be created by future AI programmers”. I suspect that most people would feel that these are two different things, and correspondingly give different answers to questions depending on which one you ask about. (This could be tested, of course.)
A separate question is how random EA people conceptualize “intelligence” (in the abstract). I suspect “lots of different ways”, and those ways might be more or less coherent. For example, one coherent possibility is to consider the set of all 2^8000000 possible 1-megabyte source code algorithms, then select the subset that is “intelligent” (operationalized somehow), and then start talking about the properties of algorithms in that set.
(disclaimer that I talked to Sasha before he put up this post) but as a ‘random EA person’ I did find reading this clarifying.
It’s not that I believed that “orthogonality thesis the reason why AGI safety is an important cause area”, but that I had never thought about the distinction between “no known law relating intelligence and motivations” and “near-0 statistical correlation between intelligence and motivations”.
If I’d otherwise been prompted to think about it, I’d probably have arrived at the former, but I think the latter was rattling around inside my system 1 because the term “orthogonality” brings to mind orthogonal vectors.
see the example from yudkowsky above. As I understand it, he is the main person who has encouraged rationalists to be focused on AI. In trying to explain why AI is important to a smart person (Bryan Caplan) he appeals to the orthogonality argument which has zero bearing on whether AI alignment will be hard or worth working on.
The Orthogonality Thesis is useful to counter the common naive intuition that sufficiently intelligent AI will be benevolent by default (which a lot of smart people tend to hold prior to examining the arguments in any detail). But as Steven refers to above, it’s only one component of the argument for taking AGI x-risk seriously (and Yudkowsky lists several others in that example. He leads with orthogonality to prime the pump; to emphasise that common human intuitions aren’t useful here.).
Hi Greg, I don’t think anyone would ever have held that it is logically impossible for AGI not to be aligned. That is clearly a crazy view. All that orthogonality argument proves is that it is logically possible for AGI not to be aligned, which is almost trivial.
Right, but I think “by default” is important here. Many more people seem to think alignment will happen by default (or at least something along the lines of us being able to muddle through, reasoning with the AI and convincing it to be good, or easily shutting it down if it’s not, or something), rather than the opposite.
All the argument shows is that it is logically possible for AGI not to be aligned. Since Bryan Caplan is a sane human being, it’s improbable that he would ever not have accepted that claim. So, it’s unclear why Yudkowsky would have presented it to him as an important argument about AGI alignment.
“1′. AIs have a non-trivial chance of being dangerously un-nice.
I do find this plausible, though only because many governments will create un-nice AIs on purpose.”
Which to me sounds like he doesn’t really get it. Like he’s ignoring “by default does things we regard as harmful” (which he kind of agrees to above; he agrees with “2. Instrumental convergence”). You’re right in that the Orthogonality Thesis doesn’t carry the argument on it’s own, but in conjunction with Instrumental Convergence (and to be more complete, mesa-optimisation), I think it does.
It’s a shame that Caplan doesn’t reply to Yudkowsky’s follow up:
Bryan, would you say that you’re not worried about 1′ because:
1’a: You don’t think a paperclip maximizer is un-nice enough to be dangerous, even if it’s smarter than us. 1’b: You don’t think a paperclip maximizer of around human intelligence is un-nice enough to be dangerous, and you don’t foresee paperclip maximizers becoming much smarter than humans. 1’c: You don’t think that AGIs as un-nice as a paperclip maximizer are probable, unless those durned governments create AGIs that un-nice on purpose.
‘By default’ seems like another murky term. The orthogonality thesis asserts (something like) that it’s not something you should place a bet at arbitrarily long odds on, but maybe it’s nonetheless very likely to work out, because per Drexler, we just don’t code AI as an unbounded optimiser, which you might still call ‘by default’.
At the moment I have no idea what to think, tbh. But I lean towards focusing on GCRs that definitely need direct action in the short term, such as climate change, over ones that might be more destructive but where the relevant direct action is likely to be taken much further off.
So by ‘by default’ I mean without any concerted effort to address existential risk from AI, or just following “business as usual” with AI development. Yes, Drexler’s CAIS would be an example of this. But I’d argue that “just don’t code AI as an unbounded optimiser” is very likely to fail due to mesa-optimisers and convergent instrumental goals emerging in sufficiently powerful systems.
Interesting you mention climate change, as I actually went from focusing on that pre-EA to now thinking that AGI is a much more severe, and more immediate, threat! (Although I also remain interested in other more “mundane” GCRs.)
As a singular data point, I’ll submit that until reading this article, I was under the impression that the Orthogonality thesis is the main reason why researchers are concerned.
I don’t know how to understand ‘the space of all possible intelligent algorithms’ as a statistical relationship without imagining it populated with actual instances
Not my field, but my understanding is that using the uniform prior is pretty normal/common for theoretical CS.
Even if you think a uniform prior has zero information, which is a disputed position in philosophy, we have lots of information to update with here. eg that programmers will want AI systems to have certain motivations, that they won’t want to be killed etc.
I think the “real” orthogonality thesis is what you call the motte. I don’t think the orthogonality thesis by itself proves “alignment is hard”; rather you need additional arguments (things like Goodhart’s law, instrumental convergence, arguments about inner misalignment, etc.).
I don’t want to say that nobody has ever made the argument “orthogonality, therefore alignment is hard”—people say all kinds of things, especially non-experts—but it’s a wrong argument and I think you’re overstating how popular it is among experts.
I think the last part of this excerpt (“almost equally”) is unfair. I mean, maybe some readers are interpreting it that way, but if so, I claim that those readers don’t know what the word “constraint” means. Right?
I think the key reason that knowledgeable optimistic people are optimistic is the fact that humans will be trying to make aligned AGIs. But neither of the polls mention that. The statement “There is no statistical relationship between intelligence and goals” is very different from “An AGI created by human programmers will have a uniformly-randomly-selected goal”; I subscribe to (something like) the former (in the sense of selecting from “the space of all possible intelligent algorithms” or something) but I put much lower probability on (something like) the latter, despite being pessimistic about AGI doom. Human programmers are not uniformly-randomly sampling the space of all possible intelligent algorithms (I sure hope!)
Hi Steven,
To clarify, I make no claims about what experts think. I would be moderately surprised if more than a small minority of them pay any attention to the orthogonality thesis, presumably having their own nuanced views how AI development might pan out. My concern is with the non-experts who make up the supermajority of the EA community—who frequently decide whether to donate their money to AI research vs other causes, who are prioritising deeper dives, who in some cases decide whether to make grants, who are deciding whether to become experts, or who are building communities that informally or formally give careers advice to others, and who generally contribute to the picture of ‘what effective altruism is about’, both steering the culture and informing the broader public’s perception.
See above—it doesn’t really matter whether they don’t know what it means if it affects their decisions. I’m not accusing anyone of wrongdoing, just giving an explanation as to why it seems many EAs have come to believe the stronger versions of the thesis and trying to discourage them from holding it without decent evidence.
I don’t know how to understand ‘the space of all possible intelligent algorithms’ as a statistical relationship without imagining it populated with actual instances. If I said ‘there’s a statistical relationship between age and death’, I think most people would understand my claim as being true, despite the huge space of all possible old people.
My perspective is “orthogonality thesis is one little ingredient of an argument that AGI safety is an important cause area”. One possible different perspective is “orthogonality thesis is the reason why AGI safety is an important cause area”. Your belief is that a lot of non-experts hold the latter perspective, right? If so, I’m skeptical.
I think I’m reasonably familiar with popular expositions of the case for AGI safety, and with what people inside and outside the field say about why or why not to work on AGI safety. And I haven’t come across “orthogonality thesis is the reason why AGI safety is an important cause area” as a common opinion, or even a rare opinion, as far as I can recall.
For example, Brian Christian, Stuart Russell, and Nick Bostrom all talk about Goodhart’s law and/or instrumental convergence in addition to (or instead of) orthogonality, Sam Harris talks about arms races and fragility-of-value, Ajeya Cotra talks about inner misalignment, Rob Miles talks about all of the above, Toby Ord uses the “second species argument”, etc. People way outside the field don’t talk about “orthogonality thesis” because they’ve never heard of it.
So if lots of people are saying “orthogonality thesis is the reason why AGI safety is an important cause area”, I don’t know where they would have gotten that idea, and I remain skeptical that this is actually the case.
My main claim here is that asking random EA people about the properties of “intelligence” (in the abstract) is different from asking them about the properties of “intelligent algorithms that will actually be created by future AI programmers”. I suspect that most people would feel that these are two different things, and correspondingly give different answers to questions depending on which one you ask about. (This could be tested, of course.)
A separate question is how random EA people conceptualize “intelligence” (in the abstract). I suspect “lots of different ways”, and those ways might be more or less coherent. For example, one coherent possibility is to consider the set of all 2^8000000 possible 1-megabyte source code algorithms, then select the subset that is “intelligent” (operationalized somehow), and then start talking about the properties of algorithms in that set.
(disclaimer that I talked to Sasha before he put up this post) but as a ‘random EA person’ I did find reading this clarifying.
It’s not that I believed that “orthogonality thesis the reason why AGI safety is an important cause area”, but that I had never thought about the distinction between “no known law relating intelligence and motivations” and “near-0 statistical correlation between intelligence and motivations”.
If I’d otherwise been prompted to think about it, I’d probably have arrived at the former, but I think the latter was rattling around inside my system 1 because the term “orthogonality” brings to mind orthogonal vectors.
see the example from yudkowsky above. As I understand it, he is the main person who has encouraged rationalists to be focused on AI. In trying to explain why AI is important to a smart person (Bryan Caplan) he appeals to the orthogonality argument which has zero bearing on whether AI alignment will be hard or worth working on.
The Orthogonality Thesis is useful to counter the common naive intuition that sufficiently intelligent AI will be benevolent by default (which a lot of smart people tend to hold prior to examining the arguments in any detail). But as Steven refers to above, it’s only one component of the argument for taking AGI x-risk seriously (and Yudkowsky lists several others in that example. He leads with orthogonality to prime the pump; to emphasise that common human intuitions aren’t useful here.).
Hi Greg, I don’t think anyone would ever have held that it is logically impossible for AGI not to be aligned. That is clearly a crazy view. All that orthogonality argument proves is that it is logically possible for AGI not to be aligned, which is almost trivial.
Right, but I think “by default” is important here. Many more people seem to think alignment will happen by default (or at least something along the lines of us being able to muddle through, reasoning with the AI and convincing it to be good, or easily shutting it down if it’s not, or something), rather than the opposite.
All the argument shows is that it is logically possible for AGI not to be aligned. Since Bryan Caplan is a sane human being, it’s improbable that he would ever not have accepted that claim. So, it’s unclear why Yudkowsky would have presented it to him as an important argument about AGI alignment.
So the last Caplan says there is:
Which to me sounds like he doesn’t really get it. Like he’s ignoring “by default does things we regard as harmful” (which he kind of agrees to above; he agrees with “2. Instrumental convergence”). You’re right in that the Orthogonality Thesis doesn’t carry the argument on it’s own, but in conjunction with Instrumental Convergence (and to be more complete, mesa-optimisation), I think it does.
It’s a shame that Caplan doesn’t reply to Yudkowsky’s follow up:
it’s tricky to see what happened in that debate because i have twitter and that blog blocked on weekdays!
I just posted a reply to a similar comment about orthogonality + IC here.
‘By default’ seems like another murky term. The orthogonality thesis asserts (something like) that it’s not something you should place a bet at arbitrarily long odds on, but maybe it’s nonetheless very likely to work out, because per Drexler, we just don’t code AI as an unbounded optimiser, which you might still call ‘by default’.
At the moment I have no idea what to think, tbh. But I lean towards focusing on GCRs that definitely need direct action in the short term, such as climate change, over ones that might be more destructive but where the relevant direct action is likely to be taken much further off.
So by ‘by default’ I mean without any concerted effort to address existential risk from AI, or just following “business as usual” with AI development. Yes, Drexler’s CAIS would be an example of this. But I’d argue that “just don’t code AI as an unbounded optimiser” is very likely to fail due to mesa-optimisers and convergent instrumental goals emerging in sufficiently powerful systems.
Interesting you mention climate change, as I actually went from focusing on that pre-EA to now thinking that AGI is a much more severe, and more immediate, threat! (Although I also remain interested in other more “mundane” GCRs.)
As a singular data point, I’ll submit that until reading this article, I was under the impression that the Orthogonality thesis is the main reason why researchers are concerned.
agreed! some evidence of that in my comment
Not my field, but my understanding is that using the uniform prior is pretty normal/common for theoretical CS.
What do you mean by “uniform prior” here?
Even if you think a uniform prior has zero information, which is a disputed position in philosophy, we have lots of information to update with here. eg that programmers will want AI systems to have certain motivations, that they won’t want to be killed etc.