One thing I’m confused about is whether Legg’s definition (or your rephrasing) allows for situations where it’s in principle possible that being smarter is ex ante worse for an agent (obviously ex post it’s possible to follow the correct decision procedure and be unlucky).
There definitely are such cases—e.g. Omega penalises all smart agents. Or environments where there are several crucial considerations which you’re able to identify at different levels of intelligence, so that as intelligence increases, your success increases and decreases.
But in general I agree with your complaint about Legg’s definition being defined in behavioural terms, and how it’d be better to have a good definition of intelligence in terms of the cognitive processes involved (e.g. planning, abstraction, etc). I do think that starting off in behaviourist terms was a good move, back when people were much more allergic to talking about AGI/superintelligence. But now that we’re past that point, I think we can do better. (I don’t think I’ve written about this yet in much detail, but it’s quite high on my list of priorities.)
There definitely are such cases … Or environments where there are several crucial considerations which you’re able to identify at different levels of intelligence, so that as intelligence increases, your success increases and decreases.
Sorry I’m confused about this claim as stated. Assume that all environments have 3 levels of abstraction, which has this ultimate {action → expected utility} pair
A-> 10 expected utils
B → −10 expected utils
C → 20 expected utils
It seems to me that by the definition:
Intelligence measures an agent’s ability to achieve goals in a wide range of environments
Then by definition, strategy that outputs C smarter than strategy that outputs A smarter than strategy that outputs B. So B < A < C.
This is true even if cognitively the algorithm that outputs B is more sophisticated (eg more crucial considerations, or literally the same learning algorithm but with more compute) than the one that outputs A.
Ah, I see. I thought you meant “situations” as in “individual environments”, but it seems like you meant “situations” as in “possible ways that all environments could be”.
In that case, I think you’re right, but I don’t consider it a problem. Why might it be the case that adding more compute, or more memory, or something like that, would be net negative across all environments? It seems like either we’d have to define the set of environments in a very gerrymandered way, or else there’s something about the change we made that lands us in a valley of bad thinking. In the former case, we should use a wider set of environments; in the latter case, it seems easier to bite the bullet and say “Yeah, turns out that adding more of this usually-valuable trait makes agents less intelligent.”
Hmm, I’m probably not phrasing this well, but the point I’m trying to get across is that the Legg definition defines intelligence as always monotonically good in an in-principle way. I actually agree with you that empirically smarts (as usually defined)->good outcomes seems like the most natural hypothesis, but I’d have preferred a definition of intelligence which would leave open this question as an empirical hypothesis, over something that assumes it by definition.
I realize that “empirical hypothesis” is weird because of No Free Lunch, so I guess by a range of environments I mean something like “environments that plausibly reflect actual questions that might naturally arise in the real world” (Not very well-stated).
For example, another thing that I’m sort of interested in is multiagent situations where credibly proving you’re dumber makes you a more trustworthy agent, where it feels weird for me to claim that the credibly dumber agent is actually on some deeper level smarter than the naively smarter agent (whereas an agent smart enough to credibly lie about their dumbness is smarter again on both definitions).
(I don’t think the Omega hates smartness example, or for that matter a world where anti-induction is the correct decision procedure, is very interesting, relatively speaking, because they feel contrived enough to be a relatively small slice of realistic possibilities).
Ah, I like the multiagent example. So to summarise: I agree that we have some intuitive notion of what cognitive processes we think of as intelligent, and it would be useful to have a definition of intelligence phrased in terms of those. I also agree that Legg’s behavioural definition might diverge from our implicit cognitive definition in non-trivial ways.
I guess the reason why I’ve been pushing back on your point is that I think that possible divergences between the two aren’t the main thing going on here. Even if it turned out that the behavioural definition and the cognitive definition ranked all possible agents the same, I think the latter would be much more insightful and much more valuable for helping us think about AGI.
But this is probably not an important disagreement.
I see the issue now. To restate it in my own words, both of us agree that cognitive definitions are plausibly more useful than behavioral definitions (and probably you are more confident in this claim than I am), but for me the cruxes are in the direction of where the cognitive definitions and the behavioral definitions diverge in non-trivial ways in ranking agents, and in those cases the divergences are important + interesting, whereas in your case you’d consider the cognitive definitions more insightful for thinking about AGI even if it can later be shown that the divergences are only in trivial ways.
Upon reflection, I’m not sure if we disagree. I’ll need to think harder about whether I’d consider using the cognitive definitions (which presumably will suffer a bit of an elegance tax) to still be a generally superior way of thinking about AGI than using the behavioral definition if there are no non-trivial divergences.
I also agree that as stated this is probably not an important disagreement.
There definitely are such cases—e.g. Omega penalises all smart agents. Or environments where there are several crucial considerations which you’re able to identify at different levels of intelligence, so that as intelligence increases, your success increases and decreases.
But in general I agree with your complaint about Legg’s definition being defined in behavioural terms, and how it’d be better to have a good definition of intelligence in terms of the cognitive processes involved (e.g. planning, abstraction, etc). I do think that starting off in behaviourist terms was a good move, back when people were much more allergic to talking about AGI/superintelligence. But now that we’re past that point, I think we can do better. (I don’t think I’ve written about this yet in much detail, but it’s quite high on my list of priorities.)
Sorry I’m confused about this claim as stated. Assume that all environments have 3 levels of abstraction, which has this ultimate {action → expected utility} pair
A-> 10 expected utils
B → −10 expected utils
C → 20 expected utils
It seems to me that by the definition:
Then by definition, strategy that outputs C smarter than strategy that outputs A smarter than strategy that outputs B. So B < A < C.
This is true even if cognitively the algorithm that outputs B is more sophisticated (eg more crucial considerations, or literally the same learning algorithm but with more compute) than the one that outputs A.
Am I confused here?
Ah, I see. I thought you meant “situations” as in “individual environments”, but it seems like you meant “situations” as in “possible ways that all environments could be”.
In that case, I think you’re right, but I don’t consider it a problem. Why might it be the case that adding more compute, or more memory, or something like that, would be net negative across all environments? It seems like either we’d have to define the set of environments in a very gerrymandered way, or else there’s something about the change we made that lands us in a valley of bad thinking. In the former case, we should use a wider set of environments; in the latter case, it seems easier to bite the bullet and say “Yeah, turns out that adding more of this usually-valuable trait makes agents less intelligent.”
Hmm, I’m probably not phrasing this well, but the point I’m trying to get across is that the Legg definition defines intelligence as always monotonically good in an in-principle way. I actually agree with you that empirically smarts (as usually defined)->good outcomes seems like the most natural hypothesis, but I’d have preferred a definition of intelligence which would leave open this question as an empirical hypothesis, over something that assumes it by definition.
I realize that “empirical hypothesis” is weird because of No Free Lunch, so I guess by a range of environments I mean something like “environments that plausibly reflect actual questions that might naturally arise in the real world” (Not very well-stated).
For example, another thing that I’m sort of interested in is multiagent situations where credibly proving you’re dumber makes you a more trustworthy agent, where it feels weird for me to claim that the credibly dumber agent is actually on some deeper level smarter than the naively smarter agent (whereas an agent smart enough to credibly lie about their dumbness is smarter again on both definitions).
(I don’t think the Omega hates smartness example, or for that matter a world where anti-induction is the correct decision procedure, is very interesting, relatively speaking, because they feel contrived enough to be a relatively small slice of realistic possibilities).
Ah, I like the multiagent example. So to summarise: I agree that we have some intuitive notion of what cognitive processes we think of as intelligent, and it would be useful to have a definition of intelligence phrased in terms of those. I also agree that Legg’s behavioural definition might diverge from our implicit cognitive definition in non-trivial ways.
I guess the reason why I’ve been pushing back on your point is that I think that possible divergences between the two aren’t the main thing going on here. Even if it turned out that the behavioural definition and the cognitive definition ranked all possible agents the same, I think the latter would be much more insightful and much more valuable for helping us think about AGI.
But this is probably not an important disagreement.
I see the issue now. To restate it in my own words, both of us agree that cognitive definitions are plausibly more useful than behavioral definitions (and probably you are more confident in this claim than I am), but for me the cruxes are in the direction of where the cognitive definitions and the behavioral definitions diverge in non-trivial ways in ranking agents, and in those cases the divergences are important + interesting, whereas in your case you’d consider the cognitive definitions more insightful for thinking about AGI even if it can later be shown that the divergences are only in trivial ways.
Upon reflection, I’m not sure if we disagree. I’ll need to think harder about whether I’d consider using the cognitive definitions (which presumably will suffer a bit of an elegance tax) to still be a generally superior way of thinking about AGI than using the behavioral definition if there are no non-trivial divergences.
I also agree that as stated this is probably not an important disagreement.