Linch comments on AGI safety from first principles

Linch Jun 27, 2021, 9:44 PM
4 points
0 ∶ 0
Hmm, I’m probably not phrasing this well, but the point I’m trying to get across is that the Legg definition defines intelligence as always monotonically good in an in-principle way. I actually agree with you that empirically smarts (as usually defined)->good outcomes seems like the most natural hypothesis, but I’d have preferred a definition of intelligence which would leave open this question as an empirical hypothesis, over something that assumes it by definition.

I realize that “empirical hypothesis” is weird because of No Free Lunch, so I guess by a range of environments I mean something like “environments that plausibly reflect actual questions that might naturally arise in the real world” (Not very well-stated).

For example, another thing that I’m sort of interested in is multiagent situations where credibly proving you’re dumber makes you a more trustworthy agent, where it feels weird for me to claim that the credibly dumber agent is actually on some deeper level smarter than the naively smarter agent (whereas an agent smart enough to credibly lie about their dumbness is smarter again on both definitions).

(I don’t think the Omega hates smartness example, or for that matter a world where anti-induction is the correct decision procedure, is very interesting, relatively speaking, because they feel contrived enough to be a relatively small slice of realistic possibilities).
- richard_ngo Jun 27, 2021, 11:01 PM
  4 points
  0 ∶ 0
  Parent
  Ah, I like the multiagent example. So to summarise: I agree that we have some intuitive notion of what cognitive processes we think of as intelligent, and it would be useful to have a definition of intelligence phrased in terms of those. I also agree that Legg’s behavioural definition might diverge from our implicit cognitive definition in non-trivial ways.
  I guess the reason why I’ve been pushing back on your point is that I think that possible divergences between the two aren’t the main thing going on here. Even if it turned out that the behavioural definition and the cognitive definition ranked all possible agents the same, I think the latter would be much more insightful and much more valuable for helping us think about AGI.
  But this is probably not an important disagreement.
  - Linch Jun 27, 2021, 11:21 PM
    4 points
    0 ∶ 0
    Parent
    I see the issue now. To restate it in my own words, both of us agree that cognitive definitions are plausibly more useful than behavioral definitions (and probably you are more confident in this claim than I am), but for me the cruxes are in the direction of where the cognitive definitions and the behavioral definitions diverge in non-trivial ways in ranking agents, and in those cases the divergences are important + interesting, whereas in your case you’d consider the cognitive definitions more insightful for thinking about AGI even if it can later be shown that the divergences are only in trivial ways.
    
    Upon reflection, I’m not sure if we disagree. I’ll need to think harder about whether I’d consider using the cognitive definitions (which presumably will suffer a bit of an elegance tax) to still be a generally superior way of thinking about AGI than using the behavioral definition if there are no non-trivial divergences.
    
    I also agree that as stated this is probably not an important disagreement.