My comically oversimplified summary of the above conversation, which is not endorsed by Richard or Eliezer (and skips over a large number of topics and claims, doesn’t try to stay close to the text, etc.):
R: I’m skeptical of your claim that capable-enough-to-save-the-world AI systems will (as a strong default) be means-end reasoners that approximate expected utility (EU) maximizers. (And therefore of your claim that like EU maximizers, as a strong default, they’ll think about the long-term consequences of their actions and try to consistently “steer” the future in some direction—properties that would be worrisome if they held, because they imply convergent instrumental goals like killing humans.)
In particular, I worry that you may be putting too much confidence in abstractions like expected utility, in the same way that you were too confident in recursive self-improvement (RSI) and missed that AI (e.g., GPT-3) could get pretty capable without it. The real world is messy, and abstractions like this often fail in surprising ways; so we should be correspondingly less confident that powerful future AGI systems will conform to the particular abstraction you’re pointing at (“expected utility”).
E: RSI still strikes me as just as good an abstraction as ever. It’s true that I was surprised by how fast ML could advance without RSI, but RSI is properly a claim about what happens when AI gets sufficiently capable, not a claim ‘there are no other ways to rapidly increase in capability’.
I see my error as ‘giving too much attention to interesting complex ways things can go poorly, and neglecting the simple, banal ways things can go wrong earlier’. If I’m messing up, it’s plausible that I haven’t fully fixed that bias and am messing up in a similar way to that. But that doesn’t make me think EU is a worse abstraction for its domain of applicability, or make me more optimistic about AI alignment.
R: If EU is a deep fundamental theory, then it should make some novel, verifiable predictions that other theories don’t make.
E: EU makes plenty of mundane predictions about, e.g., how humans reason (via weighing futures according to probabilities, etc.), and how humans will tend to behave tomorrow (usually picking up $50 bills when they see them on the ground, etc.).
R: Those seem too obvious—we already expected those things, so given things like hindsight bias, it’s hard to know how much of an advantage those successful predictions should give EU over rival models, if any. I expect something more surprising and impressive, if EU really is a useful enough framework to let us make confident predictions about capable-enough-to-save-the-world AI systems.
E: Those sorts of prediction successes about everyday human behavior strike me as easily good enough, given that I’m not claiming the level of confidence of, e.g., a law of physics. I think you’re being unreasonably skeptical here because, like a lot of EAs, you’re overly skeptical about useful predictive abstractions, and overly credulous about modest-epistemology norms.
(topic change)
E: In general, I don’t expect governments to prepare, coordinate, or exhibit any competence around AGI.
R: I feel more optimistic because before AGI, I think we might see (e.g.) a decade of non-dangerous AI radically transforming and enriching the world.
E: I don’t expect that to happen at all, because (a) I don’t expect the technology to go that way, and (b) I expect bureaucratic/regulatory obstacles to mostly prevent AI progress from hugely changing the world, until AGI saves or destroys the world.
tl;dr: Eliezer and Richard disagree about how hard alignment is, so they try to resolve that disagreement by talking about various things that might underlie the disagreement.
I might be interested in this, but I’d be really helped by a TL;DR or similar providing some context on what’s being discussed.
My comically oversimplified summary of the above conversation, which is not endorsed by Richard or Eliezer (and skips over a large number of topics and claims, doesn’t try to stay close to the text, etc.):
R: I’m skeptical of your claim that capable-enough-to-save-the-world AI systems will (as a strong default) be means-end reasoners that approximate expected utility (EU) maximizers. (And therefore of your claim that like EU maximizers, as a strong default, they’ll think about the long-term consequences of their actions and try to consistently “steer” the future in some direction—properties that would be worrisome if they held, because they imply convergent instrumental goals like killing humans.)
In particular, I worry that you may be putting too much confidence in abstractions like expected utility, in the same way that you were too confident in recursive self-improvement (RSI) and missed that AI (e.g., GPT-3) could get pretty capable without it. The real world is messy, and abstractions like this often fail in surprising ways; so we should be correspondingly less confident that powerful future AGI systems will conform to the particular abstraction you’re pointing at (“expected utility”).
E: RSI still strikes me as just as good an abstraction as ever. It’s true that I was surprised by how fast ML could advance without RSI, but RSI is properly a claim about what happens when AI gets sufficiently capable, not a claim ‘there are no other ways to rapidly increase in capability’.
I see my error as ‘giving too much attention to interesting complex ways things can go poorly, and neglecting the simple, banal ways things can go wrong earlier’. If I’m messing up, it’s plausible that I haven’t fully fixed that bias and am messing up in a similar way to that. But that doesn’t make me think EU is a worse abstraction for its domain of applicability, or make me more optimistic about AI alignment.
R: If EU is a deep fundamental theory, then it should make some novel, verifiable predictions that other theories don’t make.
E: EU makes plenty of mundane predictions about, e.g., how humans reason (via weighing futures according to probabilities, etc.), and how humans will tend to behave tomorrow (usually picking up $50 bills when they see them on the ground, etc.).
R: Those seem too obvious—we already expected those things, so given things like hindsight bias, it’s hard to know how much of an advantage those successful predictions should give EU over rival models, if any. I expect something more surprising and impressive, if EU really is a useful enough framework to let us make confident predictions about capable-enough-to-save-the-world AI systems.
E: Those sorts of prediction successes about everyday human behavior strike me as easily good enough, given that I’m not claiming the level of confidence of, e.g., a law of physics. I think you’re being unreasonably skeptical here because, like a lot of EAs, you’re overly skeptical about useful predictive abstractions, and overly credulous about modest-epistemology norms.
(topic change)
E: In general, I don’t expect governments to prepare, coordinate, or exhibit any competence around AGI.
R: I feel more optimistic because before AGI, I think we might see (e.g.) a decade of non-dangerous AI radically transforming and enriching the world.
E: I don’t expect that to happen at all, because (a) I don’t expect the technology to go that way, and (b) I expect bureaucratic/regulatory obstacles to mostly prevent AI progress from hugely changing the world, until AGI saves or destroys the world.
Ok, that helps—a litte! - but it’s still not quite at TL;DR. :)
tl;dr: Eliezer and Richard disagree about how hard alignment is, so they try to resolve that disagreement by talking about various things that might underlie the disagreement.