Mau comments on Optimism, AI risk, and EA blind spots

Mau Sep 28, 2022, 7:34 PM
11 points
1 ∶ 0
Thanks for posting! I’m sympathetic to the broad intuition that any one person being at the sweet spot where they make a decisive impact seems unlikely , but I’m not sold on most of the specific arguments given here.

Recall that there are decent reasons to think goal alignment is impossible—in other words, it’s not a priori obvious that there’s any way to declare a goal and have some other agent pursue that goal exactly as you mean it.

I don’t see why this is the relevant standard. “Just” avoiding egregiously unintended behavior seems sufficient for avoiding the worst accidents (and is clearly possible, since humans do it often).

Also, I don’t think I’ve heard these decent reasons—what are they?

Recall that engineering ideas very, very rarely work on the first try, and that if we only have one chance at anything, failure is very likely.

It’s also unclear that we only have one chance at this. Optimistically (but not that optimistically?), incremental progress and failsafes can allow for effectively multiple chances. (The main argument against seems to involve assumptions of very discontinuous or abrupt AI progress, but I haven’t seen very strong arguments for expecting that.)

Recall that getting “humanity” to agree on a good spec for ethical behavior is extremely difficult: some places are against gene drives to reduce mosquito populations, for example, despite this saving many lives in expectation.

Agree, but also unclear why this is the relevant standard. A smaller set of actors agreeing on a more limited goal might be enough to help.

Recall that there is a gigantic economic incentive to keep pushing AI capabilities up, and referenda to reduce animal suffering in exchange for more expensive meat tend to fail.

Yup, though we should make sure not to double-count this, since this point was also included earlier (which isn’t to say you’re necessarily double-counting).

Recall that we have to implement any solution in a way that appeals to the cultural sensibilities of all major and technically savvy governments on the planet, plus major tech companies, plus, under certain circumstances, idiosyncratic ultra-talented individual hackers.

This also seems like an unnecessarily high standard, since regulations have been passed and enforced before without unanimous support from affected companies.

Also, getting acceptance from all major governments does seem very hard but not quite as hard as the above quotes makes it sound. After all, many major governments (developed Western ones) have relatively similar cultural sensibilities, and ambitious efforts to prevent unilateral actions have previously gotten very broad acceptance (e.g. many actors could have made and launched nukes, done large-scale human germline editing, or maybe done large-scale climate engineering, but to my knowledge none of those have happened).

The we-only-get-one-shot idea applies on this stage too.

Yup, though this is also potential double-counting.
- Justis Sep 29, 2022, 12:23 AM
  2 points
  0 ∶ 0
  Parent
  Yeah, I share the view that the “Recalls” are the weakest part—I mostly was trying to get my fuzzy, accumulated-over-many-years vague sense of “whoa no we’re being way too confident about this” into a more postable form. Seeing your criticisms I think the main issue is a little bit of a Motte-and-Bailey sort of thing where I’m kind of responding to a Yudkowskian model, but smuggling in a more moderate perspective’s odds (ie. Yudkowsky thinks we need to get it right on the first try, but Grace and MacAskill may be agnostic there).
  I may think more about this! I do think there’s something there sort of between the parts you’re quoting, by which I mean yes, we could get agreement to a narrower standard than solving ethics, but even just making ethical progress at all, or coming up with standards that go anywhere good/predictable politically seems hard. Like, the political dimension and the technical/problem specification dimensions both seem super hard in a way where we’d have to trust ourselves to be extremely competent across both dimensions, and our actual testable experiments against either outcome are mostly a wash (ie. we can’t get a US congressperson elected yet, or get affordable lab-grown meat on grocery store shelves, so doing harder versions of both at once seems...I dunno, might hedge my portfolio far beyond that!).