Safety Researcher and Scalable Alignment Team lead at DeepMind. AGI will probably be wonderful; let’s make that even more probable.
Geoffrey Irving
In the other direction, I started to think about this stuff in detail at the same time I started working with various other people and definitely learned a ton from them, so there wasn’t a long period where I had developed views but hadn’t spent months talking to Paul.
We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.
I think mostly I arrived with a different set of tools and intuitions, in particular a better sense for numerical algorithms (Paul has that too, of course) and thus intuition about how things should work with finite errors and how to build toy models that capture the finite error setting.
I do think a lot of the intuitions built by Bostrom and Yudkowsky are easy to fix into a form that works in the finite error model (though not all of it), so I don’t agree with some of the recent negativity about these classical arguments. That is, some fixing is required to make me like those arguments, but it doesn’t feel like the fixing is particularly hard.
Well, part of my job is making new people that qualify, so yes to some extent. This is true both in my current role and in past work at OpenAI (e.g., https://distill.pub/2019/safety-needs-social-scientists).
I started working on AI safety prior to reading Superintelligence and despite knowing about MIRI et al. since I didn‘t like their approach. So I don’t think I agree with your initial premise that the field is as much a monoculture as you suggest.
Yes, the mocking is what bothers me. In some sense the wording of the list means that people on both sides of the question could come away feeling justified without a desire for further communication: AGI safety folk since the arguments seem quite bad, and AGI safety skeptics since they will agree that some of these heuristics can be steel-manned into a good form.
As a meta-comment, I think it’s quite unhelpful that some of these “good heuristics” are written as intentional strawmen where the author doesn’t believe the assumptions hold. E.g., the author doesn’t believe that there are no insiders talking about X-risk. If you’re going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that I’m one of the more senior AGI safety researchers.
“Quite possible” means I am making a qualitative point about game theory but haven’t done the estimates.
Though if one did want to do estimates, that ratio isn’t enough, as spread is superlinear as a function of the size of a group arrested and put in a single room.
Thanks, that’s all reasonable. Though to clarify, the game theory point isn’t about deterring police but about whether to let potential arrests and coronavirus consequences deter the protests themselves.
It’s worth distinguishing between the protests causing spread and arresting protesters causing spread. It’s quite possible more spread will be caused by the latter, and calling this spread “caused by the protests” is game theoretically similar to “Why are you hitting yourself?” My guess is that you’re not intending to lump those into the same bucket, but it’s worth separating them out explicitly given the title.
One note: DeepMind is outside the set of typical EA orgs, but is very relevant from a longtermist perspective. It fairs quite a bit better on this measure in terms of leadership: e.g., everyone above me in the hierarchy is non-white.
Fixed, thanks!
This isn’t a complete answer, but I think it is useful to have a list of prosaic alignment failures to make the basic issue more concrete. Examples include fairness (bad data leading to inferences that reflect bad values), recommendation systems going awry, etc. I think Catherine Olsson has a long list of these, but I don’t know where it is. We should generically effect some sort of amplification as AI strength increases; it’s conceivable the amplification is in the good direction, but at a minimum we shouldn’t be confident of that.
If someone is skeptical about AIs getting smart enough that this matters, you can point to the various examples of existing superhuman systems (game playing programs, dog distinguishers that beat experts, medical imaging systems that beat teams of experts, etc.). Narrow superintelligence should already be enough to worry, depending on how such systems are deployed.
This is a great document! I agree with the conclusions, though there are a couple factors not mentioned which seem important:
On the positive side, Google has already deployed post-quantum schemes as a test, and I believe the test was successful (https://security.googleblog.com/2016/07/experimenting-with-post-quantum.html). This was explicitly just a test and not intended as a standardization proposal, but it’s good to see that it’s practical to layer a post-quantum scheme on top of an existing scheme in a deployed system. I do think if we needed to do this quickly it would happen; the example of Google and Apple working together to get contact tracing working seems relevant.
On the negative side, there may be significant economic costs due to public key schemes deployed “at rest” which are impossible to change after the fact. This includes any encrypted communication that has been stored by an adversary across the time when we switch from pre-quantum to post-quantum, and also includes slow-to-build up applications like PGP webs of trust which are hard to quickly swap out. I don’t think this changes the overall conclusions, since I’d expect the going-forwards cost to be larger, but it’s worth mentioning.