Safety researcher at DeepMind
As someone who’s worked both in ML for formal verification with security motivations in mind, and (now) directly on AGI alignment, I think most EA-aligned folk who would be good at formal verification will be close enough to being good at direct AGI alignment that it will be higher impact to work directly on AGI alignment. It’s possible this would change in the future if there are a lot more people working on theoretically-motivated prosaic AGI alignment, but I don’t think we’re there yet.
I think that isn’t the right counterfactual since I got into EA circles despite having only minimal (and net negative) impressions of EA-related forums. So your claim is narrowly true, but if instead the counterfactual was if my first exposure to EA was the EA forum, then I think yes the prominence of this kind of post would have made me substantially less likely to engage.
But fundamentally if we’re running either of these counterfactuals I think we’re already leaving a bunch of value on the table, as expressed by EricHerboso’s post about false dilemmas.
I bounce off posts like this. Not sure if you’d consider me net positive or not. :)
Not a non-profit, but since you mention AI and X-risk it’s worth mentioning DeepMind, since program managers are core to how research is organized and led here: https://deepmind.com/careers/jobs/2390893.
5% probability by 2039 seems way too confident that it will take a long time: is this intended to be a calibrated estimate, or does the number have a different meaning?
Yep, that’s the right interpretation.
In terms of hardware, I don’t know how Chrome did it, but at least on fully capable hardware (mobile CPUs and above) you can often bitslice to make almost any circuit efficient if it has to be evaluated in parallel. So my prior is that quite general things don’t need new hardware if one is sufficiently motivated, and would want to see the detailed reasoning before believing you can’t do it with existing machines.
This is a great document! I agree with the conclusions, though there are a couple factors not mentioned which seem important:
On the positive side, Google has already deployed post-quantum schemes as a test, and I believe the test was successful (https://security.googleblog.com/2016/07/experimenting-with-post-quantum.html). This was explicitly just a test and not intended as a standardization proposal, but it’s good to see that it’s practical to layer a post-quantum scheme on top of an existing scheme in a deployed system. I do think if we needed to do this quickly it would happen; the example of Google and Apple working together to get contact tracing working seems relevant.
On the negative side, there may be significant economic costs due to public key schemes deployed “at rest” which are impossible to change after the fact. This includes any encrypted communication that has been stored by an adversary across the time when we switch from pre-quantum to post-quantum, and also includes slow-to-build up applications like PGP webs of trust which are hard to quickly swap out. I don’t think this changes the overall conclusions, since I’d expect the going-forwards cost to be larger, but it’s worth mentioning.
In the other direction, I started to think about this stuff in detail at the same time I started working with various other people and definitely learned a ton from them, so there wasn’t a long period where I had developed views but hadn’t spent months talking to Paul.
We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.
I think mostly I arrived with a different set of tools and intuitions, in particular a better sense for numerical algorithms (Paul has that too, of course) and thus intuition about how things should work with finite errors and how to build toy models that capture the finite error setting.
I do think a lot of the intuitions built by Bostrom and Yudkowsky are easy to fix into a form that works in the finite error model (though not all of it), so I don’t agree with some of the recent negativity about these classical arguments. That is, some fixing is required to make me like those arguments, but it doesn’t feel like the fixing is particularly hard.
Well, part of my job is making new people that qualify, so yes to some extent. This is true both in my current role and in past work at OpenAI (e.g., https://distill.pub/2019/safety-needs-social-scientists).
I started working on AI safety prior to reading Superintelligence and despite knowing about MIRI et al. since I didn‘t like their approach. So I don’t think I agree with your initial premise that the field is as much a monoculture as you suggest.
Yes, the mocking is what bothers me. In some sense the wording of the list means that people on both sides of the question could come away feeling justified without a desire for further communication: AGI safety folk since the arguments seem quite bad, and AGI safety skeptics since they will agree that some of these heuristics can be steel-manned into a good form.
As a meta-comment, I think it’s quite unhelpful that some of these “good heuristics” are written as intentional strawmen where the author doesn’t believe the assumptions hold. E.g., the author doesn’t believe that there are no insiders talking about X-risk. If you’re going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that I’m one of the more senior AGI safety researchers.
“Quite possible” means I am making a qualitative point about game theory but haven’t done the estimates.
Though if one did want to do estimates, that ratio isn’t enough, as spread is superlinear as a function of the size of a group arrested and put in a single room.
Thanks, that’s all reasonable. Though to clarify, the game theory point isn’t about deterring police but about whether to let potential arrests and coronavirus consequences deter the protests themselves.
It’s worth distinguishing between the protests causing spread and arresting protesters causing spread. It’s quite possible more spread will be caused by the latter, and calling this spread “caused by the protests” is game theoretically similar to “Why are you hitting yourself?” My guess is that you’re not intending to lump those into the same bucket, but it’s worth separating them out explicitly given the title.
One note: DeepMind is outside the set of typical EA orgs, but is very relevant from a longtermist perspective. It fairs quite a bit better on this measure in terms of leadership: e.g., everyone above me in the hierarchy is non-white.
This isn’t a complete answer, but I think it is useful to have a list of prosaic alignment failures to make the basic issue more concrete. Examples include fairness (bad data leading to inferences that reflect bad values), recommendation systems going awry, etc. I think Catherine Olsson has a long list of these, but I don’t know where it is. We should generically effect some sort of amplification as AI strength increases; it’s conceivable the amplification is in the good direction, but at a minimum we shouldn’t be confident of that.
If someone is skeptical about AIs getting smart enough that this matters, you can point to the various examples of existing superhuman systems (game playing programs, dog distinguishers that beat experts, medical imaging systems that beat teams of experts, etc.). Narrow superintelligence should already be enough to worry, depending on how such systems are deployed.