Safety Researcher and Scalable Alignment Team lead at DeepMind. AGI will probably be wonderful; let’s make that even more probable.
Geoffrey Irving
I bounce off posts like this. Not sure if you’d consider me net positive or not. :)
This paper has at least two significant flaws when used to estimate relative complexity for useful purposes. In the authors’ defense such an estimate wasn’t the main motivation of the paper, but the Quanta article is all about estimation and the paper doesn’t mention the flaws.
Flaw one: no reversed control
Say we have two parameterized model classes and , and ask what ns are necessary for to approximate and to approximate . It is trivial to construct model classes for which the n is large in both directions, just because is a much better algorithm to approximate than and vice versa. I’m not sure how much this cuts off the 1000 estimate, but it could easily be 10x.Brief Twitter thread about this: https://twitter.com/geoffreyirving/status/1433487270779174918
Flaw two: no scaling w.r.t. multiple neurons
I don’t see any reason to believe the 1000 factor would remain constant as you add more neurons, so that we’re approximating many real neurons with many (more) artificial neurons. In particular, it’s easy to construct model classes where the factor decays to 1 as you add more real neurons. I don’t know how strong this effect is, but again there is no discussion or estimation of it in the paper.
I think that isn’t the right counterfactual since I got into EA circles despite having only minimal (and net negative) impressions of EA-related forums. So your claim is narrowly true, but if instead the counterfactual was if my first exposure to EA was the EA forum, then I think yes the prominence of this kind of post would have made me substantially less likely to engage.
But fundamentally if we’re running either of these counterfactuals I think we’re already leaving a bunch of value on the table, as expressed by EricHerboso’s post about false dilemmas.
It can’t be up to date, since they recently announced that Helen Toner joined the board, and she’s not listed.
The rest of this comment is interesting, but opening with “Ummm, what?” seems bad, especially since it takes careful reading to know what you are specifically objecting to.
Edit: Thanks for fixing!
One note: DeepMind is outside the set of typical EA orgs, but is very relevant from a longtermist perspective. It fairs quite a bit better on this measure in terms of leadership: e.g., everyone above me in the hierarchy is non-white.
I lead some of DeepMind’s technical AGI safety work, and wanted to add two supporting notes:
I’m super happy we’re growing strategy and governance efforts!
We view strategy and governance questions as coupled to technical safety, and are working to build very close links between research in the two areas so that governance mechanisms and alignment mechanisms can be co-designed. (This also applies to technical safety and the Ethics Team, among other teams.)
As a meta-comment, I think it’s quite unhelpful that some of these “good heuristics” are written as intentional strawmen where the author doesn’t believe the assumptions hold. E.g., the author doesn’t believe that there are no insiders talking about X-risk. If you’re going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that I’m one of the more senior AGI safety researchers.
As a high-level comment, it seems bad to structure the world so that the smartest people compete against each other in zero-sum games. It’s definitely the case that zero-sum games are the best way to ensure technical hardness, as the games will by construction be right at the threshold of playability. But if we do this we’re throwing most of the value away in comparison to working on positive-sum games.
(I’m happy to die on the hill that that threshold exists, if you want a vicious argument. :))
This is a great document! I agree with the conclusions, though there are a couple factors not mentioned which seem important:
On the positive side, Google has already deployed post-quantum schemes as a test, and I believe the test was successful (https://security.googleblog.com/2016/07/experimenting-with-post-quantum.html). This was explicitly just a test and not intended as a standardization proposal, but it’s good to see that it’s practical to layer a post-quantum scheme on top of an existing scheme in a deployed system. I do think if we needed to do this quickly it would happen; the example of Google and Apple working together to get contact tracing working seems relevant.
On the negative side, there may be significant economic costs due to public key schemes deployed “at rest” which are impossible to change after the fact. This includes any encrypted communication that has been stored by an adversary across the time when we switch from pre-quantum to post-quantum, and also includes slow-to-build up applications like PGP webs of trust which are hard to quickly swap out. I don’t think this changes the overall conclusions, since I’d expect the going-forwards cost to be larger, but it’s worth mentioning.
Those aspects are getting weaker, but the ability for ML to models humans is getting stronger, and there are other “computer acting as salesperson” channels which don’t go through Privacy Sandbox. But probably I’m just misusing the term “ad tech” here, and “convince someone to buy something” tech might be a better term.
So certainly physics-based priors is a big component, and indeed in some sense is all of it. That is, I think physics-based priors should give you an immediate answer of “you can’t influence the past with high probability”, and moreover that once you think through the problems in detail the conclusion will be that you could influence the past if physics were different (including boundary conditions, even if laws remain the same), but still that boundary condition priors should still tell us you can’t influence the past. I’m happy to elaborate.
First, I think saying CDT is wrong, full stop, is much less useful than saying that CDT has a limited domain of applicability (using Sean Carroll’s terminology from The Big Picture). Analogously, one shouldn’t say that Newtonian physics is wrong, but that it is has a limited domain of applicability, and one should be careful to apply it only in that domain. Of course, you can choose to stick to the “wrong” terminology; the claim is only that this is less useful.
So what’s the domain of applicability of CDT? Roughly, I think the domain is cases where the agent can’t be predicted by other agents in the world. I personally like to call this the “free will” case, but that’s my personal definition, so if you don’t like that definition we can call it the non-prediction case. The deterministic twin case violates this, as there is a dimension of decision making where non-prediction fails: each twin can perfectly predict the other’s actions conditional on their own actions. So deterministic twins are outside the domain of applicability of CDT.
A consequence of this view is whether we are in or out of the domain of applicability of CDT is an empirical question: you can’t resolve it from pure theory. I further claim (without pinning down the definitions very well) that “generic, un-tuned” situations fall into the non-prediction case. This is again an empirical claim, and roughly says that “something needs to happen” to be outside the non-prediction case. In the deterministic twin case, this “something” is the intentional construction of the twins. Some detailed claims:
Humanity’s past fits the non-prediction case. For example, it is not the case that “perhaps you and some of the guards are implementing vaguely similar decision procedures at some level” in World War 1, not least because most of decision theory was invented after World War 1. Again, this is a purely empirical claim: it could have been otherwise, and I’m claiming it wasn’t.
The multiverse fits the non-prediction case. I also believe that once we have a sufficient understanding of cosmology, we will conclude that it is most likely that the multiverse fits the non-prediction case, roughly because the causal linkages behind the multiverse (through quantum branching, inflation, or logical possibilities) are high temperature in some sense. This is an again an empirical prediction about cosmology, though of course it’s much harder to check and I’m much less confident in it than for (1).
The world does not entirely fall into the non-prediction case. As an example, it is perilous when advertisers have too much information and computation asymmetry with users, since that asymmetry can break non-prediction (more here). A consequence of this is that it’s good that people are studying decision theories with larger domains of applicability.
AGI safety v1 can likely be made to fall into the non-prediction case. This is another highly contingent claim, and requires some action to ensure, namely somehow telling AGIs to avoid the non-prediction case in appropriate senses (and designing them so that this is possible to do). (I expect to get jumped on for this one, but before you believe I’m just ignorant it might be worth asking Paul whether I’m just ignorant.) And I do mean v1; it’s quite possible that v2 goes better if we have the option of not telling them this.
I do want to emphasize that as a consequence of (3), (4), uncertainty about (2), and a way tinier amount of uncertainty about (1), I’m happy people are exploring this space. But of course I’m also going to place a lower estimate on its importance as a consequence of the above.
Congratulations on the switch!
I enjoyed your ads blog post, by the way. Might be fun to discuss that sometime, both because (1) I’m funded by ads and (2) I’m curious how the picture will shift as ad tech gets stronger.
Yep, that’s very fair. What I was trying to say was that if in response to the first suggestion someone said “Why aren’t you deferring to others?” you could use that as a joke backup, but agreed that it reads badly.
Is Holden still on the board?
As someone who’s worked both in ML for formal verification with security motivations in mind, and (now) directly on AGI alignment, I think most EA-aligned folk who would be good at formal verification will be close enough to being good at direct AGI alignment that it will be higher impact to work directly on AGI alignment. It’s possible this would change in the future if there are a lot more people working on theoretically-motivated prosaic AGI alignment, but I don’t think we’re there yet.
We should also mention Stuart Russell here, since he’s certainly very aware of Bostrom and MIRI but has different detail views and is very grounded in ML.
As somehow who works on AGI safety and cares a lot about it, my main conclusion from reading this is: it would be ideal for you to work on something other than AGI safety! There are plenty of other things to work on that are important, both within and without EA, and a satisfactory resolution to “Is AI risk real?” doesn’t seem essential to usefully pursue other options.
Nor do I think this is a block to comfortable behavior as an EA organizer or role model: it seems fine to say “I’ve thought about X a fair amount but haven’t reached a satisfactory conclusion”, and give people the option of looking into it themselves or not. If you like, you could even say “a senior AGI safety person has given me permission to not have a view and not feel embarrassed about it.”
Unfortunately, a significant part of the situation is that people with internal experience and a negative impression feel both constrained and conflicted (in the conflict of interest sense) for public statements. This applies to me: I left OpenAI in 2019 for DeepMind (thus the conflicted).