Geoffrey Irving

Karma: 554

Safety Researcher and Scalable Alignment Team lead at DeepMind. AGI will probably be wonderful; let’s make that even more probable.

Geoffrey Irving Jun 11, 2022, 10:25 PM
3 points
0 ∶ 0
in reply to: Theo Hawking’s comment on: ‘Consequentialism’ is being used to mean several different things
Aha. Well, hopefully we can agree that those philosophers are adding confusion. :)

Geoffrey Irving Jun 11, 2022, 8:37 PM
12 points
0 ∶ 0
in reply to: Jeff Kaufman 🔸’s comment on: Leaving Google, Joining the Nucleic Acid Observatory
Those aspects are getting weaker, but the ability for ML to models humans is getting stronger, and there are other “computer acting as salesperson” channels which don’t go through Privacy Sandbox. But probably I’m just misusing the term “ad tech” here, and “convince someone to buy something” tech might be a better term.

Geoffrey Irving Jun 11, 2022, 3:53 PM
7 points
0 ∶ 0
on: ‘Consequentialism’ is being used to mean several different things
Saying that consequentialist theories are “often agent neutral” may only add confusion, as it’s not a part of the definition and indeed “consequentialism can be agent non-neutral” is part of what separates it from utilitarianism.

Geoffrey Irving Jun 11, 2022, 2:51 PM
9 points
0 ∶ 0
in reply to: Michael Huang’s comment on: How much funding is needed to eradicate Malaria?
Is that any particular confidence interval? It seems implausible that it would be so tight.

Geoffrey Irving Jun 11, 2022, 10:09 AM
11 points
0 ∶ 0
on: Leaving Google, Joining the Nucleic Acid Observatory
Congratulations on the switch!

I enjoyed your ads blog post, by the way. Might be fun to discuss that sometime, both because (1) I’m funded by ads and (2) I’m curious how the picture will shift as ad tech gets stronger.

Geoffrey Irving May 13, 2022, 7:43 PM
4 points
0 ∶ 0
in reply to: Ferenc Huszár’s comment on: Bad Omens in Current Community Building
Nice to see you here, Ferenc! We’ve talked before when I was at OpenAI and you Twitter, and always happy to chat if you’re pondering safety things these days.

DeepMind is hiring for the Scalable Alignment and Alignment Teams

Rohin ShahMay 13, 2022, 12:19 PM

102 points

0 comments9 min readEA link

Geoffrey Irving May 9, 2022, 8:53 PM
2 points
0 ∶ 0
on: What are the coolest topics in AI safety, to a hopelessly pure mathematician?
In outer alignment one can write down a correspondence between ML training schemes that learn from human feedback and complexity classes related to interactive proof schemes. If we model the human as a (choosable) polynomial time algorithm, then

1. Debate and amplification get to PSPACE, and more generally $n$ -step debate gets to $Σ_{n} P$ .
2. Cross-examination gets to NEXP.
3. If one allows opaque pointers, there are schemes that go further: market making gets to R.
Moreover, we informally have constraints on which schemes are practical based on properties of their complexity class analogues. In particular, interactive proofs schemes are only interesting if they relativize: we also have $I P = P S P A C E$ and thus a single prover gets to PSPACE given an arbitrary polynomial time verifier, but w.r.t. a typical oracle $I P^{O} < P S P A C E^{O}$ . My sense is there are further obstacles that can be found: my intuition is that “market making = R” isn’t the right theorem once obstacles are taken into account, but don’t have a formalized model of this intuition.
The reason this type of intuition is useful is humans are unreliable, and schemes that reach high complexity class analogies should (everything else equal) give more help to the humans in noticing problems with ML systems.
I think there’s quite a bit of useful work that can be done pushing this type of reasoning further, but (full disclosure) it isn’t of the “solve a fully formalized problem” sort. Two examples:

1. As mentioned above, I find “market making = R” unlikely to the right result. But this doesn’t mean that market making isn’t an interesting scheme: there are connections between market making and Paul Christiano’s learning the prior scheme. As previously formalized, market making misses a practical limitation on the available human data (the $n$ -way assumption in that link), so there may be work to do to reformalize it into a more limited complexity class in a more useful way.

2. Two-player debate is only one of many possible schemes using self-play to train systems, and in particular one could try to shift to $n$ -player schemes in order to reduce playing-for-variance strategies where a behind player goes for risky lies in order to possibly win. But the “polynomial time judge” model can’t model this situation, as there is no variance when trying to convince a deterministic algorithm. As a result, there is a need for more precise formalization that can pick up the difference between self-play schemes that are more or less robust to human error, possibly related to CRMDPs.

Geoffrey Irving Apr 22, 2022, 4:50 PM
2 points
0 ∶ 0
in reply to: Ada-Maaria Hyvärinen’s comment on: How I failed to form views on AI safety
That is also very reasonable! I think the important part is to not feel to bad about the possibility of never having a view (there is a vast sea of things I don’t have a view on), not least because I think it actually increases the chance of getting to the right view if more effort is spent.

(I would offer to chat directly, as I’m very much part of the subset of safety close to more normal ML, but am sadly over capacity at the moment.)

Geoffrey Irving Apr 18, 2022, 10:02 PM
11 points
0 ∶ 0
in reply to: DaneelO’s comment on: How I failed to form views on AI safety
Yep, that’s very fair. What I was trying to say was that if in response to the first suggestion someone said “Why aren’t you deferring to others?” you could use that as a joke backup, but agreed that it reads badly.

Geoffrey Irving Apr 18, 2022, 1:49 PM
13 points
0 ∶ 0
in reply to: Geoffrey Irving’s comment on: How I failed to form views on AI safety
(I’m happy to die on the hill that that threshold exists, if you want a vicious argument. :))

Geoffrey Irving Apr 18, 2022, 1:44 PM
8 points
0 ∶ 0
in reply to: Linch’s comment on: How I failed to form views on AI safety
I think the key here is that they’ve already spent quite a lot of time investigating the question. I would have a different reaction without that. And it seems like you agree my proposal is best both for the OP and the world, so perhaps the real sadness is about the empirical difficulty at getting people to consensus?

At a minimum I would claim that there should exist some level of effort past which you should not be sad not arguing, and then the remaining question is where the threshold is.

Geoffrey Irving Apr 17, 2022, 7:21 PM
9 points
0 ∶ 1
on: How I failed to form views on AI safety
As somehow who works on AGI safety and cares a lot about it, my main conclusion from reading this is: it would be ideal for you to work on something other than AGI safety! There are plenty of other things to work on that are important, both within and without EA, and a satisfactory resolution to “Is AI risk real?” doesn’t seem essential to usefully pursue other options.

Nor do I think this is a block to comfortable behavior as an EA organizer or role model: it seems fine to say “I’ve thought about X a fair amount but haven’t reached a satisfactory conclusion”, and give people the option of looking into it themselves or not. If you like, you could even say “a senior AGI safety person has given me permission to not have a view and not feel embarrassed about it.”

Geoffrey Irving Feb 22, 2022, 10:58 AM
4 points
0 ∶ 0
on: Deliberate Performance in People Management
This is a great article, and I will make one of those spreadsheets!

Though I can’t resist pointing out that, assuming you got 99.74% out of an Elo calculation, I believe the true probability of them beating you is way higher than 99.74%. :)

Geoffrey Irving Oct 22, 2021, 6:53 PM
1 point
0 ∶ 0
in reply to: EricBlair’s comment on: If Superpositions can Suffer
Yes, Holevo as you say. By information I mean the standard definitions.

Geoffrey Irving Oct 22, 2021, 3:17 PM
1 point
0 ∶ 0
in reply to: EricBlair’s comment on: If Superpositions can Suffer
The issue is not the complexity, but the information content. As mentioned, n qbits can’t store more than n bits of classical information, so the best way to think of them is “n bits of information with some quantum properties”. Therefore, it’s implausible that they correspond to exponential utility.

Geoffrey Irving Oct 22, 2021, 8:36 AM
3 points
0 ∶ 0
on: If Superpositions can Suffer
This is somewhat covered by existing comments, but to add my wording:

It’s highly unlikely that utility is exponential in quantum state, for roughly the same reason that quantum information is not exponential in quantum state. That is, if you have n qbits, you can hold n bits of classical information, not 2^n. You can do more computation with n qbits, but only in special cases.

Geoffrey Irving Oct 16, 2021, 12:44 PM
23 points
0 ∶ 0
in reply to: Charles He’s comment on: An update in favor of trying to make tens of billions of dollars
The rest of this comment is interesting, but opening with “Ummm, what?” seems bad, especially since it takes careful reading to know what you are specifically objecting to.

Edit: Thanks for fixing!

Geoffrey Irving Oct 10, 2021, 11:08 PM
30 points
0 ∶ 0
in reply to: David Johnston’s comment on: Why aren’t you freaking out about OpenAI? At what point would you start?
Unfortunately we may be unlikely to get a statement from a departed safety researcher beyond mine (https://forum.effectivealtruism.org/posts/fmDFytmxwX9qBgcaX/why-aren-t-you-freaking-out-about-openai-at-what-point-would?commentId=WrWycenCHFgs8cak4), at least currently.

Geoffrey Irving Oct 10, 2021, 6:46 PM
32 points
0 ∶ 0
in reply to: Jaime Sevilla’s comment on: Why aren’t you freaking out about OpenAI? At what point would you start?
It can’t be up to date, since they recently announced that Helen Toner joined the board, and she’s not listed.

Geoffrey Irving

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

DeepMind is hiring for the Scalable Alignment and Alignment Teams