Oliver Sourbut

Karma: 383

Autonomous Systems @ UK AI Safety Institute (AISI)
DPhil AI Safety @ Oxford (Hertford college, CS dept, AIMS CDT)
Former senior data scientist and software engineer + SERI MATS

I’m particularly interested in sustainable collaboration and the long-term future of value. I’d love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read—let me know your suggestions! In no particular order, here are some I’ve enjoyed recently

Ord—The Precipice
Pearl—The Book of Why
Bostrom—Superintelligence
McCall Smith—The No. 1 Ladies’ Detective Agency (and series)
Melville—Moby-Dick
Abelson & Sussman—Structure and Interpretation of Computer Programs
Stross—Accelerando
Graeme—The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

Hanabi (can’t recommend enough; try it out!)
Pandemic (ironic at time of writing...)
Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who’ve got to know me only recently are sometimes surprised to learn that I’m a pretty handy trumpeter and hornist.

Oliver Sourbut May 20, 2025, 5:30 AM
1 point
0 ∶ 0
in reply to: Jordan Arel’s comment on: FLF Fellowship on AI for Human Reasoning: $25-50k, 12 weeks
Glad to hear it! Any particular thoughts or suggestions? (Consider applying, or telling colleagues and friends you think would be a good fit!)

FLF Fellowship on AI for Human Reasoning: $25-50k, 12 weeks

Oliver SourbutMay 19, 2025, 1:25 PM

69 points

2 comments EA link

(www.flf.org)

Oliver Sourbut May 19, 2025, 10:57 AM
1 point
0 ∶ 0
on: What if we just…didn’t build AGI? An Argument Against Inevitability
On this note, the Future of Life Foundation (headed by Anthony Aguirre, mentioned in this post) is today launching a fellowship on AI for Human Reasoning.

Why? Whether you expect gradual or sudden AI takeoff, and whether you’re afraid of gradual or acute catastrophes, it really matters how well-informed, clear-headed, and free from coordination failures we are navigating into and through AI transitions. Just the occasion for human reasoning uplift!

12 weeks, $25-50k stipend, mentorship, and potential pathways to future funding and impact. Applications close June 9th.

Oliver Sourbut Apr 23, 2025, 1:31 PM
5 points
0 ∶ 0
on: Knowledge, Reasoning, and Superintelligence
(cross-posted on LW)

Love this!

As presaged in our verbal discussion my top conceptual complement would be to emphasise exploration/experimentation as central to the knowledge production loop—the cycle of ‘developing good taste to plan better experiments to improve taste (and planning model)’ is critical (indispensable?) to ‘produce new knowledge which is very helpful by the standards of human civilization’ (on any kind of meaningful timescale).

This because just flailing, or even just ‘doing stuff’, gets you some novelty of observations, but directedly seeking informative circumstances at the boundaries of the known (which includes making novel unpredictable events happen, as well as getting equipped with richer means to observe and record them, and perhaps preparing to deliberatively extract insight) turns out to be able to mine vastly more insight per resource (time, materials, etc.). Hence science, but also hence individual human and animal playfulness, curiosity, adversarial exercises and drills (self-play ish), and whatnot.

Said another way, maybe I’d characterise ‘the way that fluid intelligence and crystallised intelligence synergise in the knowledge production loop’ as ‘directed exploration/experimentation’?

Having said that, I don’t necessarily think these capacities need to reside ‘in the same mind’, just as contemporary human orgs get more of this done and more effectively than individuals. But the pieces do need to be fit to each other (like, a physicist with great physics taste can’t usually very well complement a bio lab without first becoming a person with great bio taste).

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver SourbutAug 3, 2024, 10:16 AM

4 points

1 comment EA link

(www.oliversourbut.net)

Oliver Sourbut Jul 25, 2024, 1:01 PM
5 points
0 ∶ 0
on: Decomposing Agency — capabilities without desires
I like this decomposition!

I think ‘Situational Awareness’ can quite sensibly be further divided up into ‘Observation’ and ‘Understanding’.

The classic control loop of ‘observe’, ‘understand’, ‘decide’, ‘act’^[1], is consistent with this discussion, where ‘observe’+‘understand’ here are combined as ‘situational awareness’, and you’re pulling out ‘goals’ and ‘planning capacity’ as separable aspects of ‘decide’.

Are there some difficulties with factoring?

Certain kinds of situational awareness are more or less fit for certain goals. And further, the important ‘really agenty’ thing of making plans to improve situational awareness does mean that ‘situational awareness’ is quite coupled to ‘goals’ and to ‘implementation capacity’ for many advanced systems. Doesn’t mean those parts need to reside in the same subsystem, but it does mean we should expect arbitrary mix and match to work less well than co-adapted components—hard to say how much less (I think this is borne out by observations of bureaucracies and some AI applications to date).
1. ↩︎
  Terminology varies a lot; this is RL-ish terminology. Classic analogues might be ‘feedback’, ‘process model’/‘inference’, ‘control algorithm’, ‘actuate’/‘affect’…

Oliver Sourbut Apr 30, 2024, 2:46 PM
6 points
0 ∶ 0
in reply to: Oliver Sourbut’s comment on: Careers Questions Open Thread
A little followup:

I took part in the inaugural SERI MATS programme in 2021-2022 (where incidentally I interacted with Richard), and started an AI Safety PhD at Oxford in 22.

I’m now working for the AI Safety Institute (UK Gov) since Jan 2024 as a hybrid technical expert, utilising my engineering and DS background, alongside AI/ML research and threat modelling. Likely to continue such work, there or elsewhere. Unsure if I’ll finish my PhD in the end, as a result, but I don’t regret it: I produced a little research, met some great collaborators, and had fun while learning as a consequence!

Between the original thread and my leaving for PhD, I’d say I grew my engineering, DS, and project management skills a little, though diminishing, while also doing a lot of AIS prep. My total income also went up while I remained FT employed. This was due for a slowdown as a consequence of stock movements and vesting, but regardless I definitely forwent a lot of money thanks to becoming a student again (and then a researcher rather than a high-paid engineer)! As far as I can tell this is the main price I paid, in terms of both personal situation and impact, and perhaps I should have made the move sooner (though having money in the bank is very freeing and enables indirect impact).

Oliver Sourbut Apr 28, 2024, 9:40 PM
8 points
0 ∶ 0
on: AI Regulation is Unsafe
FWIW I work at the AI Safety Institute UK and we’re considering a range of both misuse and misalignment threats, and there are a lot of smart folks on board taking things pretty seriously. I admit I… don’t fully understand how we ended up in this situation and it feels contingent and precious, as does the tentative international consensus on the value of cooperation on safety (e.g. the Bletchley declaration). Some people in government are quite good, actually!

Oliver Sourbut Jan 1, 2024, 12:11 AM
1 point
0 ∶ 0
in reply to: Larks’s comment on: Public Fundraising has Positive Externalities
Sure, take it or leave it! I think for the field-building benefits it can look more obviously like an externality (though I-the-fundraiser would in fact be pleased and not indifferent, presumably!), but the epistemic benefits could easily accrue mainly to me-the-fundraiser (of course they could also benefit other parties).

Oliver Sourbut Dec 30, 2023, 11:12 AM
12 points
4 ∶ 0
on: Altruism sharpens altruism
How much of this is lost by compressing to something like: virtue ethics is an effective consequentialist heuristic?

I’ve been bought into that idea for a long time. As Shaq says, ‘Excellence is not a singular act, but a habit. You are what you repeatedly do.’

We can also make analogies to martial arts, music, sports, and other practice/drills, and to aspects of reinforcement learning (artificial and natural).

Oliver Sourbut Dec 30, 2023, 11:05 AM
5 points
0 ∶ 0
on: Public Fundraising has Positive Externalities
Simple, clear, thought-provoking model. Thanks!

I also faintly recall hearing something similar in this vicinity: apparently some volunteering groups get zero (or less!?) value from many/most volunteers, but engaged volunteers dominate donations, so it’s worthwhile bringing in volunteers and training them! (citation very much needed)

Nitpick: are these ‘externalities’? I’d have said, ‘side effects’. An externality is a third-party impact from some interaction between two parties. The effects you’re describing don’t seem to be distinguished by being third-party per se (I can imagine glossing them as such but it’s not central or necessary to the model).

Oliver Sourbut Dec 30, 2023, 10:56 AM
5 points
2 ∶ 0
in reply to: SiebeRozendal’s comment on: Attention on AI X-Risk Likely Hasn’t Distracted from Current Harms from AI
Yeah. I also sometimes use ‘extinction-level’ if I expect my interlocutor not to already have a clear notion of ‘existential’.

Oliver Sourbut Dec 21, 2023, 11:22 PM
3 points
1 ∶ 0
on: OpenAI’s Superalignment team has opened Fast Grants
Point of information: at least half the funding comes from Schmidt futures (not OpenAI), though OpenAI are publicising and administrating it.

Oliver Sourbut Nov 1, 2023, 8:49 AM
1 point
0 ∶ 0
on: Thoughts on the AI Safety Summit company policy requests and responses
Another high(er?) priority for governments:
- start building multilateral consensus and preparations on what to do if/when
  - AI developers go rogue
  - AI leaked to/stolen by rogue operators
  - AI goes rogue

Oliver Sourbut Oct 12, 2023, 3:33 PM
31 points
11 ∶ 1
on: Pause For Thought: The AI Pause Debate
I think this is a good and useful post in many ways, in particular laying out a partial taxonomy of differing pause proposals and gesturing at their grounding and assumptions. What follows is a mildly heated response I had a few days ago, whose heatedness I don’t necessarily endorse but whose content seems important to me.

Sadly this letter is full of thoughtless remarks about China and the US/West. Scott, you should know better. Words have power. I recently wrote an admonishment to CAIS for something similar.

The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China) a chance to catch up.

There are literal misanthropic ‘effective accelerationists’ in San Francisco, some of whose stated purpose is to train/develop AI which can surpass and replace humanity. There’s Facebook/Meta, whose leaders and executives have been publicly pooh-poohing discussion of AI-related risks as pseudoscience for years, and whose actual motto is ‘move fast and break things’. There’s OpenAI, which with great trumpeting announces its ‘Superalignment’ strategy without apparently pausing to think, ‘But what if we can’t align AGI in 5 years?‘. We don’t need to invoke bogeyman ‘China’ to make this sort of point. Note also that the CCP (along with EU and UK gov) has so far been more active in AI restraint and regulation than, say, the US government, or orgs like Facebook/Meta.

Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China.

Now, this was in the context of paraphrases of others’ positions on a pause in AI development, so it’s at least slightly mention-flavoured (as opposed to use). But as far as I can tell, the precise framing here has been introduced in Scott’s retelling.

Whoever introduced this formulation, this is bonkers in at least two ways. First, who is ‘the West’ and who is ‘China’? This hypothetical frames us as hivemind creatures in a two-player strategy game with a single lever. Reality is a lot more porous than that, in ways which matter (strategically and in terms of outcomes). I shouldn’t have to point this out, so this is a little bewildering to read. Let me reiterate: governments are not currently pursuing advanced AI development, only companies. The companies are somewhat international, mainly headquartered in the US and UK but also to some extent China and EU, and the governments have thus far been unwitting passengers with respect to the outcomes. Of course, these things can change.

Second, actually think about the hypothetical where ‘we’^[1] are ‘on the verge of creating dangerous AI’. For sufficient ‘dangerous’, the only winning option for humanity is to take the steps we can to prevent, or at least delay^[2], that thing coming into being. This includes advocacy, diplomacy, ‘aggressive diplomacy’ and so on. I put forward that the right length of pause then is ‘at least as long as it takes to make the thing not dangerous’. You don’t win by capturing the dubious accolade of nominally belonging to the bloc which directly destroys everything! To be clear, I think Scott and I agree that ‘dangerous AI’ here is shorthand for, ‘AI that could defeat/destroy/disempower all humans in something comparable to an extinction event’. We already have weak AI which is dangerous to lesser levels. Of course, if ‘dangerous’ is more qualified, then we can talk about the tradeoffs of risking destroying everything vs ‘us’ winning a supposed race with ‘them’.

I’m increasingly running with the hypothesis that many anglophones are mind-killed on the inevitability of contemporary great power conflict in a way which I think wasn’t the case even, say, 5 years ago. Maybe this is how thinking people felt in the run up to WWI, I don’t know.

I wonder if a crux here is some kind of general factor of trustingness toward companies vs toward governments—I think extremising this factor would change the way I talk and think about such matters. I notice that a lot of American libertarians seem to have a warm glow around ‘company/enterprise’ that they don’t have around ‘government/regulation’.

[ In my post about this I outline some other possible cruxes and I’d love to hear takes on these ]

Separately, I’ve got increasingly close to the frontier of AI research and AI safety research, and the challenge of ensuring these systems are safe remains very daunting. I think some policy/people-minded discussions are missing this rather crucial observation. If you expect it to be easy (and expect others to expect that) to control AGI, I can see more why people would frame things around power struggles and racing. For this reason, I consider it worthwhile repeating: we don’t know how to ensure these systems will be safe, and there are some good reasons to expect that they won’t be by default.

I repeat that the post as a whole is doing a service and I’m excited to see more contributions to the conversation around pause and differential development and so on.
1. ↩︎
  Who, me? You? No! Some development team at DeepMind or OpenAI, presumably, or one of the current small gaggle of other contenders, or a yet-to-be-founded lab.
2. ↩︎
  If it comes to it, extinction an hour later is better than an hour sooner.

Oliver Sourbut Sep 28, 2023, 10:02 AM
3 points
0 ∶ 2
on: Aim for conditional pauses

I think that the best work on AI alignment happens at the AGI labs

Based on your other discussion e.g. about public pressure on labs, it seems like this might be a (minor?) loadbearing belief?

I appreciate that you qualify this further in a footnote

This is a controversial view, but I’d guess it’s a majority opinion amongst AI alignment researchers.

I just wanted to call out that I weakly hold the opposite position, and also opposite best guess on majority opinion (based on safety researchers I know). Naturally there are sampling effects!

This is a marginal sentiment, and I certainly wouldn’t trade all lab researchers for non-lab researchers or vice versa. Diversification of research settings seems quite precious, and the dialogue is important to preserve.

I also question

Reasons include: access to the best alignment talent,

because a lot of us are very reluctant to join AGI labs, for obvious reasons! I know folks inside and outside of AGI labs, and it seems to me that the most talented are among the outsiders (but this also definitely could be an artefact of sample sizes).

Oliver Sourbut Sep 22, 2023, 4:34 AM
8 points
0 ∶ 0
in reply to: aog’s comment on: Careless talk on US-China AI competition? (and criticism of CAIS coverage)
This is an exemplary and welcome response: concise, full-throated, actioned. Respect, thank you Aidan.

Sincerely, I hope my feedback was all-considered good from your perspective. As I noted in this post, I felt my initial email was slightly unkind at one point, but I am overall glad I shared it—you appreciate my getting exercised about this, even over a few paragraphs!

It’s important to discuss national AI policies which are often explicitly motivated by goals of competition without legitimizing or justifying zero-sum competitive mindsets which can undermine efforts to cooperate.

Yes, and I repeat that CAIS newsletter has a good balance of nuance, correctness, helpfulness, reach. Hopefully your example here sets the tone for conversations in this space!

Oliver Sourbut Sep 21, 2023, 11:12 AM
1 point
0 ∶ 0
in reply to: Ben Millwood🔸’s comment on: Careless talk on US-China AI competition? (and criticism of CAIS coverage)
(Prefaced with the understanding that your comment is to some extent devil’s advocating and this response may be too)

both the US and Chinese governments have the potential to step in when corporations in their country get too powerful

What is ‘step in’? I think when people are describing things in aggregated national terms without nuance, they’re implicitly imagining govts either already directing, or soon/inevitably appropriating and directing (perhaps to aggressive national interest plays). But govts could just as readily regulate and provide guidance on underprovisioned dimensions (like safety and existential risk mitigation). Or they could in fact be powerless, or remain basically passive until too late, or… (all live possibilities to me).

In these alternative cases, the kind of language and thinking I’m highlighting in the post seems like a sort of nonsense to me—like it doesn’t really parse unless you tacitly assume some foregone conclusions.

Oliver Sourbut Sep 21, 2023, 10:38 AM
1 point
0 ∶ 0
in reply to: Ben Millwood🔸’s comment on: Careless talk on US-China AI competition? (and criticism of CAIS coverage)
Thanks Ben!

Please don’t take these as endorsements that this thinking is correct, just that it’s what I see when I inspect my instincts about this

Appreciated.

These psychological (and real) factors seem very plausible to me for explaining why mistakes in thinking and communication are made.

maybe we can think of the US companies as simultaneously closer friends and closer enemies with each other?

Mhm, this seems less lossy as a hypothetical model. Even if they were only ‘closer friends’, though, I don’t think it’s at all clearcut enough for it to be appropriate to glom them (and with the govt!) when thinking about strategy. And the more so when tempered by ‘closer enemies’. As in, I expect anyone doing that to systematically be (harmfully) wrong in their thinking and writing.

I understand what you’re gesturing at regarding anticipation that US actors might associate more with other US than with Chinese actors. I don’t know what to think here but it seems far from set in stone.

Some personal anecdata. I worked in a growing internet company for some years. One of the big talking points was doing business in China, which involved making deals with Chinese entities. I wasn’t directly involved but I want to say it was… somewhat hard but not prohibitive? We ended up with offices in Shanghai, some employees there, and some folks who travelled back and forth sometimes.^[1] I tentatively think we did more business with China-based entities than with US-based market-competitors. I confidently know we did more business with non-US-based entities than with US-based market-competitors.

Meanwhile and less anecdotally, the stories about smuggling and rules-lawyering sales under the US govt’s limit are literally examples of US- and China- based actors colluding! It’s beyond sloppy to summarise that by drawing boundaries around ‘US’ and ‘China’.

I could of course find examples which reinforce the ‘intra-bloc harmony’ hypothesis. Point is that it seems far from settled, so resting on implicit assumptions here will predictably lead to errors.
1. ↩︎
  As a tongue-in-cheek aside, shockingly, Chinese colleagues I’ve had in industry and academia are not weird aliens with dangerous values (at least not more than usual). Anyone who reasons on bases like these has basically failed (in a very human and understandable way) to reason at all, as far as I’m concerned. Most of the weird aliens with dangerous values I’ve met have been Americans and Brits! (There is obviously an egregious sampling bias.) Reasoning on the basis that others will reason like this is entirely valid, unfortunately.

Oliver Sourbut Sep 20, 2023, 10:10 PM
2 points
0 ∶ 0
in reply to: Gerald Monroe’s comment on: Careless talk on US-China AI competition? (and criticism of CAIS coverage)
Just in case we’re out of sync, let’s briefly refocus on some object details

China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.

Are you aware of the following?
- the smuggling was done by… smugglers
- the buying of chips under the limit was done by multiple suppliers in China
- the selling of chips under the limit was done by Nvidia (and perhaps others)
- the investment in China’s chip industry was done by the CCP
If not, please digest those nuances (and perhaps I need to make them clearer in my OP!) and consider why I object to the phrasing.

You said,

If ground truth reality is you’re in a race to the nuke, dressing up reality in language that denies this is counterproductive.

This is true only if you have sufficient justification to believe confidently in that particular ‘ground truth reality’, and if the cost of speaking with nuance outweighs the expected cost of inflaming tensions in worlds where you’re wrong.

To be clear, I have wide uncertainty on ‘ground truth’ here. From that POV, ‘[People and organisations in] China [have ~~has~~] made several efforts...’ is the ‘clear and honest’ version, while coarse and lossy speech like ‘China has made several efforts...’ is not. I further expect the cost of nuanced speech is low, while the cost of foregone-conclusion speech (if wrong) is high, which I admit is what gets me exercised about this particular lack of nuance and not so much about others (though also others).

What about you? (I note we’re discussing possible geopolitical futures, right? I don’t think humans can be justifiably very confident about questions like this. I object to the use of ‘ground truth’ here on that basis^[1].)

I’m still interested in whether you think those questions I previously gestured at are cruxes, and whether my attempted ITT was about right. I don’t think there is a ‘MIRI’s take’ in this context.

Did you see my section in the OP about excludability of harms as follows?

Separately, a lack of reliable alignment techniques and performance guarantees makes AI-powered belligerent national interest plays look more like bioweapons than like nukes—i.e. minimally-excludable—and perhaps mutually-knowably so! This presently damps the incentive to go after them.

I wrote ‘perhaps mutually-knowably so’ anticipating this kind of ‘ooh AI big stick’ thing, though I remain uncertain. Do you think harm-excludability seems difficult for AGI? Do you think enough people currently/might agree that it’s not like a nuke and more like a bioweapon?

Do you think humanity is sort of doing middling OK on bio? (i.e. not foregone conclusion biowarfare/disasters?) What about climate? Nukes? Clearly we’re doing quite badly but I don’t think the course of the future is set in stone^[1:1] for any of these.

Overall it appears that you’re very (I would say over) confident in this picture. To the extent that you take issue with my asking for nuance (of the kind that takes claims from false unless contorted with caveats to basically true). Perhaps on the basis that what we perceive now (lots of actors of various sizes competing and cooperating on various axes including access to compute) is actually a shadow of what’s unavoidably to come (all-out superpower strife in a race to AGI) and in the latter world the finer distinctions don’t matter?
1. ↩︎↩︎
  I don’t care if you are a physical determinist, we’re finite, tiny computers in a messy world. There might be some ‘ground truth’ about what the future holds, but from our POV it’s stochastic.

Oliver Sourbut

FLF Fel­low­ship on AI for Hu­man Rea­son­ing: $25-50k, 12 weeks

Co­op­er­a­tion and Align­ment in Del­e­ga­tion Games: You Need Both!

FLF Fellowship on AI for Human Reasoning: $25-50k, 12 weeks

Cooperation and Alignment in Delegation Games: You Need Both!