Owen Cotton-Barratt

Karma: 10,254

Disempowerment spirals as a likely mechanism for existential catastrophe

Raymond DApr 10, 2025, 2:38 PM

13 points

1 comment5 min readEA link

Owen Cotton-Barratt Apr 10, 2025, 6:43 AM
3 points
1 ∶ 0
in reply to: Holly Elmore ⏸️ 🔸’s comment on: Selling out to AI companies is bad. Period. You will be corrupted.
These are in the same category because:
- I’m talking about game-changing improvements to our capabilities (mostly via more cognitive labour; not requiring superintelligence)
- These are the capacities that we need to help everyone to recognize the situation we’re in and come together to do something about it (and they are partial substitutes: the better everyone’s epistemics are, the less need for a big lift on coordination which has to cover people seeing the world very differently)
I’m not actually making a claim about alignment difficulty—beyond that I do think systems in the vein of those today and the near-successors of those look pretty safe.
I think that getting people to pause AI research would be a bigger lift than any nonproliferation treaties we’ve had in the past (not that such treaties have always been effective!). This isn’t just a military tech, it’s a massively valuable economic tech. Given the incentives, and the importance of having treaties actually followed, I do think this would be a more difficult challenge than any past nonproliferation work. I don’t think that means it’s impossible, but I do think it’s way more likely if something shifts—hence my 1-3.
(Or if you were asking why I say “out of reach now” in the quoted sentence it’s because I’m literally talking about “much better coordination” as a capability; not what could or couldn’t be achieved with a certain level of coordination.)

Owen Cotton-Barratt Apr 9, 2025, 11:51 AM
3 points
1 ∶ 0
in reply to: Holly Elmore ⏸️ 🔸’s comment on: Selling out to AI companies is bad. Period. You will be corrupted.
I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.
I think that the degree of consensus you’d need for the position that you’re outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:
1. Scientific ~consensus—people look to scientists for thought leadership on this stuff. Plausibly you could have a scientist-driven moratorium (this still feels like a stretch, but less than just switching the way society sees AI without having the scientists leading that)
2. Freak-out about everyday implications of AI—sufficiently advanced AI would not just pose unprecedented risks, but also represent a fundamental change in the human condition. This could drive a tide of strong sentiment, that doesn’t rely on abstract arguments about danger.
3. Much better epistemics and/or coordination—out of reach now, put potentially obtainable with stronger tech.
I think there’s potentially something to each of these. But I think the GDM paper is (in expectation) actively helpful for 1 and probably 3, and doesn’t move the needle much either way on 2.
(My own view is that 3 is the most likely route to succeed. There’s some discussion of the pragmatics of this route in AI Tools for Existential Security or AI for AI Safety (both of which also discuss automation of safety research, which is another potential success route), and relevant background views on the big-picture strategic situation in the Choice Transition. But I also feel positive about people exploring routes 1 and 2.)

Owen Cotton-Barratt Apr 9, 2025, 8:50 AM
6 points
2 ∶ 5
in reply to: Holly Elmore ⏸️ 🔸’s comment on: Selling out to AI companies is bad. Period. You will be corrupted.
I agree that there could be an effect that keeps people from speaking out about AI danger. But:
- I think that such political incentives can occur whenever anyone is dealing with external power-structures, and in practice my impression is that these are a bigger deal for people who want jobs in AI policy compared to people engaged with frontier AI companies
- This argument has most force in arguing that some EAs should keep professional and social distance from frontier AI companies, not that everyone should
- Working at a frontier AI company (or having worked at one) can give people a better platform to talk about these issues!
  - Both because of giving people deeper expertise (so they are actually more informed on key questions), but also because of making that legible to the outside world
  - For instance, I feel better about GDM publishing their recent content on safety and security than not, and I think the paper would have had much less impact on public discourse if it had come from an unaffiliated group

Owen Cotton-Barratt Apr 9, 2025, 8:25 AM
47 points
21 ∶ 1
on: Selling out to AI companies is bad. Period. You will be corrupted.
I downvoted this (but have upvoted some of your comments).
I think this advice is at minimum overstated, and likely wrong and harmful (at least if taken literally). And it’s presented with rhetorical force, so that it seems to mostly be trying to push people’s views towards a position that is (IMO) harmful, rather than mostly providing them with information to help them come to their own conclusions.
TBC:
- I think you probably have things to add here, and in particular feel quite curious what’s led you to the view that people here inevitably get corrupted (which doesn’t match my impression), or how you think that corruption manifests
- I’m in favour of people having access to the “henchman of a supervillain” perspective (which could help them to notice things they might otherwise overlook); the thing I’m objecting to is rhetorically projecting it as the deep truth of the situation (which I think it isn’t)

Knowledge, Reasoning, and Superintelligence

Owen Cotton-BarrattMar 26, 2025, 11:28 PM

17 points

2 comments1 min readEA link

(strangecities.substack.com)

Owen Cotton-Barratt Mar 20, 2025, 10:58 PM
4 points
0 ∶ 0
in reply to: OscarD🔸’s comment on: AI Tools for Existential Security
Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn’t feel like a very strong argument—the whole point is that we may care about accelerating applications even if it’s not by a long period. And I don’t think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
Also, we could make a similar argument that “automated safety” research won’t get dropped, since it’s so obviously in the interests of whoever’s winning the race.

Owen Cotton-Barratt Mar 20, 2025, 10:53 PM
6 points
0 ∶ 0
in reply to: OscarD🔸’s comment on: AI Tools for Existential Security
UI and complementary technologies: I’m sort of confused about your claim about comparative advantage. Are you saying that there aren’t people in this community whose comparative advantage might be designing UI? That would seem surprising.
More broadly, though:
- I’m not sure how much “we can just outsource this” really cuts against the core of our argument (how to get something done is a question of tactics, and it could still be a strategic priority even if we just wanted to spend a lot of money on it)
- I guess I feel, though, that you’re saying this won’t be a big bottleneck
  - I think that that may be true if you’re considering automated alignment research in particular. But I’m not on board with that being the clear priority here

Owen Cotton-Barratt Mar 20, 2025, 10:49 PM
6 points
0 ∶ 0
in reply to: OscarD🔸’s comment on: AI Tools for Existential Security
Compute allocation: mostly I think that “get people to care more” does count as the type of thing we were talking about. But I think that it’s not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.

Owen Cotton-Barratt Mar 20, 2025, 10:47 PM
4 points
0 ∶ 0
in reply to: OscarD🔸’s comment on: AI Tools for Existential Security
Training data: I agree that the stuff you’re pointing to seems worthwhile. But I feel like you’ve latched onto a particular type of training data, and you’re missing important categories, e.g.:
- Epistemics stuff—there are lots of super smart people earnestly trying to figure out very hard questions, and I think that if you could access their thinking, there would be a lot there which would compare favourably to a lot of the data that would be collected from people in this community. It wouldn’t be so targeted in terms of the questions it addressed (e.g. “AI strategy”, but learning good epistemics may be valuable and transfer over)
- International negotiation, and high-stakes bargaining in general—potentially very important, but not something I think our community has any particular advantage at
- Synthetic data—a bunch of things may be unlocked more by working out how to enable “self-play” (or the appropriate analogue), rather than just collecting more data the hard way

Owen Cotton-Barratt Mar 20, 2025, 10:31 PM
18 points
0 ∶ 0
on: Owen Cotton-Barratt’s Quick takes
It seems like “what can we actually do to make the future better (if we have a future)?” is a question that keeps on coming up for people in the debate week.
I’ve thought about some things related to this, and thought it might be worth pulling some of those threads together (with apologies for leaving it kind of abstract). Roughly speaking, I think that:
- ~Optimal futures flow from having a good reflective process steering things
- It’s sort of a race to have a good process steering before a bad process
  - Averting AI takeover and averting human takeover are both ways to avoid the bad process thing (although of course it’s possible to have a takeover still lead to a good process)
- We’re going to need higher powered epistemic+coordination tech to build the good process
  - But note that these tools are also very useful for avoiding falling into extinction or other bad trajectories, so this activity doesn’t cleanly fall out on either side of the “make the future better” vs “make there be a future” debate
There are some other activities which might help make the future better without doing so much to increase the chance of having a future, e.g.:
- Try to propagate “good” values (I first wrote “enlightenment” instead of “good”, since I think the truth-seeking element is especially important for ending up somewhere good; but others may differ), to make it more likely that they’re well-represented in whatever entities end up steering
- Work to anticipate and reduce the risk of worst-case futures (e.g. by cutting off the types of process that might lead there)
However, these activities don’t (to me) seem as high leverage for improving the future as the more mixed-purpose activities.

Owen Cotton-Barratt Mar 18, 2025, 10:20 AM
2 points
1 ∶ 0
in reply to: Toby Tremlett🔹’s comment on: Discussion Thread: Existential Choices Debate Week
Ughh … baking judgements about what’s morally valuable into the question somehow doesn’t seem ideal. Like I think it’s an OK way to go for moral ~realists, but among anti-realists you might have people persistently disagreeing about what counts as extinction.
Also like: what if you have a world which is like the one you describe as an extinction scenario, but there’s a small amount of moral value in some subcomponent of that AI system. Does that mean it no longer counts as an extinction scenario?
I’d kind of propose instead using the typology Will proposed here, and making the debate between (1) + (4) on the one hand vs (2) + (3) on the other.

Owen Cotton-Barratt Mar 17, 2025, 6:07 PM
8 points
1 ∶ 0
in reply to: William_MacAskill’s comment on: William_MacAskill’s Shortform
Fairly strong agree—I’m personally higher on all of (2), (3), (4) than I am on (1).
The main complication is that I think among realistic activities we can pursue, often they won’t correspond to a particular one of these; instead having beneficial effects on multiple. But I still think it’s worth asking “which is it high priority to make plans targetting?”, even if many of the best plans end up being those which aren’t so narrow as to target one to the exclusion of the others.

Owen Cotton-Barratt Mar 17, 2025, 4:10 PM
2 points
0 ∶ 0
in reply to: Toby Tremlett🔹’s comment on: Discussion Thread: Existential Choices Debate Week
This is right. But to add even more complication:
- I think most AI x-risk (in expectation) doesn’t lead to human extinction, but a noticeable fraction does
- But a lot even of the fraction that leads to human extinction seems to me like it probably doesn’t count as “extinction” by the standards of this question, since it still has the earth-originating intelligence which can go out and do stuff in the universe
  - However, I sort of expect people to naturally count this as “extinction”?
Since it wasn’t cruxy for my rough overall position, I didn’t resolve this last question before voting, although maybe it would get me to tweak my position a little.

Owen Cotton-Barratt Mar 17, 2025, 10:19 AM
10 points
2 ∶ 2
on: Discussion Thread: Existential Choices Debate Week
To some extent I reject the question as not-super-action-guiding (I think that a lot of work people do has impacts on both things).
But taking it at face value, I think that AI x-risk is almost all about increasing the value of futures where “we” survive (even if all the humans die), and deserves most attention. Literal extinction of earth-originating intelligence is mostly a risk from future war, which I do think deserves some real attention, but isn’t the main priority right now.

AI Tools for Existential Security

LizkaMar 14, 2025, 6:37 PM

52 points

8 comments11 min readEA link

(www.forethought.org)

Owen Cotton-Barratt Dec 20, 2024, 11:36 PM
2 points
0 ∶ 1
in reply to: Anthony DiGiovanni’s comment on: The ‘Dog vs Cat’ cluelessness dilemma (and whether it makes sense)
IMO the betting odds framing gets things backwards. Bets are decisions, which are made rational by whether the beliefs they’re justified by are rational. I’m not sure what would justify the betting odds otherwise.
Not sure what I overall think of the better odds framing, but to speak in its defence: I think there’s a sense in which decisions are more real than beliefs. (I originally wrote “decisions are real and beliefs are not”, but they’re both ultimately abstractions about what’s going on with a bunch of matter organized into an agent-like system.) I can accept the idea of X as an agent making decisions, and ask what those decisions are and what drives them, without implicitly accepting the idea that X has beliefs. Then “X has beliefs” is kind of a useful model for predicting their behaviour in the decision situations. Or could be used (as you imply) to analyse the rationality of their decisions.
I like your contrived variant of the pi case. But to play on it a bit:
- Maybe when I first find out the information on Sally, I quickly eyeball and think that defensible credences probably lie within the range 30% to 90%
- Then later when I sit down and think about it more carefully, I think that actually the defensible credences are more like in the range 40% to 75%
- If I thought about it even longer, maybe I’d tighten my range a bit further again (45% to 55%? 50% to 70%? I don’t know!)
In this picture, no realistic amount of thinking I’m going to do will bring it down to just a point estimate being defensible, and perhaps even the limit with infinite thinking time would have me maintain an interval of what seems defensible, so some fundamental indeterminacy may well remain.
But to my mind, this kind of behaviour where you can tighten your understanding by thinking more happens all of the time, and is a really important phenomenon to be able to track and think clearly about. So I really want language or formal frameworks which make it easy to track this kind of thing.
Moreover, after you grant this kind of behaviour [do you grant this kind of behaviour?], you may notice that from our epistemic position we can’t even distinguish between:
- Cases where we’d collapse our estimated range of defensible credences down to a very small range or even a single point with arbitrary thinking time, but where in practice progress is so slow that it’s not viable
- Cases where even in the limit with infinite thinking time, we would maintain a significant range of defensible credences
Because of this, from my perspective the question of whether credences are ultimately indeterminate is … not so interesting? It’s enough that in practice a lot of credences will be indeterminate, and that in many cases it may be useful to invest time thinking to shrink our uncertainty, but in many other cases it won’t be.

Owen Cotton-Barratt Dec 20, 2024, 11:10 AM
4 points
0 ∶ 0
in reply to: Anthony DiGiovanni’s comment on: The ‘Dog vs Cat’ cluelessness dilemma (and whether it makes sense)
I appreciated a bunch of things about this comment. Sorry, I’ll just reply (for now) to a couple of parts.
The metaphor with hedonism felt clarifying. But I would say (in the metaphor) that I’m not actually arguing that it’s confused to intrinsically care about the non-hedonist stuff, but that it would be really great to have an account of how the non-hedonist stuff is or isn’t helpful on hedonist grounds, both because this may just be helpful to input into our thinking to whatever extent we endorse hedonist goods (even if we may also care about other things), and because without having such an account it’s sort of hard to assess how much of our caring for non-hedonist goods is grounded in themselves, vs in some sense being debunked by the explanation that they are instrumentally good to care about on hedonist grounds.
I think the piece I feel most inclined to double-click on is the digits of pi piece. Reading your reply, I realise I’m not sure what indeterminate credences are actually supposed to represent (and this is maybe more fundamental than “where do the numbers come from?”). Is it some analogue of betting odds? Or what?
And then, you said:
I think this fights the hypothetical. If you “make guesses about your expectation of where you’d end up,” you’re computing a determinate credence and plugging that into your EV calculation. If you truly have indeterminate credences, EV maximization is undefined.
To some extent, maybe fighting the hypothetical is a general move I’m inclined to make? This gets at “what does your range of indeterminate credences represent?”. I think if you could step me through how you’d be inclined to think about indeterminate credences in an example like the digits of pi case, I might find that illuminating.
(Not sure this is super important, but note that I don’t need to compute a determinate credence here—it may be enough have an indeterminate range of credences, all of which would make the EV calculation fall out the same way.)

Owen Cotton-Barratt Dec 16, 2024, 3:58 PM
3 points
0 ∶ 2
in reply to: Anthony DiGiovanni’s comment on: The ‘Dog vs Cat’ cluelessness dilemma (and whether it makes sense)
I’d be keen to hear more why you’re unsatisfied with these accounts.
With the warning that this may be unsatisfying, since this is recounting a feeling I’ve had historically, and I’m responding to my impression about a range of accounts, rather than providing sharp complaints about a particular account:
- Accounts of imprecise credences seem typically to produce something like ranges of probabilities and then treat these as primitives
- I feel confusion about “where does the range come from? what’s it supposed to represent?”
  - Honestly this echoes some of my unease about precise credences in the first place!
- So I am into exploration of imprecise credences as a tool for modelling/describing the behaviour of boundedly rational actors (including in some contexts as a normative ideal for them to follow)
- But I think I get off the train before reification of the imprecise credences as a thing unto themselves
(that’s incomplete, but I think it’s the first-order bit of what seems unsatisfying)
Just to be clear, are you saying: “It’s a view that, for all/most indeterminate credences we might have, our prioritization decisions (e.g. whether intervention X is net-good or net-bad) aren’t sensitive to variation within the ranges specified by these credences”?
Definitely not saying that!
Instead I’m saying that in many decision-situations people find themselves in, although they could (somewhat) narrow their credence range by investing more thought, in practice the returns from doing that thinking aren’t enough to justify it, so they shouldn’t do the thinking.
If your estimate of your ideal-precise-credence-in-the-limit is itself indeterminate, that seems like a big deal — you have no particular reason to adopt a determinate credence then, seems to me.
I don’t see probabilities as magic absolutes, rather than a tool. Sometimes it seems helpful to pluck a number out of the air and roll with that (and that to be better practice than investing cognition in keeping track of an uncertainty range).
That said, I’m not sure it’s crucial to me to model there being a single precise credence that is being approximated. What feels more important is to be able to model the (common) phenomenon where you can reduce your uncertainty by investing more time thinking.
Later in your comment you use the phrase “rationally obligated”. I find I tend to shy away from that phrase in this context, because of vagueness about whether it means for fully rational or boundedly rational actors. In short:
- I’m sympathetic to the idea that fully rational actors should have precise credences
  - (for the normal vNM kind of reasons)
  - I don’t want to fully commit to that view, but it also doesn’t seem to me to be cruxy
- I don’t think that boundedly rational actors are rationally obliged to have precise credences
- But I don’t think that entails giving up on the idea of them making progress towards something (that I might think of as “the precise credence a fully rational version of them would have”) by thinking more, by saying “you have no reason to adopt a precise credence”
Because if the sign of intervention X for the long-term varies across your range of credences, that means you don’t have a reason to do X on total-EV grounds.
I reject this claim. For a toy example, suppose that I could take action X, which will lose me $1 if the 20th digit of Pi is odd, and gain me $2 if the 20th digit of Pi is even. Without doing any calculations or looking it up, my range of credences is [0,1] -- if I think about it long enough (at least with computational aids), I’ll resolve it to 0 or 1. But right now I can still make guesses about my expectation of where I’d end up (somewhere close to 50%), and think that this is a good bet to take—rather than saying that EV somehow doesn’t give me any reason to like the bet.
This seems hugely decision-relevant to me, if we have other decision procedures under cluelessness available to us other than committing to a precise best guess, as I think we do
For what it’s worth I’m often pretty sympathetic to other decision procedures than committing to a precise best guess (cluelessness or not).
ETA: I’m also curious whether, if you agreed that we aren’t rationally obligated to assign determinate credences in many cases, you’d agree that your arguments about unknown unknowns here wouldn’t work. (Because there’s no particular reason to commit to one “simplicity prior,” say. And the net direction of our biases on our knowledge-sampling processes could be indeterminate.)
I don’t think I’d agree with that. Although I could see saying “yes, this is a valid argument about unknown unknowns; however, it might be overwhelmed by as-yet-undiscovered arguments about unknown unknowns that point in the other direction, so we should be suspicious of resting too much on it”.
What links here?
- Should you go with your best guess?: Against precise Bayesianism and related views by Anthony DiGiovanni (Jan 27, 2025, 8:25 PM; 73 points)
- Should you go with your best guess?: Against precise Bayesianism and related views by Anthony DiGiovanni (LessWrong; Jan 27, 2025, 8:25 PM; 65 points)

Owen Cotton-Barratt Nov 29, 2024, 11:56 AM
4 points
0 ∶ 1
in reply to: Jim Buhler’s comment on: The ‘Dog vs Cat’ cluelessness dilemma (and whether it makes sense)
I think this is at least in the vicinity of a crux?
My immediate thoughts (I’d welcome hearing about issues with these views!):
- I don’t think our credences all ought to be determinate/precise
- But I’ve also never been satisfied with any account I’ve seen of indeterminate/imprecise credences
  - (though noting that there’s a large literature there and I’ve only seen a tiny fraction of it)
- My view would be something more like:
  - As boundedly rational actors, it makes sense for a lot of our probabilities to be imprecise
  - But this isn’t a fundamental indeterminacy — rather, it’s a view that it’s often not worth expending the cognition to make them more precise
  - By thinking longer about things, we can get the probabilities to be more precise (in the limit converging on some precise probability)
  - At any moment, we have credence (itself kind of imprecise absent further thought) about where our probabilities will end up with further thought
  - What’s the point of tracking all these imprecise credences rather than just single precise best-guesses?
    It helps to keep tabs on where more thinking might be helpful, as well as where you might easily be wrong about something
- On this perspective, cluelessness = inability to get the current best guess point estimate of where we’d end up to deviate from 50% by expending more thought
What links here?
- Anthony DiGiovanni's comment on The ‘Dog vs Cat’ cluelessness dilemma (and whether it makes sense) by Jim Buhler (Dec 13, 2024, 1:21 PM; 7 points)
- Owen Cotton-Barratt's comment on The ‘Dog vs Cat’ cluelessness dilemma (and whether it makes sense) by Jim Buhler (Dec 16, 2024, 3:58 PM; 3 points)

Owen Cotton-Barratt

Disem­pow­er­ment spirals as a likely mechanism for ex­is­ten­tial catastrophe

Knowl­edge, Rea­son­ing, and Superintelligence

AI Tools for Ex­is­ten­tial Security

Disempowerment spirals as a likely mechanism for existential catastrophe

Knowledge, Reasoning, and Superintelligence

AI Tools for Existential Security