Researcher at the Center on Long-Term Risk. All opinions my own.
Anthony DiGiovanni đ¸
Ah I missed the â2 states of the world which are exactly the sameâ part, sorry. Then yeah the EVs would be the same. Iâm not sure how this is supposed to support your original commentâs argument though.
Depends on the details of what the intervals are supposed to represent. E.g.:
Say you have a representor (imprecise probabilities) where EV_P(A) = EV_P(B) = [-1, 1].
On one hand:
If:
for p1 in P, EV_p1(A) = â1 while EV_p1(B) = 1, and
for p2 in P, EV_p2(A) = 1 while EV_p2(B) = â1,
then A and B are incomparable.
OTOH:
If for all p in P, EV_p(A) = EV_p(B), then A and B are comparable.
(Ofc there are lots of other cases.)
How to not do deÂciÂsion theÂory backwards
When do inÂtuÂitions need to be reÂliÂable?
Hi Vasco. I think Figure 3 here, and the surrounding discussion of how imprecision works, might answer your objection.
The idea is:
Suppose two actions have precise EVs. Youâll presumably grant that a tiny change in the (expected) location of electrons can flip the difference in EV from +epsilon to -epsilon.
If so, then a tiny change in the (expected) location of electrons can flip the lower bound of an imprecise difference in EV from +epsilon to -epsilon.
What makes two actions incomparable, under the imprecise EV model, is that the interval of EV differences crosses zero.
So, itâs unsurprising that a tiny change in the (expected) location of electrons can flip the two actions from âcomparableâ to âincomparableâ.
Can you say which step in this argument you reject, and why?
especially the observation that successful prediction systems across most domains use cluster not sequence thinking.
I find this âobservationâ confusing /â misleading, given that Holden defines cluster thinking as aggregating decisions from multiple perspectives. This is very different from aggregating the predictions of multiple models. The evidence of âsuccessâ he cites only applies to the latter (where âsuccessâ is with respect to Brier scores and such), not the former.
And this is practically relevant: If you aggregate multiple models but then maximize EV under the aggregated model, you donât get the âsandboxingâ property Holden claims cluster thinking satisfies. The fanatical/âPascalian model will still dominate the EV calculation.
(ETA: As an aside on sequence thinking /â cluster thinking generally, I wish these discussions made it very clear whether weâre taking ST/âCT as (1) different normative standards for good epistemology /â decision-making per se, vs. as (2) different procedures for satisfying a given epistemological /â decision-theoretic standard. Cf. âcriterion of rightness vs. decision procedureâ in ethics. This would be helpful for clarifying whatâs meant by claims like âcluster thinking is how âsuccessfulâ prediction systems operateâ. Iâve been assuming (2), here, FWIW.)
I think if youâre savvy you will probably find a way to make the astronomical thing go betterâsuch as doing strategy/âprioritization/âdeconfusion work, or working on robustly good intermediate desiderata, or building skills/âmoney in case thereâs more clarity in the future
What do you think about the arguments for cluelessness from imprecision, e.g., here? (I explain more why I think weâre clueless even about the things you list, here.)
Thanks for this! For what itâs worth, some issues Iâve found with the âCRIBSâ and âEA Epistemic Auditorâ reviews for drafts of philosophical blog posts:
excessively allergic to âhedgingâ, and to sections of posts meant to preempt very important misreadings
flagging some points as âhidden assumptionsâ even when theyâre explicitly addressed in the post, or seem clearly irrelevant to the argument
critiquing claim X as not empirically supported, when X is the claim âY isnât empirically supportedâ.
But theyâre somewhat useful for surfacing what kinds of misunderstandings readers might have.
(Sorry, due to lack of time I donât expect Iâll reply further. But thank you for the discussion! A quick note:)
from the subjective feeling (in your mind) that their EVs feel very hard to compare
EV is subjective. Iâd recommend this post for more on this.
I donât know exactly what you mean by âfeels very hard to compareâ. Iâd appreciate more direct responses to the arguments in this post, namely, about how the comparison seems arbitrary.
I see arbitrary choices as a reason for further research to decrease their uncertainty
First, itâs already very big-if-true if all EA intervention candidates other than âdo more researchâ are incomparable with inaction.
Second, âdo more researchâ is itself an action whose sign seems intractably sensitive to things weâre unaware of. I discuss this here.
However, by actual value, you mean a set of possible values
No, I mean just one value.
why would weighted sums of actual masses representing expected masses not be comparable?
Sorry, by âexpectedâ I meant imprecise expectation, since you gave intervals in your initial comment. Imprecise expectations are incomparable for the reasons given in the post â I worry weâre talking past each other.
What do you mean by actual mass?
The mass that the object in fact has. :) Sorry, not sure I understand the confusion.
I think expected masses are comparable because possible masses are comparable.
I donât think this follows. Iâm interested in your responses to the arguments I give for the framework in this post.
Would your framework suggest the mass of the objects is incomparable
Yes, for the expected mass.
I believe my best guess should be that the mass of one is smaller, equal, or larger than that of the other
Why? (The actual mass must be either smaller, equal, or larger, but I donât see why that should imply that the expected mass is.)
yep, thanks!
Quotes: Recent discussions of backfire risks in AI safety
Some thinkers in AI safety have recently pointed out various backfire effects that attempts to reduce AI x-risk can have. I think pretty much all of these effects were known before,[1] but itâs helpful to have them front of mind. In particular, Iâm skeptical that we can weigh these effects against the upsides precisely enough to say an AI x-risk intervention is positive or negative in expectation, without making an arbitrary call. (Even if our favorite intervention doesnât have these specific downsides, we should ask if weâre pricing in the downsides (and upsides) we havenât yet discovered.)
(Emphasis mine, in all the quotes below.)
Holdenâs Oct 2025 80K interview:
Holden: I mean, take any project. Letâs just take something that seems really nice, like alignment research. Youâre trying to detect if the AI is scheming against you and make it not scheme against you. Maybe thatâll be good. But maybe the thing youâre doing is something that is going to get people excited, and then theyâre going to try it instead of doing some other approach. And then it doesnât work, and the other approach would have worked. Well, now youâve done tremendous harm. Maybe it will work fine, but it will give people a false sense of security, make them think the problem is solved more than it is, make them move on to other things, and then youâll have a tremendous negative impact that way.
Rob Wiblin: Maybe itâll be used by a human group to get more control, to more reliably be able to direct an AI to do something and then do a power grab.
Holden: [...] Maybe it would have been great if the AIs took over the world. Maybe weâll build AIs that are not exactly aligned with humans; theyâre actually just much better â theyâre kind of like our bright side, theyâre the side we wish we were. [...]
[⌠M]aybe alignment is just a really⌠What it means is that youâre helping make sure that someone whoâs intellectually unsophisticated â thatâs us, thatâs humans â remains forever in control of the rest of the universe and imposes whatever dumb ideas we have on it forevermore, instead of having our future evolve according to things that are much more sophisticated and better reasoners following their own values.
[...]
Holden: I just think AI is too multidimensional, and thereâs too many considerations pointing in opposite directions. Iâm worried about AIs taking over the world, but Iâm also worried about the wrong humans taking over the world. And a lot of those things tend to offset each other, and making one better can make the other worse. [...]
[T]hereâs also a lot of micro ways in which you could do harm. Just literally working in safety and being annoying, you might do net harm. You might just talk to the wrong person at the wrong time, get on their nerves. Iâve heard lots of stories of this. Just like, this person does great safety work, but they really annoyed this one person, and that might be the reason we all go extinct.
[...]
Option value in the policy world is kind of a bad concept anyway. A lot of times when youâre at a nonprofit or a company and you donât know what to do, you try and preserve option value. But giving the government the option to go one way or the other, thatâs not a neutral intervention â itâs just like you donât know what theyâre going to do with that option. Giving them the option could have been bad. ⌠you donât know whoâs going to be in power when, and whether theyâre going to have anything like the goals that you had when you put in some power that they had. I know people have been excited at various points about giving government more power and then at other points giving government less power.
And all this stuff, I mean, this one axis youâre talking about: centralisation of power versus decentralisation. Most things that touch policy at all in any way will move us along that spectrum in one direction or another, so therefore have a high chance of being negative [...]
And then most things that you can do in AI at all will have some impact on policy. Even just alignment research: policy will be shaped by what weâre seeing from alignment research, how tractable it looks, what the interventions look like.
[⌠I]n AI, itâs easier to annoy someone and polarise them against you, because whatever it is youâre trying to do, thereâs some coalition thatâs trying to do the exact opposite. In certain parts of global health and farm animal welfare, thereâs certainly people who want to prioritise it less, but it doesnât have the same directional ambiguity.
Helen Tonerâs Nov 2025 80K interview:
Helen: And I think thereâs a natural tension here as well among some people who are very concerned about existential risk from AI, really bad outcomes, and AI safety: thereâs this sense that itâs actually helpful if thereâs only a smaller number of players. Because, one, they can coordinate better â so maybe if racing leads to riskier outcomes, if you just have two top players, they can coordinate more directly than if you have three or four or 10 â and also a smaller number of players is going to be easier for an outside body to regulate, so if you just have a small number of companies, thatâs going to be easier to regulate.
[...] But the problem is then the âThen what?â question of, if you do manage to avoid some of those worst-case outcomes, and then you have this incredibly powerful technology in the hands of a very small number of people, I think just historically thatâs been really bad. Itâs really bad when you have small groups that are very powerful, and typically it doesnât result in good outcomes for the rest of the world and the rest of humanity.
[...]
Rob: I feel like weâre in a very difficult spot, because so many of the obvious solutions that you might have, or approaches you might take to dealing with loss of control do make the concentration of power problem worse and vice versa. So what policies you favour and disfavour depends quite sensitively on the relative risk of these two things, the relative likelihood of things going negatively in one way versus the other way.
And at least on the loss of control thing, people disagree so much on the likelihood. People who are similarly informed, know about everything there is to know about this, go all the way from thinking itâs a 1-in-1,000 chance to itâs a 1-in-2 chance â a 0.1% likelihood to 50% chance that we have some sort of catastrophic loss of control. And discussing it leads sometimes to some convergence, but people just have not converged on a common sense of how likely this outcome is.
So the people who think itâs 50% likely that we have some catastrophic loss-of-control event, itâs understandable that they think, âWell, we just have to make the best of it. Unfortunately, we have to concentrate. Itâs the only way. And the concentration of power stuff is very sad and going to be a difficult issue to deal with, but we have to bear that cost.â And people who think itâs one in 1,000 are going to say, âThis is a terrible move that youâre making, because weâre accepting much more risk, weâre creating much more risk than weâre actually eliminating.â
Wei Dai, âLegible vs. Illegible AI Safety Problemsâ:
Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or hard to understand, or in a common cognitive blind spot), meaning there is a high risk that leaders and policymakers will decide to deploy or allow deployment even if they are not solved. (Of course, this is a spectrum, but I am simplifying it to a binary for ease of exposition.)
From an x-risk perspective, working on highly legible safety problems has low or even negative expected value. Similar to working on AI capabilities, it brings forward the date by which AGI/âASI will be deployed, leaving less time to solve the illegible x-safety problems.
So then, the difference between (a) and (b) is purely empirical, and MNB does not allow me to compare (a) and (b), right? This is what Iâd find a bit arbitrary, at first glance.
Gotcha, thanks! Yeah, I think itâs fair to be somewhat suspicious of giving special status to ânormative viewsâ. Iâm still sympathetic to doing so for the reasons I mention in the post (here). But it would be great to dig into this more.
What would the justification standards in wild animal welfare say about uncertainty-laden decisions that involve neither AI nor animals: e.g. as a government, deciding which policies to enact, or as a US citizen, deciding who to vote for President?
Yeah, I think this is a feeling that the folks working on bracketing are trying to capture: that in quotidian decision-making contexts, we generally use the factors we arenât clueless about (@Anthony DiGiovanniâI think I recall a bracketing piece explicitly making a comparison to day-to-day decision making, but now canât find it⌠so correct me if Iâm wrong!). So Iâm interested to see how that progresses.
I think the vast majority of people making decisions about public policy or who to vote for either arenât ethically impartial, or theyâre âspotlightingâ, as you put it. I expect the kind of bracketing Iâd endorse upon reflection to look pretty different from such decision-making.
That said, maybe youâre thinking of this point I mentioned to you on a call: I think even if someone is purely self-interested (say), they plausibly should be clueless about their actionsâ impact on their expected lifetime welfare, because of strange post-AGI scenarios (or possible afterlives, simulation hypotheses, etc.).[1] See this paper. So it seems like the justification for basic prudential decision-making might have to rely on something like bracketing, as far as I can tell. Even if itâs not the formal theory of bracketing given here. (I have a draft about this on the backburner, happy to share if interested.)
- ^
I used to be skeptical of this claim, for the reasons argued in this comment. I like the âimpartial goodness is freaking weirdâ intuition pump for cluelessness given in the comment. But Iâve come around to thinking âtime-impartial goodness, even for a single moral patient who might live into the singularity, is freaking weirdâ.
- ^
Would you say that what dictates my view on (a)vs(b) is my uncertainty between different epistemic principles
It seems pretty implausible to me that there are distinct normative principles that, combined with the principle of non-arbitrariness I mention in the âProblem 1â section, imply (b). Instead I suspect Vasco is reasoning about the implications of epistemic principles (applied to our evidence) in a way Iâd find uncompelling even if I endorsed precise Bayesianism. So I think Iâd answer ânoâ to your question. But I donât understand Vascoâs view well enough to be confident.
Can you explain more why answering ânoâ makes metanormatively bracketing in consequentialist bracketing (a bit) arbitrary? My thinking is: Let E be epistemic principles that, among other things, require non-arbitrariness. (So, normative views that involve E might provide strong reasons for choice, all else equal.) If itâs sufficiently implausible that E would imply Vascoâs view, then E will still leave us clueless, because of insensitivity to mild sweetening.
Given that the intervals are both derived from a representor P, the interval of EV diffs is {EV_p(A) - EV_p(B) | p in P}. See also here.