Postdoc at the Digital Economy Lab, Stanford, and research affiliate at the Global Priorities Institute, Oxford. I’m slightly less ignorant about economic theory than about everything else.
trammell
Good Ventures rather than Effective Ventures, no?
The title of this post might give the impression that Rory Stewart was a founder of GiveDirectly. To clarify, GiveDirectly was founded by other people in 2008; Stewart became its president in 2022.
Interesting, thanks for pointing this out! And just to note, that result doesn’t rely on any sort of suspicious knowledge about whether you’re on the planet labeled “x” or “y”; one could also just say “given that you observe that you’re in period 2, …”.
I don’t think it’s right to describe what’s going on here as anthropic shadow though, for the following reason. Let me know what you think.
To make the math easier, let me do what perhaps I should have done from the beginning and have A be the event that the risk is 50% and B be the event that it’s 0%. So in the one-planet case, there are 3 possible worlds:
A1 (prior probability 25%) -- risk is 50%, lasts one period
A2 (prior probability 25%) -- risk is 50%, lasts two periods
B (prior probability 50%) -- risk is 0%, lasts two periods
At time 1, whereas SIA tells us to put credence of 1⁄2 on A, SSA tells us to put something higher--
(0.25 + 0.25/2) / (0.25 + 0.25/2 + 0.5/2) = 3⁄5
--because a higher fraction of expected observers are at period 1 given A than given B. This is the Doomsday Argument. When we reach period 2, both SSA and SIA then tell us to update our credence in A downward. Both principles tell us fully to update downward for the same reasons that we would update downward on the probability of an event that didn’t change the number of observers: e.g. if A is the event you live in a place where the probability of rain per day is 50% and B is the event that it’s 0%; you start out putting credence 50% [or 60%] on A; and you make it to day 2 without rain (and would live to see day 2 either way). But in the catastrophe case SSA further has you update downward because the Doomsday Argument stops applying in period 2.
One way to put the general lesson is that, as time goes on and you learn how many observers there are, SSA has less room to shift probability mass (relative to SIA) toward the worlds where there are fewer observers.
In the case above, once you make it to period 2, that uncertainty is fully resolved: given A or B, you know you’re in a world with 2 observers. This is enough to motivate such a big update according to SSA that at the end the two principles agree on assigning probability 1⁄3 to A.
In cases where uncertainty about the number of observers is only partially resolved in the move from period 1 to period 2--as in my 3-period example, or in your 2-planet example*--then the principles sustain some disagreement in period 2. This is because
SSA started out in period 1 assigning a higher credence to A than SIA;
both recommend updating on the evidence given by survival as you would update on anything else, like lack of rain;
SSA further updates downward because the Doomsday Argument partially loses force; and
the result is that SSA still assigns a higher credence to A than SIA.
*To verify the Doomsday-driven disagreement in period 1 in the two-planet case explicitly (with the simpler definitions of A and B), there are 5 possible worlds:
A1 (prior probability 12.5%) -- risk is 50% per planet, both last one period
A2 (prior probability 12.5%) -- risk is 50% per planet, only x lasts two periods
A3 (prior probability 12.5%) -- risk is 50% per planet, only y lasts two periods
A4 (prior probability 12.5%) -- risk is 50% per planet, both last two periods
B (prior probability 50%) -- risk is 0% per planet, both last two periods
In period 1, SIA gives credence in A of 1⁄2; SSA gives (0.125 + 0.125*2/3 + 0.125*2/3 + 0.125/2) / (0.125 + 0.125*2/3 + 0.125*2/3 + 0.125/2 + 0.5/2) = 5⁄8.
One could use the term “anthropic shadow” to refer to the following fact: As time goes on, in addition to inferring existential risks are unlikely as we would infer that rain is unlikely, SSA further recommends inferring that existential risks are unlikely by giving up the claim that we’re more likely to be in a world with fewer observers; but this second update is attenuated by the (possible) existence of other planets. I don’t have any objection to using the term that way and I do think it’s an interesting point. But I think the old arguments cited in defense of an “anthropic shadow” effect were pretty clearly arguing for the view that we should update less (or even not at all) toward thinking existential risk per unit time is low as time goes on than we would update about the probabilities per unit time of other non-observed events.
Eli Rose was the one who linked to it, to give credit where it’s due : )
I agree that those are different claims, but I expect the weaker claim is also not true, for whatever that’s worth. The claim in Toby Crisford’s Possible Solution 2, as I understand it, is the same as the claim I was making at the end of my long comment: that one could construct some anthropic principle according to which the anthropic shadow argument would be justified. But that principle would have to be different from SSA and SIA; I’m pretty sure it would have to be something which no one has argued for; and my guess is that on thinking about it further most people would consider any principle that fits the bill to have significantly weirder implications than either SSA or SIA.
Ok great! And sorry the numbers in my example got unwieldy, I just picked some probabilities at the beginning and ran with them, instead of bothering to reverse-engineer something cleaner…
I’m not sure I understand the second question. I would have thought both updates are in the same direction: the fact that we’ve survived on Earth a long time tells us that this is a planet hospitable to life, both in terms of its life-friendly atmosphere/etc and in terms of the rarity of supervolcanoes.
We can say, on anthropic grounds, that it would be confused to think other planets are hospitable on the basis of Earth’s long and growing track record. But as time goes on, we get more evidence that we really are on a life-friendly planet, and haven’t just had a long string of luck on a life-hostile planet.
The anthropic shadow argument was an argument along the lines, “no, we shouldn’t get ever more convinced we’re on a life-friendly planet over time (just on the evidence that we’re still around). It is actually plausible that we’ve just had a lucky streak that’s about to break—and this lack of update is in some way because no one is around to observe anything in the worlds that blow up”.
To answer the first question, no, the argument doesn’t rely on SIA. Let me know if the following is helpful.
Suppose your prior (perhaps after studying plate tectonics and so on, but not after considering the length of time that’s passed without an an extinction-inducing supervolcano) is that there’s probability “P(A)”=0.5 that risk of an extinction-inducing supervolcano at the end of each year is 1⁄2 and probability “P(B)”=0.5 that the risk is 1⁄10. Suppose that the world lasts at least 1 year and most 3 years regardless.
Let “A1” be the possible world in which the risk was 1⁄2 per year and we blew up at the end of year 1, “A2” be that in which the risk was 1⁄2 per year and we blew up at the end of year 2, and “A3” be that in which the risk was 1⁄2 per year and we never blew up, so that we got to exist for 3 years. Define B1, B2, B3 likewise for the risk=1/10 worlds.
Suppose there’s one “observer per year” before the extinction event and zero after, and let “Cnk”, with k<=n, be observer #k in world Cn (C can be A or B). So there are 12 possible observers: A11, A21, A22, A31, A32, A33, and likewise for the Bs.
If you are observer Cnk, your evidence is that you are observer #k. The question is what Pr(A|k) is; what probability you should assign to the annual risk being 1⁄2 given your evidence.
Any Bayesian, whether following SIA or SSA (or anything else), agrees that
Pr(A|k) = Pr(k|A)Pr(A)/Pr(k),
where Pr(.) is the credence an observer should have for an event according to a given anthropic principle. The anthropic principles disagree about the values of these credences, but here the disagreements cancel out. Note that we do not necessarily have Pr(A)=P(A): in particular, if the prior P(.) assigns equal probability to two worlds, SIA will recommend assigning higher credence Pr(.) to the one with more observers, e.g. by giving an answer of Pr(coin landed heads) = 1⁄3 in the sleeping beauty problem, where on this notation P(coin landed heads) = 1⁄2.
On SSA, your place among the observers is in effect generated first by randomizing among the worlds according to your prior and then by randomizing among the observers in the chosen world. So Pr(A)=0.5, and
Pr(1|A) = 1⁄2 + 1/4*1/2 + 1/4*1/3 = 17⁄24
(since Pr(n=1|A)=1/2, in which case k=1 for sure; Pr(n=2|A)=1/4, in which case k=1 with probability 1⁄2; and Pr(n=3|A)=1/4, in which case k=1 with probability 1⁄3);
Pr(2|A) = 1/4*1/2 + 1/4*1/3 = 5⁄24; and
Pr(3|A) = 1/4*1/3 = 2⁄24.
For simplicity we can focus on the k=2 case, since that’s the case analogous to people like us, in the middle of an extended history. Going through the same calculation for the B worlds gives Pr(2|B) = 63⁄200, so Pr(2) = 0.5*5/24 + 0.5*63/200 = 157⁄600.
So Pr(A|2) = 125⁄314 ≈ 0.4.
On SIA, your place among the observers is generated by randomizing among the observers, giving proportionally more weight to observers in worlds with proportionally higher prior probability, so that the probability of being observer Cnk is
1/12*Pr(Cn) / [sum over possible observers, labeled “Dmj”, of (1/12*Pr(Dm))].
This works out to Pr(2|A) = 2⁄7 [6 possible observers given A, but the one in the n=1 world “counts for double” since that world is twice as likely than the n=2 or =3 worlds a priori];
Pr(A) = 175⁄446 [less than 1⁄2 since there are fewer observers in expectation when the risk of early extinction is higher], and
Pr(2) = 140⁄446, so
Pr(A|2) = 5⁄14 ≈ 0.36.
So in both cases you update on the fact that a supervolcano did not occur at the end of year 1, from assigning probability 0.5 to the event that the underlying risk is 1⁄2 to assigning some lower probability to this event.
But I said that the disagreements canceled out, and here it seems that they don’t cancel out! This is because the anthropic principles disagree about Pr(A|2) for a reason other than the evidence provided by the lack of a supervolcano at the end of year 1: namely the possible existence of year 3. How to update on the fact that you’re in year 2 when you “could have been” in year 3 gets into doomsday argument issues, which the principles do disagree on. I included year 3 in the example because I worried it might seem fishy to make the example all about a 2-period setting where, in period 2, the question is just “what was the underlying probability we would make it here”, with no bearing on what probability we should assign to making it to the next period. But since this is really the example that isolates the anthropic shadow consideration, observe that if we simplify things so that the world lasts at most 2 years (and there 6 possible observers), SSA gives
Pr(2|A) = 1⁄4, Pr(A) = 1⁄2, Pr(2) = 4⁄5 → Pr(A|2) = 5⁄14.
and SIA gives
Pr(2|A) = 1⁄3, Pr(A) = 15⁄34, Pr(2) = 14⁄34 → Pr(A|2) = 5⁄14.
____________________________
An anthropic principle that would assign a different value to Pr(A|2)--for the extreme case of sustaining the “anthropic shadow”, a principle that would assign Pr(A|2)=Pr(A)=1/2--would be one in which your place among the observers is generated by
first randomizing among times k (say, assigning k=1 and k=2 equal probability);
then over worlds with an observer alive at k, maintaining your prior of Pr(A)=1/2;
[and then perhaps over observers at that time, but in this example there is only one].
This is more in the spirit of SSA than SIA, but it is not SSA, and I don’t think anyone endorses it. SSA randomizes over worlds and then over observers within each world, so that observing that you’re late in time is indeed evidence that “most worlds last late”.
I also found this remarkably clear and definitive—a real update for me, to the point of coming with some actual relief! I’m afraid I wasn’t aware of the existing posts by Toby Crisford and Jessica Taylor.
I suppose if there’s a sociological fact here it’s that EAs and people who are nerdy in similar sorts of ways, myself absolutely included, can be quick to assume a position is true because it sounds reasonable and seemingly thoughtful other people who have thought about the question more have endorsed it. I don’t think this single-handedly demonstrates we’re too quick; not everyone can dig into everything, so at least to some extent it makes sense to specialize and defer despite the fact this is bound to happen now and then.
Of course argument-checking is also something one can specialize in, and one thing about the EA community which I think is uncommon and great is hiring people like Teru to dig into its cultural background assumptions like this...
To my mind, the first point applies to whatever resources are used throughout the future, whether it’s just the earth or some larger part of the universe.
I agree that the number/importance of welfare subjects in the future is a crucial consideration for how much to do longtermist as opposed to other work. But when comparing longtermist interventions—say, splitting a budget between lowering the risk of the world ending and proportionally increasing the fraction of resources devoted to creating happy artificial minds—it would seem to me that the “size of the future” typically multiplies the value of both interventions equally, and so doesn’t matter.
Ok—at Toby’s encouragement, here are my thoughts:
This is a very old point, but to my mind, at least from a utilitarian perspective, the main reason it’s worth working on promoting AI welfare is the risk of foregone upside. I.e. without actively studying what constitutes AI welfare and advocating for producing it, we seem likely to have a future that’s very comfortable for ourselves and our descendants—fully automated luxury space communism, if you like—but which contains a very small proportion of the value that could have been created by creating lots of happy artificial minds. So concern for creating AI welfare seems likely to be the most important way in which utilitarian and human-common-sense moral recommendations differ.
It seems to me that the amount of value we could create if we really optimized for total AI welfare is probably greater than the amount of disvalue we’ll create if we just use AI tools and allow for suffering machines by accident, since in the latter case the suffering would be a byproduct, not something anyone optimizes for.
But AI welfare work (especially if this includes moral advocacy) just for the sake of avoiding this downside also seems valuable enough to be worth a lot of effort on its own, even if suffering AI tools are a long way off. The animal analogy seems relevant: it’s hard to replace factory farming once people have started eating a lot of meat, but in India, where Hinduism has discouraged meat consumption for a long time, less meat is consumed and so factory farming is evidently less widespread.
So in combination, I expect AI welfare work of some kind or another is probably very important. I have almost no idea what the best interventions would be or how cost-effective they would be, so I have no opinion on exactly how much work should go into them. I expect no one really knows at this point. But at face value the topic seems important enough to warrant at least doing exploratory work until we have a better sense of what can be done and how cost-effective it could be, only stopping in the (I think unlikely) event that we can say with some confidence that the best AI welfare work to be done is worse than the best work that can be done in other areas.
- Jul 5, 2024, 3:38 PM; 15 points) 's comment on Discussion Thread: AI Welfare Debate Week by (
The point that it’s better to save people with better lives than people with worse lives, all else equal, does make sense (at least from a utilitarian perspective). So you’re right that [$ / lives saved] is not a perfect approach. I do think it’s worth acknowledging this...!
But the right correction isn’t to use VSLs. The way I’d put it is: a person’s VSL—assuming it’s been ideally calculated for each individual, putting aside issues about how governments estimate it in practice—is how many dollars they value as much as slightly lowering their chance of death. So the fact that VSLs differ across people mixes together two things: a rich person might have a higher VSL than a poor person (1) because the rich person values their life more, or (2) because the rich person values a dollar less. The first thing is right to correct for (from a utilitarian perspective), but as other commenters have noted, the second isn’t.
My guess is that the second factor baked into the VSL is bigger in most real-world comparisons we might want to make, so that it’s less of a mistake to just try to maximize [$ / lives saved] than to try to maximize [$ / (lives saved * VSL)].
Ah I see, sorry. Agreed
I don’t follow—are you saying that (i) AI safety efforts so far have obviously not actually accomplished much risk-reduction, (ii) that this is largely for risk compensation reasons, and (iii) that this is worth emphasizing in order to prevent us from carrying on the same mistakes?
If so, I agree that if (i)-(ii) are true then (iii) seems right, but I’m not sure about (i) and (ii). But if you’re just saying that it would be good to know whether (i)-(ii) are true because if they are then it would be good to do (iii), I agree.
Whoops, thanks! Issues importing from the Google doc… fixing now.
Good to hear, thanks!
I‘ve just edited the intro to say: it’s not obvious to me one way or the other whether it’s a big deal in the AI risk case. I don’t think I know much about the AI risk case (or any other case) to have much of an opinion, and I certainly don’t think anything here is specific enough to come to a conclusion in any case. My hope is just that something here makes it easier to for people who do know about particular cases to get started thinking through the problem.
If I have to make a guess about the AI risk case, I’d emphasize my conjecture near the end, just before the “takeaways” section, namely that (as you suggest) there currently isn’t a ton of restraint, so (b) mostly fails, but that this has a good chance of changing in the future:
Today, while even the most advanced AI systems are neither very capable nor very dangerous, safety concerns are not constraining much below . If technological advances unlock the ability to develop systems which offer utopia if their deployment is successful, but which pose large risks, then the developer’s choice of at any given is more likely to be far below
, and the risk compensation induced by increasing is therefore more likely to be strong.If lots/most of AI safety work (beyond evals) is currently acting more “like evals” than like pure “increases to S”, great to hear—concern about risk compensation can just be an argument for making sure it stays that way!
Thanks for noting this. If in some case there is a positive level of capabilities for which P is 1, then we can just say that the level of capabilities denoted by C = 0 is the maximum level at which P is still 1. What will sort of change is that the constraint will be not C ≥ 0 but C ≥ (something negative), but that doesn’t really matter since here you’ll never want to set C<0 anyway. I’ve added a note to clarify this.
Maybe a thought here is that, since there is some stretch of capabilities along which P=1, we should think that P(.) is horizontal around C=0 (the point at which P can start falling from 1) for any given S, and that this might produce very different results from the example in which there would be a kink at C=0. But no—the key point is whether increases to S change the curve in a way that widens as C moves to the right, and so “act as price decreases to C”, not the slope of the curve around C=0. E.g. if (for , and 0 above), then in the k=0 case where the lab is trying to maximize , they set , and so P is again fixed (here, at 2⁄3) regardless of S.
Notes on risk compensation
Hey David, I’ve just finished a rewrite of the paper which I’m hoping to submit soon, which I hope does a decent job of both simplifying it and making clearer what the applications and limitations are: https://philiptrammell.com/static/Existential_Risk_and_Growth.pdf
Presumably the referees will constitute experts on the growth front at least (if it’s not desk rejected everywhere!), though the new version is general enough that it doesn’t really rely on any particular claims about growth theory.
Hold on, just to try wrapping up the first point—if by “flat” you meant “more concave”, why do you say “I don’t see how [uncertainty] could flatten out the utility function. This should be in “Justifying a more cautious portfolio”?”
Did you mean in the original comment to say that you don’t see how uncertainty could make the utility function more concave, and that it should therefore also be filed under “Justifying a riskier portfolio”?
Ok great!
And ok, I agree that the answer to the first question is probably “yes”, so maybe what I was calling an alternative anthropic principle in my original comment could be framed as SSA with this directly time-centric reference class. If so, instead of saying “that’s not SSA”, I should have said “that’s not SSA with a standard reference class (or a reference class anyone seems to have argued for)”. I agree that Bostrom et al. (2010) don’t seem to argue for such a reference class.
On my reading (and Teru’s, not coincidentally), the core insight Bostrom et al. have (and iterate on) is equivalent to the insight that if you haven’t observed something before, and you assign it a probability per unit of time equal to its past frequency, then you must be underestimating its probability per unit of time. The response isn’t that this is predicated on, or arguing for, any weird view on anthropics, but just that it has nothing to do with anthropics: it’s true, but for the same reason that you’ll underestimate the probability of rain per unit time based on past frequency if it’s never rained (though in the prose they convey their impression that the fact that you wouldn’t exist in the event of a catastrophe is what’s driving the insight). The right thing to do in both cases is to have a prior and update the probability downward as the dry spell lengthens. A nonstandard anthropic principle (or reference class) is just what would be necessary to motivate a fundamental difference from “no rain”.