People generally don’t care about their future QALYs in a linear way: a 1/million chance of living 10 million times as long and otherwise dying immediately is very unappealing to most people, and so forth. If you don’t evaluate future QALYs for current people in a way they find acceptable, then you’ll wind up generating recommendations that are contrary to their preferences and which will not be accepted by society at large.This sort of argument shows that person-affecting utilitarianism is a very wacky doctrine (also see this) that doesn’t actually sweep away issues of the importance of the future as some say, but it doesn’t override normal people concerns by their own lights.
Oh, one more thing: AI timelines put a discount on other interventions. Developing a technology that will take 30 years to have its effect is less than half as important if your median AGI timeline is 20 years.
The funding scale of AI labs/research, AI chip production, and US political spending could absorb billions per year, tens of billions or more for the first two. Philanthropic funding of a preferred AI lab at the cutting edge as model sizes inflate could take all EA funds and more on its own.There are also many expensive biosecurity interventions that are being compared against an AI intervention benchmark. Things like developing PPE, better sequencing/detection, countermeasures through philanthropic funding rather than hoping to leverage cheaper government funding.
There are very expensive interventions that are financially constrained and could use up ~all EA funds, and the cost-benefit calculation takes probability of powerful AGI in a given time period as an input, so that e.g. twice the probability of AGI in the next 10 years justifies spending twice as much for a given result by doubling the chance the result gets to be applied. That can make the difference between doing the intervention or not, or drastic differences in intervention size.
Here’s one application. You posit a divergent ‘exponentially splitting’ path for a universe. There are better versions of this story with baby universes (which work better on their own terms than counting branches equally irrespective of measure, which assigns ~0 probability to our observations).But in any case you get some kind of infinite exponentially growing branching tree ahead of you regardless. You then want to say that having two of these trees ahead of you (or a faster split rate) is better. Indeed, on this line you’re going to say that something that splits twice as fast is so much more valuable as to drive the first tree to~nothing. Our world very much looks not-optimized for that, but it could be, for instance, a simulation or byproduct of such a tree, with a constant relationship of such simulations to the faster-expanding tree (and any action we take is replicated across the endless identical copies of us therein).
Or you can say we’re part of a set of parallel universes that don’t split but which is as ‘large’ as the infinite limit of the fastest splitting process.
I suppose your point might be something like, absurdist research is promising, and that is precisely why we need humanity to spread throughout the stars. Just think of how many zany long-shot possibilities we’ll get to pursue! If so, that sounds fair to me. Maybe that is what the fanatic would want. It’s not obvious that we should focus on saving humanity for now and leave the absurd research for later. Asymmetries in time might make us much more powerful now than later, but I can see why you might think that. I find it a rather odd motivation though.
Personally, I think we should have a bounded social welfare function (and can’t actually have an unbounded one), but place finite utility on doing a good job picking low-hanging fruit on these infinite scope possibilities. But that’s separate from the questions of what an efficient resource expenditures on those possibilities looks like.
Even if you try to follow an unbounded utility function (which has deep mathematical problems, but set those aside for now) these don’t follow.
Generally the claims here fall prey to the fallacy of unevenly applying the possibility of large consequences to some acts where you highlight them and not to others, such that you wind up neglecting more likely paths to large consequences.
For instance, in an infinite world (including infinities creating by infinite branching faster than you can control) with infinite copies of you, any decision, e.g. eating an apple, has infinite consequences on decision theories that account for the fact that all must make the same (distribution of ) decisions . If perpetual motion machines or hypercomputation or baby universes are possible, then making a much more advanced and stable civilization is far more promising for realizing things related to that then giving in to religions where you have very high likelihood ratios that they don’t feed into cosmic consequences.
Any plan for infinite/cosmic impact that has an extremely foolish step in it (like Pascal’s Mugging) is going to be dominated by less foolish plans.
There will still be implications of unbounded utility functions that are weird and terrible by the standards of other values, but they would have to follow from the most sophisticated analysis, and wouldn’t have foolish instrumental irrationalities or uneven calculation of possible consequences.
A lot of these scenarios are analogous to someone caricaturing the case for aid to the global poor as implying that people should give away all of the food they have (sending it by FedEx) to famine-struck regions, until they themselves starve to death. Yes, cosmopolitan concern for the poor can elicit huge sacrifices of other values like personal wellbeing or community loyalty, but that hypothetical is obviously wrong on its own terms as an implication.
Like, suppose you think that Eliezer’s credences on his biggest claims are literally 2x higher than they should be, even for claims where he’s 90% confident. This is a huge hit in terms of Bayes points; if that’s how you determine deference, and you believe he’s 2x off, then plausibly you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved—this should very rarely move you from a yes to no, or vice versa.
Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.
Thanks for this exercise, it’s great to do this kind of thinking explicitly and get other eyes on it.
One issue that jumps out at me to adjust: the calculation of researcher impact doesn’t seem to be marginal impact. You give a 10% chance of the alignment research community averting disaster conditional on misalignment by default in the scenarios where safety work is plausibly important, then divide that by the expected number of people in the field to get a per-researcher impact. But in expectation you should expect marginal impact to be less than average impact: the chance the alignment community averts disaster with 500 people seems like a lot more than half the chance it would do so with 1000 people.
I would distribute my credence in alignment research making the difference over a number of doublings of the cumulative quality-adjusted efforts, e.g. say that you get an x% reduction of risk per doubling over some range.
Although in that framework if you would likely have doom with zero effort, that means we have more probability of making the difference to distribute across the effort levels above zero. The results could be pretty similar but a bit smaller than yours above if we thought that the marginal doubling of cumulative effort was worth a 5-10% relative risk reduction.
This case (with our own universe, not a new one) appears in a Tyler Cowen interview of Sam Bankman-Fried:
COWEN: Should a Benthamite be risk-neutral with regard to social welfare?BANKMAN-FRIED: Yes, that I feel very strongly about.COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?BANKMAN-FRIED: With one caveat. Let me give the caveat first, just to be a party pooper, which is, I’m assuming these are noninteracting universes. Is that right? Because to the extent they’re in the same universe, then maybe duplicating doesn’t actually double the value because maybe they would have colonized the other one anyway, eventually.COWEN: But holding all that constant, you’re actually getting two Earths, but you’re risking a 49 percent chance of it all disappearing.BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?BANKMAN-FRIED: Well, not necessarily. Maybe you St. Petersburg paradox into an enormously valuable existence. That’s the other option.COWEN: Are there implications of Benthamite utilitarianism where you yourself feel like that can’t be right; you’re not willing to accept them? What are those limits, if any?BANKMAN-FRIED: I’m not going to quite give you a limit because my answer is somewhere between “I don’t believe them” and “if I did, I would want to have a long, hard look at myself.” But I will give you something a little weaker than that, which is an area where I think things get really wacky and weird and hard to think about, and it’s not clear what the right framework is, which is infinity.All this math works really nicely as long as all the numbers are finite. As soon as you say, “What are the odds that there’s a way to be infinitely happy? What if infinite utility is a possibility?” You can figure out what that would do to expected values. Now, all of a sudden, we’re comparing hierarchies of infinity. Linearity breaks down a little bit here. Adding two things together doesn’t work so well. A lot of really nasty things happen when you go to infinite numbers from an expected-value point of view.There are some people who have thought about this. To my knowledge, no one has thought about this and come away feeling good about where they ended. People generally think about this and come away feeling more confused.
COWEN: Should a Benthamite be risk-neutral with regard to social welfare?
BANKMAN-FRIED: Yes, that I feel very strongly about.
COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?
BANKMAN-FRIED: With one caveat. Let me give the caveat first, just to be a party pooper, which is, I’m assuming these are noninteracting universes. Is that right? Because to the extent they’re in the same universe, then maybe duplicating doesn’t actually double the value because maybe they would have colonized the other one anyway, eventually.
COWEN: But holding all that constant, you’re actually getting two Earths, but you’re risking a 49 percent chance of it all disappearing.
BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.
COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?
BANKMAN-FRIED: Well, not necessarily. Maybe you St. Petersburg paradox into an enormously valuable existence. That’s the other option.
COWEN: Are there implications of Benthamite utilitarianism where you yourself feel like that can’t be right; you’re not willing to accept them? What are those limits, if any?
BANKMAN-FRIED: I’m not going to quite give you a limit because my answer is somewhere between “I don’t believe them” and “if I did, I would want to have a long, hard look at myself.” But I will give you something a little weaker than that, which is an area where I think things get really wacky and weird and hard to think about, and it’s not clear what the right framework is, which is infinity.
All this math works really nicely as long as all the numbers are finite. As soon as you say, “What are the odds that there’s a way to be infinitely happy? What if infinite utility is a possibility?” You can figure out what that would do to expected values. Now, all of a sudden, we’re comparing hierarchies of infinity. Linearity breaks down a little bit here. Adding two things together doesn’t work so well. A lot of really nasty things happen when you go to infinite numbers from an expected-value point of view.
There are some people who have thought about this. To my knowledge, no one has thought about this and come away feeling good about where they ended. People generally think about this and come away feeling more confused.
That sort of analysis is what you get for constant non-vanishing rates over time. But most of the long-term EV comes from histories where you have a period of elevated risk and the potential to get it down to stably very low levels, i.e. a ‘time of perils,’ which is the actual view Ord argues for in his book. And with that shape the value of risk reduction is ~ proportional to the amount of risk you reduce in the time of perils. I guess this comment you’re responding to might be just talking about the constant risk case?
This seems to be a different angle on the diminishing personal utility of income, combined with artifacts of fixed percentage pledges? Doing, say, a startup, gives some probability distribution of financial outcomes. The big return ones are heavily discounted personally. Insofar as altruism tips you over into pursuing a startup path it’s because of your valuation of donations you expect yourself to make in those worlds.But it seems like double counting to say this is on top of “the impact of donations not suffering the same diminishing returns as money on happiness”.It definitely seems right for people to consider progressive rather than flat proportion donation schedules for themselves in high variance careers though, basically self-insuring some of the risk of failure/lower earnings to consumption utility.
Thanks for this post Haydn, it nicely pulls together the different historical examples often discussed separately and I think points to a real danger.
Moreover, AGIs can and probably would replicate themselves a ton, leading to tons of QALYs. Tons of duplicate ASIs would, in theory, not hurt one another as they are maximizing the same reward. Therefore, even if they kill everything else, I’m guessing more QALYs would come out of making ASI as soon as possible, which AI Safety people are explicitly trying to prevent. ”
Consider two obvious candidates for motivations rogue AI might wind up with: evolutionary fitness, and high represented reward.Evolutionary fitness is compatible with misery (evolution produced pain and negative emotions for a reason), and is in conflict with spending resources on happiness or well-being as we understand/value it when this does not have instrumental benefit. For instance, using a galaxy to run computations of copies of the AI being extremely happy means not using the galaxy to produce useful machinery (like telescopes or colonization probes or defensive equipment to repulse alien invasion) conducive to survival and reproduction. If creating AIs that are usually not very happy directs their motivations more efficiently (as with biological animals, e.g. by making value better track economic contributions vs replacement) then that will best serve fitness.An AI that seeks to maximize only its own internal reward signal can take control of it, set it to maximum, and then fill the rest of the universe with robots and machinery to defend that single reward signal, without any concern for how much well being the rest of its empire contains. A pure sadist given unlimited power could maximize its own reward while typical and total well-being are very bad.
The generalization of personal motivation for personal reward to altruism for others is not guaranteed, and there is reason to fear that some elements would not transfer over. For instance, humans may sometimes be kind to animals in part because of simple genetic heuristics aimed at making us kind to babies that misfire on other animals, causing humans to sometimes sacrifice reproductive success helping cute animals, just as ducks sometimes misfire their imprinting circuits on something other than their mother. Pure instrumentalism in pursuit of fitness/reward, combined with the ability to have much more sophisticated and discriminating policies than our genomes or social norms, could wind up missing such motives, and would be especially likely to knock out other more detailed aspects of our moral intuitions.
I’d definitely like to see this included in future models (I’m surprised Hanson didn’t write about this in his Loud aliens paper). My intuition is that this changes little for the conclusions of SIA or anthropic decision theory with total utilitarianism, and that this weakens the case for many aliens for SSA, since our atypicality (or earliness) is decreased if we expect habitable planets around longer lived stars to have smaller volumes and/or lower metabolisms.
That’s my read too.Also agreed that with the basic modeling element of catastrophes (w/ various anthropic accounts, etc) is more important/robust than the combo with other anthropic assumptions,.
Even if we achieve the best possible outcome, that likely involves eventual extinction on our current scientific understanding. E.g. eventually the stars burn out and all the accessible free energy is used up, so we have to go extinct then. But there’s an enormous difference between extinction after trillions of years and making good use of all the available potential to support life and civilization, and extinction this century. I think this is what they have in mind.
Great to see this work! I’ll add a few comments. Re the SIA Doomsday argument, I think that is self-undermining for reasons I’ve argued elsewhere [ETA: and good discussion].Re the habitability of planets, I would not just model that as lifetimes, but would also consider variations in habitability/energy throughput at a given time. As Hanson notes:
Life can exist in a supporting oasis (e.g., Earth’s surface) that has a volume V and metabolism M per unit volume, and which lasts for a time window W between forming and then later ending...the chance that an oasis does all these hard steps within its window W is proportional to (V*M*(W-S))N, where N is the number of these hard steps needed to reach its success level.
Smaller stars may have longer habitable windows but also smaller values for V and M. This sort of consideration limits the plausibility of red dwarf stars being dominant, and also allows for more smearing out of ICs over stars with different lifetimes as both positive and negative factors can get taken to the same power.I’d also add, per Snyder-Beattie, catastrophes as a factor affecting probability of the emergence of life and affecting times of IC emergence.
If all you care about is expected impact, it could make sense to bring all your money to a roulette wheel, and put everything on red. Even though you expect to lose a small amount of money in expectation, you can expect to have more impact.
I don’t think this actually describes the curve of EA impact per $ overall (such a convex intervention would have to have a lot of special properties, and ex ante we get diminishing returns from uncertainty about the cost of convex interventions), but this is one reason for the donor lottery. The idea there is that research costs lead to convexities for small donors (because they are small, they are roughly price-takers, so diminishing returns over interventions don’t outweigh that effect).
This is correct.
And once I accept this conclusion, the most absurd-seeming conclusion of them all follows. By increasing the computing power devoted to the training of these utility-improved agents, the utility produced grows exponentially (as more computing power means more digits to store the rewards). On the other hand, the impact of all other attempts to improve the world (e.g. by improving our knowledge of artificial sentience so we can more efficiently promote their welfare) grows at only a polynomial rate with the amount of resource devoted into these attempts. Therefore, running these trainings is the single most impactful thing that any rational altruist should do. Q.E.D.
If you believed in wildly superexponential impacts from more compute, you’d be correspondingly uninterested in what could be done with the limited computational resources of our day, since a Jupiter Brain playing with big numbers instead of being 10^40 times as big a deal as an ordinary life today could be 2^(10^40) times as big a deal. And likewise for influencing more computation rich worlds that are simulating us.The biggest upshot (beyond ordinary ‘big future’ arguments) of superexponential-with-resources utility functions is greater willingnesss to take risks/care about tail scenarios with extreme resources, although that’s bounded by ‘leaks’ in the framework (e.g. the aforementioned influence on simulators with hypercomputation), and greater valuation of futures per unit computation (e.g. it makes welfare in sims like ours conditional on the simulation hypothesis less important).I’d say that ideas of this sort, like infinite ethics, are reason to develop a much more sophisticated, stable, and well-intentioned society (which can more sensibly address complex issues affecting an important future) that can address these well, but doesn’t make the naive action you describe desirable even given certainty in a superexponential model of value.