I am a research analyst at the Center on Long-Term Risk.
I’ve worked on grabby aliens, the optimal spending schedule for AI risk funders, and evidential cooperation in large worlds.
Some links
I am a research analyst at the Center on Long-Term Risk.
I’ve worked on grabby aliens, the optimal spending schedule for AI risk funders, and evidential cooperation in large worlds.
Some links
Thanks again Phil for taking the read this through and for the in-depth feedback.
I hope to take some time to create a follow-up post, working in your suggestions and corrections as external updates (e.g. to the parameters of lower total AI risk funding, shorter Metaculus timelines).
I don’t know if the “only one big actor” simplification holds closely enough in the AI safety case for the “optimization” approach to be a better guide, but it may well be.
This is a fair point.
The initial motivator for the project was for AI s-risk funding, of which there’s pretty much one large funder (and not much work is done on AI s-risk reduction outside of people and organizations and people outside the effective altruism community) though this result is entirely on AI existential risk, which is less well modeled as a single actor.
My intuition is that the “one big actor” does work sufficiently well for the AI risk community given the shared goal (avoid an AI existential catastrophe) and my guess that a lot of the AI risk done by the community doesn’t change the behaviour of AI labs much (i.e. it could be that they choose to put more effort into capabilities over safety because of work done by the AI risk community, but I’m pretty sure this isn’t happening).
For example, the value of spending after vs. before the “fire alarm” seems to depend erroneously on the choice of units of money. (This is the second bit of red-highlighted text in the linked Google doc.) So I’d encourage someone interested in quantifying the optimal spending schedule on AI safety to start with this model, but then comb over the details very carefully.
To comment on this particular error (though not to say that other errors Phil points to are not also unproblematic—I’ve yet to properly go through them), for what it’s worth, the main results of the post suppose zero post fire alarm spending[1] and (fortunately) since in our results we use units of millions of dollars and take the initial capital to be on the order of 1000 $m, I don’t think we face this problem of smaller having the reverse than desired effect for
In a future version I expect I’ll just take the post-fire alarm returns to spending to use the same returns exponent from before the fire alarm but have some multiplier—i.e. returns to spending before the fire-alarm and afterwards.
Though if one thinks there will many good opportunities to spend after a fire alarm, our main no-fire-alarm results would likely be an overestimate
Strong agreement that a global moratorium would be great.
I’m unsure if aiming for a global moratorium is the best thing to aim for rather than a slowing of the race-like behaviour—maybe a relevant similar case is whether to aim directly for the abolition of factory farms or just incremental improvements in welfare standards.
This post from last year—What an actually pessimistic containment strategy looks like - has some good discussion on the topic of slowing down AGI research.
Thanks for the transcript and sharing this. The coverage seems pretty good, and the airplane crash analogy seems pretty helpful for communicating - I expect to use it in the future!
I agree. This lines with models of optimal spending I worked on which allowed for a post-fire alarm “crunch time” in which one can spend a significant fraction of remaining capital.
I think “different timelines don’t change the EV of different options very much” plus “personal fit considerations can change the EV of a PhD by a ton” does end up resulting in an argument for the PhD decision not depending much on timelines. I think that you’re mostly disagreeing with the first claim, but I’m not entirely sure.
Yep, that’s right that I’m disagreeing with the first claim. I think one could argue the main claim either by:
Regardless of your timelines, you (person considering doing a PhD) shouldn’t take it too much into consideration
I (advising you on how to think about whether to do a PhD) think timelines are such that you shouldn’t take timelines too much into consideration
I think (1) is false, and think that (2) should be qualified by how one’s advice would change depending on timelines. (You do briefly discuss (2), e.g. the SOTA comment).
To put my cards on the table, on the object level, I have relatively short timelines and that fewer people should be doing PhDs on the margin. My highly speculative guess is that this post has the effect of marginally pushing more people towards doing PhDs (given the existing association of shorter timelines ⇒ shouldn’t do a PhD).
I think you raise some good considerations but want to push back a little.
I agree with your arguments that
- we shouldn’t use point estimates (of the median AGI date)
- we shouldn’t fully defer to (say) Metaculus estimates.
- personal fit is important
But I don’t think you’ve argued that “Whether you should do a PhD doesn’t depend much on timelines.”
Ideally as a community we can have a guess at the optimal number of people in the community that should do PhDs (factoring in their personal fit etc) vs other paths.
I don’t think this has been done, but since most estimates of AGI timelines have decreased in the past few years it seems very plausible to me that the optimal allocation now has fewer people doing PhDs. This could maybe be framed as raising the ‘personal fit bar’ to doing a PhD.
I think my worry boils down to thinking that “don’t factor in timelines too much” could be overly general and not get us closer to the optimal allocation.
Thanks for the post!
In this post, I’ll argue that when counterfactual reasoning is applied the way Effective Altruist decisions and funding occurs in practice, there is a preventable anti-cooperative bias that is being created, and that this is making us as a movement less impactful than we could be.
One case I’ve previously thought about is that some naive forms of patient philanthropy could be like this—trying to take credit for spending on the “best” interventions.
I’ve polished a old draft and posted it as short-form with some discussion of this (in the When patient philanthropy is counterfactual section).
Epistemic status: I’ve done work suggesting that AI risk funders be spending at a higher rate, and I’m confident in this result. The other takes are less informed!
I discuss
Whether I think we should be spending less now
Useful definitions of patient philanthropy
Being specific about empirical beliefs that push for more patience
When patient philanthropy is counterfactual
Opportunities for donation trading between patient and non-patient donors
In principle I think the effective giving community could be in a situation where we should marginally be saving/investing more than we currently do (being ‘patient’).
However, I don’t think we’re in such a situation and in fact believe the opposite. My main crux is AI timelines; if I thought that AGI was less likely than not to arrive this century, then I would almost certainly believe that the community should marginally be spending less now.
I think patient philanthropy could be thought of as saying one of:
The community is spending at the optimal rate - let’s create a place to save/invest so ensure we don’t (mistakenly) overspend and keep our funds secure.
The community is spending above the optimal rate—let’s push for more savings on the margin, and create a place to save/invest and give later
I don’t think we should call (1) patient philanthropy. Large funders (e.g. Open Philanthropy) already do some form of (1) by just not spending all their capital all this year. Doing (1) is instrumentally useful for the community and is necessary in any case where the community is not spending all of its capital this year.
I like (2) a lot more. This definition is relative to the community’s current spending rate and could be intuitively ‘impatient’. Throughout, I’ll use ‘patient’ to refer to (2): thinking the community’s current spending rate is too high (and so we do better by saving more now and spending later).
As an aside, thinking that the most ‘influential’ time ahead is not equivalent to being patient. Non-patient funders can also think this but believe their last spending this year goes further than in any other year.
A potential third definition could be something like “patience is spending 0 to ~2% per year” but I don’t think it is useful to discuss.
Of course, the large funders and the patient philanthropist may have different beliefs that lead them to disagree on the community’s optimal spending rate. If I believes one of the following, I’d like decrease my guess of the community’s optimal spending rate (and becoming more patient):
Thinking that there are not good opportunities to spend lots on now (i.e. higher diminishing returns to spending)
Thinking that TAI / AGI is further away.
Thinking that the rate of non-AI global catastrophic risk (e.g. nuclear war, biorisk) is lower
Thinking that there’ll be great spending opportunities in the run-up to AGI
Thinking that capital will be useful post-AGI
Thinking that the existing large funders’s capital is less secure or the large funders’ future effective giving less likely for other reasons
Since it seems likely that there are multiple points of disagreement leading to different spending rates, ‘‘patient philanthropy’ may be a useful term for the cluster of empirical beliefs that imply the community should be spending less. However, it seems better to be more specific about which particular beliefs are driving this the most.
For example “AI skeptical patient philanthropists” and “better-AI-opportunities-now patient philanthropists” may agree that the community’s current spending rate is too high, but disagree on the optimal (rate of) future spending.
Patient philanthropists can be considered as funders with a very high ‘bar’. That is, they will only spend down on opportunities better than utils per $ and if none currently exist, they will wait.
Non-patient philanthropists also operate similarly but with a lower bar . While the non-patient philanthropist has funds (and funds anything above utils per dollar, including the opportunities that the patient philanthropist would otherwise fund) the patient philanthropist spends nothing. The patient philanthropist reasons that the counterfactual value of funding something the non-patient philanthropist would fund is zero and so chooses to save.
In this setup, the patient philanthropist is looking to fund and take credit for the ‘best’ opportunities and—while the large funder is around—the patient philanthropist is just funging with them. Once the large funder runs out of funds, the patient philanthropist’s funding is counterfactual.[1]
If the large funder and patient philanthropist have differences in values or empirical beliefs, it is unsurprising they have different guesses of the optimal spending rate and ‘bar’.
However, this should not happen with value and belief-aligned funders and patient philanthropists; if the funder is acting ‘rationally’ and spending at the optimal rate, then (by definition) there are no type-2 patient patient philanthropists that have the same beliefs.
There are some opportunities for trade between patient philanthropists and non-patient philanthropists, similar to how people can bet on AI timelines.
Let’s say Alice pledges to give $/year from her income and thinks that the community should be spending more now. Let’s say that Bob thinks the community should be spending less and and saves $/year from his income in order to give it away later. There’s likely an agreement possible (dependent on many factors) where they both benefit. A simple setup could involve:
Bob, for the next years gives away their $/year to Alice’s choice of giving opportunity
Alice, after years, giving $/year to Bob’s preferred method of investing/saving or giving
This example closely follows similar setups suggested for betting on AI timelines.
Unless the amazing opportunity of utils/$ appears just after the large funder runs out of funds. Where ‘just after’ is the time that the large funder would have kept going with their existing spending strategy of funding everything about utils/$ by using the patient philanthropist’s funds.
DM = digital mind
Archived version of the post (with no comments at the time of the archive). The post is also available on the Sentience Institute blog
I think you are mistaken on how Gift Aid / payroll giving works in the UK (your footnote 4), it only has an effect once you are a higher rate or additional rate taxpayer. I wrote some examples up here. As a basic rate taxpayer you don’t get any benefit—only the charity does.
Thanks for the link to your post! I’m a bit confused about where I’m mistaken. I wanted to claim that:
(ignoring payroll giving or claiming money back from HMRC, as you discuss in yoir post) taking a salary cut (while at the 40% marginal tax rate) is more efficient (at getting money to your employer) than receiving taxed income than donating it (with gift aid) to your employer
Is this right?
My impression is that people within EA already defer too much in their donation choices and so should be spending more time thinking about how and where to give, what is being missed by Givewell/OP etc. Or defer some (large) proportion of their giving to EA causes but still have a small amount for personal choices.
Fair point. I think that because I’m somewhat more excited about one person doing a 100 hour investigation rather than 10 people doing 10 hour investigations and I would still push for people to enter small-medium sized a donor lotteries (which is arguably a form of deferral).
I think we should reason in terms of decisions and not use anthropic updates or probabilities at all. This is what is argued for in Armstrong’s Anthropic Decision Theory, which itself is a form of updateless decision theory.
In my mind, this resolves a lot of confusion around anthropic problems when they’re reframed as decision problems.
I’d pick, in this order,
Minimal reference class SSA
SIA
Non-minimal reference class SSA
I choose this ordering because both minimal reference class SSA and SIA can give the ‘best’ decisions (ex-ante optimal ones) in anthropic problems,[1] when paired with the right decision theory.
Minimal reference class SSA needs pairing with an evidential-like decision theory, or one that supposes you are making choices for all your copies. SIA needs pairing with a causal-like decision theory (or one that does not suppose your actions give evidence for, or directly control, the actions of your copies). Since I prefer the former set of decision theories, I prefer minimal reference class SSA to SIA.
Non-minimal reference class SSA, meanwhile, cannot be paired with any (standard) decision theory to get ex-ante optimal decisions in anthropic problems.
For more on this, I highly recommend Oesterheld & Conitzer’s Can de se choice be ex ante reasonable in games of imperfect recall?
For example, the sleeping beauty problem or the absent-minded driver problem
In this first comment, I stick with the explanations. In sub-comments, I’ll give my own takes
We need the following ingredients
A non-anthropic prior over worlds where [1]
A set of all the observers in for each .
A subset of observers of observers in each world for each that contain your exact current observer moment
Note it’s possible to have some empty—worlds in your non-anthropic prior where there are zero instances of your current observer moment
A reference class (you can choose!) where . I call the reference set for world which is a subset of the observers in that world.
Generally one picks a ‘rule’ to generate the systematically
For example “human observers” or “human observers not in simulations”.
All anthropic theories technically use a reference class (even the self-indication assumption)
Generally one chooses the reference class to contain all of the you-observer moments (i.e. for all )
And finally, we need a choice of anthropic theory. On observation (that you are your exact current observer moment), the anthropic differ by the likelihood .
Bostrom gives the definition of something like
All other things equal, one should reason as if they are randomly selected from the set of all possible observer moments [in your reference class].
This can be formalised as
Note that
The denominator - which is the total number of observers in all possible worlds that are in our reference class—is independent of , so we can ignore it when doing the update since it cancels out when normalising.
The standard approach of SIA is to take the reference class equal to be all observers, that is for all . In this case, the the update is proportional to
This is true for any reference class with (i.e. our reference class does not exclude any instances of our exact current observer moment).
Heuristically we can describe SIA as updating towards worlds in direct proportion to the number of instances of “you” that there are.
Updating with SIA has the effect of
Updating towards worlds that are multiverses
Probably thinking there are many simulated copies of you
Probably thinking there are aliens (supposing that the process that leads to humans and aliens is the same, we update towards this process being easier).
(The standard definition) breaking when you have a world with infinite instances of your exact current observer moment.
Bostrom again,
All other things equal, one should reason as if they are randomly selected from the set of all actually existent observer moments in their reference class
This can be formalised as
Note that
The denominator—which is the total number of observers in the reference set in world - is not independent of i so does not cancel out when normalising our update
In the case that one takes a reference class to include all exact copies of ones’ current observer moment (i.e. ), we have the expression simplify to .[3]
Heuristically we can describe SSA as updating towards worlds in direct proportion to how relatively common instances of “you” are in the reference class in the world.
This is a special case of SSA, where one takes the minimal reference class that contains the exact copies of your exact observer moment. That is, take for every . This means for any evidence you receive, ruling out worlds that do not contain a copy of you that has this same evidence.
Note that the formula for this update becomes - for worlds where there is at least one copy of us, the likelihood is 1 and is otherwise 0.[4]
Updating with SSA has the effect of
Updating towards worlds that are small but just contain copies of you (in the reference class)
For non-minimal reference classes, believing in a ‘Doomsday’ (i.e. updating to worlds with fewer (future) observers in your reference class, since in those worlds your observations are more common in the reference set).
Having generally weaker updates than SIA—the likelihoods are always in [0,1].
Update towards worlds where you are more ‘special’,[5] for example updating towards “now” being an interesting time for simulations to be run.
Can be extended to worlds containing infinite observers
(In my mind) are valid ‘interpretations’ of what we want ‘probability’ to mean, but not how I think we should do things.
Can be Dutch-booked if paired with the wrong decision theory[6]
One could also easily write this in a continuous case
Strictly, this is the strong self-sampling assumption in Bostrom’s original terminology (the difference being the use of observer moments, rather than observers)
One may choose not to do this, for example, by excluding simulated copies of oneself or Boltzmann brain copies of oneself.
The formal definition I gave for SSA is only for cases where the reference set is non-empty, so there’s not something weird going on where we’re deciding 0⁄0 to equal 0.
In SIA, being ‘special’ is being common and appearing often. In SSA, being ‘special’ is appearing often relative to other observers
Spoilers: using SIA with a decision theory that supposes you can ‘control’ all instances of you (e.g. evidential like theories, or functional-like theories) is Dutch-bookable. This is also the case for non-minimal reference class SSA with a decision theory that supposes you only control a single instance of you (e.g. causal decision theory).
I think there are benefits to thinking about where to give (fun, having engagement with the community, skill building, fuzzies)[1] but I think that most people shouldn’t think too much about it and—if they are deciding where to give—should do one of the following.
1 Give to the donor lottery
I primarily recommend giving through a donor lottery and then only thinking about where to give in the case you win. There are existing arguments for the donor lottery.
2 Deliberately funge with funders you trust
Alternatively I would recommend deliberately ‘funging’ with other funders (e.g. Open Philanthropy), such as through GiveWell’s Top Charities Fund.
However, if you have empirical or value disagreements with the large funder you funge with or believe they are mistaken, you may be able to do better by doing your own research.[2]
3 If you work at an ‘effective’[3] organisation, take a salary cut
Finally, if you work at an organisation whose mission you believe effective, or is funded by a large funder (see previous point on funging), consider taking a salary cut[4].
(a) Saving now to give later
I would say to just give to the donor lottery and if you win: first, spend some time thinking and decide whether you want to give later. If you conclude yes, give to something like the Patient Philanthropy Fund, set-up some new mechanism for giving later or (as you always can) enter/create a new donor lottery.
(b) Thinking too long about it—unless it’s rewarding for you
Where rewarding could be any of: fun, interesting, good for the community, gives fuzzies, builds skills or something else. There’s no obligation at all in working out your own cost effectiveness estimates of charities and choosing the best.
(c) Thinking too much about funging, counterfactuals or Shapley values
My guess is that if everyone does the ‘obvious’ strategy of “donate to the things that look most cost effective[5]” and you’re broadly on board with the values[6], empirical beliefs[7] and donation mindset[8] of the other donors in the community[9], it’s not worth considering how counterfactual your donation was or who you’ve funged with.
Thanks to Tom Barnes for comments.
Consider the goal factoring the activity of “doing research about where to give this year”. It’s possible there are distinct tasks that achieve your goals better (e.g. “give to the donor lottery” and “do independent research on X” that better achieve your goals).
For example, I write here how—given Metaculus AGI timelines and a speculative projection of Open Philanthropy’s spending strategy—small donors donations’ can go further when not funging with them.
A sufficient (but certainly not necessary) condition could be “receives funding from an EA-aligned funded, such as Open Philanthropy” (if you trust the judgement and the share values of the funder)
This is potentially UK specific (I don’t know about other countries) and for people on relatively high salaries (>£50k, the point at which the marginal tax rate is greater than Gift Aid one can claim back).
With the caveat of making sure opportunities doesn’t get overfunded
I’d guess there is a high degree of values overlap in your community: if you donate to a global health organisation and another donor—as a result of your donation—decides to donate elsewhere, it seems reasonably likely they will donate to another global health organisation.
I’d guess this overlap is relatively high for niche EA organisations. I’ve written about how to factor in funging as a result of (implicit) differences of AI timelines. Other such empirical beliefs could include: beliefs about the relative importance of different existential risks among longtermists or the value of some global health interventions (e.g. Strong Minds)
For particularly public charitable organisations and causes, I’d guess there is less mindset overlap. That is, whether the person you’ve funged with shares the effectiveness mindset (and so their donation may be to a charity you would judge as less cost effectiveness than where you would donate if-accounting-for-funging.
The “community” is roughly the set of people who donate—or would donate—to the charities you are donating to.
Using goal factoring on tasks with ugh fields
Summary: Goal factor ugh tasks (listing the reasons for completing the task) and then generate multiple tasks that achieve each subgoal.
Example: email
I sometimes am slow to reply to email and develop an ugh-field around doing it. Goal factoring “reply to the email” into
complete sender’s request
be polite to the sender (i.e. don’t take ages to respond)
one can see that the first sub-goal may take some time (and maybe is the initial reason for not doing it straight away), the second sub-goal is easy! One can send an email saying you’ll get back to them soon. [Of course, make sure you eventually fulfil the request, and potentially set a reminder to send a polite follow-up email if you’re delayed longer!
Do you have any updates / plan to publish anything about the Monte Carlo simulation approach you write about in footnote 3?
Thanks for the post! I thought it was interesting and thought-provoking, and I really enjoy posts like this one that get serious about building models.
Thanks :-)
One thought I did have about the model is that (if I’m interpreting it right) it seems to assume a 100% probability of fast takeoff (from strong AGI to ASI/the world totally changing), which isn’t necessarily consistent with what most forecasters are predicting. For example, the Metaculus forecast for years between GWP growth >25% and AGI assigns a ~25% probability that it will be at least 15 years between AGI and massive resulting economic growth.
Good point! The model does assume that the funder’s spending strategy never changes. And if there was a slow takeoff the funder might try and spend quickly before their capital becomes useless etc etc.
I think I’m pretty sold on fast-takeoff that this consideration didn’t properly cross my mind :-D
I would enjoy seeing an expanded model that accounted for this aspect of the forecast as well.
Here’s one very simple way of modelling it
write for the probability of a slow takeoff
call the interventions available during the slow take-off and write as the (average) cost effectiveness of interventions as as fraction of , the cost effectiveness of intervention .[1]
Conditioning on AGI at :
spends fraction of any saved capital on
spends fraction ofof any saved capital on
Hence the cost effectiveness of a small donor’s donation to this year is times the on-paper cost effectiveness of donating to .
Taking
truncated to (0,1)
the distribution for in the post and Metaculus AGI timelines
gives the following result, around a 5pp increase compared to the results not factoring this in.
I think this extension better fits faster slow-takeoffs (i.e. on the order of 1-5 years). In my work on AI risk spending I considered a similar model feature, where after an AGI ‘fire alarm’ funders are able to switch to a new regime of faster spending.
I think almost certainly that for GHD giving. Both because
(1) the higher spending rate requires a lower bar
(2) many of the best current interventions benefit people over many years (and so we have to truncate only consider the benefit accrued before full on AGI, something that I consider here).
Thanks for putting it together! I’ll give this a go in the next few weeks :-)
In the past I’ve enjoyed doing the YearCompass.
The consequence of this for the “spend now vs spend later” debate is crudely modeled in The optimal timing of spending on AGI safety work, if one expects automated science to directly & predictably precede AGI. (Our model does not model labor, and instead considers [the AI risk community’s] stocks of money, research and influence)
We suppose that after a ‘fire alarm’ funders can spend down their remaining capital, and that the returns to spending on safety research during this period can be higher than spending pre-fire alarm (although our implementation, as Phil Trammell points out, is subtly problematic, and I’ve not computed the results with a corrected approach).