i.e., Why isn’t there an org analogous to GiveWell and Animal Charity Evaluators that evaluates and recommends charities according to how much impact they can have on the long-term future, e.g. by reducing existential risk? As opposed to only making grants directly and saying “just trust us” like the EA Funds.
[Question] Why isn’t there a charity evaluator for longtermist projects?
- 16 Nov 2023 15:11 UTC; 6 points) 's comment on JamesN’s Quick takes by (
- 2 Aug 2023 18:59 UTC; 1 point) 's comment on Long-Term Future Fund: April 2023 grant recommendations by (
This lack is one (among several) reasons why I haven’t shifted any of my donations toward longtermist causes.
I’m currently a guest fund manager at LTFF and, personally, would love to fund a project like this if I thought that the execution would be good.
If you are interested in working on this and need funding, I’d be excited about you reaching out, or for you just to apply to the LTFF.I’d guess the funding mechanism has to be somewhat different given the incentives at play with AI x-risk. Specifically, the Omega critiques do not seem bottlenecked by funding but by time and anonymity in ways that can’t be solved with money.
Whats the version/route to value of this that you are excited about? I feel quite skeptical anything like this could work (see my answer on this post) but would be eager for people to change my mind.
I agree that the lack of feedback loops and complicated nature of all of the things is a core challenge in making this useful.
I think you really can’t do better than trying to evaluate people’s track records and the quality of their higher level reasoning, which is essentially the meaning of grantmakers’ statements like “just trust us”.
I do have this sense that we can do better than illegible “just trust us”. For example, in the GHD regime, it seems to me like the quality of the reasoning associated with people championing different interventions could be quite decorrelated with the actual amount of impact—it seems like you need to have numbers on how many people are being impacted by how much.
In my experience even fairly rough BOTECs do shine some light on what we should be doing, and it feels tractable to improve the quality of these.
Whats the version/route to value of this that you are excited about?
The version that I am excited about tries to quantify how impactful longtermists interventions are. I think this could help with both grantmaking, but also people’s career choices. Things that would be cool to estimate the value of include:
A marginal average research hour on each of the alignment agendas.
A marginal person doing EA or AIS community building
A person figuring out AI policy details.
An Op-ed about AI safety in a well respected news site
An extra safety researcher getting hired at a leading lab
An extra person estimating AI timelines, or doing epoch-style research.
… and a ton more things
How to do this estimation? There’s two main ways I feel excited about people trying:
Building a quantitative model, say, a guesstimate, which quantitatively estimates their worldview.
Building an “outside view” model aggregating a bunch of people’s opinions, perhaps some mix of superforecasters and domain experts. One can make this more efficient by just having a ton of public prediction markets, and then picking a small subset at random to have the experts spend a long time thinking through what they think the result should be.
Here are two silly examples of (1).
Suppose that I’m trying to estimate the impact of a project attempting to pass regulation enacting a frontier AI training run, say, similar to the ideas described here. One can factor this into a) the probability of passing legislation and b) if passed, how good would this legislation be. To estimate (a), I look at the base rate of legislation passing, factors that might make this more or less likely to pass given the current climate, etc. All in all, I estimate something like P(pass) = 4%. To estimate (b), I might crowdsource vibes-based opinions of how much doom might change, or I might list out various worldviews, and estimate how good this might be conditional on each of these worldviews. This one has more uncertainty, but I’m going to put expected reduction in P(AI doom) to be around 1%.[1] This naively puts the impact of this at .01 * .04 = .0004 = 400 microdooms.
Now, suppose that I’m trying to estimate the impact of one year of research produced by the mean SERI MATS scholar. Remember that research impact is heavy tailed, so the mean is probably >> the median. The alignment field is roughly 10 years old, and say it had an average of 100 people working on it, so 1000 person years. This seems low, so I’m going to arbitrarily bump up this number to 3000 person years. Now, suppose the progress so far has been 1% of the alignment problem[2] being solved, and suppose that alignment being solved is 50% of what is necessary for an existential win. Let’s say the average MATS scholar is equal to the average person in the alignment field so far. This means that the value of a year of research produced by the average MATS scholar is .5*.01* (1/3000) = 1.6 microdooms.
These are both terrible estimates that get so many things wrong! Probably most of the value of the MATS person’s research now is upskilling to make future research better, or perhaps its in fieldbuilding to excite more people to start working on alignment later, or perhaps something else. Maybe the project doing legislation makes it way more difficult for other efforts to do something in this space later, or maybe it makes a mistake in the regulation and creates a licensing body that immediately gets regulatory captured, so the value isn’t 1% of an existential win, it could be −5%. My claims are:
a)These estimates are better than nothing, and that they at least inform one’s intuition of what is going on.
b) These estimates can be improved radically by working on them. I had a lot of this cached (having thought about stuff like this a fair bit), but this was a very quick comment, and with some dedicated effort, a bunch of cruxes could be uncovered. I feel like one could spend 10 hours on each of these and really narrow down the uncertainty quite a bit.
- ^
This is a wild estimate! I’m typing this on the fly but in reality I’d make a guesstimate with a distribution over these parameters because multiplying point estimates instead of distributions induces error.
- ^
I’m being horrendously glossy, but by “the alignment problem” I mean something like “the technical solution that tells us how to build a superintelligent AI pointed at humanities reflectively endorsed values”.
Thanks for the detailed response! Your examples were helpful to illustrate your general thinking, and I did update slightly towards thinking some version of this could work, but I am still getting stuck on a few points:
Re. the GHD comparison: firstly to clarify, I meant “quality of reasoning” primarily in terms of the stated theory of change rather than a much more difficult to assess general statement. I would expect the quality of reasoning around a ToC to quite strongly correlate with expected impact. Of course this might not always cash out in actual impact, but this doesn’t necessarily feel relevant for funding longtermist projects due to the inability to get feedback on actual impact. I think most longtermist work focuses on wicked problems, and this makes even progress of existing projects also not necessarily a good proxy for overall success.
For your 2 suggestions of methodology, it seems like (2) would be very useful to donors but would be very costly in expert time and not obviously worth it to me (although I’d be keen to try a small test-run and see) for the marginal gains compared to a grantmakers’ decision.
For method (1), I think that quantification is most useful for clarifying your own intuitions and allowing for some comparison within your own models. So I am certainly pro grantmakers doing their own quick evaluations, but I am not sure how useful it would be as a charity evaluator. I think you still have such irreducibly huge uncertainty bars on some of the key statements you need to get there (especially when you consider counterfactuals), that a final quantification of impact for a longtermist charity is just quite misleading for less well-informed donors.
For example, I’m not sure what a statement like “alignment being solved is 50% of what is necessary for an existential win” means exactly, but I think it does illustrate how messy this is. Does this mean it reduces AI X-risk by half this century? Increases chance of existential security by 50% (any effect on this seems to change an evaluation by orders of magnitude)? I am guessing it means it is 50% of the total work needed to reduce AI risk to ~0, but it seems awfully unclear how to quantify this as there must be some complex distribution of overall risk reduction depending on the amount of other progress made rather than a binary, which feels very hard to quantify. Thus I agree with claim(a), but am skeptical of our ability to make progress in a reasonable space of time for b.
One thing that I would be excited about is more explicit statements by longtermist charities themselves detailing their own BOTECs along the lines of what you are talking about, justifying from their perspective why their project is worth funding. This allows you to clearly understand their worldview, the assumptions they are making, and what a “win” would look like for them, which allows you to make your own evaluation. I think it would be great to make reasoning more explicit and allow for more comparison probably within the AI safety community, but it feels unlikely to be useful for non extremely well-informed donors.
Feels like this question is running throughout the answers:
Should there be an evaluator for longtermist projects?
Agree vote: Yes
Disagree vote: No
Probably depends on the cost and counterfactual, right? I doubt many people will think having a longtermist evaluator is bad if it’s free.
I think it could be bad if it relies too much on a particular worldview for its conclusions, which causes people to unnecessarily anchor on it. Seems like it could also be bad from a certain perspective if you think that it could lead to preferred treatment for longtermist causes which are easier to evaluate (eg. climate change relative to AI safety).
And an evaluator may be useful even without explicitly ranking/recommending certain charities, by providing semi-objective information not only for donors, but also collaborators, prospective employees, and as a feedback loop to the orgs
What’s the kind of information you mean by semi-objective? Something comparable to this for instance? Nuclear Threat Initiative’s Global Biological Policy and Programs (founderspledge.com) (particularly the “why we recommend them” section)
I did a sort of version of this for many years. Eventually it became a huge amount of somewhat painful work and it was never exactly clear to me how many people it was helping; it got a lot of karma, but so did a lot of much lower effort posts, and I didn’t have a lot of other feedback mechanisms.
Wow Larks you really did a thorough and impressive roundup there. If anyone is interested you can check out his 2021 review here.
https://forum.effectivealtruism.org/posts/BNQMyWGCNWDdP2WyG/2021-ai-alignment-literature-review-and-charity-comparison
Thanks!
I really appreciated your assessments of the alignment space, and would be open to paying out a retroactive bounty and/or commissioning reports for 2022 and 2023! Happy to chat via DM or email (austin@manifund.org)
Were there donors who said that they benefitted from your work and/or made specific decisions based on it?
Some yes, sometimes years after the event, but generally without quantification of the impact. The highlight was probably Mustafa Suleyman mentioning it, though I only learned of this a long time later, and I’m not aware of any specific actions he took as a result.
Congrats!
Quick things:
1. The vast majority of the funding comes from a very few funders. In the cases of GiveWell/ACE, they are meant more for many smaller donors. Historically, funding was not the main bottleneck in longtermism (“we might as well fund everything”) - that said, this is changing now, so it could be a good time to do something different.
2. My guess is that few strong people wanted to themselves make a new evaluator in this space. Many of the strong people who could evaluate longtermist orgs seem to prefer working at OP or, more likely, just working in the field directly. EA is kind of a “do-ocracy”—often the reason for “why didn’t this happen?” is, “it would require some strong talent, and none really was excited about this.”
3. Similar to (1), I haven’t seen a large funding base be very interested in spending money on evaluation here.
4. Much of the longtermist field is fairly friendly with one another and I think there’s a gap of candid evaluation.
5. Honestly, I think a lot of opinions around specific longtermist interventions seem very intuition-based and kind of arbitrary. Especially around AI, there seem to be a bunch of key considerations that many people disagree about—so it’s tricky to have a strong set of agreements to do evaluation around.
6. Some people seem to think that basically nothing in AI Safety is very promising yet, so we’re just trying a bunch of stuff and hoping that eventually strong research programs are evident.
I’d like to see more work here and really hope that things mature.
I tend to agree with this perspective, though I would also add that I think that not investing more in longtermist evaluation seven years ago was a mistake.
Why seven years ago specifically?
Because that seems like enough time to have something good now.
One could try to make the evaluation criteria worldview-agnostic – focusing on things like the quality of their research and workplace culture – and let individuals donate to the best orgs working on problems that are high priority to them.
I think having recommendations in each subfield would make sense. But how many subfields have a consensus standard for how to evaluate such things as “quality of . . . research”?
My somewhat “cynical” answer is that an important function of charity evaluation organisations is to challenge the status quo. If everyone is funding animal shelters and you think that’s ineffective, you might as well create an organisation to redirect funding to more effective interventions. But longtermism as a field is already “owned” by EA, so EAs are unlikely to feel a need to shake up the establishment there.
I am surprised no one has directly made the obvious point of there being no concrete feedback loops in longtermist work, which means that it would be very messy to try and compare. While some people have tried to get at the cost effectiveness of X-risk reduction, it is essentially impossible to be objective in evaluating how much a given charity has actually reduced X-risk. Perhaps there is something about creating clear proxies which allows for better comparison, but I am guessing that there would still be major disagreements over what could be best that are unresolvable.
Any evaluation would have to be somewhat subjective and would smuggle in a lot of assumptions about their worldview. I think you really can’t do better than trying to evaluate people’s track records and the quality of their higher level reasoning, which is essentially the meaning of grantmakers’ statements like “just trust us”.
Perhaps there could be something like a site which aggregates the opinions of relevant experts on something like the above and explains their reasoning publically, but I doubt this is what you mean and I am not sure this is a project worth doing.
This seems much too strong. Sure, “successfully avert human extinction” doesn’t work as a feedback loop, but projects have earlier steps. And the areas in which I expect technical work on existential risk reduction to be most successful are ones where those loops are solid, and are well connected to reducing risk.
For example, I work in biosecurity at the NAO, with an overall goal of some thing like “identify future pandemics earlier”. Some concrete questions that would give good feedback loops:
If some fraction of people have a given virus, how much do we expect to see in various kinds of sequencing data?
How well can we identify existing pathogens in sequencing data?
Can we identify novel pathogens? If we don’t use pathogen specific data can we successfully re-identify known pathogens?
What are the best methods for preparing samples for sequencing to get a high concentration of human viruses relative to other things?
Similarly, consider the kinds of questions Max discusses in his recent far-UVC post.
Thanks for the comment Jeff! I admit that I didn’t have biosecurity consciously in mind where I think perhaps you have an unusually clear paradigm compared to other longtermist work (eg. AI alignment/governance, space governance etc), and my statement was likely too strong besides.
However, I think there is a clear difference between what you describe and the types of feedback in eg. global health. In your case, you are acting with multiple layers of proxies for what you care about, which is very different to measuring the number of lives saved by AMF for example. I am not denying that this gives you some indication of the progress you are making, but it does become very difficult to precisely evaluate the impact of the work and make comparisons.
To establish a relationship between “How well can we identify existing pathogens in sequencing data?”, identifying future pandemics earlier, and reducing catastrophic/existential risk from pandemics, you have to make a significant number of assumptions/guesses which are far more difficult to get feedback on. To give a few examples:
- How likely is the next catastrophic pandemic to be from an existing pathogen?
- How likely is it that marginal improvements to the identification process are going to counterfactually identify a catastrophic threat?
- For the set of pathogens that could cause an existential/catastrophic threat, how much does early identification reduce the risk by?
- How much is this risk reduction in absolute terms? (Or a different angle, assuming you have an answer to the previous question: What are the chances of an existential/catastrophic pandemic this century?)
These are the types of question that you need to address to actually draw a line to anything that cashes out to a number, and my uninformed guess is that there is substantial disagreement about the answers. So while you may get clear feedback on a particular sub question, it is very difficult to get feedback on how much this is actually pushing on the thing you care about. So while perhaps you can compare projects within a narrow subfield (eg. improving identification of existing pathogens), it is easy to then lose track of the bigger picture which is what really matters.
To be clear, I am not at all saying that this doesn’t make the work worth doing, it does just make me pessimistic about the utility of attempting to make precise quantifications.
I have made early steps towards this. So far funder interest has been a blocker, although perhaps that doesn’t say much about the value of the idea in general.
I suggest that charity evaluators generally work best, inter alia, when: (1) the charities to be evaluated are somewhat established such as to allow evaluation of a track record; (2) the charities are offering a “stable” enough intervention that a review should largely remain valid for at least a few years; (3) a field isn’t moving/evolving so quickly that prior reviews will age out quickly; and (4) there’s enough room for more funding in the top-evaluated orgs that would counterfactually go unfilled to make the process worthwhile. All of those conditions seem much less present in longtermism.
My sense is that at least the top 20% or so longtermist org are very likely to be funded anyway. In contrast, global health or animal orgs can almost always use more money at only a slight reduction in marginal cost-effectiveness. Finally, it’s probably relevant that those orgs are competing for resources from non-EA donors with mainstream NGOs in a way that longtermist EA orgs generally aren’t.
We’ve been doing a fair amount of work in this direction at Founders Pledge in “epistemically mildly” longtermist areas, such as climate or nuclear risk—areas that are clearly much more uncertain than RCT-global-health but probably a fair amount less cluelessless-riddled than the most intractable longtermist interventions where there is little agreement on the sign of interventions.
We are describing some of the ideas here (and, initially, in our Changing Landscape report), my colleague Christian Ruhl has also just published a report using some of those ideas to allow evaluating nuclear risk interventions, and I expect we do a fair bunch more of this work.
It is my impression that there is a bunch of low-hanging fruit here, that methodology to prioritize in high-uncertainty contexts could be a lot more developed than it is.
A lot of this is early-stage and pre-quantification (and the quantified stuff is not ready for publication yet), but it is something we are thinking about a lot (though, as per Jeff, we are quite small and, as per Nuno, it might take seven years for something like this to become good!).
(Sorry for a post mostly containing links to FP work, but they seem relevant to the discussion here).
I’ll +1 everything Johannes has already said, and add that several people (including myself) have been chewing over the “how to rate longtermist projects” question for quite some time. I’m unsure when we will post something publicly, but I hope it won’t be too far in the future.
If anyone is curious for details feel free to reach out!
GiveWell and ACE (aspirationally):
provide highly quantified results
have relatively small confidence intervals
have high transparency and generally justify their beliefs.
And even then, they’re not doing great on these on an absolute scale. Estimates for a given x-risk intervention are going to be many orders of magnitude harder to quantify than bednets, and interventions that are longtermist but not x-risk-related harder still.
I have no inside information, but I think a major reason OpenPhil doesn’t want small donations is that they don’t want to have to justify their investments, when legible justification is basically impossible.
Grantmakers could probably provide more information about their grants than they currently do; there are several posts on this forum explaining why they don’t.
I’d be pretty excited to see a new platform for retail donors giving to x-risk charities. For this, you’d want to have some x-risk opportunities that are highly scaleable (can do ≥ $10m p.a., will execute the project over years reliably without intervention or outside pressure), measurable (you can write out a legible, robust, well-quantifed theory of change from marginal dollars to x-risk), have a pretty smooth returns curve (so people can have decent confidence that their donations have the returns that they expect, whether they are a retail donors or a large donor). And then you could build out cost effectiveness models given different assumptions about values (e.g. time preference, population ethics) and extinction risk models that people might have, and offer retail donors a few different ways of thinking about and modeling the impacts of their donations (e.g. lives saved, micro- or picodooms).
I’d guess there are some bio interventions that fit these criteria. For AI safety there could be a crowd-funded compute cluster for safety research. This could be pretty boring and difficult to model robustly unless there was good reporting on the wins that came out of research using the cluster and further iteration on the model based on the track record.
Founders Pledge evaluates charities in a range of fields including global catastrophic risk reduction. For example, they recommend NTI’s biosecurity work, and this is why GWWC marks NTI’s bio program as a top charity. Founders Pledge is not as mature as GiveWell, they don’t have the same research depth, and they’re covering a very broad range of fields with a limited staff, but this is some work in that direction.
From a brief glance, it does appear that Founders Pledge’s work is far more analogous to typical longtermist EA grantmaking than Givewell. Ie. it relies primarily on heuristics like organiser track record and higher-level reasoning about plans.
I think this is mostly correct, with the caveat that we don’t exclusively rely on qualitative factors and subjective judgement alone. The way I’d frame it is more as a spectrum between
[Heuristics] <------> [GiveWell-style cost-effectiveness modelling]
I think I’d place FP’s longtermist evaluation methodology somewhere between those two poles, with flexibility based on what’s feasible in each cause
Hear hear to asking the obvious questions! Still so much low hanging fruit in longtermism / x-risk.
First instinctive intuitive reaction—because it is not so easy, not so obvious how to measure, evaluate, quantify.
I actually posted a few days ago—https://forum.effectivealtruism.org/posts/xNyd8SuTzsScXc7KB/measuring-impact-ea-bias-towards-numbers—I made a hypothesis (based on own observations and face-to-face conversations) that there is a bias towards easily quantifiable projects.
Because the question is impossible to answer.
First, by definition, we have no actual evidence about outcomes in the long-term future—it is not as if we can run RCTs where we run Earth 1 for the next 1,000 years with one intervention and Earth 2 with a different intervention. Second, even where experts stand behind short-term treatments and swear that they can observe the outcomes happening right in front of them (everything from psychology to education to medicine), there are many cases where the experts are wrong—even many cases where we do harm while thinking we do good (see Prasad and Cifu’s book Medical Reversals).
Given the lack of evidentiary feedback as well as any solid basis for considering people to be “experts” in the first place, there is a high likelihood that anything we think benefits the long-term future might do nothing or actually make things worse.
The main way to justify long-termist work (especially on AGI) is to claim that there’s a risk of everyone dying (leading to astronomically huge costs), and then claim that there’s a non-zero positive probability of affecting that outcome. There will never be any evidentiary confirmation of either claim, but you can justify any grant to anyone for anything by adjusting the estimated probabilities as needed.
Is your last point meant to be AGI specific or not? I feel like it would be relatively easy to get non-zero evidence that there was a risk of everyone dying from a full nuclear exchange: you’d just need some really good modelling of the atmospheric effects that suggested a sufficiently bad nuclear winter, where the assumptions of the model themselves were ultimately traceable to good empirical evidence. Similarly for climate change being an X-risk. Sure, even good modelling can be wrong, but unless you reject climate modelling entirely, and are totally agnostic about what will happen to world temperature by 2100, I don’t see how there could be an in-principle barrier here. I’m not saying we in fact have evidence that there is a significant X-risk from nuclear war or climate change, just that we could; nothing about “the future is hard to predict” precludes it.
.
I generally agree, but I think that we are nowhere near being able to say, “The risk of future climate catastrophe was previously 29.5 percent, but thanks to my organization’s work, that risk has been reduced to 29.4 percent, thus justifying the money spent.” The whole idea of making grants on such a slender basis of unprovable speculation is radically different from the traditional EA approach of demanding multiple RCTs. Might be a great idea, but still a totally different thing. Shouldn’t even be mentioned in the same breath.
There are probably good proxies for climate effects though: i.e. reductions in more measurable stuff, so I think the situation is no that analogous to AI. And some global health and development stuff involves things where the outcome we actually care about is hard to measure: i.e. Deworming and it’s possible positive effects on later earnings, and presumably well-being. We know deworming gets rid of worms, but the literature on the benefits of this is famously contentious.
Although we could potentially derive probabilities of various sorts of nuclear incidents causing extinction, the probabilities of those events occuring in the first place are in the end guesswork. By definition, there can be no “evidentiary confirmation” of the guesswork because once the event occurs, there is no one around to confirm it happened. Thus, the probabilities of event occurence could be well-informed guesswork, but would still be guesswork.