The Top AI Safety Bets for 2023: GiveWiki’s Latest Recommendations
Summary: The AI Safety GiveWiki (formerly Impact Markets) has completed its third round of retroactive impact evaluations – just in time to provide updated recommendations for the giving season! Here is a reminder of how the platform works. Want to donate? Open up the page of our top project/s, double-check that they are still fundraising, and ka-ching! Interested in regranting? Check out our post on the (now) $700,000 that want to be allocated. |
Top Projects
Our top projects stand out by virtue of their high support scores. There are a lot of ties between these top projects, so we’ve categorized them into tiers.
Note that we determine the top projects according to their support. Further down we’ll cover how our latest evaluation round worked out. But the support scores are two hops removed from those results: (1) Projects receive support in the form of donations as a function of donation size, earliness, and the score of the donor; (2) donors get their scores as a function of size and earliness of their donations and the scores of the beneficiary projects; (3) projects receive their credits from our evaluators:
Project credits → donor scores → project support.
This mimics the price discovery process of a for-profit impact market. Hence it’s also likely that the scores are slightly different by the time you read this article because someone may have entered fresh donation data into the platform.
We have tried to find and reach out to every notable AI safety project, but some may yet be missing from our list because (1) they haven’t heard of us after all, (2) they’re not fundraising from the public, (3) they prefer to keep a low profile, (4) etc. But at the time of writing, we have 106 projects on the platform that are publicly visible and fundraising.
Ties for Tier 1
These are the projects at the very top! FAR AI and the Simon Institute with a support score of (at the time of writing) 213.
Ties for Tier 2
They all have a support score of 212. Such small differences in support are probably quite uninformative. New data or tweaks to our algorithm could easily change their rank.
Ties for Tier 3
Other projects with > 200 support
Note that, while we now market the platform to AI safety, really any project can use it and some may even fare well! We may introduce other specialized GiveWikis in the future.
If you’re just here for the results then this is where you can stop reading.
Evaluation Process
Preliminaries
For this evaluation round, we recruited Charbel-Raphael Segerie, Dima Krasheninnikov, Gurkenglas, Imma Six, Konrad Seifert, Linda Linsefors, Magdalena Wache, Mikhail Samin, Plex, and Steven Kaas as evaluators. Matt Brooks, Frankie Parise, and I may also have pitched in. Some of them ended up not having time for the evaluation. But some of our communication was under the Chatham House Rule, so I’m listing them anyway for added anonymity.
Our detailed instructions included provisions for how to score project outputs according to quality and impact; how to avoid anchoring on other evaluators; how to select artifacts to strike a compromise between comprehensiveness, redundancy, and time investment; how to evaluate projects using wiki credits; and some tips and arrangements.
Outputs are such things as the papers or hackathons that organizations put out. They can create one project per output on our platform, or they can create one project for the whole organization. Conferences cannot be directly evaluated after the fact, so what our evaluators considered were artifacts, such as recordings or attendance statistics. This distinction makes less sense for papers.
The projects were selected from among the projects that had signed up to our website (though in some cases I had helped out with that), limited to those with smaller annual budgets (in the five or lower six digits, according to rough estimates) and those that were accepting donations. The set of outputs was limited to those from 2023 in most cases to keep them relevant to the current work of the project, if any. We made a few exceptions if there were too few outputs from 2023 and there were older, representative outputs.
We hadn’t run an evaluation round at this scale. Previously we were three and could just have a call to sync up. This time everything needed to be more parallelizable.
Hence we followed a two-pronged approach with (1) evaluations of individual outputs using scores, and (2) evaluations of the AI safety activities of whole projects using our wiki credits. If one kind of evaluation fell short, we had another to fall back on.
Lessons Learned
Fast-forward four fortnights, and it turned out that there were too many outputs and too few evaluators so only two outputs had been evaluated more than twice (and 10 had been evaluated more than once). According to this metric, AI Safety Support and AI Safety Events did very well, leaving the other project in the dust by a wide margin – but those numbers were carried just by the scores of one or two evaluators so they’re most likely in large part due to the Optimizer’s Curse.
Hence we decided not to rely on this scoring for our evaluation and rather fall back on the credits for that. But the evaluations came with insightful comments that are still worth sharing.
Next time we’ll use credits only and at most list some outputs to help evaluators who are not familiar with the work of the project to get an idea of what its most important contributions were.
Wiki Credits Ranking
These are the normalized average credits that our evaluators have assigned to the projects. As mentioned above, these determine how richly donors to these projects get rewarded in terms of their donor scores, which then determine the project support: Project credits → donor scores → project support.
Rank | Project | Credits |
1 | FAR AI | 1768 |
2 | AI Safety Events | 1457 |
3 | Centre For Enabling EA Learning & Research | 842 |
4 | AI Safety Support | 695 |
5 | Center for the Study of Existential Risk | 607 |
6 | Rational Animations | 601 |
7 | Campaign for AI Safety | 579 |
8 | AI X-risk Research Podcast | 566 |
9 | Simon Institute for Longterm Governance | 490 |
10 | Pour Demain | 481 |
11 | Alignment Jam | 476 |
12 | The Inside View | 466 |
13 | EffiSciences | 415 |
14 | Center for Reducing Suffering | 397 |
15 | Modeling Cooperation | 233 |
16 | QACI | 158 |
17 | AI Objectives Institute | 151 |
18 | Virtual AI Safety Unconference | 142 |
19 | Alignment Plans | 131 |
20 | AI Safety Ideas | 105 |
21 | Global Catastrophic Risk Institute | 96 |
Qualitative Results
AI Safety Events
AI Safety Unconference at NeurIPS 2022: One of the evaluators attended it and found it high value for networking, but (empirically) only for networking within the AI safety community, not for recruiting new people to the space.
ML Safety Social at NeurIPS 2022: One evaluator estimated, based on this modeling effort, that the social was about 300 times as impactful as the reference output (“AI Takeover Does Not Mean What You Think It Means”). The estimate was even higher for the safety unconference at the same conference.
Hence AI Safety Events had generally very high ratings. It is not listed among our top recommendations because we don’t have enough donation data on it. If you have supported AI Safety Events in the past, please register your donations! You may well move a good chunk of the (now) $700,000 that donors seek to allocate!
The Inside View
AI Takeover Does Not Mean What You Think It Means: This was our calibration output – it allowed me to understand how an evaluator is using the score and to scale their values up or down. The evaluators who commented on the video were generally happy with its production quality. Some were confused by the title (Paul’s models are probably well known among them) but found it sad that it had so few views. The main benefit over the blog post is probably to reach more people with it, which hasn’t succeeded to any great degree. Maybe we need an EA/AIS marketing agency? I’m also wondering whether it could’ve benefited from a call to action at the end.
AI X-risk Research Podcast
Superalignment with Jan Leike: This interview was popular among evaluators, perhaps because they had largely already watched it. Some were cautious to score it too highly simply because it hadn’t reached enough people yet. But in terms of the content it was well regarded: “The episodes are high-quality in the sense that Daniel asks really good questions which make the podcast overall really informative. I think the particular one with Jan Leike is especially high-impact because Superalignment is such a big player, in some sense it’s the biggest alignment effort in the world.” (The episodes with Scott Aaronson and Vanessa Kosoy received lower impact scores but no comments.)
AI Safety Ideas
The website: “Seems potentially like a lot of value per connection.” The worries were that it might not be sufficiently widely known or used: “I think the idea is really cool, but I haven’t heard of anyone who worked on an idea which they found there.” And does it add much value at the current margin? “I couldn’t find a project on the site which was successful and couldn’t be attributed to the alignment jams. However, if there were some successful projects then it’s a decent impact. And I suspect there were at least some, otherwise Esben wouldn’t have worked on the site.” The evaluators didn’t have the time to disentangle whether people who participated in any Alignment Jams got some of their ideas from AI Safety Ideas or vice versa. All in all the impact scores were on par with the Jan Leike interview.
Orthogonal
Formalizing the QACI alignment formal-goal: This output scored highest on quality and impact (with impact scores in between the three AXRP interviews above) from among Orthogonal’s outputs. It got lower scores on the quality side because the evaluator found it very hard to read (but noting that it’s also just really hard to create a formal framework for outer alignment). But it scored more highly on the impact side. The evaluator thinks that it (the whole QACI idea) is very unlikely to work but highly impactful if it does. The other evaluated outputs were less notable.
Center for Reducing Suffering
Documentary about Dystopian Futures | S-risks and Longtermism: One evaluator gave a lower quality score to the documentary than to the reference output (“AI Takeover Does Not Mean What You Think It Means”) but noted that it “represents longtermism decently and gives an OK definition for s-risk.” They were confused, though, why it was published on a channel with seemingly largely unrelated content (since the context of the channel will color how people see s-risks) and concerned that talking about s-risks publicly can easily be net negative if done wrong.
Avoiding the Worst—Audiobook: The audiobook got the highest impact rating among the CRS outputs even though an evaluator noted that they only counted what it added over the book – another way to access it – which isn’t much in comparison. (The book itself was outside of our evaluation window, having been published in 2022.)
FAR AI
Pretraining Language Models with Human Preferences: One evaluator was excited about this paper in and of itself but worried that it might be a minor contribution on the margin compared to what labs like OpenAI, DeepMind, and Anthropic might’ve published anyway. They mention Constitutional AI as a similar research direction.
Training Language Models with Language Feedback at Scale: While this one scored slightly lower quantitatively, the qualitative review was the same.
Improving Code Generation by Training with Natural Language Feedback: One evaluator was concerned about the converse in this case, that is that the paper might’ve contributed to capabilities and has hence had a negative impact.
Centre For Enabling EA Learning & Research (EA Hotel)
In general: “CEEALAR doesn’t have particularly impressive direct outputs, but I think the indirect outputs which are hard to measure are really good.” Or “the existence of CEEALAR makes me somewhat more productive in my everyday work, because it is kind of stress-reducing to know that there is a backup option for a place to live in case I don’t find a job.”
AI Safety Support
AI Alignment Slack: Invaluable for information distribution. One evaluator mentioned the numerous times that they found out about opportunities through this Slack.
Lots of Links page: “The best collection of resources we currently have,” but with a big difference between the quality and the impact score: “It could be better organized and more up to date (even at a time when it was still maintained).”
Epilogue
Want to regrant some of the (now) $700,000 aggregate donation budget of our users? Please register your donations! The GiveWiki depends on your data.
You’re already a grantmaker or regrantor for a fund? Use the GiveWiki to accept and filter your applications. You will have more time to focus on the top applications, and the applicants won’t have to write yet another separate application.
We’re always happy to have a call or answer your questions in the comments or by email.
I feel very confused as to how I should update based on these results, how the ranking was made and what the ranking means.
A few concrete questions
are credits tracking cost-effectiveness on the current margin or something else?
is a project with 2 credits twice as [cost-effective] as a project with 1 credit?
Is the qualitative summary a pretty good depiction of the main points that contributed to that project’s scores?
E.g.
First it could make sense not to focus too much on the credits. The ranking has to bottom out somewhere, and that’s where the credits come into it, to establish a track record for our donors. The ranking itself is better thought of as the level of endorsement of a project weighed by the track record of the endorsing donors.
We’re still thinking about how we want to introduce funding goals and thus some approximation of short-term marginal utility. At the moment all projects discount donations at the same rate. Ideally we’d be able to use something like the S-Process to generate marginal utility curves that discount the score “payout” that donors can get. I’ve experimented with funding goals around $100k per project, and 10x sharper discounts afterwards, but it hadn’t made enough of a difference that would’ve legitimized the increased complexity and assumptions. Maybe we’ll revive that feature as a configurable funding goal at some point. But there is also the fundamental problem that we don’t have access to complete lists of donations, so less popular, less well-maintained projects would seemingly have higher marginal utility just because their donation records are more incomplete. That would be an annoying incentive to introduce. Those problems paired with the minor, unconvincing results of my experiments have caused me not to prioritize this yet.
But when it comes to the credits, the instructions to the evaluators are probably a good guide:
“Imagine that you’re given a budget of 1,000 impact credits to allocate across projects (not artifacts). (a) Please allocate them to the projects in proportion to how impactful you think they were. (b) There’s no transaction cost and no change in marginal utility (the first credit is worth the same as the last to a project). (c) We’ll … average your scores, multiply the averages with the number of evaluators, normalize, and then allocate that product to minimize the impact of recusals.”
And further down: “Note that this is a purely retroactive evaluation. (a) You can ignore the tractability of producing an output since they’ve all been produced. (b) Likewise please ignore the cost at which the output was produced. (c) Do consider neglectedness, though, and consider how likely some equivalent output would’ve been produced anyway had it not been for the given project. (d) Consider the ex ante expected utility. A bullshit project mustn’t get a high score because it somehow got unpredictably lucky. (Fictional examples.)”
So like everything in our evaluation, the credits are retroactive too, so they are not about the current margin. One reason to ignore costs is that we don’t have the data, though we might request or estimate it next time around. But the other reason is that the donors to overly expensive projects have already gotten “punish” for their nonoptimal investment through the opportunity cost that they’ve paid. Intuitively it seems to me like it would be double-counting to also reduce the credits that they receive.
So is it reasonable to interpret your process as saying FAR was similarly impactful to AI safety events over the last year?
AI Safety Events is one of the projects where we expanded the time window because they were on a hiatus in earlier 2023. The events that got evaluated were from 2022. Otherwise yes. (But just to be clear, this is about the retroactive evaluation results mentioned at the bottom of the post.)