This payout report is meant to cover the Long-Term Future Fund’s grantmaking starting January 2022 (after our December 2021 payout report), going through April 2023 (1 January 2022 − 30 April 2023).
Report authors: Asya Bergal (chair), Linchuan Zhang, Oliver Habryka, Caleb Parikh, Thomas Larsen, Matthew Graves
52 of our grantees, worth $1.41M, requested that we not include public reports for their grants. (You can read our policy on public reporting here.) We referred 2 grants to other funders for evaluation ($0.501M). Our median response time over this period was 29 days.
The rest of our grants are listed below (either in long or short form), as well as in our public grants database.
If you’re interested in receiving funding from the Long-Term Future Fund, apply here.
(Note: The initial sections of this post were written by me, Asya Bergal.)
Other updates
We’ve had a substantial increase in applications since 2021-- we averaged 35 applications per month in the latter half of 2021, 69 applications per month in 2022, and 90 applications per month so far in 2023.
Our funding bar went up at the end of 2022, in response to a decrease in the overall funding available to long-term future-focused projects. If we assume our numerical ratings are consistent, then applying our new bar to our earlier 2022 funding would imply not having funded 28% of earlier grants.
We’re looking for more funding. We’ve spent an average of ~$1M per month across March, April, and May 2023 to maintain our current bar, have $992,870.53 in reserves as of July 3, and are ideally looking to fundraise at least $10M for the coming year.
As described in this post, we’re trying to increase our independence from Open Philanthropy, which provided ~45% of our funding in 2022. As a transitional measure, over the next 6 months, Open Philanthropy will be matching funding given to the Long-Term Future Fund by small donors 2:1, for up to $3.5M total, making now a particularly good time to donate. Donate here. (The Long-Term Future Fund is part of EA Funds, which is a fiscally sponsored project of Effective Ventures Foundation (UK) (EV UK) and Effective Ventures Foundation USA Inc. (EV US). Donations to the Long-Term Future Fund are donations to EV US or EV UK.)
As a temporary measure in response to uncertainty about our future funding levels, we’ve put the bottom ~40% of grants above our current funding bar on hold. I think we’ll make several of those grants after this round of fundraising is over, but I generally expect our funding bar to vary more over time and to depend more on individual donations than it has historically.
I will be stepping down as chair of the fund by the end of October (and potentially earlier)-- I’ve written some reflections on my time on the fund here. We’re looking for additional fund managers (including potential chair candidates)-- express interest here.
The fund’s current fund managers are me (Asya Bergal), Linchuan Zhang, Oliver Habryka, and Caleb Parikh as permanent fund managers, and Thomas Larsen, Daniel Eth, Matthew Gray, Lauro Langosco, and Clara Collier as guest managers.
Our legal team asked us to highlight the eligibility criteria for our grants, which you can find in the appendices.
Highlights
Our grants include:
$316,000 in June 2022to support SERI MATS, an 8-week scholar program that pairs promising alignment researchers with mentors in the alignment field.
$200,000 in February 2022 to support Stephen Grugett, James Grugett, and Austin Chen for 4 months to build a forecasting platform (Manifold Markets) based on user-created play-money prediction markets.
Payout reports
Longer grant write-ups
Grants evaluated by Linchuan Zhang
Stephen Grugett, James Grugett, Austin Chen ($200,000): 4 month stipend for 3 FTE to build a forecasting platform made available to the public based on user-created play-money prediction markets
March 2022 Notes by Linch Zhang: This was my first substantive grant investigation. At the time, I felt shaky about it, but now I feel really good about it. The two main reasons I originally recommended this grant:
1. It was an investment into the people who wanted to do EA work – getting 3 ~Google-quality engineers to do more EA/longtermist work (as opposed to counterfactuals that were earning to give or worse) seems well worth it at 200k.
2. It was an investment into the team specifically. Having cohesive software teams seems like an important component for EA becoming formidable in the future, and is somewhat (surprisingly to me) missing in EA, especially outside of AI safety and crypto trading. I heard really good things about Manifold from early users, and they appeared to be developing at a speed that blew other software projects in forecasting (Metaculus, Foretold, Cultivate, Hypermind, etc) out of the water.
At the time, it was not an investment into the prediction market itself/theory of change with regards to play-money prediction markets broadly, because the above two factors were sufficient to be decisive.
At the time, it was also unclear whether they plan to go the for-profit route or the nonprofit route.
They’ve since decided to go the for-profit route.
Looking back, still too soon to be sure, but it looks like Manifold is going quite well. Continue to develop features at phenomenal speeds, lots of EAs and others in adjacent communities use the product, team is still producing fast and are excited for the future.
From an “investment into team” perspective, I think Manifold now plausibly has the strongest software team in EA outside of AI safety and earning-to-give (not that I’d necessarily have enough visibility to know of all the potentially better teams, especially stealth ones).
I have a number of disjunctive ToCs for how Manifold (and forecasting in general) can over time make the future better, some of which is implicitly covered here.
Though I am still uncertain about whether this particular project is the best use of the cofounders + team’s time...a lot of the evidence I have to observe this is more an update on the team’s overall skill + cohesiveness rather than an update about their comparative advantage for prediction markets specifically.
Addendum June 2023:
I’ve grown more confused about the total impact or value of this grant. On the one hand, I think Manifold is performing at or moderately above expectations in terms of having a cohesive team that’s executing quickly, and many people in the community appear to find their product useful or at least interesting. On the other hand, the a) zero-interest-rate environment and corresponding high startup evaluations when I recommended this grant has ended in early 2022, and b) recent events have reduced a substantial fraction of EA funding, which meant 200K is arguably much more costly now than a year ago.
Still, I think I’m broadly glad to have Manifold in our ecosystem. I think they’re very helpful for people in our and adjacent communities in training epistemics, and I’m excited to see them branch out into experiments in regranting and retroactive funding projects; from a first-principles perspective, it’d be quite surprising if the current status of EA grantmaking is sufficiently close to optimal.
Solomon Sia ($71,000): 6-month stipend for providing consultation and recommendations on changes to the US regulatory environment for prediction markets.
Solomon Sia wants to talk to a range of advisers, including industry experts, users, and contacts at the CFTC, to see if there are good improvements in ways to regulate prediction markets in the US, while simultaneously protecting users and reducing regulatory risk and friction.
This was an exploratory grant for seeing how it’s possible to improve the US regulatory environment for prediction markets with a resulting written report provided to EA Funds.
I think this is a reasonable/great option to explore:
I think my position on prediction markets is somewhat more cynical than that of most EAs in the forecasting space, but still, I’m broadly in favor of them and think they can be a critical epistemic intervention, both for uncovering new information and for legibility/common knowledge reasons.
It seemed quite plausible to me that the uncertain regulatory environment for prediction markets in the US is impeding the growth of large real-money prediction markets on questions that matter.
Solomon seemed unusually competent and knowledgeable about the tech regulations space, a skillset very few EAs have.
Cultivating this skillset and having him think about EA issues seemed valuable.
A potential new caveat is that in 2023 as AI risk worries heat up, it seems increasingly likely that we might be able to draw from a diverse skillset of experienced and newly interested/worried people.
The for-profit motivations for this work are there but not very large, as unless a company is trying very hard to do specific regulatory capture for their company (which is bad and also practically very difficult), easing prediction market regulations has collective benefits and individual costs.
(weakly held) I thought trying to nail this during the Biden administration is good because it seemed plausible that the current CFTC will be more predisposed to liking prediction markets than average for the CFTC.
One interesting update is that EA connections are likely a mild plus in 2022, and a moderate liability in 2023.
NB: Solomon and his collaborator think a) that the EA connection is still a mild to moderate positive b) it’s now unclear whether the Biden administration is better or worse than a counterfactual Republican administration.
I’ve thought about this grant some afterwards, and I think even with the benefit of hindsight, I’m still a bit confused about how happy I should be about this grant ex-post.
One thing is that I’ve grown a bit more confused about the output and tractability of interventions in this domain.
The successes(?) Kalshi had confused me and I haven’t had enough time to integrate this into my worldview.
My current impression is that CFTC is fairly open to informed opinions from others on this matter.
I continue to believe it’s a good grant ex-ante.
Grants evaluated by Oliver Habryka
Alexander Turner ($220,000): Year-long stipend for shard theory and RL mechanistic interpretability research
This grant has been approved but has not been paid out at the time of writing.
We’ve made grants to Alex to pursue AI Alignment research before:
2020: Understanding when and why proposed AI designs seek power over their environment ($30,000)
2021: Alexander Turner—Formalizing the side effect avoidance problem ($30,000)
2022: Alexander Turner − 12-month stipend supplement for CHAI research fellowship ($31,500)
We also made another grant in 2023 to a team led by Alex Turner for their post on steering vectors for $115,411 (total includes payment to 5 team members, including, without limitation, travel expenses, office space, and stipends).
This grant is an additional grant to Alex, this time covering his full-time stipend for a year to do more research in AI Alignment.
Only the first one has a public grant write-up, and the reasoning and motivation behind all of these grants is pretty similar, so I will try to explain the reasoning behind all of them here.
As is frequently the case with grants I evaluate in the space of AI Alignment, I disagree on an inside-view level pretty strongly with the direction of the research that Alex has been pursuing for most of his AI Alignment career. Historically I have been, on my inside-view, pretty unexcited about Alex’s work on formalizing power-seekingness, and also feel not that excited about his work on shard theory. Nevertheless, I think these are probably among the best grants the LTFF has made in recent years.
The basic reasoning here is that despite me not feeling that excited about the research directions Alex keeps choosing, within the direction he has chosen, Alex has done quite high-quality work, and also seems to often have interesting and useful contributions in online discussions and private conversations. I also find his work particularly interesting, since I think that within a broad approach I often expected to be fruitless, Alex has produced more interesting insight than I expected. This in itself has made me more interested in further supporting Alex, since someone producing work that shows that I was at least partially wrong about a research direction being not very promising is more important to incentivize than work whose effects I am pretty certain of.
I would like to go into more detail on my models of how Alex’s research has updated me, and why I think it has been high quality, but I sadly don’t have the space or time here to go into that much depth. In-short, the more recent steering vector work seems like the kind of “obvious thing to try that could maybe help” that I would really like to saturate with work happening in the field, and the work on formalizing power-seeking theorems is also the kind of stuff that seems worth having done, though I do pretty deeply regret the overly academic/formal presentation which has somewhat continuously caused people to overinterpret the strength of its results (which Alex also seems to have regretted, and is also a pattern I have frequently observed in academic work that was substantially motivated by trying to “legitimize the field”).
Another aspect of this grant that I expect to have somewhat wide-ranging consequences is the stipend level we set on. Some basic principles that have lead me to suggest this stipend level:
I have been using the anchor of “industry stipend minus 30%” as a useful heuristic for setting stipend levels for LTFF grants. The goal in that heuristic was to find a relatively objective standard that would allow grantees to think about stipend expectations on their own without requiring a lot of back and forth, while hitting a middle ground in the incentive landscape between salaries being so low that lots of top talent would just go into industry instead of doing impactful work, and avoiding grifter problems with people asking for LTFF grants because they expect they will receive less supervision and can probably get away without a ton of legible progress.
In general I think self-employed salaries should be ~20-40% higher, to account for additional costs like health insurance, payroll taxes, administration overhead, and other things that an employer often takes care of.
I have been rethinking stipend policies, as I am sure many people in the EA community have been since the collapse of FTX, and I haven’t made up my mind on the right principles here. It does seem like a pretty enormous number of good projects are no longer having the funding to operate at their previous stipend levels, and it’s plausible to me that we should take the hit, lose out on a bunch of talent, and reduce stipend levels to a substantially lower level again to be more capable of handling funding shocks. But I am really uncertain on this, and at least in the space of AI Alignment, I can imagine the recent rise to prominence of AI Risk concerns could potentially alleviate funding shortfalls (or it could increase competition by having more talent flow into the space, which could reduce wages, which would also be great).
See the Stipend Appendix below, “How we set grant and stipend amounts”, for more information on EA Funds’ determination of grant and stipend amounts.
Vanessa Kosoy ($100,000): Working on the learning-theoretic AI alignment research agenda
This is a grant to cover half of Vanessa’s stipend for two years (the other half being paid by MIRI). We also made another grant to Vanessa in Q4 2020 for a similar amount.
My model of the quality of Vanessa’s work is primarily indirect, having engaged relatively little with the central learning-theoretic agenda that Vanessa has worked on. The work is also quite technically dense, and I haven’t found anyone else who could explain the work to me in a relatively straightforward way (though I have heard that Daniel Filan’s AXRP podcast with Vanessa is a better way to get started than previous material, though it hadn’t been published when I was evaluating this grant).
I did receive a decent number of positive references for Vanessa’s work, and I have seen her make contributions to other conversations online that struck me as indicative of a pretty deep understanding of the AI Alignment problem.
If I had to guess at the effects of this kind of work, though I should clarify I am substantially deferring to other people here in a way that makes me not particularly trust my specific predictions, I expect that the primary effect would be that the kind of inquiry Vanessa is pursuing highlights important confusions and mistaken assumptions in how we expect machine intelligence to work, which when resolved, will make researchers better at navigating the very large space of potential alignment approaches. I would broadly put this in the category of “Deconfusion Research”.
Vanessa’s research resulted in various public blog posts, which can be found here.
Skyler Crossman ($22,000): Support for Astral Codex Ten Everywhere meetups
Especially since the collapse of FTX, I am quite interested in further diversifying the set of communities that are working on things I think are important to the future. AstralCodexTen and SlateStarCodex meetups seem among the best candidates for creating additional thriving communities with overlapping, but still substantially different norms.
I do feel currently quite confused about what a good relationship between adjacent communities like this and Effective Altruism-labeled funders like the Long Term Future Fund should be. Many of these meetups do not aim to do as much as good as possible, or have much of an ambitious aim to affect the long term future of humanity, and I think pressures in that direction would likely be more harmful than helpful, by introducing various incentives for deception and potentially preventing healthy local communities from forming by creating a misaligned relationship between the organizers (who are paid by EA institutions to produce as much talent for longtermist priorities) and the members (who are interested in learning cool things about rationality and the world and want to meet other people with similar interests).
Since this is a relatively small grant, I didn’t really resolve this confusion, and mostly decided to just go ahead with this. I also talked a bunch to Skyler about this, and currently think we can figure out a good relationship into the future on how it’s best to distribute funding like this, and I expect to think more about this in the coming weeks.
Grants evaluated by Asya Bergal
Any views expressed below are my personal views, and not the views of my employer, Open Philanthropy. (In particular, getting funding from the Long-Term Future Fund should not be read as an indication that the applicant has a greater chance of receiving funding from Open Philanthropy, and not receiving funding from the Long-Term Future Fund [or any risks and reservations noted in the public payout report] should not be read as an indication that the applicant has a smaller chance of receiving funding from Open Philanthropy.)
Alignment Research Center $54,543: Support for a research & networking event for winners of the Eliciting Latent Knowledge contest
This was funding a research & networking event for the winners of the Eliciting Latent Knowledge contest run in early 2022; the plan for the event was mainly for it to be participant-led, with participants sharing what they were working on and connecting with others, along with professional alignment researchers visiting to share their own work with participants.
I think the case for this grant is pretty straightforward: the winners of this contest are (presumably) selected for being unusually likely to be able to contribute to problems in AI alignment, and retreats, especially those involving interactions with professionals in the space, have a strong track record of getting people more involved with this work.
Daniel Filan ($23,544): Funding to produce 12 more AXRP episodes, the AI X-risk Podcast.
We recommended a grant of $23,544 to pay Daniel Filan for his time making 12 additional episodes of the AI X-risk Research Podcast (AXRP), as well as the costs of hosting, editing, and transcription.
The reasoning behind this grant was similar to the reasoning behind my last grant to AXRP:
I’ve listened or read through several episodes of the podcast; I thought Daniel asked good questions and got researchers to talk about interesting parts of their work. I think having researchers talk about their work informally can provide value not provided by papers (and to a lesser extent, not provided by blog posts). In particular:
I’ve personally found that talks by researchers can help me understand their research better than reading their academic papers (e.g. Jared Kaplan’s talk about his scaling laws paper). This effect seems to have also held for at least one listener of Daniel’s podcast.
Informal conversations can expose motivations for the research and relative confidence level in conclusions better than published work.
Daniel also shared some survey data in his grant application about how people rated AXRP compared to other AI alignment resources, though I didn’t look at this closely when making the grant decision, as I already had a reasonably strong prior towards funding.
Grants evaluated by Caleb Parikh
Conjecture ($72,827): Funding for a 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan.
Conjecture applied for funding to host a two day AI safety workshop in Japan in collaboration with Araya (a Japanese AI company). They planned to invite around 40 people, with half of the attendees being AI researchers, and half being alignments researchers from the US and UK. Japanese researchers were generally senior, leading labs, holding postdoc positions in academia, or holding senior technical positions at tech companies.
To my knowledge, there has been very little AI safety outreach conducted amongst strong academic communities in Asia (e.g. in Japan, Singapore, South Korea …). On the current margin, I am excited about more outreach being done in these countries within ultra-high talent groups. The theory of change for the grant seemed fairly straightforward: encourage talented researchers who are currently working in some area of AI to work on AI safety, and foster collaborations between them and the existing alignment community.
Conjecture shared the invite list with me ahead of the event, and I felt good about the set of alignment researchers invited from the UK and US. I looked into the Japanese researchers briefly, but I found it harder to gauge the quality of invites given my lack of familiarity with the Japanese AI scene. I also trust Conjecture to execute operationally competently on events of this type, having assisted other AI safety organisations (such as SERI MATS) in the past.
On the other hand, I have had some concerns about Conjecture, and I felt confused about whether this conference gave Conjecture more influence in ways that I would feel concerned about given the questionable integrity and judgement of their CEO, - see this and this section of a critique of their organisation (though note that I don’t necessarily endorse the rest of the post). It was also unclear to me how counterfactual the grant was, and how this traded off against activities that I would be less excited to see Conjecture run. I think this is a general issue with funding projects at organisations with flexible funding, as organisations are incentivised to present their most fundable projects (which they are also the most excited about), and then in cases where the funding request is successful, move funding that they would have spent on this projects to other lower impact projects. Overall, I modelled making this grant as being about a quarter as cost-effective as it might have been without these considerations (though I don’t claim this discount factor to be particularly reliable).
Overall, I thought this grant was pretty interesting, and I think that the ex-ante case for it was pretty solid. I haven’t reviewed the outcomes of this grant yet, but I look forward to reviewing and potentially making more grants in this area.
Update: Conjecture kindly directed me towards this retrospective and have informed me that some Japanese attendees of their conference are thinking of creating an alignment org.
SERI MATS program ($316,000): 8 weeks scholars program to pair promising alignment researchers with renowned mentors. (Originally evaluated by Asya Bergal)
SERI MATS is a program that helps established AI safety researchers find mentees. The program has grown substantially since we first provided funding, and now supports 15 mentors, but at the time, the mentors were Alex Gray, Beth Barnes, Evan Hubinger, John Wentworth, Leo Gao, Mark Xu, and Stuart Armstrong. Mentors took part in the program in Berkeley in a shared office space.
When SERI MATS was founded, there were very few opportunities for junior researchers to try out doing alignment research. Many opportunities were informal mentorship positions, sometimes set up through cold emails or after connecting at conferences. The program has generally received many more qualified applicants than they have places for, and the vast majority of fellows report a positive experience of the program. I also believe the program has substantially increased the number of alignment research mentorship positions available.
I think that SERI MATS is performing a vital role in building the talent pipeline for alignment research. I am a bit confused about why more organisations don’t offer larger internship programs so that the mentors can run their programs ‘in-house’. My best guess is that MATS is much better than most organisations running small internship programs for the first time, particularly in supporting their fellows holistically (often providing accommodation and putting significant effort into the MATS fellows community). One downside of the program relative to an internship at an organisation is that there are fewer natural routes to enter a managed position, though many fellows have gone on to receive LTFF grants for independent projects or continued their mentorship under the same mentor.
Robert Long ($10,840): travel funding for participants in a workshop on the science of consciousness and current and near-term AI systems
Please note this grant has been approved but at the time of writing it has not been paid out.
We funded Robert Long to run a workshop on the science of consciousness for current and near-term AI systems. Robert and his FHI colleague, Patrick Butlin, began the project on consciousness in near-term AI systems during their time at FHI, where they both worked in the digital minds research group. Since January of this year, Rob has been continuing the project while a philosophy fellow at CAIS. There are surprisingly few people investigating the consciousness of near-term AI systems, which I find pretty worrying given the rapid pace of progress in ML. I think that it’s plausible we end up creating many copies of AI systems and use them in ways that we’d consider immoral given enough reflection , in part due to ignorance about their preferences. The workshop aimed to produce a report applying current theories of consciousness (like integrated information theory and global workspace theory) to current ml systems.
I think that Rob is an excellent fit for this kind of work; he is one of the few people working in this area and has written quite a lot about AI consciousness on his blog. He has a PhD in philosophy from NYU, where he was advised by David Chalmers, and has experience running workshops (e.g. in 2020, he ran a workshop on philosophy and large language models with Amanda Askell).
Jeffrey Ladish ($98,000): 6-monthstipend & operational expenses to start a cybersecurity & alignment risk assessment org
Please note this grant has been approved but at the time of writing it has not been paid out.
Jeffrey Ladish applied for funding to set up an organisation to do AI risk communications, with a focus on cybersecurity and alignment risks. His organisation, Palisade Research Inc., plans to conduct risk assessments and communicate those risks to the public, labs and the government. The theory of change is that communicating catastrophic risks to the public and key decision makers could increase political support for slowing down AI and other measures that might reduce AI risk. I am particularly excited about Jeffrey’s organisation demonstrating offensive AI cyber capabilities and other demos that help to communicate current risks from advanced AI systems.
I am pretty excited about Jeffrey’s organisation. He has worked on information security in various organisations (including Anthropic), he seems well-networked amongst people working in think tanks and AI labs,and I like his public writing on AI risk. I am generally sceptical of people doing work related to policy without having first worked in lower stakes positions in similar areas first, but I thought that Jeffrey was orienting to the downsides very reasonably and doing the sensible things, like developing plans with more experienced policy professionals.
Grants evaluated by Matthew Gray
Leap Laboratories ($195,000): One year of seed funding for a new AI interpretability research organisation.
Jessica Rumbelow applied for seed funding to set up an interpretability research organisation, which hopes to develop a model-agnostic interpretability engine.
I think trends in the AI development space suggest a need for model-agnostic methods.
More broadly, I think this showcases one of the primary benefits of interpretability research: it’s grounded in a way that makes it easy to verify and replicate.
Daniel Kokotajlo ($10,000): Funding for a research retreat on a decision-theory/cause-prioritisation topic.
We funded a research retreat run by Daniel Kokotajlo on Evidential Cooperation in Large Worlds. I think research retreats like this are both quite productive and quite cheap; we only have to pay for travel and housing costs, and the attendees are filtered on intrinsic interest in the topic.
Grants evaluated by Thomas Larsen
Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos ($167,480): Implementing and expanding on the research methods of the “Discovering Latent Knowledge” paper.
This is a team which started in SERI MATS applying for funding to continue their SERI MATS project on research checking for dishonesty in advanced AI systems.
My cruxes for this type of grant are:
(1) If done successfully, would this project help with alignment?
(2) How likely is this team to be successful?
My thoughts on (1):
This is meant to build upon Burns’ et al.‘s Discovering Latent Knowledge paper (DLK), which finds a direction in activation space that is supposed to represent the ‘truth’ of a logical proposition.
I think that Eliciting Latent Knowledge (ELK) is an important subproblem of alignment, and I think it can be directly applied to combat deceptive alignment. My independent impression is that this specific direction towards solving ELK is not very useful towards a full alignment solution, but that it may lead to slightly better monitoring. (In particular, I think even in a good outcome, this will only lead to an average case solution to ELK, meaning that when we explicitly train against this detector, it will fail.) I expect that AGI projects will be in a position where it’s obvious that the systems they are building are capable and dangerous, and it will be apparent that instrumental incentives kick in for e.g. powerseeking and deception. I think that this technique might help us detect this danger, but given that we can’t train against it, it doesn’t let us actually fix the underlying problem. Thus, the lab will be in the difficult position of continuing on, or having to train against their detection system. I still think that incremental progress on detecting deception is good, because it can help push for a stop in capabilities growth before prematurely continuing to AGI.
My thoughts on (2):
They produced reasonable output during SERI MATS, including the beginning of a replication of the DLK paper. They weren’t that specific in their grant application, but they wrote a number of ideas for ways to extend the paper in the LW post. The two ideas that seem best to me are:
Connecting DLK to mechanistic interpretability. This seems hard, but maybe tinkering around in activation space can be helpful.
Creating a better confidence loss. In the original paper, only one statement was considered, and so the loss was coming from the constraint that P(q) + P(not q) = 1. They propose evaluating two propositions p & q, and getting more constraints from that.
These ideas don’t seem amazing, but they seem like reasonable things to try. I expect that the majority of the benefit will come from staring at the model internals and the results of the techniques and then iterating. I hope that this process will churn out more and better ideas.
One reservation I have is that none of the applicants have an established research track record, though they have published several papers:
This team did get strong references from Colin Burns and John Wentworth, which makes me a lot more excited about the project. All things considered, I’m excited about giving this team a chance to work on this project, and see how they are doing. I’m also generally enthusiastic about teams trying their hand at alignment research.
Joseph Bloom ($50,000): Funding AI alignment research into circuits in decision transformers.
Joseph applied for independent research funding to continue his research into decision transformer interpretability. I’m happy about Joseph’s initial result, which found circuits in a decision transformer in a simple RL environment. I thought the applicant’s write up was solid and gave me some updates on what cognitive machinery I expect to be induced by RL. In particular, I was excited about the preference directions in embedding space that they constructed. This seems like a useful initial step for retargeting the search, though more understanding of the circuits that are doing the optimization seems critical for this approach.
I think interpretability on RL models is pretty neglected and very relevant for safety.
According to a reference, the applicant was also in the top 3 ARENA participants, and was very motivated and agentic.
The counterfactual is that Joseph tries to get funding elsewhere, and if that fails, getting a research engineer job at an AI safety org (e.g. Redwood, Conjecture, Ought, etc). I encouraged this person to apply to the AI safety orgs, as I think that working at an org is generally more productive than independent research. These jobs are quite competitive, so it’s likely that Joseph won’t get hired by any of them, and in this case, it seems great to pay him to do independent alignment research.
Overall, I think that Joseph is a promising researcher, and is working on a useful direction, so I feel excited about supporting this.
Since receiving this grant, Joseph has received some more funding (here), and was mentioned in the Anthropic May Update.
Other grants we made during this period
Applicant Name
Grant Summary
Awarded Amount
Decision Date
Thomas Woodside
Support to work on research projects relevant to AI alignment
$50,000
January 2022
Anonymous
Support to study and gain a background in technical AI
$3,000
January 2022
Charlie Steiner
Support for researching value learning
$50,000
January 2022
Logan Smith
Support to create language model (LM) tools to aid alignment research through feedback and content generation
$40,000
January 2022
Paul Colognese
Educational scholarship in AI safety
$13,000
January 2022
Anonymous
AI governance PhD
$4,129
January 2022
Ruth Grace Wong
Research paper about the history of philanthropy-driven national-scale movement-building strategy to inform how EA funders might go about building movements for good
$2,000
February 2022
Stephen Grugett, James Grugett, Austin Chen
Support to build a forecasting platform based on user-created play-money prediction markets
$200,000
February 2022
Marius Hobbhahn
Research on AI safety
$30,103
February 2022
JJ Hepburn
Health coaching to optimise the health and wellbeing, and thus capacity/productivity, of those working on AI safety
$80,000
February 2022
Vael Gates
Support for a study on AI researchers’ perceptions of safety
$9,900
February 2022
William Bradshaw
Support to work on biosecurity
$11,400
February 2022
Michael Parker
Catalogue the history of U.S. high-consequence pathogen regulations, evaluate their performance, and chart a way forward
$34,500
February 2022
Stuart Armstrong
Support for setting up a research company in AI alignment
$33,762
February 2022
Anonymous
AI safety field-building
$32,568
February 2022
Anonymous
Travel funds to attend a conference and network with the community at an EA hub
$1,600
February 2022
Timothy Underwood
Write a SF/F novel based on the EA community
$15,000
February 2022
Simon Grimm
Financial support for work on a biosecurity research project and workshop, and travel expenses
$15,000
February 2022
Anonymous
Scholarship/teaching buy-out to finish Master’s thesis and commence AI safety research
$10,800
February 2022
Oliver Zhang
Running an alignment theory mentorship program with Evan Hubinger
$3,600
February 2022
Anonymous
A large conference hosting communities working on improving the long-term future
$250,000
February 2022
Anonymous
Recording written materials that are useful for people working on AI governance
$5,100
March 2022
Gavin Leech
Researching and documenting longtermist lessons from COVID
$5,625
March 2022
Anonymous
Support to work on a safe exploration project with an AI research organization
$33,000
March 2022
Anonymous
Support to work on a technical AI safety research project in an academic lab
$45,000
March 2022
Jessica Cooper
Funding to trial a new London organisation aiming to 10x the number of AI safety researchers
$234,121
March 2022
Aaron Bergman
Research on EA and longtermism
$70,000
March 2022
Anonymous
Funding a visit to the Sculpting Evolution group for collaboration
$4,000
March 2022
Jan-Willem van Putten
EU Tech Policy Fellowship with ~10 trainees
$68,750
March 2022
Anonymous
3-month funding to do an internship to develop career capital in policy advocacy
$12,600
March 2022
Anonymous
Support for equipment for AI Safety and Metascience research
$1,905
March 2022
Darryl Wright
1-year research stipend (and travel and equipment expenses) for support for work on 2 AI safety projects: 1) Penalising neural networks for learning polysemantic neurons; and 2) Crowdsourcing from volunteers for alignment research.
$150,000
March 2022
Anonymous
Support for travel and equipment expenses for EA work on AI alignment
$5,000
March 2022
Tomáš Gavenčiak
Organise the third Human-Aligned AI Summer School, a 4-day summer school for 150 participants in Prague, summer 2022
$110,000
March 2022
Anonymous
Independent alignment research at the intersection of computational cognitive neuroscience and AGI safety
$55,000
March 2022
Kai Sandbrink
Starting funds for a DPhil project in AI that addresses safety concerns in ML algorithms and positions
$3,950
April 2022
Maximilian Kaufmann
Support to work on technical AI alignment research
$7,000
April 2022
Anonymous
PhD in Safe and Trusted AI with a focus on inductive biases towards the interpretability of neural networks
$63,259
April 2022
Chloe Lee
Support to study emerging policies in biosecurity for better understanding and global response coordination
$25,000
April 2022
Jack Ryan
Support for alignment theory agenda evaluation
$25,000
April 2022
Isabel Johnson
Support research, write, and publish a book: a survey on the unknown dangers of a contemporary nuclear strike
$5,000
April 2022
Nicholas Greig
Neural network interpretability research
$12,990
April 2022
Centre for the Governance of AI
GovAI salaries and overheads for new academic AI governance group.
$401,537
April 2022
Daniel Skeffington
Research and a report/paper on the the role of emergency powers in the governance of X-Risk
$26,000
April 2022
Noga Aharony
Support for PhD developing computational techniques for novel pathogen detection
$20,000
April 2022
Tim Farrelly
Equipment for AI Safety research
$3,900
April 2022
Sasha Cooper
6 months funding for supervised research on the probability of humanity becoming interstellar given non-existential catastrophe
$36,000
April 2022
Kevin Wang
Support to work on Aisafety.camp project, impact of human dogmatism on training
$2,000
April 2022
Philipp Bongartz
Enabling prosaic alignment research with a multi-modal model on natural language and chess
$25,000
May 2022
Ross Graham
Stipend and research fees for completing dissertation research on public ethical attitudes towards x-risk
$60,000
May 2022
Nikiforos Pittaras
Support and compute expenses for technical AI Safety research on penalising RL agent betrayal
$14,300
May 2022
Josiah Lopez-Wild
Funding a new computer for AI alignment work, specifically a summer PIBBSS fellowship and ML coding
$2,500
May 2022
Theo Knopfer
Support to explore biosecurity policy projects: BWC/ European early detection systems/Deep Vision risk mitigation
$27,800
May 2022
Jan Kirchner
Support for working on “Language Models as Tools for Alignment” in the context of the AI Safety Camp.
$10,000
May 2022
Lucius Bushnaq, Callum McDougall, Avery Griffin
Support to investigate the origins of modularity in neural networks
$125,000
May 2022
Anonymous
Admissions fee for MPA in International Development at a top university
$800
May 2022
Anonymous
Support for research on international standards for AI
$5,250
May 2022
Rory Gillis
Research project designed to map and offer preliminary assessment of AI ideal governance research
$2,000
May 2022
John Bridge
Research into the international viability of FHI’s Windfall Clause
$3,000
May 2022
CHERI / Naomi Nederlof
Stipends for students of 2022 CHERI’s summer residence
$134,532
May 2022
Wyatt Tessari
Support to connect, expand and enable the AGI safety community in Canada
$87,000
May 2022
Ondrej Bajgar
Support funding during 2 years of an AI safety PhD at Oxford
$11,579
May 2022
Neil Crawford
Support gatherings during 12 months period for discussion of AI safety
$10,000
May 2022
Anonymous
Support to do AI alignment research on Truthful/Honest AI
$120,000
May 2022
Logan Strohl
Support to further develop a branch of rationality focused on patient and direct observation
$80,000
May 2022
Anonymous
Support for courses and research on AI
$4,000
May 2022
Anonymous
Support to explore the concept of normative risk and its potential practical consequences
$20,000
May 2022
Philippe Rivet
Support for research into applied technical AI alignment work
$10,000
May 2022
Anonymous
Support to extend Udacity Deep Reinforcement Learning Nanodegree
$1,400
May 2022
Cindy Wu
ML security/safety summer research project: model backdooring through pre-processing
$5,000
May 2022
Marius Hobbhahn
Support for Marius Hobbhahn for piloting a program that approaches and nudges promising people to get into AI safety faster
$50,000
May 2022
Conor McGlynn
Up-skill for AI governance work before starting Science and Technology Policy PhD at Harvard
$17,220
June 2022
Anonymous
Support to hire a shared PA for researchers working at two organisations contributing to AI safety and governance
$78,000
June 2022
Nora Ammann
Support the PIBBSS fellowship with more fellows than originally anticipated and to realize a local residency
$180,200
June 2022
Julia Karbing
Paid internships for promising Oxford students to try out supervised AI Safety research projects
$60,000
June 2022
Anonymous
Support to take Sec401 course from SANS for cyber security professionals
$8,589
June 2022
Anonymous
Funding for 1-year executive and research assistance to support 2 researchers working in the longtermist space
$84,000
June 2022
Francis Rhys Ward
Funding to support PhD in AI Safety at Imperial College London, technical research and community building
$6,350
June 2022
Peter Barnett
Equipment for technical AI safety research
$4,099
June 2022
Jacques Thibodeau
3-month research stipend to continue working on AISC project to build a dataset for alignment and a tool to accelerate alignment
$22,000
June 2022
Chris Patrick
Stipend to produce a guide about AI safety researchers and their recent work, targeted to interested laypeople
$5,000
June 2022
Anonymous
Software engineering to revise and resubmit a multi-objective reinforcement learning paper
$26,000
June 2022
Anonymous
PhD/research stipend for work on key longtermist area
$30,000
June 2022
Jay Bailey
Support for Jay Bailey for work in ML for AI Safety
$79,120
June 2022
Thomas Kehrenberg
6-month research stipend for AI alignment research
$15,000
June 2022
Solomon Sia
Support to lobby the CFTC and legalise prediction markets
$138,000
June 2022
Bálint Pataki
Support AI Policy studies in the ML Safety Scholars program and at Oxford
$3,640
June 2022
Jade Zaslavsky
12-month research stipend to work on ML models for detecting genetic engineering in pathogens
$85,000
June 2022
Gergely Szucs
6-month research stipend to develop an overview of the current state of AI alignment research, and begin contributing
$70,000
June 2022
Anonymous
AI safety PhD funding
$7,875
June 2022
Conor Barnes
Website visualising x-risk as a tree of branching futures per Metaculus predictions
$3,500
June 2022
Jonas Hallgren
3-month research stipend to set up a distillation course helping new AI safety theory researchers to distil papers
$14,600
June 2022
Victor Warlop
SERI MATS aims at scaling the number of alignment theorists by pairing promising applicants with renowned mentors
$316,000
June 2022
Patrick Gruban
Weekend organised as a part of the co-founder matching process of a group to found a human data collection org
$2,300
June 2022
Victor Warlop Piers de Raveschoot
Retroactive grant for managing the MATS program, 1.0 and 2.0
$27,000
June 2022
Anonymous
A 1-year research stipend for up-skilling in technical and general AI alignment to prepare for an impactful job in the field
$110,000
June 2022
Anonymous
7-month research stipend to do independent AI Safety research on interpretability and upskill in ML engineering
$43,600
June 2022
Mario Peng Lee
Stanford Artificial Intelligence Professional Program Tuition
$4,785
July 2022
Conor Sullivan
Develop and market video game to explain the Stop Button Problem to the public & STEM people
$100,000
July 2022
Quinn Dougherty
Short meatspace workshop to hone, criticize, and evaluate hazardousness of a new research programme in alignment
$9,000
July 2022
Viktoria Malyasova
Stipend for up-skilling in infrabayesianism prior to start of SERI MATS program
$4,400
July 2022
Samuel Nellessen
6-month budget to self-study ML and the possible applications of a Neuro/CogScience perspective for AGI Safety
$4,524
July 2022
Charles Whittaker
Support for academic research projects relating to pandemic preparedness and biosecurity
$8,150
July 2022
Amrita A. Nair
3-month funding for upskilling in technical AI Safety to test personal fit and potentially move to a career in alignment
$1,000
July 2022
Jeffrey Ohl
Tuition to take one Harvard economics course in fall 2022 to be a more competitive econ graduate school applicant
$6,557
July 2022
Anonymous
Funding to take an online course on public policy to help the applicant transition from Machine Learning to AI-Governance
$2,732
July 2022
Samuel Brown
6-month research stipend to research AI alignment, specifically the interaction between goal-inference and choice-maximisation
$47,074
July 2022
Anonymous
Support for multiple ML projects to build up skills for AI safety PhD
$1,100
July 2022
Anonymous
25-month grant funding EA-relevant dissertation that contributes to improved research on rate-limiting steps and constraints in AI research.
$139,000
July 2022
Kyle Scott
A research and networking event for winners of the Eliciting Latent Knowledge contest to encourage collaboration on aligning future machine learning systems with human interests
$72,000
July 2022
Derek Shiller
Support for an academic project evaluating factors relevant to digital consciousness with the aim of better understanding how and how not to create conscious artificial intelligences.
Financial support for career exploration and related project in AI alignment
$26,077
August 2022
Anonymous
2-month research stipend to build skills, and broaden action space for EA related projects to undertake in gap year
$15,320
August 2022
Jonathan Ng
Funding support for MLSS scholar to up-skill in ML for alignment, documenting key learnings, and visit Berkeley in pursuit of a career in technical AI safety.
$16,000
August 2022
Hamza Tariq Chaudhry
Equipment expenses for summer research fellowship at CERI and organising the virtual Future of Humanity Summit
$2,500
August 2022
Anonymous
Research project on strategies to mitigate x-risk in Party Politics
$3,000
August 2022
Anonymous
Funding for administrative support to the CEO for a large team working on research of interest to the longtermist community
$50,847
August 2022
Simon Skade
Funding for 3 months’ independent study to gain a deeper understanding of the alignment problem, publishing key learnings and progress towards finding new insights.
$35,625
August 2022
Ardysatrio Haroen
Support participation in MLSS program working on AI alignment.
$745
August 2022
Antonio Franca
Equipment stipend for MLSS scholar to do research in AI technical research
$2,000
August 2022
Darren McKee
Support for a non-fiction book on threat of AGI for a general audience
$50,000
August 2022
Steve Petersen
Research stipend to work on the foundational issue of *agency* for AI safety
$20,815.20
August 2022
Ross Nordby
Support for AI safety research and concrete research projects
$62,500
August 2022
Leah Pierson
300-hour research stipend for a research assistant to help implement a survey of 2,250 American bioethicists to lead to more informed discussions about bioethics.
$4,500
August 2022
Luca De Leo
12-month research stipend to study and get into AI Safety Research and work on related EA projects
$14,000
August 2022
Anonymous
Two months of independent study in alignment to start my career as an alignment researcher
$8,333
August 2022
Robi Rahman
Support for part-time rationality community building
$4,000
August 2022
Lennart Justen
Funding to increase my impact as an early-career biosecurity researcher
$6,000
September 2022
Fabienne Sandkühler
Funding for research on the effect of creatine on cognition
$4,000
September 2022
Chris Leong
Funding for the AI Safety Nudge Competition
$5,200
September 2022
Brian Porter
Independent research and upskilling for one year, to transition from academic philosophy to AI alignment research
$60,000
September 2022
John Wentworth
1-year research stipend for research in applications of natural abstraction
$180,000
September 2022
Anonymous
6 month research stipend for SERI MATS scholar to continue working on Alignment and ML Interpretability
$48,000
September 2022
Nicky Pochinkov
6-month research stipend for SERI MATS scholar to continue working on theoretical AI alignment research, trying to better understand how ML models work to reduce X-risk from future AGI
$50,000
September 2022
David Hahnemann, Luan Ademi
6-month research stipend for 2 people working on modularity, a subproblem of Selection Theorems and budget for computation
$26,342
September 2022
Dan Valentine
12-month research stipend to transition career into technical alignment research
$25,000
September 2022
Anonymous
3-month funding to explore GCBR-focused biosecurity projects after having finished my virology PhD
$25,000
September 2022
Logan Smith
6-month research stipend for continued work on shard theory: studying how inner values are formed by outer reward schedules
$40,000
September 2022
Gunnar Zarncke
One year grant for a project to reverse-engineer human social instincts by implementing Steven Byrnes’ brain-like AGI
$16,600
September 2022
Zach Peck
Supporting participation at the Center for the Advancement of Rationality (CFAR) workshop
$1,800
September 2022
Anonymous
AI master’s thesis and research in longtermism
$30,000
September 2022
Anonymous
Upskilling in technical AI Safety Research to contribute to the field through an engineering or research role
$33,000
September 2022
Adam Rutkowski
Piloting an EA hardware lab for prototyping hardware relevant to longtermist priorities
$44,000
September 2022
Anonymous
Setting up experiments with LLM to examine Strategic Instrumental Behavior in real-life setting
$50,000
September 2022
Egor Zverev
PhD program support
$6,500
September 2022
Anonymous
1-year research stipend to work on alignment research full time
$80,000
September 2022
Shavindra Jayasekera
Research in machine learning and computational statistics
$38,101
October 2022
Hoagy Cunningham
6-month stipend for research into preventing steganography in interpretable representations using multiple agents
$20,000
October 2022
Joel Becker
5-month research stipend to support civilizational resilience projects arising from SHELTER Weekend
$27,248
October 2022
Jonas Hallgren
4 month research stipend to set up AI safety groups at 2 groups covering 3 universities in Sweden with eventual retreat
$10,000
October 2022
Anonymous
4 month research stipend in technical safety, ML, and AI chip supply chains before participating in an AI governance program
$11,500
October 2022
Anonymous
8-month research stipend to do research in AI safety
$35,000
October 2022
Anonymous
3-month research stipend in technical AI safety
$9,750
October 2022
David Udell
One-year full-time research stipend to work on alignment distillation and conceptual research with Team Shard after SERI MATS
$100,000
October 2022
John Burden
Funding 2 years of technical AI safety research to understand and mitigate risk from large foundation models
$209,501
October 2022
Anonymous
AI safety research
$1,500
October 2022
Garrett Baker
12-month research stipend to work on alignment research
$96,000
October 2022
Magdalena Wache
9-month part-time research stipend for AI safety, test fit for theoretical research
$62,040
October 2022
Anshuman Radhakrishnan
6-month stipend to continue upskilling in Machine Learning in order to contribute to Prosaic AI Alignment Research
$55,000
October 2022
Theo Knopfer
Travel Support to BWC RevCon & Side Events
$3,500
October 2022
Daniel Herrmann
Support for PhD on embedded agency, to free up my time from teaching
$64,000
October 2022
Jeremy Gillen
6-month research stipend to work on the research I started during SERI MATS, solving alignment problems in model based RL
$40,000
October 2022
Anonymous
3.5 months’ support for ML engineering skill-up
$8,720
October 2022
Edward Saperia
One year of funding to improve an established community hub for EA in London
$50,000
November 2022
Chu Chen
1-year research stipend for upskilling in technical AI alignment research
$96,000
November 2022
Anonymous
12-month stipend to research assumptions underlying most existing work on AI alignment and AI forecasting
$7,645
November 2022
Kajetan Janiak
Support forAI safety research.
$4,000
November 2022
Felix Hofstätter
6-month research stipend for an AI alignment research project on the manipulation of humans by AI
$25,383
November 2022
Maximilian Kaufmann
4 month research stipend to support an early-career alignment researcher, who is taking a year to pursue research and test fit
$20,000
November 2022
Will Aldred
6-month research stipend to: 1) Carry out independent research into risks from nuclear weapons, 2) Upskill in AI strategy
$40,250
November 2022
Benjamin Anderson
Support to conduct work in AI safety
$5,000
November 2022
Arun Jose
4-month funding for Arun Jose’s independent alignment research and study
$15,478
November 2022
Anonymous
Professional development grant for independent upskilling in AGI Safety
$3,600
November 2022
Matthias Georg Mayer
6-months research stipend for upskilling and researching “Framing computational systems such that we can find meaningful concepts.”
$24,000
November 2022
Johannes C. Mayer
6 months research stipend. Turn intuitions, like goals, wanting, abilities, into concepts applicable to computational systems
$24,000
November 2022
Anonymous
Funding for MSc Thesis on Language Models Safety
$28,160
November 2022
Paul Bricman
1-year stipend and compute for conducting a research project focused on AI safety via debate in the context of LLMs
$50,182
November 2022
Simon Möller
6-months research stipend to transition into technical AI Safety work by working through Jacob Hilton’s curriculum and a project
$65,000
November 2022
Anonymous
Fall semester stipend to work on AI Safety research, in particular adversarial robustness, monitoring, and trojaning
$7,500
November 2022
Alan Chan
4-month research stipend for a research visit with David Krueger on evaluating non-myopia in language models and RLHF systems
$12,321
November 2022
Tomislav Kurtovic
3-month research stipend to skill up in ML and Alignment with goal of developing a streamlined course in Math/AI
$5,500
November 2022
Kadri Reis
Support to participate in Biological Weapons Convention in Geneva
$1,500
November 2022
Skyler Crossman
Twelve month funding for global rationality organization development
$130,000
December 2022
Daniel O’Connell
Investigate AI alignment options
$54,250
December 2022
Remmelt Ellen
Cover participant stipends for AI Safety Camp Virtual 2023
$72,500
December 2022
Josiah Lopez-Wild
Scholarship for PhD student working on research related to AI Safety
$8,000
January 2023
Zhengbo Xiang (Alana)
Support for 18 months of independent alignment research and upskilling, focusing on developing a research agenda on corrigibility
$30,000
January 2023
Daniel Filan
Funding to make 12 more AXRP episodes, the AI X-risk Research Podcast.
$23,544
January 2023
Sam Marks
3-week research stipend for three people to review AI alignment agendas
$26,000
January 2023
Robert Kirk
Funding to perform human evaluations for evaluating different machine learning methods for aligning language models
$10,000
January 2023
Jérémy Perret
Support for AI alignment outreach in France (video/audio/text/events) & field-building
$24,800
January 2023
Peter Ruschhaupt
3 months support for exploring career options in AI governance—upskilling, networking and writing articles summarising present AI governance work and ideas.
$20,000
January 2023
Charlie Griffin
8 months research stipend for alignment work: assisting academics, skilling up and personal research.
$35,000
January 2023
Alexander Lintz
6 months research stipend for independent work centred on distillation and coordination in the AI governance & strategy space
$69,940
January 2023
Anonymous
Living cost stipend top up while working on long-term future relevant research at a think tank
$15,000
January 2023
Francis Rhys Ward
Support for PhD in AI safety—technical research and community building work
$2,305
January 2023
Lucius Bushnaq
6-month research stipend for two people to find formalisms for modularity in neural networks
$72,560
January 2023
David Quarel
Support for a project with the Cambridge AI Safety group. The group will be working on projects related to AI alignment, in particular, setting up experimental demonstrations of deceptive alignment.
$5,613
January 2023
Tim Farkas
Funding to run a 20-30 people 2-3 day retreat & bring together key EA thinkers/actors of the mind enhancement cause area
$2,540
February 2023
Wyatt Tessari
3-month stipend to connect, expand and enable the AGI gov/safety community in Canada
$17,000
February 2023
Anonymous
14-month research stipend and research costs for 3 research reports on best risk communication practices for longtermist orgs
$96,000
February 2023
Daniel Kokotajlo
Funding for research retreat on a decision-theory / cause-prioritisation topic.
$10,000
February 2023
Alex Altair
Funding for research stipend to develop a framework of optimisation.
$8,000
February 2023
Max Lamparth
Funding for technical AI safety research—using interpretability methods on large language models for AI safety.
$2,500
February 2023
Liam Carroll
6-week research stipend to publish a series of blogposts synthesising Singular Learning Theory for a computer science audience
$8,000
February 2023
Amrita A. Nair
3-month scholarship to support Amrita Nair’s upskilling in AI Safety working on Evan Hubinger’s Reward Side-Channels experiment proposal.
$5,000
February 2023
Gerold Csendes
Funding for project transitioning from AI capabilities to AI Safety research.
$8,200
February 2023
Anonymous
Career transition including but not limited to exploring helping set up an x-risk research institute and working on a research project on AI ethics boards
$30,000
February 2023
Tamsin Leake
6 months research stipend to do independent AI alignment research focused on formal alignment and agent foundations
$30,000
February 2023
Chris Scammell, Andrea Miotti, Katrina Joslin
A 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan
$72,827
February 2023
Joseph Bloom
6-month research stipend to conduct AI alignment research circuits in decision transformers
$50,000
February 2023
Carson Jones
1 year research stipend (or less) to help alignment researchers improve their research ability via 1-on-1 conversations
$10,000
February 2023
Andrei Alexandru
Fine-tuning large language models for an interpretability challenge (compute costs)
$11,300
February 2023
Anonymous
Two month research stipend and bridge funding to complete an AI governance report and produce a related article
$11,560
February 2023
Jacob Mendel
General support to spend 1 month working with Will Bradshaw’s team at the Nucleic Acid Observatory producing reports on the merits of alternative sample choices to wastewater for metagenomic sequencing.
$4,910
February 2023
Max Räuker
Funding for Max Rauker’s part-time research stipend for a trial and developer costs to maintain and improve the AI governance document sharing hub
$15,000
March 2023
Anonymous
A twelve month research stipend to pursue independent writing on the sociology and philosophy of longtermist effective altruism
$75,346
March 2023
Anonymous
3-4 month stipend for AI safety upskilling and research
$7,000
March 2023
Fabian Schimpf
6-month research stipend for AI alignment research and conduct independent research on limits of predictability
$28,875
March 2023
Anonymous
Support for PhD student pursuing research areas that intersect economics and EA
$4,528
March 2023
Kane Nicholson
6-months research stipend for AI safety upskilling and research projects
$26,150
March 2023
David Lindner
Support for David Linder and Jeremy Scheurer to participate in Redwood Research’s REMIX program on mechanistic interpretability using their new causal scrubbing methodology
$4,300
March 2023
Jessica Rumbelow
One year of seed funding for a new AI interpretability research organisation
$195,000
March 2023
Alexander Large
1 month general support for projects for small EA-aligned charities.
$3,618
March 2023
Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos
6-month research stipend for Georgios Kaklamanos, Walter Laurito, Kaarel Hänni and Kay Kozaronek to continue their SERI-MATS project on expanding the “Discovering Latent Knowledge” paper
$167,480
March 2023
Matt MacDermott
3 month research stipend for SERI MATS extension on agent foundations research
$24,000
March 2023
Max Kaufmann
9 months of funding for an early-career alignment researcher to work with Owain Evans and others
$45,000
March 2023
Anonymous
40 hours of research stipend for researchers to finish a paper on governing AI via compute
$1,200
March 2023
Robert Miles
Funding for additional fellows for the AISafety.info Distillation Fellowship, improving our single-point-of-access to AI safety
$54,962
March 2023
Alexander Turner
Funding Alexander Turner and team research project—Writing new motivations into a policy network by understanding and controlling its internal decision-influences
$115,411
March 2023
Anonymous
3-months stipend for upskilling in ML to transition from mathematics (at PhD level) to AI safety work. During the grant period, project goals include replicating an interpretability paper with longer term goals of publishing project write-ups.
$5,300
March 2023
Anonymous
3-4 month salary to help setup a new division at a US think tank doing AI governance research
$26,800
March 2023
Anonymous
2-month living expenses while waiting to join a US think tank
$12,000
March 2023
Andrey Tumas
4-month research stipend for conceptual/theoretical research towards perfect world-model interpretability.
$30,000
March 2023
Nora Ammann
Funding for PIBBSS research fellowship to host 6 additional fellows
$100,000
March 2023
David Staley
Support to maintain a copy of the alignment research dataset etc in the Arctic World Archive for 5 years
$3,000
March 2023
Wesley Fenza
One-year funding of Astral Codex Ten meetup in Philadelphia
$5,000
March 2023
Matthew MacInnes
8 months support to test fit for social scientific research related to AI governance, preparing for MPhil proposal.
$9,000
March 2023
Anonymous
3-months funding for upskilling in AI Safety and research on hardware-enabled mechanisms for AI Governance.
$48,000
March 2023
Anonymous
Support for PhD Track in Health and Security at a top US university
$9,800
March 2023
Nicholas Kees Dupuis
12-month research stipend to continue developing research agenda on new ways to make LLMs directly useful for alignment research without advancing capabilities
$120,000
March 2023
Anonymous
Scholarship for taking the Offsec Certified Professional (OSCP) certification—the industry leading Penetration Testing with Kali Linux course and online lab before taking the OSCP certification exam.
$2,000
March 2023
Jingyi Wang
Organising OPTIC: in-person, intercollegiate forecasting tournament. Boston, Apr 22. Funding is for prizes, venue, etc.
$2,100
March 2023
Rusheb Shah
6 months research stipend to upskill on technical AI safety through collaboration with researchers and self-study.
$50,000
March 2023
Alfred Harwood
6-month research stipend to research geometric rationality, ergodicity economics and their applications to decision theory and AI
$11,000
April 2023
Alexander Turner
Year-long research stipend for shard theory and RL mech int research
$220,000
April 2023
Said Achmiz
1 year support for developing and maintaining projects/resources used by the EA and rationality communities
$60,000
April 2023
Skyler Crossman
Support Astral Codex Ten Everywhere meetups
$22,000
April 2023
Vanessa Kosoy
2-year research stipend for work on the learning-theoretic AI alignment research agenda
$100,000
April 2023
Robert Long
Support participants in a workshop on the science of consciousness and current and near-term AI systems
$10,840
April 2023
Mateusz Bagiński
6-month research stipend to up skill in maths, ML and AI alignment as well as working on non-profit projects beneficial for AI safety in pursuit of a research career.
$14,136
April 2023
Quentin Feuillade—Montixi
Funding for Quentin Feuillade-Montixi’s 4 month SERI MATS extension in London, mentored by Janus and Nicholas Kees Dupuis to work on cyborgism
$32,000
April 2023
Anonymous
3 month research stipend for independent research into and articles on large language models, agent foundations, and AI alignment
$14,019
April 2023
Smitha Milli
Support to participate in the Symposium on AGI Safety at Oxford
$1,500
April 2023
Anonymous
6-month research stipend and course funding to upskill in AI safety before entering the Civil Service Fast Stream in September 2023 (Data & Tech)
$14,488
April 2023
Anonymous
Support for independent projects & upskilling for AI safety work
$18,000
April 2023
Sage Bergerson
5-month part time research stipend for collaborating on a research paper analysing the implications of compute access
$2,500
April 2023
Iván Godoy
6-month research stipend to dedicate full-time to upskilling/AI alignment research tentatively focused on agent foundations and start a MIRIx group in Buenos Aires.
$6,000
April 2023
Naoya Okamoto
Support for Mathematics of Machine Learning course offered by the University of Illinois at Urbana-Champaign.
$7,500
April 2023
Joshua Reiners
4-month research stipend to work on a project finding the most interpretable directions in gpt2-small’s early residual stream to better understand contemporary AI systems
$16,300
April 2023
Appendix: How we set grant and stipend amounts
(Our legal team requested that we include this section; it was written by Caleb Parkih.)
Over the last year, we have directed a significant portion of our grants toward supporting individuals in the field of AI safety research. When compared to much of the non-profit sector, some of our grants may seem large. However, I believe there are strong justifications for this approach.
Our grantees often have excellent earning potential
Our grantees often exhibit extraordinary earning potential due to their skills and qualifications. Many of them are excellent researchers (or have the potential to become one in a few years) and could easily take jobs in big tech or finance, and some could command high salaries (over $400k/year) while conducting similar research at AI labs. I expect that offering lower grants would push some grantees to take higher-earning options in private industry, creating less altruistic value. My impression is that our grants are not larger than comparable grants or salaries offered by many established AI safety organizations. In fact, I anticipate our grants are likely lower.
Grants have substantive downsides relative to working in an organisation
Grants, while helpful, do have some drawbacks compared to conventional employment. We do not provide additional benefits often found in organizations, such as health insurance, office spaces, or operations support, and our stipends often offer less financial security than full-time employment. Often, a portion of a grant is designed to support grantees’ operational and living expenses while they pursue their research projects.
Generally, we expect our grantees to work full-time on their projects, with similar intensity to the work they’d do at other organizations within EA and AI safety, and we structure our grants to account for this amount of work. There are of course, benefits such as our grantees having more flexibility than they would in many organizations.
How we decide on personal stipend size
The fund operates as a collection of fund managers who sometimes have differing views on how much to fund a grantee for.
Our general process is:
The fund manager assigned to a grant reviews the budget provided by the grantee and makes adjustments based on their understanding of the grant, the market rate for similar work and other factors.
The grant size is then reviewed by the fund chair (Asya Bergal) and the director of EA Funds (Caleb Parikh).
One heuristic we commonly use (especially for new, unproven grantees) is to offer roughly 70% of what we anticipate the grantee would earn in an industry role. We want to compensate people fairly and allow them to transition to impactful work without making huge sacrifices, while conserving our funding and discouraging grifters. A relatively common procedure for fund managers to use to decide how much to fund a grantee (assuming a fund manager has already decided they’re overall worth funding), is to:
Calculate what we expect the grantee would earn for similar work in an industry role (in the location they’re planning on performing the grant activity).
Look at the amount of funding the applicant has requested, and see if that amount differs significantly from 70% of their industry salary.
If it doesn’t differ significantly, make the grant with the requested number.
If it does differ significantly, consider adjusting the grant upwards or downwards, taking into account other factors that would affect what an appropriate funding ask would be, e.g. their pre-existing track record. (We’re more likely to adjust a grant downwards if we think the requested amount is too high, than upwards if we think the requested amount is too low).
Appendix: Eligibility criteria for LTFF grants
(Our legal team requested that we include this section; it was written by Caleb Parikh.)
Career Stage: Our interest lies in assisting grantees who are at the beginning of their careers, are contemplating a career shift towards an area of higher impact, or have accumulated several years of experience in their respective fields.
Demonstrated Skills: We require that prospective grantees exhibit evidence of possessing the skills necessary for the type of work or study they plan to undertake. This evidence could come from previous experiences, credentials, or a particularly remarkable application.
Generally, our grants fulfil one of the following additional criteria:
High-Impact Projects: The central aim of the Long-Term Future Fund is to improve humanity’s odds of a long and flourishing future. We assess proposed projects based on their potential to contribute to this goal. However, it is not mandatory for grantees to share this specific objective or to be entirely focused on improving the long-term future.
Empowering people pursuing impactful work: Grants related to career support (e.g. travel grants for conferences, scholarships for online courses, or funding to allow time for skill development) can enable grantees to increase their positive impact over the course of their careers. Grantees should demonstrate a strong interest in a priority area for the long-term future, such as biosecurity or mitigating risks from advanced AI. This could be evidenced by past experiences, credentials, or an application that shows familiarity with the field they intend to study.
Appendix: Special note on upskilling grants
(Our legal team requested that we include this section.)
One of LTFF’s overall charitable purposes is to encourage qualified and thoughtful individuals to think about and find solutions for global catastrophic risks, such as advanced artificial intelligence. We do this by funding such individuals to research issues like AI alignment so that they become more knowledgeable in and/or potentially change their career path to fully invest in these issues.
Long-Term Future Fund: April 2023 grant recommendations
Introduction
This payout report is meant to cover the Long-Term Future Fund’s grantmaking starting January 2022 (after our December 2021 payout report), going through April 2023 (1 January 2022 − 30 April 2023).
Total funding recommended: $13.0M
Total funding paid out: $12.16M
Number of grants paid out: 327
Acceptance rate (excluding desk rejections): 50.0%
Acceptance rate (including desk rejections): 37.4%
Report authors: Asya Bergal (chair), Linchuan Zhang, Oliver Habryka, Caleb Parikh, Thomas Larsen, Matthew Graves
52 of our grantees, worth $1.41M, requested that we not include public reports for their grants. (You can read our policy on public reporting here.) We referred 2 grants to other funders for evaluation ($0.501M). Our median response time over this period was 29 days.
The rest of our grants are listed below (either in long or short form), as well as in our public grants database.
If you’re interested in receiving funding from the Long-Term Future Fund, apply here.
(Note: The initial sections of this post were written by me, Asya Bergal.)
Other updates
We’ve had a substantial increase in applications since 2021-- we averaged 35 applications per month in the latter half of 2021, 69 applications per month in 2022, and 90 applications per month so far in 2023.
Our funding bar went up at the end of 2022, in response to a decrease in the overall funding available to long-term future-focused projects. If we assume our numerical ratings are consistent, then applying our new bar to our earlier 2022 funding would imply not having funded 28% of earlier grants.
We’re looking for more funding. We’ve spent an average of ~$1M per month across March, April, and May 2023 to maintain our current bar, have $992,870.53 in reserves as of July 3, and are ideally looking to fundraise at least $10M for the coming year.
As described in this post, we’re trying to increase our independence from Open Philanthropy, which provided ~45% of our funding in 2022. As a transitional measure, over the next 6 months, Open Philanthropy will be matching funding given to the Long-Term Future Fund by small donors 2:1, for up to $3.5M total, making now a particularly good time to donate. Donate here. (The Long-Term Future Fund is part of EA Funds, which is a fiscally sponsored project of Effective Ventures Foundation (UK) (EV UK) and Effective Ventures Foundation USA Inc. (EV US). Donations to the Long-Term Future Fund are donations to EV US or EV UK.)
As a temporary measure in response to uncertainty about our future funding levels, we’ve put the bottom ~40% of grants above our current funding bar on hold. I think we’ll make several of those grants after this round of fundraising is over, but I generally expect our funding bar to vary more over time and to depend more on individual donations than it has historically.
I will be stepping down as chair of the fund by the end of October (and potentially earlier)-- I’ve written some reflections on my time on the fund here. We’re looking for additional fund managers (including potential chair candidates)-- express interest here.
The fund’s current fund managers are me (Asya Bergal), Linchuan Zhang, Oliver Habryka, and Caleb Parikh as permanent fund managers, and Thomas Larsen, Daniel Eth, Matthew Gray, Lauro Langosco, and Clara Collier as guest managers.
Our legal team asked us to highlight the eligibility criteria for our grants, which you can find in the appendices.
Highlights
Our grants include:
$316,000 in June 2022 to support SERI MATS, an 8-week scholar program that pairs promising alignment researchers with mentors in the alignment field.
$72,000 in July 2022 for a research and networking retreat for winners of the Eliciting Latent Knowledge contest.
$200,000 in February 2022 to support Stephen Grugett, James Grugett, and Austin Chen for 4 months to build a forecasting platform (Manifold Markets) based on user-created play-money prediction markets.
Payout reports
Longer grant write-ups
Grants evaluated by Linchuan Zhang
Stephen Grugett, James Grugett, Austin Chen ($200,000): 4 month stipend for 3 FTE to build a forecasting platform made available to the public based on user-created play-money prediction markets
March 2022 Notes by Linch Zhang: This was my first substantive grant investigation. At the time, I felt shaky about it, but now I feel really good about it. The two main reasons I originally recommended this grant:
1. It was an investment into the people who wanted to do EA work – getting 3 ~Google-quality engineers to do more EA/longtermist work (as opposed to counterfactuals that were earning to give or worse) seems well worth it at 200k.
2. It was an investment into the team specifically. Having cohesive software teams seems like an important component for EA becoming formidable in the future, and is somewhat (surprisingly to me) missing in EA, especially outside of AI safety and crypto trading. I heard really good things about Manifold from early users, and they appeared to be developing at a speed that blew other software projects in forecasting (Metaculus, Foretold, Cultivate, Hypermind, etc) out of the water.
At the time, it was not an investment into the prediction market itself/theory of change with regards to play-money prediction markets broadly, because the above two factors were sufficient to be decisive.
At the time, it was also unclear whether they plan to go the for-profit route or the nonprofit route.
They’ve since decided to go the for-profit route.
Looking back, still too soon to be sure, but it looks like Manifold is going quite well. Continue to develop features at phenomenal speeds, lots of EAs and others in adjacent communities use the product, team is still producing fast and are excited for the future.
From an “investment into team” perspective, I think Manifold now plausibly has the strongest software team in EA outside of AI safety and earning-to-give (not that I’d necessarily have enough visibility to know of all the potentially better teams, especially stealth ones).
I have a number of disjunctive ToCs for how Manifold (and forecasting in general) can over time make the future better, some of which is implicitly covered here.
Though I am still uncertain about whether this particular project is the best use of the cofounders + team’s time...a lot of the evidence I have to observe this is more an update on the team’s overall skill + cohesiveness rather than an update about their comparative advantage for prediction markets specifically.
Addendum June 2023:
I’ve grown more confused about the total impact or value of this grant. On the one hand, I think Manifold is performing at or moderately above expectations in terms of having a cohesive team that’s executing quickly, and many people in the community appear to find their product useful or at least interesting. On the other hand, the a) zero-interest-rate environment and corresponding high startup evaluations when I recommended this grant has ended in early 2022, and b) recent events have reduced a substantial fraction of EA funding, which meant 200K is arguably much more costly now than a year ago.
Still, I think I’m broadly glad to have Manifold in our ecosystem. I think they’re very helpful for people in our and adjacent communities in training epistemics, and I’m excited to see them branch out into experiments in regranting and retroactive funding projects; from a first-principles perspective, it’d be quite surprising if the current status of EA grantmaking is sufficiently close to optimal.
Solomon Sia ($71,000): 6-month stipend for providing consultation and recommendations on changes to the US regulatory environment for prediction markets.
Solomon Sia wants to talk to a range of advisers, including industry experts, users, and contacts at the CFTC, to see if there are good improvements in ways to regulate prediction markets in the US, while simultaneously protecting users and reducing regulatory risk and friction.
This was an exploratory grant for seeing how it’s possible to improve the US regulatory environment for prediction markets with a resulting written report provided to EA Funds.
I think this is a reasonable/great option to explore:
I think my position on prediction markets is somewhat more cynical than that of most EAs in the forecasting space, but still, I’m broadly in favor of them and think they can be a critical epistemic intervention, both for uncovering new information and for legibility/common knowledge reasons.
It seemed quite plausible to me that the uncertain regulatory environment for prediction markets in the US is impeding the growth of large real-money prediction markets on questions that matter.
Solomon seemed unusually competent and knowledgeable about the tech regulations space, a skillset very few EAs have.
Cultivating this skillset and having him think about EA issues seemed valuable.
A potential new caveat is that in 2023 as AI risk worries heat up, it seems increasingly likely that we might be able to draw from a diverse skillset of experienced and newly interested/worried people.
The for-profit motivations for this work are there but not very large, as unless a company is trying very hard to do specific regulatory capture for their company (which is bad and also practically very difficult), easing prediction market regulations has collective benefits and individual costs.
(weakly held) I thought trying to nail this during the Biden administration is good because it seemed plausible that the current CFTC will be more predisposed to liking prediction markets than average for the CFTC.
One interesting update is that EA connections are likely a mild plus in 2022, and a moderate liability in 2023.
NB: Solomon and his collaborator think a) that the EA connection is still a mild to moderate positive b) it’s now unclear whether the Biden administration is better or worse than a counterfactual Republican administration.
I’ve thought about this grant some afterwards, and I think even with the benefit of hindsight, I’m still a bit confused about how happy I should be about this grant ex-post.
One thing is that I’ve grown a bit more confused about the output and tractability of interventions in this domain.
The successes(?) Kalshi had confused me and I haven’t had enough time to integrate this into my worldview.
My current impression is that CFTC is fairly open to informed opinions from others on this matter.
I continue to believe it’s a good grant ex-ante.
Grants evaluated by Oliver Habryka
Alexander Turner ($220,000): Year-long stipend for shard theory and RL mechanistic interpretability research
This grant has been approved but has not been paid out at the time of writing.
We’ve made grants to Alex to pursue AI Alignment research before:
2019: Building towards a “Limited Agent Foundations” thesis on mild optimization and corrigibility
2020: Understanding when and why proposed AI designs seek power over their environment ($30,000)
2021: Alexander Turner—Formalizing the side effect avoidance problem ($30,000)
2022: Alexander Turner − 12-month stipend supplement for CHAI research fellowship ($31,500)
We also made another grant in 2023 to a team led by Alex Turner for their post on steering vectors for $115,411 (total includes payment to 5 team members, including, without limitation, travel expenses, office space, and stipends).
This grant is an additional grant to Alex, this time covering his full-time stipend for a year to do more research in AI Alignment.
Only the first one has a public grant write-up, and the reasoning and motivation behind all of these grants is pretty similar, so I will try to explain the reasoning behind all of them here.
As is frequently the case with grants I evaluate in the space of AI Alignment, I disagree on an inside-view level pretty strongly with the direction of the research that Alex has been pursuing for most of his AI Alignment career. Historically I have been, on my inside-view, pretty unexcited about Alex’s work on formalizing power-seekingness, and also feel not that excited about his work on shard theory. Nevertheless, I think these are probably among the best grants the LTFF has made in recent years.
The basic reasoning here is that despite me not feeling that excited about the research directions Alex keeps choosing, within the direction he has chosen, Alex has done quite high-quality work, and also seems to often have interesting and useful contributions in online discussions and private conversations. I also find his work particularly interesting, since I think that within a broad approach I often expected to be fruitless, Alex has produced more interesting insight than I expected. This in itself has made me more interested in further supporting Alex, since someone producing work that shows that I was at least partially wrong about a research direction being not very promising is more important to incentivize than work whose effects I am pretty certain of.
I would like to go into more detail on my models of how Alex’s research has updated me, and why I think it has been high quality, but I sadly don’t have the space or time here to go into that much depth. In-short, the more recent steering vector work seems like the kind of “obvious thing to try that could maybe help” that I would really like to saturate with work happening in the field, and the work on formalizing power-seeking theorems is also the kind of stuff that seems worth having done, though I do pretty deeply regret the overly academic/formal presentation which has somewhat continuously caused people to overinterpret the strength of its results (which Alex also seems to have regretted, and is also a pattern I have frequently observed in academic work that was substantially motivated by trying to “legitimize the field”).
Another aspect of this grant that I expect to have somewhat wide-ranging consequences is the stipend level we set on. Some basic principles that have lead me to suggest this stipend level:
I have been using the anchor of “industry stipend minus 30%” as a useful heuristic for setting stipend levels for LTFF grants. The goal in that heuristic was to find a relatively objective standard that would allow grantees to think about stipend expectations on their own without requiring a lot of back and forth, while hitting a middle ground in the incentive landscape between salaries being so low that lots of top talent would just go into industry instead of doing impactful work, and avoiding grifter problems with people asking for LTFF grants because they expect they will receive less supervision and can probably get away without a ton of legible progress.
In general I think self-employed salaries should be ~20-40% higher, to account for additional costs like health insurance, payroll taxes, administration overhead, and other things that an employer often takes care of.
I have been rethinking stipend policies, as I am sure many people in the EA community have been since the collapse of FTX, and I haven’t made up my mind on the right principles here. It does seem like a pretty enormous number of good projects are no longer having the funding to operate at their previous stipend levels, and it’s plausible to me that we should take the hit, lose out on a bunch of talent, and reduce stipend levels to a substantially lower level again to be more capable of handling funding shocks. But I am really uncertain on this, and at least in the space of AI Alignment, I can imagine the recent rise to prominence of AI Risk concerns could potentially alleviate funding shortfalls (or it could increase competition by having more talent flow into the space, which could reduce wages, which would also be great).
See the Stipend Appendix below, “How we set grant and stipend amounts”, for more information on EA Funds’ determination of grant and stipend amounts.
Vanessa Kosoy ($100,000): Working on the learning-theoretic AI alignment research agenda
This is a grant to cover half of Vanessa’s stipend for two years (the other half being paid by MIRI). We also made another grant to Vanessa in Q4 2020 for a similar amount.
My model of the quality of Vanessa’s work is primarily indirect, having engaged relatively little with the central learning-theoretic agenda that Vanessa has worked on. The work is also quite technically dense, and I haven’t found anyone else who could explain the work to me in a relatively straightforward way (though I have heard that Daniel Filan’s AXRP podcast with Vanessa is a better way to get started than previous material, though it hadn’t been published when I was evaluating this grant).
I did receive a decent number of positive references for Vanessa’s work, and I have seen her make contributions to other conversations online that struck me as indicative of a pretty deep understanding of the AI Alignment problem.
If I had to guess at the effects of this kind of work, though I should clarify I am substantially deferring to other people here in a way that makes me not particularly trust my specific predictions, I expect that the primary effect would be that the kind of inquiry Vanessa is pursuing highlights important confusions and mistaken assumptions in how we expect machine intelligence to work, which when resolved, will make researchers better at navigating the very large space of potential alignment approaches. I would broadly put this in the category of “Deconfusion Research”.
Vanessa’s research resulted in various public blog posts, which can be found here.
Skyler Crossman ($22,000): Support for Astral Codex Ten Everywhere meetups
Especially since the collapse of FTX, I am quite interested in further diversifying the set of communities that are working on things I think are important to the future. AstralCodexTen and SlateStarCodex meetups seem among the best candidates for creating additional thriving communities with overlapping, but still substantially different norms.
I do feel currently quite confused about what a good relationship between adjacent communities like this and Effective Altruism-labeled funders like the Long Term Future Fund should be. Many of these meetups do not aim to do as much as good as possible, or have much of an ambitious aim to affect the long term future of humanity, and I think pressures in that direction would likely be more harmful than helpful, by introducing various incentives for deception and potentially preventing healthy local communities from forming by creating a misaligned relationship between the organizers (who are paid by EA institutions to produce as much talent for longtermist priorities) and the members (who are interested in learning cool things about rationality and the world and want to meet other people with similar interests).
Since this is a relatively small grant, I didn’t really resolve this confusion, and mostly decided to just go ahead with this. I also talked a bunch to Skyler about this, and currently think we can figure out a good relationship into the future on how it’s best to distribute funding like this, and I expect to think more about this in the coming weeks.
Grants evaluated by Asya Bergal
Any views expressed below are my personal views, and not the views of my employer, Open Philanthropy. (In particular, getting funding from the Long-Term Future Fund should not be read as an indication that the applicant has a greater chance of receiving funding from Open Philanthropy, and not receiving funding from the Long-Term Future Fund [or any risks and reservations noted in the public payout report] should not be read as an indication that the applicant has a smaller chance of receiving funding from Open Philanthropy.)
Alignment Research Center $54,543: Support for a research & networking event for winners of the Eliciting Latent Knowledge contest
This was funding a research & networking event for the winners of the Eliciting Latent Knowledge contest run in early 2022; the plan for the event was mainly for it to be participant-led, with participants sharing what they were working on and connecting with others, along with professional alignment researchers visiting to share their own work with participants.
I think the case for this grant is pretty straightforward: the winners of this contest are (presumably) selected for being unusually likely to be able to contribute to problems in AI alignment, and retreats, especially those involving interactions with professionals in the space, have a strong track record of getting people more involved with this work.
Daniel Filan ($23,544): Funding to produce 12 more AXRP episodes, the AI X-risk Podcast.
We recommended a grant of $23,544 to pay Daniel Filan for his time making 12 additional episodes of the AI X-risk Research Podcast (AXRP), as well as the costs of hosting, editing, and transcription.
The reasoning behind this grant was similar to the reasoning behind my last grant to AXRP:
I’ve listened or read through several episodes of the podcast; I thought Daniel asked good questions and got researchers to talk about interesting parts of their work. I think having researchers talk about their work informally can provide value not provided by papers (and to a lesser extent, not provided by blog posts). In particular:
I’ve personally found that talks by researchers can help me understand their research better than reading their academic papers (e.g. Jared Kaplan’s talk about his scaling laws paper). This effect seems to have also held for at least one listener of Daniel’s podcast.
Informal conversations can expose motivations for the research and relative confidence level in conclusions better than published work.
Daniel also shared some survey data in his grant application about how people rated AXRP compared to other AI alignment resources, though I didn’t look at this closely when making the grant decision, as I already had a reasonably strong prior towards funding.
Grants evaluated by Caleb Parikh
Conjecture ($72,827): Funding for a 2-day workshop to connect alignment researchers from the US, UK, and AI researchers and entrepreneurs from Japan.
Conjecture applied for funding to host a two day AI safety workshop in Japan in collaboration with Araya (a Japanese AI company). They planned to invite around 40 people, with half of the attendees being AI researchers, and half being alignments researchers from the US and UK. Japanese researchers were generally senior, leading labs, holding postdoc positions in academia, or holding senior technical positions at tech companies.
To my knowledge, there has been very little AI safety outreach conducted amongst strong academic communities in Asia (e.g. in Japan, Singapore, South Korea …). On the current margin, I am excited about more outreach being done in these countries within ultra-high talent groups. The theory of change for the grant seemed fairly straightforward: encourage talented researchers who are currently working in some area of AI to work on AI safety, and foster collaborations between them and the existing alignment community.
Conjecture shared the invite list with me ahead of the event, and I felt good about the set of alignment researchers invited from the UK and US. I looked into the Japanese researchers briefly, but I found it harder to gauge the quality of invites given my lack of familiarity with the Japanese AI scene. I also trust Conjecture to execute operationally competently on events of this type, having assisted other AI safety organisations (such as SERI MATS) in the past.
On the other hand, I have had some concerns about Conjecture, and I felt confused about whether this conference gave Conjecture more influence in ways that I would feel concerned about given the questionable integrity and judgement of their CEO, - see this and this section of a critique of their organisation (though note that I don’t necessarily endorse the rest of the post). It was also unclear to me how counterfactual the grant was, and how this traded off against activities that I would be less excited to see Conjecture run. I think this is a general issue with funding projects at organisations with flexible funding, as organisations are incentivised to present their most fundable projects (which they are also the most excited about), and then in cases where the funding request is successful, move funding that they would have spent on this projects to other lower impact projects. Overall, I modelled making this grant as being about a quarter as cost-effective as it might have been without these considerations (though I don’t claim this discount factor to be particularly reliable).
Overall, I thought this grant was pretty interesting, and I think that the ex-ante case for it was pretty solid. I haven’t reviewed the outcomes of this grant yet, but I look forward to reviewing and potentially making more grants in this area.
Update: Conjecture kindly directed me towards this retrospective and have informed me that some Japanese attendees of their conference are thinking of creating an alignment org.
SERI MATS program ($316,000): 8 weeks scholars program to pair promising alignment researchers with renowned mentors. (Originally evaluated by Asya Bergal)
SERI MATS is a program that helps established AI safety researchers find mentees. The program has grown substantially since we first provided funding, and now supports 15 mentors, but at the time, the mentors were Alex Gray, Beth Barnes, Evan Hubinger, John Wentworth, Leo Gao, Mark Xu, and Stuart Armstrong. Mentors took part in the program in Berkeley in a shared office space.
When SERI MATS was founded, there were very few opportunities for junior researchers to try out doing alignment research. Many opportunities were informal mentorship positions, sometimes set up through cold emails or after connecting at conferences. The program has generally received many more qualified applicants than they have places for, and the vast majority of fellows report a positive experience of the program. I also believe the program has substantially increased the number of alignment research mentorship positions available.
I think that SERI MATS is performing a vital role in building the talent pipeline for alignment research. I am a bit confused about why more organisations don’t offer larger internship programs so that the mentors can run their programs ‘in-house’. My best guess is that MATS is much better than most organisations running small internship programs for the first time, particularly in supporting their fellows holistically (often providing accommodation and putting significant effort into the MATS fellows community). One downside of the program relative to an internship at an organisation is that there are fewer natural routes to enter a managed position, though many fellows have gone on to receive LTFF grants for independent projects or continued their mentorship under the same mentor.
Robert Long ($10,840): travel funding for participants in a workshop on the science of consciousness and current and near-term AI systems
Please note this grant has been approved but at the time of writing it has not been paid out.
We funded Robert Long to run a workshop on the science of consciousness for current and near-term AI systems. Robert and his FHI colleague, Patrick Butlin, began the project on consciousness in near-term AI systems during their time at FHI, where they both worked in the digital minds research group. Since January of this year, Rob has been continuing the project while a philosophy fellow at CAIS. There are surprisingly few people investigating the consciousness of near-term AI systems, which I find pretty worrying given the rapid pace of progress in ML. I think that it’s plausible we end up creating many copies of AI systems and use them in ways that we’d consider immoral given enough reflection , in part due to ignorance about their preferences. The workshop aimed to produce a report applying current theories of consciousness (like integrated information theory and global workspace theory) to current ml systems.
I think that Rob is an excellent fit for this kind of work; he is one of the few people working in this area and has written quite a lot about AI consciousness on his blog. He has a PhD in philosophy from NYU, where he was advised by David Chalmers, and has experience running workshops (e.g. in 2020, he ran a workshop on philosophy and large language models with Amanda Askell).
Jeffrey Ladish ($98,000): 6-month stipend & operational expenses to start a cybersecurity & alignment risk assessment org
Please note this grant has been approved but at the time of writing it has not been paid out.
Jeffrey Ladish applied for funding to set up an organisation to do AI risk communications, with a focus on cybersecurity and alignment risks. His organisation, Palisade Research Inc., plans to conduct risk assessments and communicate those risks to the public, labs and the government. The theory of change is that communicating catastrophic risks to the public and key decision makers could increase political support for slowing down AI and other measures that might reduce AI risk. I am particularly excited about Jeffrey’s organisation demonstrating offensive AI cyber capabilities and other demos that help to communicate current risks from advanced AI systems.
I am pretty excited about Jeffrey’s organisation. He has worked on information security in various organisations (including Anthropic), he seems well-networked amongst people working in think tanks and AI labs, and I like his public writing on AI risk. I am generally sceptical of people doing work related to policy without having first worked in lower stakes positions in similar areas first, but I thought that Jeffrey was orienting to the downsides very reasonably and doing the sensible things, like developing plans with more experienced policy professionals.
Grants evaluated by Matthew Gray
Leap Laboratories ($195,000): One year of seed funding for a new AI interpretability research organisation.
Jessica Rumbelow applied for seed funding to set up an interpretability research organisation, which hopes to develop a model-agnostic interpretability engine.
I’m excited about this grant primarily based on the strength of research work she did with Matthew Watkins during SERI-MATS, discovering anomalous tokens like SolidGoldMagikarp.
I think trends in the AI development space suggest a need for model-agnostic methods.
More broadly, I think this showcases one of the primary benefits of interpretability research: it’s grounded in a way that makes it easy to verify and replicate.
Daniel Kokotajlo ($10,000): Funding for a research retreat on a decision-theory/cause-prioritisation topic.
We funded a research retreat run by Daniel Kokotajlo on Evidential Cooperation in Large Worlds. I think research retreats like this are both quite productive and quite cheap; we only have to pay for travel and housing costs, and the attendees are filtered on intrinsic interest in the topic.
Grants evaluated by Thomas Larsen
Kaarel Hänni, Kay Kozaronek, Walter Laurito, and Georgios Kaklmanos ($167,480): Implementing and expanding on the research methods of the “Discovering Latent Knowledge” paper.
This is a team which started in SERI MATS applying for funding to continue their SERI MATS project on research checking for dishonesty in advanced AI systems.
My cruxes for this type of grant are:
(1) If done successfully, would this project help with alignment?
(2) How likely is this team to be successful?
My thoughts on (1):
This is meant to build upon Burns’ et al.‘s Discovering Latent Knowledge paper (DLK), which finds a direction in activation space that is supposed to represent the ‘truth’ of a logical proposition.
I think that Eliciting Latent Knowledge (ELK) is an important subproblem of alignment, and I think it can be directly applied to combat deceptive alignment. My independent impression is that this specific direction towards solving ELK is not very useful towards a full alignment solution, but that it may lead to slightly better monitoring. (In particular, I think even in a good outcome, this will only lead to an average case solution to ELK, meaning that when we explicitly train against this detector, it will fail.) I expect that AGI projects will be in a position where it’s obvious that the systems they are building are capable and dangerous, and it will be apparent that instrumental incentives kick in for e.g. powerseeking and deception. I think that this technique might help us detect this danger, but given that we can’t train against it, it doesn’t let us actually fix the underlying problem. Thus, the lab will be in the difficult position of continuing on, or having to train against their detection system. I still think that incremental progress on detecting deception is good, because it can help push for a stop in capabilities growth before prematurely continuing to AGI.
My thoughts on (2):
They produced reasonable output during SERI MATS, including the beginning of a replication of the DLK paper. They weren’t that specific in their grant application, but they wrote a number of ideas for ways to extend the paper in the LW post. The two ideas that seem best to me are:
Connecting DLK to mechanistic interpretability. This seems hard, but maybe tinkering around in activation space can be helpful.
Creating a better confidence loss. In the original paper, only one statement was considered, and so the loss was coming from the constraint that P(q) + P(not q) = 1. They propose evaluating two propositions p & q, and getting more constraints from that.
These ideas don’t seem amazing, but they seem like reasonable things to try. I expect that the majority of the benefit will come from staring at the model internals and the results of the techniques and then iterating. I hope that this process will churn out more and better ideas.
One reservation I have is that none of the applicants have an established research track record, though they have published several papers:
- Kaarel’s Arxiv page
- Walter’s Google Scholar Profile
- Georgios’s ORCID
This team did get strong references from Colin Burns and John Wentworth, which makes me a lot more excited about the project. All things considered, I’m excited about giving this team a chance to work on this project, and see how they are doing. I’m also generally enthusiastic about teams trying their hand at alignment research.
Joseph Bloom ($50,000): Funding AI alignment research into circuits in decision transformers.
Joseph applied for independent research funding to continue his research into decision transformer interpretability. I’m happy about Joseph’s initial result, which found circuits in a decision transformer in a simple RL environment. I thought the applicant’s write up was solid and gave me some updates on what cognitive machinery I expect to be induced by RL. In particular, I was excited about the preference directions in embedding space that they constructed. This seems like a useful initial step for retargeting the search, though more understanding of the circuits that are doing the optimization seems critical for this approach.
I think interpretability on RL models is pretty neglected and very relevant for safety.
According to a reference, the applicant was also in the top 3 ARENA participants, and was very motivated and agentic.
The counterfactual is that Joseph tries to get funding elsewhere, and if that fails, getting a research engineer job at an AI safety org (e.g. Redwood, Conjecture, Ought, etc). I encouraged this person to apply to the AI safety orgs, as I think that working at an org is generally more productive than independent research. These jobs are quite competitive, so it’s likely that Joseph won’t get hired by any of them, and in this case, it seems great to pay him to do independent alignment research.
Overall, I think that Joseph is a promising researcher, and is working on a useful direction, so I feel excited about supporting this.
Since receiving this grant, Joseph has received some more funding (here), and was mentioned in the Anthropic May Update.
Other grants we made during this period
Applicant Name
Grant Summary
Awarded Amount
Decision Date
$50,000
$3,000
$50,000
$40,000
$13,000
$4,129
$2,000
$200,000
$30,103
$80,000
$9,900
$11,400
$34,500
$33,762
$32,568
$1,600
$15,000
$15,000
$10,800
$3,600
$250,000
$5,100
$5,625
$33,000
$45,000
$234,121
$70,000
$4,000
$68,750
$12,600
$1,905
$150,000
$5,000
$110,000
$55,000
$3,950
$7,000
$63,259
$25,000
$25,000
$5,000
$12,990
$401,537
$26,000
$20,000
$3,900
$36,000
$2,000
$25,000
$60,000
$14,300
$2,500
$27,800
$10,000
$125,000
$800
$5,250
$2,000
$3,000
$134,532
$87,000
$11,579
$10,000
$120,000
$80,000
$4,000
$20,000
$10,000
$1,400
$5,000
$50,000
$17,220
$78,000
$180,200
$60,000
$8,589
$84,000
$6,350
$4,099
$22,000
$5,000
$26,000
$30,000
$79,120
$15,000
$138,000
$3,640
$85,000
$70,000
$7,875
$3,500
$14,600
$316,000
$2,300
$27,000
$110,000
$43,600
$4,785
$100,000
$9,000
$4,400
$4,524
$8,150
$1,000
$6,557
$2,732
$47,074
$1,100
$139,000
$72,000
$11,000
$5,550
$26,077
$15,320
$16,000
$2,500
$3,000
$50,847
$35,625
$745
$2,000
$50,000
$20,815.20
$62,500
$4,500
$14,000
$8,333
$4,000
$6,000
$4,000
$5,200
$60,000
$180,000
$48,000
$50,000
$26,342
$25,000
$25,000
$40,000
$16,600
$1,800
$30,000
$33,000
$44,000
$50,000
$6,500
$80,000
$38,101
$20,000
$27,248
$10,000
$11,500
$35,000
$9,750
$100,000
$209,501
$1,500
$96,000
$62,040
$55,000
$3,500
$64,000
$40,000
$8,720
$50,000
$96,000
$7,645
$4,000
$25,383
$20,000
$40,250
$5,000
$15,478
$3,600
$24,000
$24,000
$28,160
$50,182
$65,000
$7,500
$12,321
$5,500
$1,500
$130,000
$54,250
$72,500
$8,000
$30,000
$23,544
$26,000
$10,000
$24,800
$20,000
$35,000
$69,940
$15,000
$2,305
$72,560
$5,613
$2,540
$17,000
$96,000
$10,000
$8,000
$2,500
$8,000
$5,000
$8,200
$30,000
$30,000
$72,827
$50,000
$10,000
$11,300
$11,560
$4,910
$15,000
$75,346
$7,000
$28,875
$4,528
$26,150
$4,300
$195,000
$3,618
$167,480
$24,000
$45,000
$1,200
$54,962
$115,411
$5,300
$26,800
$12,000
$30,000
$100,000
$3,000
$5,000
$9,000
$48,000
$9,800
$120,000
$2,000
$2,100
$50,000
$11,000
$220,000
$60,000
$22,000
$100,000
$10,840
$14,136
$32,000
$14,019
$1,500
$14,488
$18,000
$2,500
$6,000
$7,500
$16,300
Appendix: How we set grant and stipend amounts
(Our legal team requested that we include this section; it was written by Caleb Parkih.)
Over the last year, we have directed a significant portion of our grants toward supporting individuals in the field of AI safety research. When compared to much of the non-profit sector, some of our grants may seem large. However, I believe there are strong justifications for this approach.
Our grantees often have excellent earning potential
Our grantees often exhibit extraordinary earning potential due to their skills and qualifications. Many of them are excellent researchers (or have the potential to become one in a few years) and could easily take jobs in big tech or finance, and some could command high salaries (over $400k/year) while conducting similar research at AI labs. I expect that offering lower grants would push some grantees to take higher-earning options in private industry, creating less altruistic value. My impression is that our grants are not larger than comparable grants or salaries offered by many established AI safety organizations. In fact, I anticipate our grants are likely lower.
Grants have substantive downsides relative to working in an organisation
Grants, while helpful, do have some drawbacks compared to conventional employment. We do not provide additional benefits often found in organizations, such as health insurance, office spaces, or operations support, and our stipends often offer less financial security than full-time employment. Often, a portion of a grant is designed to support grantees’ operational and living expenses while they pursue their research projects.
Generally, we expect our grantees to work full-time on their projects, with similar intensity to the work they’d do at other organizations within EA and AI safety, and we structure our grants to account for this amount of work. There are of course, benefits such as our grantees having more flexibility than they would in many organizations.
How we decide on personal stipend size
The fund operates as a collection of fund managers who sometimes have differing views on how much to fund a grantee for.
Our general process is:
The fund manager assigned to a grant reviews the budget provided by the grantee and makes adjustments based on their understanding of the grant, the market rate for similar work and other factors.
The grant size is then reviewed by the fund chair (Asya Bergal) and the director of EA Funds (Caleb Parikh).
One heuristic we commonly use (especially for new, unproven grantees) is to offer roughly 70% of what we anticipate the grantee would earn in an industry role. We want to compensate people fairly and allow them to transition to impactful work without making huge sacrifices, while conserving our funding and discouraging grifters. A relatively common procedure for fund managers to use to decide how much to fund a grantee (assuming a fund manager has already decided they’re overall worth funding), is to:
Calculate what we expect the grantee would earn for similar work in an industry role (in the location they’re planning on performing the grant activity).
Look at the amount of funding the applicant has requested, and see if that amount differs significantly from 70% of their industry salary.
If it doesn’t differ significantly, make the grant with the requested number.
If it does differ significantly, consider adjusting the grant upwards or downwards, taking into account other factors that would affect what an appropriate funding ask would be, e.g. their pre-existing track record. (We’re more likely to adjust a grant downwards if we think the requested amount is too high, than upwards if we think the requested amount is too low).
Appendix: Eligibility criteria for LTFF grants
(Our legal team requested that we include this section; it was written by Caleb Parikh.)
Generally, our grants fulfil one of the following additional criteria:
Appendix: Special note on upskilling grants
(Our legal team requested that we include this section.)
One of LTFF’s overall charitable purposes is to encourage qualified and thoughtful individuals to think about and find solutions for global catastrophic risks, such as advanced artificial intelligence. We do this by funding such individuals to research issues like AI alignment so that they become more knowledgeable in and/or potentially change their career path to fully invest in these issues.