Long-Term Future Fund: March 2024 Payout recommendations

Introduction

This payout report covers the Long-Term Future Fund’s grantmaking from May 1 2023 to March 31 2024 (11 months). It follows our previous April 2023 payout report.

  • Total funding recommended: $6,290,550

  • Total funding paid out: $5,363,105

  • Number of grants paid out: 141

  • Acceptance rate (excluding desk rejections): 159672 = 23.7%

  • Acceptance rate (including desk rejections): 159825 = 19.3%

  • Report authors: Linchuan Zhang (primary author), Caleb Parikh (fund chair), Oliver Habryka, Lawrence Chan, Clara Collier, Daniel Eth, Lauro Langosco, Thomas Larsen, Eli Lifland

25 of our grantees, who received a total of $790,251, requested that our public reports for their grants are anonymized (the table below includes those grants). 13 grantees, who received a total of $529, 819, requested that we not include public reports for their grants. You can read our policy on public reporting here.

We referred at least 2 grants to other funders for evaluation.

Highlighted Grants

(The following grants writeups were written by me, Linch Zhang. They were reviewed by the primary investigators of each grant).

Below, we highlighted some grants that we thought were interesting and covered a relatively wide scope of LTFF’s activities. We hope that reading the highlighted grants can help donors make more informed decisions about whether to donate to LTFF.[1]

Gabriel Mukobi ($40,680) 9-month university tuition support for technical AI safety research focused on empowering AI governance interventions

The Long-Term Future Fund provided a $40,680 grant to Gabriel Mukobi from September 2023 to June 2024, originally for 9 months of university tuition support. The grant enabled Gabe to pursue his master’s program in Computer Science at Stanford, with a focus on technical AI governance.

Several factors favored funding Gabe, including his strong academic background (4.0 GPA in Stanford CS undergrad with 6 graduate-level courses), experience in difficult technical AI alignment internships (e.g., at the Krueger lab), and leadership skills demonstrated by starting and leading the Stanford AI alignment group. However, some fund managers were skeptical about the specific proposed technical research directions, although this was not considered critical for a skill-building and career-development grant. The fund managers also had some uncertainty about the overall value of funding Master’s degrees.

Ultimately, the fund managers compared Gabe to marginal MATS graduates and concluded that funding him was favorable. They believed Gabe was better at independently generating strategic directions and being self-motivated for his work, compared to the median MATS graduate. They also considered the downside risks and personal costs of being a Master’s student to be lower than those of independent research, as academia tends to provide more social support and mental health safeguards, especially for Master’s degrees (compared to PhDs). Additionally, Gabe’s familiarity with Stanford from his undergraduate studies was seen as beneficial on that axis. The fund managers also recognized the value of a Master’s degree credential for several potential career paths, such as pursuing a PhD or working in policy. However, a caveat is that Gabe might have less direct mentorship relevant to alignment compared to MATS extension grantees.

Outcomes: In a recent progress report, Gabe noted that the grant allowed him to dedicate more time to schoolwork and research instead of taking on part-time jobs. He produced several new publications that received favorable media coverage and was accepted to 4 out of 6 PhD programs he applied to. The grant also allowed him to finish graduating in March instead of June. Due to his early graduation, Gabe will not need to use the entire granted amount, saving us money.

Joshua Clymer ($1,500) - Compute funds for a research paper introducing an instruction-following generalization benchmark

The Long-Term Future Fund provided a $1,500 grant to Joshua Clymer for compute funds to rent A100 GPUs for experiments on instruction-following generalization. Although Clymer had previously worked on AI safety field-building and communications, this was his first technical AI safety project.

The fund was interested in whether models trained within a specific distribution can generalize “correctly” out of distribution, such as on benchmarks like TruthfulQA, instead of learning the idiosyncrasies of human evaluators. More broadly, the fund wanted to ensure that models can faithfully follow human instructions even when operating in vastly different contexts from their training data. The grantee believed that training models on general instruction-following could be a plausible approach to aligning AI systems with insufficiently specified rewards.

The proposal employed a method called “sandwiching,” where the model is trained on data from a non-expert who cannot evaluate the model’s performance, but the later evaluation is conducted by an expert. We were excited about the grant as it was cheap and tractable, while addressing an obvious difficulty in AI alignment, making it an attractive funding opportunity.

Outcomes: The paper and benchmark have been published, with the authors finding that reward models do not inherently learn to evaluate ‘instruction-following’ and instead favor personas that resemble internet text. In other words, models trained to follow instructions on easy tasks do not naturally follow instructions on hard tasks, even when we are fairly confident the model has the knowledge to answer the questions correctly. You can also check out Joshua’s thoughts on the project on the alignment forum.

Logan Smith ($40,000)6-month stipend to create language model (LM) tools to aid alignment research through feedback and content generation

The Long-Term Future Fund has been supporting Logan Smith for 2 years, providing a stipend of $40,000 every 6 months. Recently, Logan and their team published exciting mechanistic interpretability results where they used Sparse Autoencoders (SAEs) to find highly interpretable directions in language models. The fund has also supported Hoagy Cunningham, the first author of the linked paper, for the same work.

This work was quite similar to research later published by Anthropic, which generated significant excitement in the AI safety community (and was a precursor to the recent excitement of Golden Gate Claude). You may find it helpful to read this post comparing their two papers. I believe some of the independent researchers funded by the Long-Term Future Fund to work on SAEs were subsequently hired by Anthropic to continue their interpretability work at Anthropic.

Although I (Linch) am not an expert in the field, my impression is that the work on sparse autoencoders, both from independent researchers and Anthropic, represents some of the most meaningful advances in AI safety and interpretability in 2023.

Note: When an earlier private version of these notes was circulated, a senior figure in technical AI safety strongly contested my description. They believe that the Anthropic SAE work is much more valuable than the independent SAE work, as both were published around the same time, but the Anthropic work provides sufficient evidence to be worth extending by other researchers, whereas the independent research was not dispositive. I find these arguments plausible but not overwhelmingly convincing. Unfortunately, I lack the technical expertise to be well-equipped to form an accurate independent assessment on this matter.

Alignment Ecosystem Development ($99,330) − 1-year stipend for 1.25 FTEs to build and maintain digital infrastructure for the AI safety ecosystem, plus AI Safety-related domains and other expenses

The Long-Term Future Fund provided a $99,330 grant to Alignment Ecosystem Development (AED), an AI safety field-building nonprofit, through their fiscal sponsor, Ashgro Inc. The grant covered a 1-year stipend for 1.25 full-time equivalents (FTEs) to build and maintain digital infrastructure for the AI safety ecosystem, as well as expenses related to AI safety domains and other assorted costs. AED has built, taken ownership of, closely partnered with, and/​or maintained approximately 15 different projects, including the AI Safety Map (aisafety.world), AI Safety Info (aisafety.info), and AI Safety Quest (aisafety.quest), with the overarching objective of growing and improving the AI safety ecosystem. Bryce Robertson, a volunteer, currently works on the ecosystem full-time and would like to transition into a paid role. The grant allocated $66,000 for stipends and the remaining funds for various software and other expenses.

Fund managers had differing opinions on whether this grant represented a good use of marginal funds. The primary investigator strongly supported the grant, while two other fund managers who examined it in detail were not convinced that it met the fund’s high funding bar. Habryka, one of the fund managers, expressed excitement for the grant primarily because it continued the work that AED leadership had historically done in the AI safety space. Although not particularly enthusiastic about any individual project in the bundle, Habryka had heard of many people benefiting from AED’s support in various ways and held a prior belief that good infrastructure work in this space often involves providing a significant amount of illegible and diffuse support. For example, Habryka noted that plex from AED had been quite useful in moderating and assisting with Rob Miles’s Discord, which Habryka considered a valuable piece of community infrastructure, contributing positively to the impact of Rob Miles’s videos in informing people about AI safety.

Arguments in favor of the grant included the websites being useful pieces of community infrastructure that can help people orient towards AI safety and the historically surprising popularity of AED’s work. However, some fund managers raised concerns that AED might be spreading themselves too thin by working on many different projects rather than focusing on just one. They also questioned the quality of the outputs and worried that if newcomers’ first exposure to AI safety was through AED’s work, they might become unimpressed with the field as a result.

In accordance with the fund’s procedure, the project was put to a vote, and the grant ultimately passed the funding bar. At a more abstract level, the fund’s general policy or heuristic has been to lean towards funding when one fund manager is very excited about a grant, and other fund managers are more neutral. The underlying implicit model here is that individual excitement is more likely to identify grants with the potential for significant impact or “hits” in a hits-based giving framework, compared to grant decisions with consensus decisions where everybody is happy about a grant, but no one is actively thrilled. The underlying philosophy is explicated further in an earlier comment I wrote in January 2021, about a year before I joined LTFF.

Lisa Thiergart and Monte MacDiarmid ($40,000) - Conference publication of using activation addition for interpretability and steering of language models

The Long-Term Future Fund provided a $40,000 grant, administered by FAR AI, to Lisa Thiergart and Monte MacDiarmid for converting their research on activation addition for interpretability and steering of language models into a format suitable for academic audiences. At the time of funding, the key points of the research had already been completed and written up on LessWrong, including “Interpretability of a maze-solving network” and “Steering GPT2-XL using activation addition.” The grant enabled the team to work with external writers to translate their work for academic publication.

The fund managers identified several points in favor of this grant. Firstly, the research work itself was of high quality. Secondly, having the work accepted and critiqued by academic audiences seems valuable for both increasing the rigor of the alignment subfield and encouraging more mainstream machine learning researchers to work on alignment-related issues. Lastly, working with external authors can be a valuable experiment in creating a reliable pipeline for converting high-quality research (blog) posts into academic papers, without requiring significant time investment from senior alignment researchers, especially ones without significant prior academic ML experience. This approach could potentially provide many of the benefits of mainstream academic publication, at a lower cost than asking senior alignment researchers conform to academic norms.

However, the grant might come with nontrivial downside risks or (additional, non-monetary) costs. Work of this nature might have significant negative capabilities externalities for the world, as improved model steering could increase commercial viability and accelerate the development of more capable AI systems. Of course, this concern also applies to other commercially relevant alignment efforts, such as RLHF (Reinforcement Learning from Human Feedback) and Constitutional AI. I and other LTFF fund managers have frequently found it difficult to reason about whether to support positive alignment efforts that nonetheless have nontrivial capabilities externalities, and getting this decision right is still a work in progress. (My own guess is that we made the right call in this case)

Outcomes: The two preprints, “Activation Addition: Steering Language Models Without Optimization” and “Understanding and Controlling a Maze-Solving Policy Network,” are now available on arXiv. However, as of time of writing this payout report, I think they have not yet been formally published in a conference or journal.

Robert Miles ($121,575) − 1-year stipend + contractor and other expenses to continue his communications and outreach projects

The Long-Term Future Fund provided a grant of $121,575 to Robert Miles, administered through his fiscal sponsor, Ashgro Inc. The grant includes a $71,000 stipend for Rob and $50,000 for contractors and other expenses to support his work in producing YouTube videos, appearing on podcasts, helping researchers communicate their work, growing an online community to help newcomers get more involved, and building a large online FAQ at aisafety.info.

This grant continues the support we’ve given Rob in the past; we were his first significant funder and have been consistently happy with his progress. We believe Rob’s historical impact with his outreach projects has been surprisingly large. His videos are of very high quality, with high overall production value and message fidelity. My understanding is that many technical researchers report satisfaction with the way their ideas are presented. Additionally, his videos are popular, with a typical Rob Miles video garnering between 100,000 and 200,000 views. While this is significantly lower than top technical YouTube channels, it has gained more traction than almost any other technical AI safety outreach to date.

The grant is also fairly cheap relative to the historical impact. Rob requested a $71,000 stipend, which is substantially lower than both his counterfactual earnings and the pay of other individuals who are competent in both technical AI safety and communications.

However, we think Rob’s non-YouTube projects are less successful or impactful than his main YouTube channel. This is hard to definitively assess since much of it is based on private or hard-to-aggregate information, such as Rob privately advising researchers on how to communicate their work. Nonetheless, we (I) think it’s unlikely that the non-YouTube work is as valuable as the videos. That said, the primary investigator of the grant believes Rob’s non-YouTube work is still significantly more valuable than our marginal grants and thus worth funding.

Another point against the grant is that Rob’s YouTube channel productivity has been rather low, especially recently, with no new videos produced in the last year. However, he recently released a very long and high-quality video that, in addition to being a useful summary about many of the important events in AI (safety) over the last year, also goes into some detail about why he hasn’t produced as much content recently.

Anonymous ($17,000) - Top-up stipend for independent research to evaluate the security of new biotechnology advances, outline vulnerabilities, propose solutions, and gain buy-in from relevant stakeholders

I was the primary evaluator of this grant, which provided a top-up stipend to an anonymous grantee for independent research on the security of new biotechnology advances. Originally, we evaluated the grantee for a 1-year stipend, but during our evaluation period, they secured external funding for most of the grant period. The grantee asked for the difference as a top-up stipend, which we provided.

To estimate the impact of this grant, I thought the work was quite valuable but had to defer to our advisors more than I would have liked. I asked an academic in the field to provide frank feedback on the grant and double-checked their reasoning with an external, non-academic biosecurity researcher. Both were positive about both the angle of attack and the applicant’s fit for the role.

I believe this is an obviously important area to investigate within biosecurity (“big if true”), and I’m glad someone is looking into it. The applicant seemed fairly competent and impressive by conventional metrics, and advisors noted their unusually strong security mindset and background. We thought they could do a good job discreetly, which is hard to replicate with other funding arrangements (e.g., publish-or-perish in academia). We also thought the applicant was unusually suited for this work specifically, as the combination of biosecurity understanding and security mindset is rare. The applicant further seemed well-connected, which would otherwise be a major concern with independent work in threat analysis. Under some reasonable assumptions, I think this might have been LTFF’s highest impact biosecurity grant last year, or at least in the top 3.

However, I thought the downside risks of doing this work well were nontrivial, and I also had to defer more than I’d like to our advisors that it’s worth pushing ahead on. I don’t love giving anonymous grants and have updated slightly against them over the last year. I think the original intent of anonymous/​private grants (offering privacy for personally sensitive concerns or, less frequently, giving people an option to confidentially work on info-hazards) has been somewhat eroded. Plausibly, some of our grantees ask for anonymity just because having public funding sources can be annoying, which I think is understandable individually but creates a worse epistemic environment overall. However, please note that this is a personal update, and other fund managers may hold different views on anonymous grants (in an earlier draft, one fund manager specifically noted their disagreement with my update).

Unfortunately, both the grantee and I think it’s better not to share every detail of this grant. Nonetheless, my hope is that sharing some details about our anonymous grants is helpful for donors, so that the ~10% of our grants that are anonymous do not look completely like a black box to small donors and other community members.

Other updates

  • Our funding bar has been somewhat variable in the last year. It first went up at the end of 2022 in response to a decrease in the overall funding available to long-term future-focused projects, and then increased again a few times in response to liquidity issues within the Long-Term Future Fund itself, peaking at ~ September 2023.

  • There was a 2:1 donation matching from Open Phil. That matching was completely filled, thanks to our generous donors.

  • Thanks to the increase in funding, we were able to decrease our funding bar since our peak. We’re currently at around the early 2023 funding bar. I expect this bar to be relatively stable in the coming months compared to 2023, but I generally expect our funding bar to vary more over time and to depend more on individual donations than it has historically (in 2022 and earlier).

  • Thank you to everybody who donated to us. Your contributions are key in supporting projects that we think are very valuable for the world.

  • Longer-term, I’d like to seek out institutional funding and larger sources of individual funding, to help stabilize the fund and build out a longer runway.

  • We’ve distanced ourselves from Open Phil since Aug 2023, to help increase diversity of perspectives and increase funder and grantee independence.

  • We are in the process of spinning out of Effective Ventures, our fiscal sponsor.

  • I (Linch Zhang) have joined EA Funds full-time, with a focus on LTFF.

Other writings

Compared to past years, LTFF and its fund managers have become substantially more active in writing and public communications. Here are relevant writings since our last payout report period:

  • What Does a Marginal Grant at LTFF Look Like? Funding Priorities and Grantmaking Thresholds at the Long-Term Future Fund: A detailed discussion of “grantmaking thresholds” for marginal grants at LTFF. Essentially, given that we have limited resources and many good projects to fund, how do we choose which grants to make per $X we have? The post covers different projects we might want to fund at different thresholds ($X per 6 months).

  • Select examples of adverse selection in longtermist grantmaking: I reviewed my past experiences with “adverse selection” as a grantmaker, that is, situations where we choose to not fund a project that initially looked good, often due to surprising and private information.

  • LTFF and EAIF are unusually funding-constrained right now: Our fundraising post in September. We are in much less of a funding crunch now than we were in September, but the post may still be helpful for you to decide whether LTFF (or EAIF) are good donation targets relative to your next best alternative.

  • The Long-Term Future Fund is looking for a full-time fund chair: Our hiring post for LTFF fund chair. Mostly a historical curiosity now that we’re no longer looking at new applications, but community members may be interested in reading it to understand the responsibilities and day-to-day of work at LTFF.

  • Hypothetical grants that the Long-Term Future Fund narrowly rejected: A continuation of the marginal grants post, in that it’s a more narrow and tightly scoped list of hypothetical grants that are very close to our current funding bar. If you’re considering whether to fund LTFF or not, I think this post may be the best one in helping you decide what the most likely uses of your marginal dollars would end up actually funding.

    • Please note that while the post was framed as hypothetical grants the LTFF narrowly rejected, our funding bar has decreased some since the publication of that post. So grants just barely below our past bar should be just barely above our current bar. Going forwards, in the next few months, donors should think of that post as “hypothetical grants that LTFF will narrowly accept.

  • LessWrong comments discussion of whether longtermism or LTFF work has been net negative so far: A LW comments discussion of whether we should be worried about donating to LTFF, asking for assurances that LTFF will not fund net-negative work. Multiple fund managers offered their individual perspectives. Tl;dr: We are unfortunately unable to provide strong assurances. :/​ Doing robustly good work in a highly speculative domain is very difficult, and fund managers are not confident that we can always be sure our work is good.

    • I want to quickly note that the fund managers who commented (myself included) are not necessarily representative of LTFF, and my guess is we’re overall more negative/​uncertain than the fund’s median.

  • Lawrence Chan: What I would do if I wasn’t at ARC Evals: Lawrence Chan, a part-time guest fund manager at LTFF, discusses what he’d likely do if he wasn’t at ARC Evals (his day job). This might be relevant to community members considering career pivots in or into AI Safety/​x-risk reduction.

    • Lawrence, if you’re reading this, I hope you’d consider joining LTFF full-time! 😛

  • Caleb Parikh on AI consciousness: Caleb Parikh, Project Lead of EA Funds and interim LTFF fund chair, discusses why he thinks the broader community is underinvesting in research projects working on AI consciousness.

  • LTFF to be more stringent in evaluating mechanistic interpretability grants. We think funding and hiring in mechanistic interpretability is now less neglected outside of LTFF, thanks in large part due to recent advances in mechanistic interpretability (including from LTFF grantees!). So we’re increasing our bar for mechanistic interpretability grants, in part to help encourage technical AI safety work on other agendas.

    • Please note that this is a mild/​moderate technical change. We’re still broadly optimistic about mech interp work done and many of us will continue to encourage more people to work on it.

Appendix

Other Grants We Made During This Time Period

(The “grant purpose” of each grant are usually short descriptions submitted by grantees during application time, with occasional adjustments from fund managers if the scope has changed, or for clarity)

GranteeAmountGrant PurposeAward Date
Samuel Brown$82,29812-month stipend to research AI alignment, with a focus on technical approaches to Value Lock-in and minimal Paternalism. Findings will be published either as part of an academic collaboration or independently.May 2023
Jonathan Ng$32,6506-month stipend for Jonathan Ng to continue working on SERI MATS project expanding the “Discovering Latent Knowledge” paperMay 2023
Simon Lermen$13,0003-month stipend + compute expenses to study and publish on shutdown evasion in LLMs and to use LLMs as tools for alignmentMay 2023
Alex Infanger$19,2003-month stipend for upskilling in PyTorch and AI safety research and also running a virtual AGISF cohort for Successif.orgMay 2023
Anonymous$10,000Budget for an EA group to fund outstanding local or high-context grant opportunities which are urgent and/​or low cost but high EV.May 2023
Avery Griffin$32,0004-month stipend for two people to find formalisms for modularity in neural networksMay 2023
Lucius Bushnaq$32,0024-month stipend for two people to find formalisms for modularity in neural networksMay 2023
Anonymous$1,020Funding for productivity/​research related expensesJune 2023
Roman Leventov$6,6696-month stipend to continue developing as an AI safety researcher. Goals include writing a review paper about goal misgeneralisation from the perspective of Active Inference and pursuing collaborative projects on collective decision-making systems.June 2023
Artem Karpov$1,7396-month support for self study and development in ML and AI Safety. Goals include producing an academic paper while working on the “Inducing Human-Like Biases in Moral Reasoning LMs” project run by AI Safety Camp.June 2023
Wikiciv Foundation$16,000Funding for labor to expand content on Wikiciv.org, a wiki for rebuilding civilizational technology after a catastrophe. Project: writing instructions for recreating one critical technology in a post-disaster scenario.June 2023
Anonymous$5,000Funds for research expensesJune 2023
Benjamin Sturgeon$12,0004 month stipend for independent research and AIS field building in South Africa, including working with the AI Safety Hub to coordinate reading groups, research projects and hackathons to empower people to start working on research.June 2023
Michael Parker$40,000Project supportJune 2023
Anonymous$14,0004-month career transition grant to upskill in organization-building and community-building.June 2023
Anonymous$63,2426-month grant to pivot into AI alignment research. Goals include upskilling in linear algebra and probability theory, independent mechanistic interpretability research and working on an AI Safety Camp research project.June 2023
Abhijit Narayan$1,0004-month scholarship to support upskilling in technical AI alignment following either of the following programmes:June 2023
Amritanshu Prasad$7,9716-month Scholarship to support Amritanshu Prasad’s upskilling in technical AI alignment. Amritanshu will study the AGI Safety Fundamentals Alignment Curriculum and create an accessible and informative summary of the curriculum.June 2023
Patricio Vercesi$500Laptop stipend to study ML at university and AIS independently. Short-term goals include writing up and sharing thoughts on AIS strategies and current ecosystem. Long-term goals include pursuing employment /​ funding as an independent AIS researcher.June 2023
Bart Bussmann$9,0572-month stipend to test suitability for technical AI alignment research and identify a research direction. Project output includes writing up a reflection on this processJune 2023
Ben Stewart$3,1383-month (+buffer to prepare project reports) part-time stipend to upskill in biosecurity research and prioritization. Courses include Infectious Disease Modelling by Imperial College London. Projects include UChicago’s Market Shaping Accelerator challenge.June 2023
AI Safety Support Ltd.$50,0006-month stipend plus expenses for Jay Bailey to work on Joseph Bloom’s Decision Transformer interpretability project.June 2023
Guillaume Corlouer$6,800Funding research on understanding search in transformers at the AI Safety Camp. The AIS camp project is about figuring out if transformers are able to learn a search algorithm and whether we can steer such learned search algorithms to different goals.June 2023
Anamarija Kozina$9,8901 year of funding to help cover expenses of transferring to a MSc at TU Munich 2023/​2024, studying Mathematics with a minor in Informatics.July 2023
Anonymous$59,223A grant to support the growth of a YouTube channel that discusses key developments in AI, targeted to the general public.July 2023
Shoshannah Tekofsky$90,0006-month funding for 3 experiments (Enhance Selection, Coordinate Crowds & Model Interhuman Alignment) and write-ups that explore the potential of Collective Human Intelligence to accelerate progress on the alignment problem.July 2023
Hunar Batra$66,046Grant to cover 1 year of tuition fees and living expenses to pursue PhD CS at the University of Oxford. Accelerate alignment research by building Alignment Research tools using expert iteration based amplification from Human-AI collaboration.July 2023
Viktor Rehnberg$19,248This grant will support Viktor Rehnberg’s project identifying key steps in reducing risks from learned optimisation and working towards solutions that seem the most important. Viktor will start working on this project as part of the SERI MATS program.July 2023
Anonymous$4,309Stipend for 4-month placement at the MIT Computer Science and Artificial Intelligence Laboratory. The goal is to produce a paper for ICML, which will be accessible to researchers worldwide and that will aid in the understanding of the geometry of AI thought processes.July 2023
Palisade Research Inc$98,000This grant is funding a 6-month stipend for Jeffrey Ladish and operational expenses for his new organization Palisade Research Inc. Palisade will research offensive AI capabilities to better understand and communicate the threats posed by agentic AI systems. During this initial period, Palisade plans to create 2-3 demos that could be presented to policymakers in time for the Schumer AI bill effort, finish setting up its infrastructure and apply for 501c3 status, and hire a part time research assistant and executive assistant.July 2023
Sviatoslav Chalnev$35,000This grant is funding for a 6-month stipend for Sviatoslav Chalnev to work on independent interpretability research, specifically mechanistic interpretability and open-source tooling for interpretability research.July 2023
Carson Ezell$8,673This grant provides a 3-month part-time stipend for Carson Ezell to conduct 2 research projects related to AI governance and strategy. The path to impact for both of these projects will involve engaging with individuals at AI labs to ensure that the problems are currently unsolved and then develop proposals which members of governance teams at labs can point to as reasonable proposals and which might be implemented.July 2023
Ossian Labs$55,260This grant provides funding for a project exploring debate as a tool that can verify the output of agents which have more domain knowledge than their human counterparts. This grant provides funding for the first 2 stages of this project, investigating debate as a truth seeking protocol (WP1) and debate as a method for annotation speed ups (WP2).August 2023
Anonymous$115,000This grant provides funding towards a CS Master’s program at NYU, where the grantee will pursue technical AI safety research.August 2023
Cindy Wu$5,004This grant provides a stipend for Cindy Wu to spend 4 months working on AI safety research. During this period, Cindy will extend her Master’s thesis on understanding mechanistically causal knowledge representation in NNs for robust distillation. Other activities include but are not limited to taking on a research team lead position with AI Safety Hub Summer Labs and working with EleutherAI on LLM interpretability. This stipend is given conditional on Cindy spending at least one month working on unpaid projects.August 2023
Anonymous$48,451This grant provides a 6 month stipend to continue SERI MATS research on abstract out-of-context reasoning in large-language models as a precursor for treacherous turns.August 2023
Bilal Chughtai$48,084This grant is funding a 6-month stipend for Bilal Chughtai to upskill and work on a mechanistic interpretability project investigating attention head superposition in LLMs. Bilal will be working with Alan Cooney and the project will be supervised by Neel Nanda. The goal of this project is to publish a conference paper.August 2023
Bryce Meyer$50,000This grant is funding a stipend for Bryce Meyer to build and enhance open-source mechanistic interpretability tooling for AI safety researchers.August 2023
Nathaniel Monson$70,000Support to spend 6 months studying to transition to AI alignment research, with a focus on methods for mechanistic interpretability. Nathan will be advised by Professor Goldstein, Director of the UMD center for machine learning, and goals include solving one of Neel Nanda’s 200 Concrete Open Problems in Mechanistic Interpretability and developing a proposal for an interpretability paper.August 2023
Anonymous$8,000This grant provides a 2 month stipend for work on a hardware-related AI governance project. The project will take a detailed look at the current export restrictions on AI chips and seek to understand how they could be improved, with the goal of restricting uncooperative parties from creating cutting-edge models.August 2023
Kristy Loke$10,000This grant is funding a 2-month stipend for Kristy Loke to complete her GovAI project examining the state of AI development.September 2023
Morgan Simpson$32,653This grant is funding a 6-month stipend for Morgan Simpson to produce 2 AI governance white papers and a series of case studies, with additional research costs.September 2023
Steve French$5,000This grant provides one year of technical assistance and capacity building for ACX meetup in Atlanta Georgia. The main aim is to increase attendance in sessions and encourage rational thinking.September 2023
Kristy Loke$50,000This grant provides 6 months of funding for Kristy Loke’s research with Fynn Heide on AI development and AI safety engagement over the course of 6 months.September 2023
Anonymous$7,7003-month unpaid AI Governance internship to build career capital at the Millennium Project, a global futurist think tankSeptember 2023
Codruta Lugoj$7,537This grant provides a 4-month stipend for capacity building in AI alignment. The goals are for the grantee to gain research engineering skills implementing experiments in alignment (which will be shared on Github), understand current alignment agendas, and transition to doing the winter SERI MATS program or becoming an alignment researcher.September 2023
Shashwat Goel$12,000This grant is funding a 3-month stipend for Shashwat Goel’s SERI MATS research on knowledge removal techniques as a convergent safety technique that can help mitigate diverse risk scenarios. with the Center for AI Safety. The research results will produce an academic paper to engage and direct the efforts of the wider ML community.September 2023
Ross Nordby$40,000This grant provides funding for 1 year, part-time independent AI safety research focused on interpretability. Research will be published online, with longer-term goals of publishing in an academic journal or similar, if appropriate.September 2023
Charles Whittaker$5,293Funds to support travel for research with the Nucleic Acid Observatory relating to biosecurity and GCBRsOctober 2023
Anonymous$53,400Support for two researchers to work on a paper about (dis)empowerment in relation to Artificial General Intelligence. The paper will aim for top ML conferences such as NeurIPS and will formalize a notion of (dis)empowerment in order to train and evaluate models that do not reduce human agency.October 2023
Alexander Mann$36,000Designing a plan for a longtermist industrial conglomerate aligned via a reputation based economyOctober 2023
Francis Rhys Ward$8,285Funding for (academic/​technical) AI safety community events in LondonOctober 2023
Logan Smith$40,000Support for further pursuing sparse autoencoders for automatic feature findingOctober 2023
Bilal Chughtai$42,460Support to work on mechanistic interpretability research with mentorship from Prof David BauOctober 2023
Thomas Kwa$14,880Grant for past study of Goodhart effects on heavy-tailed distributions.October 2023
Nicky Pochinkov$71,695This grant is funding a 12-month stipend for Nicky Pochinkov’s independent AI Safety research. Nicky is exploring research on LLM Modularity/​Separability & Modelling Goals and Long-term Behavior. Results from Nicky’s research will be published to LessWrong.October 2023
Arran McCutcheon$6,214This grant will support Arran McCutcheon’s work on AI governance projects and activities.October 2023
Drake Thomas$14,880Retroactive grant to study Goodhart effects on heavy-tailed distributionsOctober 2023
Matthias Dellago$25,9076-month stipend for Matthias Dellago working on his Master’s thesis and paper on technical alignment research: mechanistic interpretability of attention. The paper will be published to arXiv and discussed in a blog post. The paper may also be submitted to a conference, with further plans to develop a tool that will allow other researchers to easily leverage and build on the results of this research.October 2023
Zachary Furman$40,0006-month stipend for Zachary Furman’s research as part of Daniel Murfet’s research group at the University of Melbourne, where he is working on developmental interpretability and singular learning theory which will be published to academic ML conferences.October 2023
Ann-Kathrin Dombrowski$27,8923-months stipend for SERI MATS extension to work on internal concept extractionOctober 2023
Jacques Thibodeau$27,108Funding to continue making tools for accelerating alignment and the Supervising AIs Improving AIs agendaOctober 2023
Berkeley Existential Risk Initiative$30,000This grant provides operational support for the mechanistic interpretability and language model steering project by Team Shard.October 2023
Hoagy Cunningham$35,923This grant provides a stipend for the grantee to draft a paper by the end of the SERI MATS phase on the sparse coding project and for supporting future research.October 2023
Kurt Brown$15,0004 weeks dev time to make a cryptographic tool enabling anonymous whistleblowers to prove their credentialsOctober 2023
Harrison Gietz$10,750This grant will support Harrison Gietz with a stipend for AI safety technical and/​or governance research. The research goals are to conduct impactful research to influence AI safety research, governance, and/​or evals in a positive direction, and to upskill in safety-relevant ML, with the aim of producing a paper/​report that is published in a reputable ML/​AI conference or journal.October 2023
Aidan Ewart$7,9296-month stipend for part-time independent research on LM interpretability for AI alignmentOctober 2023
Anonymous$48,423This grant provides a stipend for work generating feature identification methods based on information content useful for the purpose of interpreting neural networks. Results will be shared through blog posts for community review.November 2023
Cole Wyeth$50,000This grant provides 1 year of funding for tuition (~66% of the total grant) and living expenses for Cole Wyeth, who is pursuing a PhD in Computer Science at the University of Waterloo. Cole will be studying extensions of the AIXI model to reflective agents to understand the behavior of self modifying AGI, supervised by Professor Hutter.November 2023
Scott Viteri$10,000This grant provides 1 year of compute funding to develop a novel training technique that implements incentives towards prosocial behavior to improve the safety and alignment of LLMs. The goal is to provide a strong enough proof-of-concept that OpenAI or Anthropic implements the technique in their next large training run.November 2023
Anonymous$48,000This grant provides 6 months of support for up-skilling in technical AI alignment and independent interpretability research.November 2023
Samotsvety Forecasting$6,000General support for a forecasting teamNovember 2023
Felix Binder$2,000This grant will support Felix Binder’s compute for an experiment about how steganography in large language models might arise as a result of benign optimization.November 2023
Prompt Human Inc.$43,1596 months of funding for Quentin Feuillade—Montixi to continue working on Model Psychology and Evaluation research and publish findings to LessWrong.November 2023
Anonymous$20,000This grant will support the grantee with a 6-month part-time stipend to study, write about, and advise on frontier model regulation and forecasting.December 2023
Christopher Lakin$5,000This grant will support Christopher Lakin facilitating a small workshop in February focused on coordinating/​planning/​applying the concept of «boundaries» to AI safety.December 2023
David Udell$80,000This grant will support David Udell with a one year stipend and compute budget for full-time technical AI alignment research.December 2023
Existential Risk Observatory$24,484This grant will support Existential Risk Observatory in organizing AI x-risk events with experts (such as Stuart Russell), politicians, and journalists in order to inform and influence policymaking.December 2023
Riya Sharma$1,702This grant will provide Riya Sharma funding to attend the 2023 Biological Weapons Convention (BWC) Working Group Meeting to discuss transparency surrounding bioweapons with country representatives, as well as to work on a related research project.December 2023
Brian Tan$61,460This grant is providing nine months of funding for WhiteBox Research (1.9 FTE) to pilot a training program in Manila focused on Mechanistic Interpretability.December 2023
Thomas Kwa$75,000This grant will support Thomas Kwa with a 6-month stipend grant to research interpretability and control in independent alignment projects. He aims to prove the accuracy of a 1-layer attention only model, develop a variant of NMF to find human-interpretable features in LMs, and do activation engineering.December 2023
Anonymous$27,450This grant will support 3 months of research in nascent areas of EA and longtermism (such as digital sentience), with the eventual aim of founding a new organization or company.January 2024
The University of Hong Kong$33,000Undergrad buyout for Nathaniel Sharadin to teach AI safety in Hong Kong’s new MA program on AI; China-West AI Safety workshop.January 2024
Logan Strohl$80,000This grant will provide Logan Strohl with a one-year stipend to support work developing materials demonstrating an investigative procedure for advancing the art of rationality. This work has the potential to build capacity for open technical problems like AI alignment, which require new conceptual breakthroughs and defy complete formal theorizing.January 2024
John Wentworth$200,000This grant will provide John Wentworth with a 1-year stipend to continue his research into natural abstractions, with the goal of using products as a feedback mechanism to bridge the theory-practice gap and eventually apply the research to retargeting ML-internal planning processes, oversight, designing AI architectures, and building higher-level agency theory.January 2024
Chris Mathwin$28,325This grant will provide Chris Mathwin with a 6-month stipend to conduct independent mechanistic interpretability projects following SERI MATS 4.1. Chris’s research aims to improve fundamental understanding of the attention mechanism and to develop an appropriate functional unit of analysis for mechanistic interpretability.January 2024
Anonymous$7,4004-month stipend for remote part-time mechanistic interpretability research under Neel Nanda extending SERI MATS researchJanuary 2024
Yuxiao Li$45,000This grant is funding a $35,000 stipend plus $10,000 in compute costs for Yuxiao Li’s independent inference-based AI interpretability research.January 2024
Anonymous$9,675This grant will support the grantee with a living cost top-off stipend while they work on long-term relevant research at a DC think tank.February 2024
Philip Quirke$61,000This grant will provide Philip Quirke with a six-month study grant to speed up his career pivot into AI safety and alignment research. Specific deliverables include a paper on and tooling to help simplify the process of understanding complex ML capabilities.February 2024
Anonymous$2,200This grant will support the grantee with funding to visit MIT FutureTech.February 2024
Marcus Williams$42,000This grant will support Marcus Williams with a 6-month stipend to train Multi-Objective RLAIF (Reinforcement Learning from AI Feedback) models and compare their safety performance to standard RLAIF, with the goal of improving the alignment of future AI systems.February 2024
Rafael Andersson Lipcsey$7,000This grant will support Rafael Andersson Lipcsey with a 4-month stipend for upskilling within the field of economic governance of AI.February 2024
Keith Wynroe$22,3955-month funding to continue upskilling in mechanistic interpretability post-SERI MATs, and to continue open projectsFebruary 2024
Effective Altruism Israel$40,000This grant will provide EA Israel funding to support MentaLeap, a collaborative group of over 100 scholars composed of neuroscientists, AI researchers, and cybersecurity experts. The grant will support MentaLeap with costs associated with their office rental, leadership, compute budget, and food and refreshments.February 2024
Dillon Bowen$30,000This grant will support Dillon with a six-month stipend to transition to a career in AI safety while working on AI safety projects.March 2024
Lukas Fluri$37,120This grant will support Lukas Fluri with a six-month stipend to do an unpaid internship focused on using theory/​interpretability to increase the safety of AI systems.March 2024
Anonymous$275,000This grant will support stipend, compute, and contractor costs for an AI interpretability research platform for LLMs.March 2024
Anonymous$3,008This grant will support the grantee with travel expenses to present at the 2024 Global Health Security Conference in Sydney.March 2024
Aidan Ewart$23,159This grant will support Aidan Ewart with four months of funding for a MATS 5.0 extension. Aidan will work on improving methods in latent adversarial training to advance language model safety.March 2024
Arjun Panickssery$34,100This grant will support Arjun Panickssery with a four-month stipend for MATS extension work. Arjun will be studying the safety implications of LLM self-recognition.March 2024
Skyler Crossman$140,000This grant will support Skyler Crossman with a 12-month stipend to work as a coordinator for global rationality meetups.March 2024
Anonymous$3,821This grant will support the grantee with travel funding to present biosecurity policy research at Global Health Security Conference 2024.March 2024
Hannah Erlebach$25,125This grant will support Hannah Erlebach with a 4-month stipend to continue AI projects relevant to singleMarch 2024
Yoav Tzfati$62,150This grant will fund Yoav Tzfati with a four-month stipend to continue working on a MATS project. Yoav’s project aims to use meta level adversarial evaluation of debate (scalable oversight technique) on simple math problems, with the ultimate goal of red-teaming scalable oversight techniques that may be used to align transformative AI.March 2024
Ashgro Inc$272,800This grant will provide Apart Research (through fiscal sponsor Ashgro, Inc.) with funding (salaries & ops costs) for AI Safety talent incubation through research sprints and fellowships.March 2024
Garrett Baker$17,500This grant will support Garrett Baker with a 3-month MATS extension stipend to use singular learning theory to explain & control the development of values in machine learning systems.March 2024
Alfie Lamerton$6,001This grant will support Alfie Lamerton in conducting a one-month literature review on in-context learning and its relevance to AI alignment.March 2024
Anonymous$30,291This project will support the grantee with a 5-month stipend to create a research agenda and conduct research using the IO literature in economics for AI strategy; findings will be published online when done.March 2024
Epistea$103,822This grant will provide one year of funding for PIBBSS (through fiscal sponsor Epistea), a research initiative aiming to leverage insights on the parallels between intelligent behaviour in natural and artificial systems towards progress on important questions about future artificial systems. The grant will support several programs, including the 2024 fellowship, affiliate program, and a reading group.March 2024
Egg Syntax$55,000This grant will support Egg Syntax with four months of funding for research on how much language models can infer about their current user, and interpretability work on such inferences. This work aims to contribute to better understanding of state-of-the-art LLMs, in particular with respect to their capacity to infer information about users, to help detect deceptive or manipulative behavior.March 2024
Oscar Balcells$40,356This grant will support Oscar Balcells with a 4-month stipend to research the mechanisms of refusal in chat LLMs, with the ultimate goal of contributing to models that are more resistant to misuse.April 2024
Anonymous$127,000This grant will support the grantee with a one-year stipend for policy and technical work on biosecurity.April 2024
Sienka Dounia$8,500This grant will support Sienka Dounia with a three-month stipend to support relocation from Chad to London to work on Eliciting Latent Knowledge with Jake Mendel from Apollo Research. This work aims to improve the transparency and reliability of AI systems, reducing the risks of deceptive models, with direct benefits to the AI safety research community.April 2024
Teunis van der Weij$30,458This grant will support Teunis van der Weij with 4-month expenses for AI safety research on personas and sandbagging during the MATS 5.0 extension program.April 2024
Ashgro Inc$100,000This grant will provide support for an increase in Timaeus’ salaries and rates for employees & contractors, enabling them to continue their work investigating the applications of Developmental Interpretability (DevInterp) and Singular Learning Theory (SLT) to AI safety.April 2024
Hayden Peacock$5,000This grant will support Hayden Peacock with a two-month stipend while Hayden works to establish a broad-spectrum antiviral research organization. If the proposed venture is successful, it will improve defenses against future pandemics by providing new broad-spectrum antiviral drugs.April 2024
Joseph Kwon$40,000This grant will support Joseph Kwon with a 6-month stipend to work on a machine learning safety project, with the aim of investigating the limitations of current probing/​interpretability methods of representation engineering in AI.April 2024
Abhay Sheshadri$15,075This grant will support Abhay with a four-month stipend to work on two research projects during the MATS 5.0 extension program, focusing on understanding and mitigating the potential misuse of language models (LMs) and developing tools for the safe pruning of knowledge from AI systems.April 2024
Sviatoslav Chalnev$40,000This grant will support Sviatoslav Chalnev with a six-month stipend to continue independent interpretability research, with the goal of diversifying AI alignment work with more speculative ideas.April 2024
Danielle Ensign$60,000This grant will support Danielle Ensign with a six-month stipend to do circuit-based mechanistic interpretability on MAMBA, as part of the MATS extension program.April 2024
Roman Soletskyi$35,468This grant will provide Roman Soletskyi with a 6-month stipend to conduct research on AI safety, verifying neural network scalability for reinforcement learning and producing a human to superhumansuper-human scalable oversight benchmark, which will eventually be published publicly.April 2024
  1. ^

    Please note that the highlighted grants are likely to be unrepresentative of our average grant, and certainly of our marginal grant. To have a better sense of what marginally donations are likely to buy, please read Hypothetical grants that the Long-Term Future Fund narrowly rejected, my earlier post on this exact question.

Crossposted to LessWrong (40 points, 0 comments)