Speedrun: AI Alignment Prizes

Introduction

This post is a shallow investigation of the intervention of running a highly-advertised, high-budget AI alignment prize contest. The post is part of a sequence of speedrun research projects by Rethink Priorities’ general longtermism team. I recommend you begin by reading the introductory post in the sequence if you haven’t already for information about the context of this post.

A quick context tl;dr:

  1. The aim of this investigation was to help our team decide if we should take steps towards incubating an organization focusing on this project. Keep in mind that some of the conclusions take into account considerations (such as our team’s comparative advantage) that may not be relevant to the reader.

  2. The investigation was intended to be very early-stage and prioritize speed over rigor. We would have conducted a more in-depth investigation before launching any projects in this space (and recommend that others do the same).

  3. Especially important for this speedrun, given the potentially very large costs of the intervention: This post was written in early fall 2022; the funding situation has changed significantly since then, but the investigation has not been updated to reflect this. As such, the funding bar alluded to in the post is probably outdated.

Epistemic status

I spent ~16 hours researching and writing this speedrun. I have no other experience with prizes, and am not intimately familiar with technical alignment. So: I’m a junior generalist who has thought about this for a couple of work days, and as a result, this post should be considered very preliminary and likely to contain mistakes and bad takes. My goal in publishing this regardless is that it may be useful as (a) a primer gathering useful information in one place, and (b) an example of the kind of research done by junior generalists.

I’ve also received feedback on this since I wrote it that has updated my assessment slightly downward, but I haven’t updated my bottom line to reflect this. You can find some of these considerations in the Key Uncertaintiessection– particularly the last point (re: the Millennium Prizes being a poor allegory to use as a baseline).

Summary

  • The document evaluates the potential for a highly-advertised AI alignment prize contest, which hypothetically would award tens of millions of dollars (or more) to entrants who solve difficult, important open questions in AI alignment. (More)

  • I’m mildly excited about this project, but only if done especially well. If done well, the impact is potentially huge (i.e., identifying novel approaches to AI alignment). Second-order effects could also bring attention & talent to alignment research. (More)

  • The “good” version of this project would need to carefully address several design considerations, such as (a) identifying and clearly describing a tractable question that, if someone solved it, they could produce clear evidence of having done so,[1] (b) establishing transparency and legitimacy in the judging process,[2] and (c) using other methods than mind-boggling amounts of cash to incentivize participation. (More)

  • There are some clear downside risks, some of them may be hard to mitigate, and they mostly stem from anything short of great execution. These include (a) reputational harms to alignment research and associated parties if the challenge is poorly defined or viewed as frivolous, (b) Opportunity costs of participants’ time if the challenge fails to make intellectual progress, and (c) Potentially incentivizing capabilities research and popularizing dangerous ideas about AI. (More)

  • Others have worked on smaller alignment prizes. There are a handful of these contests underway or already finished. (More)

  • However, there are bottlenecks to making this larger version happen. The main one is actually having a Billion Dollar Question or a set of Million Dollar Questions. Some steps toward this could include:

    • Convening alignment researchers to develop a set of promising questions

    • Surveying target groups to gauge their interest in research questions

    • Supporting independent efforts on this (including identifying and supporting founders to do the above tasks)

    • Incubating a small team dedicated to comparing research agendas and identifying promising questions for contests. (More)

Intro: What is this project?

Running a highly-advertised AI alignment prize contest, which would award tens of millions of dollars (or more) to entrants who solve difficult, important open questions in AI alignment. Similar contests in other domains include the Millennium Prize Problems and the Google Lunar X Prize, which attempt to induce innovation by offering a large bounty on solutions. Examples of alignment problems that the FTX Future Fund, for example, mentioned as prize-worthy (on their now-deleted page of project ideas) include: “reverse engineering 100% of a neural network the size of GoogLeNet”, solving the problem described in “Redwood Research’s current project”, achieving near-perfect robustness on adversarial examples in vision without lowering average performance”, and more.[3]

Design Considerations

I have some preliminary takes on design considerations for this project, held with low credence. These include the following:

  • Alternative incentives. We might want to use other methods than mind-boggling amounts of cash to incentivize participation.

    • The correlation between large cash prizes and contest visibility seems weak– alternative options like high-profile sponsors (e.g., prominent mathematicians) could start to address this.

  • Number of questions. Should this prize be filtered through a single challenge (e.g., $100M prize for carbon removal) or a number of important sub-problems (e.g., Millennium prizes)?

    • I slightly lean toward having a number of sub-problems, as I think this is more suited to the current state of alignment research.

  • Periodicity. Whether this is a one-off with no end date, or if this repeats in ~6-month cycles.

    • While I expect that asking sub-questions of alignment is a reasonable approach in the absence of a clearly most-important and well-scoped question, I think this could lend itself to either (a) a cyclical & updating contest as our priorities change, or (b) a longer-running contest with many subproblems.

  • Question specification and framing: Should this focus on alignment theory, or be closer to an engineering question?

    • Successful inducement prizes have historically (often) dealt with engineering problems that have a clear “victory state”, like the Longitude Rewards. Building on this trend, I imagine that this might be more helpful for empirical research than alignment theory.

Impact

Note that in this section, I focus exclusively on how this project could help prevent existential risk. This is because my team generally operates under a longtermist worldview, and under this worldview, effects on existential risk seem likely to be the most important effects in determining the expected impact of a project.

Table 1: Mechanisms to Impact

Mechanism to ImpactReasons for Skepticism
Significantly increasing the number of total hours spent on AI safety researchIt’s possible the hours “bought” by a prize would be lower quality than existing alignment work– though this could also attract brilliant researchers with fresh perspectives.
(see above)It’s possible that the premise (there are tractable questions in alignment research that external people could contribute to) just isn’t true.
(see above)I might be significantly overestimating the hours that a major research prize would buy, by anchoring on the most successful research prizes as a baseline.
Nerd-sniping prominent mathematicians, philosophers, computer scientists, etc. to want to solve alignmentPossibly getting highly-skilled people incidentally interested in AI capabilities.
Adding prestige to alignment as a fieldThe correlation between large cash prizes and prestige is weak, according to this prizes review from Rethink Priorities’ Global Health and Development team.
Pushing alignment researchers to develop stronger proposals to align prosaic models

This effect relies on better work being done in response to the pressure of a looming prize that has to be good. AFAIK alignment researchers are already feeling the pressure that the world will end soon, and so probably can’t work much harder.

This could also distract some research hours from more theoretical work, if the prize is closer to the engineering end.

Developing connections between alignment researchers and experts in other disciplinesOther researchers might find existing alignment work to be pretty arcane and/​or bad, and mostly decide their own approaches are much better.

Gaps/​Opportunities for Marginal Impact

Given the amount of activity already in this space (see: Existing And Previous AI Safety Contests), a significant question is how anyone can contribute at the margin. There are several gaps to fill, including:

  • Providing information to funders about the level of success of existing safety prizes, and related lessons (using expert surveys).

  • Coordinating among alignment researchers, or recruiting individuals to do comparative research agenda investigations, to identify research questions that are (1) tractable by outsiders, (2) legitimately hard, (3) could be interesting to extremely smart people, and (4) worth spending massive amounts of money on.

  • Surveying target groups (e.g., ML grad students and professors, or philosophy professors for more theory-oriented prizes) to gauge their interest in questions produced by the above process. (HT to Oliver Zhang for inspiring this point).

    • Note: Some former AI safety prizes by CAIS have seen that being within “mainstream” research plays a major factor in participation, which suggests waiting until a topic has been formalized by a well-cited research paper before launching a massive prize on it (HT Oliver again).

Another bottleneck, mentioned by Holden Karnofsky in the Appendices to Important, Actionable Research Questions, is this:

  • He describes that much of the existing prize work fails to employ “comparative AI alignment assessment… trying to argue explicitly that particular alignment agendas seem more valuable than others.” In his assessment, most of the existing work is either not aimed at the hardest part of the problem, or totally incomprehensible to an outsider. He estimates there are 0-5 people doing comparative AI alignment assessment, and it’s a critical step.

  • In his doc, the ideal contest question would be some “Activity that is likely to be relevant for the hardest and most important parts of the problem, while also being the sort of thing that researchers can get up to speed on and contribute to relatively straightforwardly (without having to take on an unusual worldview, match other researchers’ unarticulated intuitions to too great a degree, etc.)”

Cost (~70% CI)

The ambitious/​rough upper bound, as explored by Holden and also mentioned by Superlinear, is offering a ~$1 billion prize, in addition to other costs (e.g., cost of second-place prizes [also significant], ad campaigns, and cost of labor to choose problem(s) and review submissions [relatively negligible]). Holden states in his discussion that prizes could be larger or smaller, and that $1B is largely a placeholder to signal that this is highly scalable.

I’d place a 70% CI at $10M-$2B, which encompasses a few scenarios:

  • Scenario 1: Pilot fails, no repeat. Lower bound for this is a $1M prize[4] with a 10x operating cost,[5] so in this scenario, someone hosts a $1M contest, it’s clear from the submissions that the problem is intractable for outside researchers, and for some reason we don’t try it again.

  • Scenario 2: Pilot fails, repeat larger. Same as above, but instead it’s clear that the problem was either (a) not enough money was put up for sufficient attention, or (b) society’s failure to answer the question leads to a lot of interest, and we try again with a larger prize.

  • Scenario 3: We don’t need $1B. It’s possible that, after the team running this project does some social psychology work, they find that a much smaller prize (e.g., $10M) would be sufficiently motivating to attract whatever researchers and attention we’re looking for (though we might want to investigate how this effect is going with the X prizes [of which the Google Lunar project mentioned above is one]– for example, the Musk Foundation just funded a $100M carbon removal incentive prize on their platform).

  • Scenario 4: Full-scale, including a repeat. This scenario includes a $1B prize that fails, one that succeeds, and one that succeeds so well that it’s worth repeating with a different question.

  • Alternate scenario: One big contest is worse than several small(er) contests. Paraphrasing Akash Wasil from a private conversation: “It seems likely to me that it would be better to have multiple $10M contests rather than one $100M contest. Different contests will appeal to different groups of people. For example, it might be good to have (at least) one contest that appeal to (a) theoretical/​conceptual researchers, (b) empirical researchers/​ML people, (c) mathematicians, and (d) philosophers.”

Cost-effectiveness guesstimate

The main impact: increasing the quantity of research hours spent on AI safety.

  • At 50th percentile success, I believe this could roughly 1.05x the number of total hours spent to date on AI safety research.[6]

  • At 95th percentile success, I believe the factor would be closer to 1.5x.

  • My method for quantifying this involved:

    • Ballparking (A) the number of hours spent to date on AI safety research

    • Ballparking (B) the number of hours garnered by the Millennium Prizes

    • Speculating (C) the level of success of this project as a fraction of the Millennium Prizes (e.g., 1/​40th as successful), in terms of research hours purchased.

    • Calculating (D) the additional percentage of AI safety research hours purchased by this project, compared to existing research hours = (B*C) /​ A

  • See below for more details.

(A) Hours spent on AI safety so far

Context: Very-rough estimate of total hours spent on AI safety work to date: 3.1 million.

  • I use Stephen McAleese’s estimation of technical AI safety researchers per year to ballpark annual FTEs dating back to 2000

  • I multiply that by 1.5x to include non-technical AI safety researchers (roughly the estimation of the associated post for non-technical researchers)

  • Cumulative FTEs spent on technical & non-technical AI safety since 2000: ~1560

    • x 40 hours/​wk, 50 weeks/​yr =

  • Cumulative hours spent on technical & non-technical AI safety since 2000: 3.1 M

(B) Estimation of # of hours garnered by other prizes

The literature on prizes is weak when it comes to estimating participant hours. Using the sources available: taking Superlinear’s estimate of hours spent on the P=NP problem [1 million][7] x 7 [for the other 6 Millennium prizes]) leads to an estimate that the Millennium Prizes have garnered 7 million researcher hours (over the course of a roughly 20-year period).

Are the Millennium prizes a reasonable baseline to compare this project to? Major uncertainties include:

  • The Millennium prizes have been around since 2000; could we expect to gather (a) a non-negligible fraction of that level of interest (say, one-quarter), and (b) do so in a shorter period of time, using the right incentives?

  • Are the Millennium prizes more successful for other reasons, like being directed toward a clear population (mathematicians)?

    • Could we replicate this success by having strongly-targeted question selection and outreach?

    • Retroactive comment: Oliver Zhang pointed out to me that the Millennium Prizes benefit significantly from the social prestige and status the challenge holds within the academic community. While I believe there are some ways we can actively try to build prestige around a large alignment prize (e.g., recruiting high-status judges/​advisors, or specifying problems that rely on more established literature), the prestige consideration would bump down my bottom line somewhat (though I haven’t considered by how much).

There are other prizes that might be better comparisons, and next steps for this investigation could include doing a more robust cost-effectiveness measurement if information on participant hours for such contests is available.

I’m taking the maybe-controversial stance that these hours aren’t substantially less valuable than previously-spent hours on alignment, for the following reasons:

  • Shared wisdom seems to be that the top 1-5% of researchers produce the majority of valuable insights. I expect that a major prize would attract brilliant researchers from other fields as good as some of the best researchers already working on alignment, if aimed at audiences with major potential (e.g., Fields Medalists).

  • The hours spent by the most brilliant researchers could translate into longer-term interest, effectively buying the opportunity to access many more of the best human labor hours available.

  • Large prizes, if designed and advertised well, can attract unconventional and diverse approaches, and encourage interdisciplinary collaboration. There’s significant value to having approaches from fields that aren’t already well-represented in alignment research– an imaginary example of a project that might add valuable exploration and new insights could be a collaboration between a theoretical mathematician, a psychologist, and an evolutionary biologist.

(C-D) Estimates of cost-effectiveness at different success levels

Retroactive comment: The following estimates use an upper bound of a $1B prize. The prize offered could (and, I think, should) be 1-2 orders of magnitude (10-100x) lower, but I started there because this was roughly the scale that was being floated by the Future Fund at the time of writing.

Caveat: Due to the “speed” nature of the speedrun, much of the calculations below are based on very rough guesswork, aiming to get roughly within an order of magnitude of what we could expect from this project (i.e., to inform us on whether to investigate further).

50th percentile success:

If this is done reasonably well, I imagine it could be 1/​40th as attractive as the Millennium Prizes– that, in total, it attracts half as much attention in total as the Millennium prizes attract in 1 year (~175,000 h).

The cost-effectiveness of the prize contest then depends on the prize size that would be needed to achieve this level of success.

For the upper bound of a $1B prize, the cost per research hour at this level of success would be ~$5700/​hr. At first glance, this looks rather extortionate. But importantly, if the AI alignment community proposes a question that is actually worth a billion dollars to solve, and someone solves it, then this price tag is (by assumption) worth it.

But assuming good execution on outreach, question choice, sponsors, etc., I would be surprised if there was a major difference in submissions for prizes ranging $1B and $100M, or even lower– and if so, this could improve cost-effectiveness by a factor of 10 or more. (HT Oliver Zhang for inspiring this point)

  • If we assume that a $100M prize would capture 100% of the value (i.e., produce the same quality and quantity of submissions), the cost per research hour would be ~$570/​hr.

  • If we assume that a $100M prize would capture 80% of the value, the cost per research hour would be ~$710/​hr.

95th percentile success:

I place a 5% chance that this prize is ¼ as successful as the Millennium Prizes in terms of total research hours purchased, i.e., that it purchases 1.75 million research hours over several years.[8]

For the upper bound of a $1B prize, that would put the cost per research hour at $570/​h.

  • If we assume that a $100M prize would capture 100% of the value of a $1B prize (which seems less plausible for the 95th percentile success scenario), the cost per research hour would be ~$57/​hr.

  • If we assume that a $100M prize would capture 80% of the value, the cost per research hour would be ~$71/​hr.

The calculus is much worse if the prize is less successful than that, and this seems highly (95%+) likely to me. As a result, this project seems like it would only be worth it if:

  • We were buying a reasonable shot at impactful solutions to alignment that we don’t expect to find otherwise,

  • And/​or we believe that the prize is massively expanding the reach and prestige of alignment as a discipline.

We may also want to avoid scenarios that don’t result in large prize disbursement (e.g., possibly 50th percentile scenarios), because saving money in that scenario likely comes at reputational cost, especially because alignment isn’t already a strongly established field.

Other considerations

Downside Risks

I see this project as somewhat risky, but in a way that risks could be slightly mitigated (low-confidence on the mitigation, though). Risks include:

  • Popularizing dangerous ideas. I imagine that major ($1-10M +) funding into AI alignment prizes outside of the alignment community could publicize alignment literature in a way that popularizes ideas like short timelines, which could be bad.

    • Mitigation Strategy: I wonder whether this could be mitigated at the level of question-setting (e.g., not phrasing questions heavily in alignment terms). However, that may have a downside of building excitement about AGI without an alignment context (HT my colleague Linch for this point).

  • Accidentally incentivizing capabilities research. i.e., if non-alignment researchers try to solve hard problems regarding advanced AI systems, without a strong understanding of the risks, they might uncover some pretty dangerous lines of thinking.

    • Mitigation strategy: Not sure, but this seems important to address somehow.

  • Asking bad questions. If the prize fails to identify the correct questions, this could direct a lot of people to focus on aspects of alignment or AI research in general that either aren’t that important, or are actively harmful (e.g., by mischaracterizing the problem, or by opening new lines of dangerous research), or contribute to the sense some non-alignment AI researchers have that alignment research is poor quality (HT to my colleague Marie).

    • Mitigation strategy: Don’t rush the project if a question doesn’t pass the following criteria:

    • (1) accepted by a near-consensus of alignment researchers

    • (2) receives generally positive feedback from a survey of target audience

    • (3) has been red-teamed by someone who’s highly skeptical of alignment prize questions and edited accordingly

  • Reputational risks. Given the amount of attention/​media requests (and scrutiny) I expect a move like this to get, reputational harm is a risk to all parties involved, if the prize comes across as underspecified in practice, and/​or a frivolous waste of money.

    • Parties that could be at reputational risk include funders, people who specified the question(s), and the field of alignment more generally.

    • Reputational harm seems pretty bad if the project doesn’t produce major results, because it could lead to concrete blocks to recruiting/​researching/​popularizing alignment research in other ways.

    • Mitigation strategy/​opportunity: This might be mitigated in part by having broadly-respected figures in the ML field endorse the contest, i.e., spending reputation points to legitimize the contest and alignment as an important problem to take seriously (without necessarily endorsing the quality of the field as it exists).

    • Mitigation strategy: Avoid large-scale media coverage/​outreach, focus more on spreading the contest via academic circles

  • Opportunity costs. For example, this could just be explicitly worse than an intervention like offering very large salaries to alignment researchers, because (1) salaries ensure full-time-ish work, as opposed to ten hours of optional attention that might translate into further interest and work, and (2) a lot of time probably saved on running the contest.

  • Judging process issues. Participants should have a clear understanding of what a winning proposal would require, as opposed to something like “these five alignment researchers think your proposal will work” (HT Mark Xu). This is a challenge because AI alignment is pretty arcane. A lack of transparency and legitimacy in the judging process could lead target audiences to (a) never participate, or (b) lose faith in judging decisions.

    • Mitigation strategy: Scope the prize such that challenges are closer to engineering, and answers are more likely to be clearly delineated as “it worked” or “it didn’t”; avoid running an enormous philosophy/​theory prize.

Feasibility

Upsides

There may be money available if the right version of this project comes along. People within EA are excited about this, and Holden Karnofsky has tentatively expressed interest (on the behalf of longtermist funders) in potentially spending ~$1B on this. [source]

  • Retroactive comment: This statement is much more questionable given the recent changes to the longtermist funding landscape.

Large amounts of funding will probably cause significant attention relative to existing AI research prizes. Computer science prizes are generally not as huge as the kind of prize evaluated here: see the Squirrell Award, a “new Nobel” award for socially-beneficial AI research, which provides a $1 million prize, and Amazon Research Awards provide ~$20K-$100K per prize.

Downsides

Talent required: This requires time and attention from alignment researchers who don’t have much time and attention to spare. For example, this post by Akash Wasil argues that the question-makers would need Ideally a strong understanding of AI safety and the ability to identify/​write-up subproblems. But I think this could work if someone was working closely with AI safety researchers to select & present subproblems.”

Question specificity is lacking: Currently, there doesn’t seem to be consensus in alignment research on what the most important questions are, and many questions don’t reflect more engineering-type work that historically have made for a good prize.

Others working in this space

There are a lot of people already working on smaller alignment prizes. Other actors (e.g., the FTX Future Fund and Holden Karnofsky, as mentioned above) have performed early exploration into the prospect of a major prize. The latter work, focusing on a huge alignment research prize, has been preliminary, and work to date on the research that could identify the right problem for such a prize is minimal.

The table below offers a non-comprehensive overview of AI safety-related contests. The actors listed do not necessarily endorse or agree with the views I have expressed in this post.

Table 2. Existing and Previous AI Safety Contests

Who?What?More Info
CAISHosting several competitionsCompetitions page
Future Fund[withdrawn] Large AI Worldview Prize (up to $1.5M for major changes to their current worldviews on AI risk)Announcing the Future Fund’s AI Worldview Prize
Ethan Perez, Ian McKenzie, Sam Bowman$250K prize pool for demonstrations of the Inverse Scaling LawAnnouncing the Inverse Scaling Prize—LessWrong
ARC[completed] The Alignment Research Center offered prizes for proposed algorithms for eliciting latent knowledge.ELK
MIRI$200K in prizes for the development of datasets of “visible thoughts” for language models—i.e., intermediary annotations of the main outputVisible Thoughts Project and Bounty Announcement—LessWrong
FLI$100K prize pool for designing visions of aspirational futures with strong AI.FLI Worldbuilding Contest
Akash Wasil & Olivia Jimenez

Two $100k research contests about goal misgeneralization & corrigibility

Alignment awards
EA UC Berkeley$2500 prize (and smaller sub-prizes) for distilling complex AI alignment/​safety ideasDistillation Contest—Results and Recap—EA Forum
Andy Zou/​CAIS$100K prize for AI as a predictive toolThe Autocast Competition
Mantas Mazeika/​CAIS$50K prize pool for detecting and analyzing Trojan attacks on deep neural networksTrojan Detection Challenge
Nonlinear staffSuperlinear, a service for scaling up prizes by allowing people to pledge more money than the initial prizeSuperlinear
SafeBench$500K prize pool for outstanding AI safety benchmark ideasSafeBench

Key uncertainties /​ Potential next questions

  • Why haven’t people done $1B prizes in other fields? This is unclear in the literature as far as I can tell. To be fair, $1B is a substantial sum of money, so there just aren’t many individuals who could do this; there are, however, institutions with enormous budgets (e.g., governments) that could fund massive prizes. Why haven’t they? (HT Oliver Zhang)

  • Will non-alignment researchers do a good job of solving alignment problems? I’m skeptical, especially if the question is phrased in the more arcane language of some existing alignment agendas. One way to address this would be to take the route of having a very explicit engineering problem (e.g., get this model to behave in this way).

  • Are there tractable questions yet? If we get top alignment researchers together in a room, could they come close to agreeing on a set of questions? It’s possible that the field is currently too pre-paradigmatic to identify the right question.

  • Is there an optimal ratio of Prize Size : EV of submissions? E.g., is $1B necessary if $100M is still significantly larger than any other prize in this area, or $25M? How does interest scale with size? Where does interest taper off? Are there leaps in interest at certain thresholds (e.g., $1M)?

  • How do different disciplines respond to different prize sizes? For example, a larger prize might be required to attract attention in ML because it’s a well-resourced field, but other fields (e.g., philosophy) might respond well to smaller prizes, so long as they’re large in relation to the field’s typical funding scale. (HT Oliver Zhang)

  • Who are the people who’d participate in a contest like this but wouldn’t do alignment research otherwise?

    • Quote from a private conversation with my colleague Marie Buhl: “It seems to me like a large proportion of the contest participants would have to be not-already-alignment researchers for the project to really be impactful (otherwise we’re just redirecting alignment work towards certain problems). It’s not clear to me that the money will be that big a motivating factor. It also seems plausible that the people for whom money is a major incentive aren’t the best researchers – people with already established CS careers probably already earn a good amount and it’s costly to switch away from what you’re currently working on.”

  • I’m quite uncertain about using the Millennium Prizes as a baseline. Retrospectively, I think it would be helpful for this analysis to instead take numbers from existing alignment and/​or ML prizes as a baseline, and extrapolate from there (HT Oliver Zhang). I’d be interested in seeing someone try to wring some information out of this, perhaps by taking the contests in Table 2 (above) as a starting point.

Appendix: Some Relevant Literature

Acknowledgements

This document benefited significantly from feedback provided by my colleagues Renan Araujo, Marie Buhl, Jenny Kudymowa, Linch Zhang, and Michael Aird, and through conversations with Oliver Zhang, Mark Xu, and Akash Wasil. All errors are my own.

This research is a project of Rethink Priorities. It was written by Joe O’Brien. If you like our work, please consider subscribing to our newsletter. You can explore our completed public work here.

  1. ^

    In the spirit of solving a Millennium Prize question, which comes in the form of a mathematical proof.

  2. ^

    i.e., “These five alignment researchers liked it” is unlikely to encourage to external researchers to devote resources to the project [thanks to Mark Xu for this point]

  3. ^

    These are merely some examples that others have mentioned- an important design choice is how problems are chosen, and it is unclear what problems would ultimately create the most value and avoid important risks.

  4. ^

    ~Arbitrarily chosen as a reasonable pilot number because it’s both (1) probably large enough to elicit wows from society, and (2) 1/​1000th of the number floating around for a major prize seems reasonable for a pilot.

  5. ^

    Based on insights from a conversation with my colleague Jenny Kudymowa; however, I (Joe) didn’t include this operating cost factor at the high end, because my (low-confidence) personal intuition is that on the high end, operating costs would be negligible compared to the bulk of the prize.

  6. ^

    Re: x-risk

  7. ^

    https://​​www.super-linear.org/​​about—though there’s not an original source here.

  8. ^

    P=NP got 1 million hours, and Millennium Prizes had 7 questions, so 7 million hours total assumed, and 1.75 for ¼ success for the alignment prize.