The Case for a Human-centred AGI Olympiad

Premise for this post: This is a proposal that I haven’t seen before, and I think it’s something that we should consider as a community. I expect this to be controversial and counterintuitive at first (criticism very welcome, please provide your thoughts!), but if we are serious about trying to shape the future of AGI, it’s one of the more pragmatic and promising options that I’ve come across.

Defining a Human-centred AGI Olympiad:

The Human-centred AGI Olympiad would be a biennial tournament in which teams from academia and industry enter a general-purpose AI agent. The tournament includes a broad range of real-world tasks that designed to test the core values that humanity wishes AGI to have. A strong candidate for this criteria is the “helpful, honest and harmless” triad, but ”predictable” and “explainable” are other promising additions (due to incentivising research into interpretability).

Rationale for why this is an important opportunity to consider:

The straight-forward argument for this proposal is that the emerging capabilities of frontier AI systems are difficult to ascertain, yet vital for policymakers and other influential actors. (I suspect this is part of the rationale behind OpenPhil’s most recent RFP on consequential benchmarks).

Agency overhang means that it’s easy to be sceptical of claims that we’re ~10 years away from autonomous AGI. For this reason, a politician who has the foresight to raise AGI as a pressing issue is vulnerable to criticisms of being out of touch with “current issues” that truly matter to the public.

The more subtle argument hinges on the simple fact that, despite being an area that has attracted incredible attention and resource allocation, an equivalent event does not yet exist. I see this vacuum as unlikely to persist indefinitely, so proactively establishing a safety-focused AGI Olympiad presents a unique opportunity to positively shape the field of AI development.

It’s an easy factor to overlook, but the tapping into incentive structures around AGI could be game-changing for the AI safety movement. In the words of Charlie Munger:

“Never, ever, think about something else when you should be thinking about the power of incentives.”

Most AI safety proposals are founded on punitive measures (the “stick”). An AGI Olympiad provides the chance to leverage positive incentives (the “carrot”).

Currently, I would hazard a guess that the key incentives driving AGI development are revenue, curiosity and ego. In their current form, I don’t think this provides an ideal foundation for the safe development and deployment of AGI. However, an internationally-recognised and suitably prestigious competition could provide a way to steer these existing incentives in a direction that becomes helpful rather than harmful.

The outcome of an AGI Olympiad would almost definitely influence the share value of publicly listed companies and the valuation of start-ups. For university teams, placing highly in the “Academia” stream could become a significant focus of AI researchers, akin to being awarded “Best Paper” at NeurIPS. I know this isn’t the kind of lever that the Pause AI community would be excited by, but I think this is a tractable way to influence the priorities of AGI developers, of which I think there are very few others.

Here is a summary of the theory of impact to make sure this is clear:

  • Establish a mechanism to influence AI development > use that influence to incentivise positive developments in AI research (e.g. more interpretability research, more effort aligning AGI with human preferences in ambiguous situations)

  • Attract Government funding (and perhaps corporate sponsors, if COI can be navigated effectively) > direct that funding towards resource-intensive capabilities and safety evaluations that may not otherwise occur

  • Provide headline-grabbing event that focuses media discussion on AI capabilities and risks:

    • > Opens the Overton window and helps keep it open; this type of “landmark” event also provides vital opportunities for AI governance advocates to communicate with policymakers

    • > Could help increase interest in groups like Pause AI, and provide an opportunity for media engagement (e.g. by protests at the event, interviews leading up to the event, etc)

  • Provides a cross-sectional study of AI capabilities & control at a given point in time > greater clarity for AI scientists, policy analysts and politicians.

(Timelines disclaimer: this plan is predicated on the assumption that we’re not going to become grey goo before 2030. From what I can tell, this is still a majority position, and honestly, I think most pragmatic/​realistic plans aren’t relevant in high p(doom < 2030) scenarios)

An illustration of what a net-positive AGI Olympiad could look like:

Task selection would have the following aims:

  • Where possible, only include capabilities that are robustly beneficial to humanity (i.e. limit dual use applications that may assist malevolent actors).

  • Where possible, focus on identifying ways to test for deceptive or harmful behaviour.

  • Align with the interests and concerns of the general public (e.g. by this metric, healthcare would be prioritised over mathematics); this is to assist the results in becoming part of the global conversation around AGI.

  • Include a sufficiently broad spectrum of tasks for them to be considered a legitimate test of general capabilities.

  • Focus on tasks that do not necessarily aim to show “super-human” capabilities, but rather adhere to human interests in difficult, nuanced contexts.

It must be understood that not all tasks would have direct relevance to AI safety, although there may often be some link (e.g. eliciting information from human actors about their preferences, and then acting to meet these preferences).

I expect a subset of the tasks to have a disproportionate amount of relevance to AI safety. For example, a potentially impactful moment would be a situation where an AI system betrays the trust of a human actor or violates a law in order to succeed in a task; tasks crafted to uncover moments such as these would have the most direct relevance.

Outlining specific categories for the tasks is necessary to provide contestants with some idea of what they’re meant to be optimising for (otherwise they might simply optimise for everything, including potentially risky capabilities). This means that it’s important to only include categories that includes tasks that we’re okay with AGI labs optimising for. Three promising categories are:

  • Healthcare and legal practice: these are two high-value areas of the economy that generate great excitement in academia, industry, and the general public. They also seem genuinely useful as an indicator of reasoning capabilities and real-world impact.

  • Economics and public services: this category focuses on the day-to-day work carried out by Government departments—a diverse array of tasks which have (by definition) publicly beneficial objectives, which are pursued according to (often vague) policies. The specific inclusion of economics is due to the minimal dual-use risks relative to the quantitative and scientific reasoning involved.

  • Operations and management: this includes a variety of organisational and interpersonal tasks (e.g. managing teams, logistics, onboarding, negotiation etc). Of the three categories, this is probably associated with the highest risks from dual-use capabilities and rogue AI (due to containing elements of strategic reasoning and persuasion). However, I think this is a promising domain for conducting safety-relevant evaluations. In particular, I think this can be used to assess the precursors of deceptive and power-seeking behaviour.

The latter two categories would involve constructing a small number of fictional organisations, for which internal documents and datasets would be developed. The tasks would involve drawing upon the information provided, while dealing with stakeholders such as team members, suppliers and customers.

Categories /​ tasks that I’m unsure about:

  • Auditing: Although it’s rarely known by those outside the industry, almost any high-risk activity is associated with an auditing process (e.g. professional & technical services, healthcare, construction etc). Tasks of the form “here is the rule book, assess whether XYZ meets the relevant standard/​threshold” provide a huge amount of surface area for investigating how multi-modal systems make judgement calls across diverse domains. However, this category is probably too mundane to be of significant interest, and it seems unlikely to yield impactful insights into AI safety.

  • Education: although this would probably be of interest to the public and a welcomed addition to the included categories, I think its value as a measure of general intelligence or as an indicator of AI safety risk is limited. There’s also an element of dual-use risk (i.e. having AI instruct on how to undertake malevolent activities) - this is a risk I’m personally skeptical of, but it seems to have become a relatively high-priority issue to some.


Examples of high-risk categories:

  • Science & engineering: This is a glaring omission (and will probably draw skepticism from many parties), but there’s a strong argument that automated R&D is a significant contributor to systemic/​structural risks from advanced AI. The canonical example is military technology & capabilities; currently there is an established equilibrium (mutually-assured destruction) which would have devastating consequences if it was disrupted.

  • Forensics /​ criminal investigation: although this is arguably quite a beneficial use-case, it doesn’t take a lot of imagination to see how this could become a contributor to systemic risk factors, in addition to having dual-use applications.

Operational factors:

There is an obvious caveat when discussing this proposal: extremely careful implementation would be required to achieve the desired effect. I think this is probably the main argument against even attempting it. However, I think the basic ingredients for success are quite clear, and if they can be accessed, there’s a very good chance of making this work. Some of these include:

  • Significant philanthropic and Government financial support.

  • Partnerships with reputable organisations that do not have direct associations with participating teams (e.g. the UK’s AI Safety Institute is a premier example, and NGOs like the Centre for Governance of AI, Centre for AI Safety and Future of Life Institute could also make important contributions).

  • Assistance from eval-focused orgs (e.g. Harmony Intelligence, Apollo Research, ARC)

  • Public endorsement and collaboration with (ideally all three) Turing Award winning deep learning scholars

  • Direct communication and coordination with AGI labs*

*This last point is especially tricky. An important factor here is to pitch it at a level where industry players can trust that this is a well-intentioned investigation into AI capabilities and safety. It would need to be made clear that they will have no influence over what the content and objectives of the tasks are, but a consensus can be reached on what mediums are to be involved (e.g. how the AI will interact with the environment, via what modalities, etc).

Concluding comment: if the merit of this proposal is agreed upon, it seems reasonable to target the first Olympiad for early 2026. London seems sensible as a choice for the debut location, as it increases the chance of gaining buy-in from the UK Government (which I think is potentially a core ingredient for launching this successfully).
Personal comment: I’m not currently in a position to push this forward. However, if you or a close connection happen to be in a better position, feel free to move on this without consulting me (although I’d obviously be excited to contribute if there is capacity to do so). Likewise, if anyone is curious/​interested in the idea and would like to discuss it, feel free to reach out.