Announcing the Winners of the 2023 Open Philanthropy AI Worldviews Contest
Introduction
In March 2023, we launched the Open Philanthropy AI Worldviews Contest. The goal of the contest was to surface novel considerations that could affect our views on the timeline to transformative AI and the level of catastrophic risk that transformative AI systems could pose. We received 135 submissions. Today we are excited to share the winners of the contest.
But first: We continue to be interested in challenges to the worldview that informs our AI-related grantmaking. To that end, we are awarding a separate $75,000 prize to the Forecasting Research Institute (FRI) for their recently published writeup of the 2022 Existential Risk Persuasion Tournament (XPT).[1] This award falls outside the confines of the AI Worldviews Contest, but the recognition is motivated by the same principles that motivated the contest. We believe that the results from the XPT constitute the best recent challenge to our AI worldview.
FRI Prize ($75k)
Existential Risk Persuasion Tournament by the Forecasting Research Institute
AI Worldviews Contest Winners
First Prizes ($50k)
AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years by Basil Halperin, Zachary Mazlish, and Trevor Chow
Evolution provides no evidence for the sharp left turn by Quintin Pope (see the LessWrong version to view comments)
Second Prizes ($37.5k)
Deceptive Alignment is <1% Likely by Default by David Wheaton (see the LessWrong version to view comments)
AGI Catastrophe and Takeover: Some Reference Class-Based Priors by Zach Freitas-Groff
Third Prizes ($25k)
Imitation Learning is Probably Existentially Safe by Michael Cohen[2]
‘Dissolving’ AI Risk – Parameter Uncertainty in AI Future Forecasting by Alex Bates
Caveats on the Winning Entries
The judges do not endorse every argument and conclusion in the winning entries. Most of the winning entries argue for multiple claims, and in many instances the judges found some of the arguments much more compelling than others. In some cases, the judges liked that an entry crisply argued for a conclusion the judges did not agree with—the clear articulation of an argument makes it easier for others to engage. One does not need to find a piece wholly persuasive to believe that it usefully contributes to the collective debate about AI timelines or the threat that advanced AI systems might pose.
Submissions were many and varied. We can easily imagine a different panel of judges reasonably selecting a different set of winners. There are many different types of research that are valuable, and the winning entries should not be interpreted to represent Open Philanthropy’s settled institutional tastes on what research directions are most promising (i.e., we don’t want other researchers to overanchor on these pieces as the best topics to explore further).
- ^
We did not provide any funding specifically for the XPT, which ran from June 2022 through October 2022. In December 2022, we recommended two grants totaling $6.3M over three years to support FRI’s future research.
- ^
The link above goes to the version Michael submitted; he’s also written an updated version with coauthor Marcus Hutter.
- Announcing the Open Philanthropy AI Worldviews Contest by 10 Mar 2023 2:33 UTC; 137 points) (
- AI #32: Lie Detector by 5 Oct 2023 13:50 UTC; 45 points) (LessWrong;
- 4 Apr 2024 19:24 UTC; 9 points) 's comment on Where are you donating this year, and why? (Open thread) by (
Very curious to hear how Open Philanthropy has updated as a result of this competition.
Looking at the winning entries would seem to suggest that Open Philanthropy is likely to now be less worried about these risks, but it would be interesting to know by how much.
Hi Chris,
Thanks for your question. Two quick points:
(1) I wouldn’t model Open Phil as having a single view on these sorts of questions. There’s a healthy diversity of opinions, and as stated in the “caveats” section, I think different Open Phil employees might have chosen different winners.
(2) Even for the subset of Open Phil employees who served as judges, I wouldn’t interpret these entries as collectively moving our views a ton. We were looking for the best challenges to our AI worldview in this contest, and as such I don’t think it should be too surprising that the winning entries are more skeptical of AI risks than we are.
Hi Jason, thank you for giving a quick response. Both points are very reasonable.
The contest announcement post outlined “several ways an essay could substantively inform the thinking of a panelist”, namely, changing the central estimate or shape of the probability distribution of AGI / AGI catastrophe, or clarifying a concept or identifying a crux.
It would be very interesting to hear if any of the submissions did change any of the panelists’ (or other Open Phil employees’) mind in these ways, and how so. If not, whether because you learned an unanticipated kind of a thing, or because the contest turned out to be less useful than you initially hoped, I think that might also be very valuable for the community to know.
Thanks!
It would be great if you could put words to this effect—or state your actual current views on AI x-risk—right up front in your winners announcement, because to me (and no doubt many others) it basically looks like OpenPhil are updating away from the problem being urgent right at the point where we’ve hit crunch time and it couldn’t be more urgent! I’m really quite upset about this.
I also got that feeling. I do assume this is just unfortunate optics and they mostly wanted to reward the winners for making good and original arguments, but it would be good at least to state how their views have been influenced, and what the particular arguments of each winning essay were the most relevant for their decision.
Unfortunate optics confirmed. But would still be good to get an update on:
[from the contest announcement]
Congratulations to the winners.
My question is, now that we have winners, how do we make the best use of this opportunity? What further actions would help people think better about such questions, either those at OP or otherwise?
This is complicated by the caveats. We don’t know which of the points made are the ones the judges found to be interesting or useful, or which ones merely crystalized disagreements, and which ones were mostly rejected.
As is, my expectation is that the authors (of both the winners and other entries) put a lot of effort into this, and the judges a lot of effort into evaluations, and almost no one will know what to do with all of that and the opportunity will by default be wasted.
So if there’s opportunity here, what is it? For me, or for others?
As another commentor notes, at the time I offered a rebuttal to the interest rates post, that I would still stand by almost verbatim, and I’m confused why this post was still judged so highly—or what we should be paying attention to there, beyond the one line ‘the market’s interest rates do not reflect the possibility of transformative AI.’
I will refrain from commenting on the others since I haven’t given them a proper reading yet (or if I did, I don’t remember it).
As a potential experiment/aid here, I created Manifold markets (1,2,3,4,5,6) on whether my review of these six would be retrospectively considered by me a good use of time.
I was also surprised by how highly the EMH post was received, for a completely different reason – the fact that markets aren’t expecting AGI in the next few decades seems unbelievably obvious, even before we look at interest rates. If markets were expecting AGI, AI stocks would presumably be (much more, at least compared to non-AI stocks) to the moon than they are now, and market analysts would presumably (at least occasionally) cite the possibility of AGI as the reason why. But we weren’t seeing any of that, and we already knew from just general observation of the zeitgeist that, until a few months ago, the prospect of AGI was overwhelmingly not taken seriously outside of a few niche sub-communities and AI labs (how to address this reality has been a consistent, well-known hurdle within the AI safety community).
So I’m a little confused at what exactly judges thought was the value provided by the post – did they previously suspect that markets were taking AGI seriously, and this post significantly updated them towards thinking markets weren’t? Maybe instead judges thought that the post was valuable for some other reason unrelated to the main claim of “either reject EMH or reject AGI in the next few decades”, in which case I’d be curious to hear about what that reason is (e.g., if the post causes OP to borrow a bunch of money, that would be interesting to know).
Granted, it’s an interesting analysis, but that seems like a different question, and many of the other entries (including both those that did and didn’t win prizes) strike me as having advanced the discourse more, at least if we’re focusing on the main claims.
I think it’s important to verify theories that seem obvious by thinking about precise predictions the theories make. The AI and EMH post attempts to analyze precise predictions made by the theory that “the market doesn’t expect TAI soon”, and for that reason I think the post makes a valuable contribution.
That said, it is still unclear to me whether interest rates will actually rise as investors realize the potential for TAI. If news of TAI causes investors to become more optimistic about investment, potentially because of the promise of higher lifespans, or the fear of missing out on extreme relative wealth etc., that could easily cause a shift in the supply curve for loanable funds to the right, lowering the interest rate. This makes one of the central predictions in the post unreliable IMO, and that undermines the post’s thesis.
So, overall I agree with you that the market is not currently pricing in TAI, and like you I believe that for ordinary informal reasons, such as the fact that investors rarely talk about explosive growth in public. The post itself, however, while interesting, didn’t move my credences as much as the informal evidence.
I found your argument more interesting than the other “rebuttals.” Halperin et.al’s core argument is that there’s a disjunction between EMH on AI and soonish TAI, and suggests this as evidence against soonish TAI.
The other rebuttals gave evidence for one fork in this disjunction (that EMH does not apply to AI), but your argument, if correct, suggests that the disjunction might not be there in the first place.
Zvi—FWIW, your refutation of the winning essay on AI, interest rates, and the efficient market hypothesis (EMH) seemed very compelling, and I’m surprised that essay was taken seriously by the judges.
Global capital markets don’t even seem to have any idea how to value crypto protocols that might be moderately disruptive to fiat currencies and traditional finance institutions. Some traders think about these assets (or securities, or commodities, or whatever the SEC thinks they are, this week), but most don’t pay any attention to them. And even if most traders thought hard about crypto, there’s so much regulatory uncertainty about how they’ll end up being handled that it’s not even clear how traders could ‘price in’ issues such as how soon Gary Gensler will be replaced at the SEC.
Artificial Superintelligence seems vastly more disruptive than crypto, and much less salient (at least until this year) to most asset managers, bankers, traders, regulators, etc.
Yeah, I think the interest rates post is all but entirely refuted immediately by the obvious argument (the one in your rebuttal) and it both confuses me and slightly irritates me to see it regarded so highly.
If anything I think a rebuttal should win the contest, because it would present what I guess is a nontrivial observation about how information-free the yield curve is in this context, and so serves as a correction to people who frankly aren’t trading-literate who would spend energy trying to infer things about TAI timelines from securities prices
“Refuted” feels overly strong to me. The essay says that market participants don’t think TAGI is coming, and those market participants have strong financial incentive to be correct, which feels unambiguously correct to me. So either TAGI isn’t coming soon, or else a lot of people with a lot of money on the line are wrong. They might well be wrong, but their stance is certainly some form of evidence, and evidence in the direction of no TAGI. Certainly the evidence isn’t bulletproof, condsidering the recent mispricings of NVIDIA and other semi stocks.
In my own essay, I elaborated on the same point using prices set by more-informed insiders: e.g., valuations and hiring by Anthropic/DeepMind/etc., which also seem to imply that TAGI isn’t coming soon. If they have a 10% chance of capturing 10% of the value for 10 years of doubling the world economy, that’s like $10T. And yet investment expenditures and hiring and valuations are nowhere near that scale. The fact that Google has more people working on ads than TAGI implies that they think TAGI is far off. (Or, more accurately, that marginal investments would not accelerate TAGI timelines or market share.)
It did not occur to me to enter the rebuttal into the contest, I doubt anyone else entered one either. In hindsight, given the original entry won, I regret not doing so.
Congrats to the winners, readers, and writers!
Two big surprises for me:
(1) It seems like 5⁄6 of the essays are about AI risk, and not TAGI by 2043. I thought there were going to be 3 winners on each topic, but perhaps that was never stated in the rules. Rereading, it just says there would be two 1st places, two 2nd places, and two 3rd places. Seems the judges were more interested in (or persuaded by) arguments on AI safety & alignment, rather than TAGI within 20 years. A bit disappointing for everyone who wrote on the second topic. If the judges were more interested in safety & alignment forecasting, that would have been nice to know ahead of time.
(2) I’m also surprised that the Dissolving AI Risk paper was chosen. (No disrespect intended; it was clearly a thoughtful piece.)
To me, it makes perfect sense to dissolve the Fermi paradox by pointing out that the expected # of alien civilizations is a very different quantity than the probability of 0 alien civilizations. It’s logically possible to have both a high expectation and a high probability of 0.
But it makes almost no sense to me to dissolve probabilities by factoring them into probabilities of probabilities, and then take the geometric mean of that distribution. Taking the geometric mean of subprobabilities feels like a sleight of hand to end up with a lower number than what you started with, with zero new information added in the process. I feel like I must have missed the main point, so I’ll reread the paper.
Edit: After re-reading, it makes more sense to me. The paper takes the geometric means of odds ratios in order to aggregate survey entries. It doesn’t take the geometric mean of probabilities, and it doesn’t slice up probabilities arbitrarily (as they are the distribution over surveyed forecasters).
Edit2: As Jaime says below, the greater error is assuming independence of each stage. The original discussion got quite nerd-sniped by the geometric averaging, which is a bit of a shame, as there’s a lot more to the piece to discuss and debate.
(I agree that geometric-mean-of-odds is an irrelevant statistic and ‘Dissolving’ AI Risk’s headline number should be the mean-of-probabilities, 9.7%. I think some commenters noticed that too.)
Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I’m realizing I don’t understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom… then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?
Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.
Yeah, I agree; I think the geometric mean is degenerate unless your probability distribution quickly approaches density-0 around 0% and 100%. This is an intuition pump for why the geometric mean is the wrong statistic.
Also if you’re taking the geometric mean I think you should take it of the odds ratio (as the author does) rather than the probability; e.g. this makes probability-0 symmetric with probability-1.
[To be clear I haven’t read most of the post.]
I have grips with the methodology of the article, but I don’t think highlighting the geometric mean of odds over the mean of probabilities is a major fault. The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence).
The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks is flawed:
There is an equally compelling pump the other way around: the arithmetic mean of probabilities defers unduly to people assigning a high chance. A single dissenter between 10 experts can bound the lower bound of the probability to their preferred up to a factor of 10.
And surely if anyone is assigning a zero percent chance to something, you can safely assume they are not taking the situation seriously and ignore them.
In ultimate instance, we can theorize all we want, but as a matter of fact the best performance when predicting complex events is achieved when taking the geometric mean of odds, both in terms of logloss and brier scores. Without more compelling evidence or a very clear theoretical reason that distinguishes between the contexts, it seems weird to argue that we should treat AI risk differently.
And if you are still worried about dissenters skewing the predictions, one common strategy is to winsorize, by clipping the predictions among the 5% and 95% percentile for example.
Congratulations to the winners.
I haven’t engaged deeply with any of the winning posts like the judges have, but I engaged shallowly with 3–4 when they were written. I thought they were methodologically doomed (‘Dissolving’ AI Risk) or constituted very weak evidence even if they were basically right (AGI and the EMH and especially Reference Class-Based Priors). (I apologize for this criticism-without-justification, but explaining details is not worth it and probably the comments on those posts do a fine job.)
Normally I wouldn’t say this. But OP is high-status and I worry people will defer to its judgment about Good Posts and then defer to these posts (despite the disclaimer that the judges disagreed with some) or add them to reading lists or something, which I think would be mostly bad.
(To avoid being purely negative: I think excellent quantitative work on AI forecasting that directly addresses the big question—when will powerful AI appear—includes bioanchors, the Davidson takeoff report, and Fun with +12 OOMs of Compute.)
[edited: last sentence for explicitness of my point]
I think this worry should be more a critique of the EA community writ-large for being overly deferential than for OP holding a contest to elicit critiques of its views and then following through with that in their own admittedly subjective criteria. OP themselves note in the post that people shouldn’t take this to be OP’s institutional tastes.
No, I don’t have a take on deference in EA. I meant: post contests generally give you evidence about which posts to pay attention to, especially if they’re run by OP. I am sharing that I have reason to believe that (some of) these winners are less worth-paying-attention-to than you’d expect on priors.
(And normally this reason would be very weak because the judges engaged much more deeply than I did, but my concerns with the posts I engaged with seem unlikely to dissolve upon deeper engagement.)
Congrats to the winners! It’s interesting to see how surprised people are. Of these six, I think only David Wheaton on deceptive alignment was really on my radar. Some other highlights that didn’t get much discussion:
Marius Hobbhahn’s Disagreements with Bio Anchors that lead to shorter timelines (1 comment)
In the author’s words: ‘I, therefore, think of the following post as “if bio anchors influence your timelines, then you should really consider these arguments and, as a consequence, put more weight on short timelines if you agree with them”. I think there are important considerations that are hard to model with bio anchors and therefore also added my personal timelines in the table below for reference.’
Even though Bio Anchors doesn’t particularly influence my timelines, I find Hobbhahn’s thoughtful and systematic engagement worthwhile.
Kiel Brennan-Marquez’s Cooperation, Avoidance, and Indifference: Alternate Futures for Misaligned AGI (1 comment)
Maybe this question is seen as elementary or settled by “you are made of atoms”, but even so, I think other equilibria could be better explored. This essay is clear and concise, has some novel (to me) points, and could serve as a signpost for further exploration.
Matt Beard’s AI Doom and David Hume: A Defence of Empiricism in AI Safety (6 comments)
On reasoning described by Tom Davidson: “This is significantly closer to Descartes’ meditative contemplation than Hume’s empiricist critique of the limits of reason. Davidson literally describes someone thinking in isolation based on limited data. The assumption is that knowledge of future AI capabilities can be usefully derived through reason, which I think we should challenge.”
Beard doesn’t mean to pick on Davidson, but I really think his methods deserve more skepticism. Even before specific critiques, I’m generally pessimistic about how informative models like Davidson’s can be. I was also very surprised by some of his comments on the 80,000 Hours podcast (including those highlighted by Beard). Otherwise, Beard’s recommendations are pretty vague but agreeable.
Jason—thanks for the news about the winning essays.
If appropriate, I would appreciate any reactions the judges had to my essay about a moral backlash against the AI industry slowing progress towards AGI. I’m working on refining the argument, so any feedback would be useful (even if only communicated privately, e.g. by email).
Ditto re my entries.
We’re honored to have received a prize for our work on the Existential Risk Persuasion Tournament (XPT)!
We’re in the process of extending this work and better connecting forecasts to policy decisionmaking. If you’re interested in working on this and our other projects, we wanted to mention that we’re currently hiring for full-time Research Analysts and Data Analysts. We’re also hiring for part-time Research Assistants.
Facilitating useful engagement seems like a fine judging criterion, but was there any engagement or rebuttal to the winning pieces that the judges found particularly compelling? It seems worth mentioning such commentary if so.
Neither of the two winning pieces significantly updated my own views, and (to my eye) look sufficiently rebutted that observers taking a more outside view might similarly be hesitant to update about any AI x-risk claims without taking the commentary into account.
On the EMH piece, I think Zvi’s post is a good rebuttal on its own and a good summary of some other rebuttals.
On the Evolution piece, lots of the top LW comments raise good points. My own view is that the piece is a decent argument that AI systems produced by current training methods are unlikely to undergo a SLT. But the actual SLT argument applies to systems in the human-level regime and above; current training methods do not result in systems anywhere near human-level in the relevant sense. So even if true, the claim that current methods are dis-analogous to evolution isn’t directly relevant to the x-risk question, unless you already accept that current methods and trends related to below-human level AI will scale to human-level AI and beyond in predictable ways. But that’s exactly what the actual SLT argument is intended to argue against!
Speaking as the author of Evolution provides no evidence for the sharp left turn, I find your reaction confusing because the entire point of the piece is to consider rapid capabilities gains from sources other than SGD. Specifically, it consists of two parts:
Argues that human evolution provides no evidence for spikiness in AI capabilities gains, because the human spike in capabilities was due to human evolution-specific details which do not appear in the current AI paradigm (or plausible future paradigms).
Considers two scenarios for AI-specific sudden capabilities gains (neither due to SGD directly, and both of which would likely involve human or higher levels of AI capabilities), and argues that they’re manageable from an alignment perspective.
On the first point, my objection is that the human regime is special (because human-level systems are capable of self-reflection, deception, etc.) regardless of which methods ultimately produce systems in that regime, or how “spiky” they are.
A small, relatively gradual jump in the human-level regime is plausibly more than enough to enable an AI to outsmart / hide / deceive humans, via e.g. a few key insights gleaned from reading a corpus of neuroscience, psychology, and computer security papers, over the course of a few hours of wall clock time.
The second point is exactly what I’m saying is unsupported, unless you already accept the SLT argument as untrue. You say in the post you don’t expect catastrophic interference between current alignment methods, but you don’t consider that a human-level AI will be capable of reflecting on those methods (and their actual implementation, which might be buggy).
Similarly, elsewhere in the piece you say:
And
But again, the actual SLT argument is not about “extreme sharpness” in capability gains. It’s an argument which applies to the human-level regime and above, so we can’t already be past it no matter what frame you use. The version of the SLT argument you argue against is a strawman, which is what my original LW comment was pointing out.
I think readers can see this for themselves if they just re-read the SLT post carefully, particularly footnotes 3-5, and then re-read the parts of your post where you talk about it.
[edit: I also responded further on LW here.]
On advice of Manifold I read Quintin’s entry and wrote up a response on LessWrong. I would be curious to hear from OpenPhil and others if and how they found this useful at all. Unless I get unexpectedly good feedback on that, I do not anticipate engaging with the other four.