Update: I, Nick Beckstead, no longer work at the Future Fund am writing this update purely in a personal capacity. Since the Future Fund team has resigned and FTX has filed for bankruptcy, it now seems very unlikely that these prizes will be paid out. I’m very sad about the disruption that this may cause to contest participants.
I would encourage participants who were working on entries for this prize competition to save their work and submit it to Open Philanthropy’s own AI Worldview Contest in 2023.
Today we are announcing a competition with prizes ranging from $15k to $1.5M for work that informs the Future Fund’s fundamental assumptions about the future of AI, or is informative to a panel of superforecaster judges selected by Good Judgment Inc. These prizes will be open for three months—until Dec 23—after which we may change or discontinue them at our discretion. We have two reasons for launching these prizes.
First, we hope to expose our assumptions about the future of AI to intense external scrutiny and improve them. We think artificial intelligence (AI) is the development most likely to dramatically alter the trajectory of humanity this century, and it is consequently one of our top funding priorities. Yet our philanthropic interest in AI is fundamentally dependent on a number of very difficult judgment calls, which we think have been inadequately scrutinized by others.
As a result, we think it’s really possible that:
all of this AI stuff is a misguided sideshow,
we should be even more focused on AI, or
a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem.
If any of those three options is right—and we strongly suspect at least one of them is—we want to learn about it as quickly as possible because it would change how we allocate hundreds of millions of dollars (or more) and help us better serve our mission of improving humanity’s longterm prospects.
Second, we are aiming to do bold and decisive tests of prize-based philanthropy, as part of our more general aim of testing highly scalable approaches to funding. We think these prizes contribute to that work. If these prizes work, it will be a large update in favor of this approach being capable of surfacing valuable knowledge that could affect our prioritization. If they don’t work, that could be an update against this approach surfacing such knowledge (depending how it plays out).
The rest of this post will:
Explain the beliefs that, if altered, would dramatically affect our approach to grantmaking
Describe the conditions under which our prizes will pay out
Describe in basic terms how we arrived at our beliefs and cover other clarifications
Prize conditions
On our areas of interest page, we introduce our core concerns about AI as follows:
We think artificial intelligence (AI) is the development most likely to dramatically alter the trajectory of humanity this century. AI is already posing serious challenges: transparency, interpretability, algorithmic bias, and robustness, to name just a few. Before too long, advanced AI could automate the process of scientific and technological discovery, leading to economic growth rates well over 10% per year (see Aghion et al 2017, this post, and Davidson 2021).
As a result, our world could soon look radically different. With the help of advanced AI, we could make enormous progress toward ending global poverty, animal suffering, early death and debilitating disease. But two formidable new problems for humanity could also arise:
Loss of control to AI systems
Advanced AI systems might acquire undesirable objectives and pursue power in unintended ways, causing humans to lose all or most of their influence over the future.Concentration of power
Actors with an edge in advanced AI technology could acquire massive power and influence; if they misuse this technology, they could inflict lasting damage on humanity’s long-term future.For more on these problems, we recommend Holden Karnofsky’s “Most Important Century,” Nick Bostrom’s Superintelligence, and Joseph Carlsmith’s “Is power-seeking AI an existential risk?”.
Here is a table identifying various questions about these scenarios that we believe are central, our current position on the question (for the sake of concreteness), and alternative positions that would significantly alter the Future Fund’s thinking about the future of AI[1][2]:
Proposition | Current position | Lower prize threshold | Upper prize threshold |
“P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI | 15% | 7% | 35% |
AGI will be developed by January 1, 2043 | 20% | 10% | 45% |
AGI will be developed by January 1, 2100 | 60% | 30% | N/A |
Future Fund will award a prize of $500k to anyone that publishes analysis that moves these probabilities to the lower or upper prize threshold.[3] To qualify, please please publish your work (or publish a post linking to it) on the Effective Altruism Forum, the AI Alignment Forum, or LessWrong with a “Future Fund worldview prize” tag. You can also participate in the contest by publishing your submission somewhere else (e.g. arXiv or your blog) and filling out this submission form. We will then linkpost/crosspost to your submission on the EA Forum.
We will award larger prizes for larger changes to these probabilities, as follows:
$1.5M for moving “P(misalignment x-risk|AGI)” below 3% or above 75%
$1.5M for moving “AGI will be developed by January 1, 2043” below 3% or above 75%
We will award prizes of intermediate size for intermediate updates at our discretion.
We are also offering:
A $200k prize for publishing any significant original analysis[4] which we consider the new canonical reference on any one of the above questions, even if it does not move our current position beyond a relevant threshold. Past works that would have qualified for this prize include: Yudkowsky 2008, Superintelligence, Cotra 2020, Carlsmith 2021, and Karnofsky’s Most Important Century series. (While the above sources are lengthy, we’d prefer to offer a prize for a brief but persuasive argument.)
A $200k prize for publishing any analysis which we consider the canonical critique of the current position highlighted above on any of the above questions, even if it does not move our position beyond a relevant threshold. Past works that might have qualified for this prize include: Hanson 2011, Karnofsky 2012, and Garfinkel 2021.
At a minimum, we will award $50k to the three published analyses that most inform the Future Fund’s overall perspective on these issues, and $15k for the next 3-10 most promising contributions to the prize competition. (I.e., we will award a minimum of 6 prizes. If some of the larger prizes are claimed, we may accordingly award fewer of these prizes.)
As a check/balance on our reasonableness as judges, a panel of superforecaster judges will independently review a subset of highly upvoted/nominated contest entries with the aim of identifying any contestant who did not receive a prize, but would have if the superforecasters were running the contest themselves (e.g., an entrant that sufficiently shifted the superforecasters’ credences).
For the $500k-$1.5M prizes, if the superforecasters think an entrant deserved a prize but we didn’t award one, we will award $200k (or more) for up to one entrant in each category (existential risk conditional on AGI by 2070, AGI by 2043, AGI by 2100), upon recommendation of the superforecaster judge panel.
For the $15k-200k prizes, if the superforecasters think an entrant deserved a prize but we didn’t award one, we will award additional prizes upon recommendation of the superforecaster judge panel.
The superforecaster judges will be selected by Good Judgment Inc. and will render their verdicts autonomously. While superforecasters have only been demonstrated to have superior prediction track records for shorter-term events, we think of them as a lay jury of smart, calibrated, impartial people.
Our hope is that potential applicants who are confident in the strength of their arguments, but skeptical of our ability to judge impartially, will nonetheless believe that the superforecaster jury will plausibly judge their arguments fairly. After all, entrants could reasonably doubt that people who have spent tens of millions of dollars funding this area would be willing to acknowledge it if that turned out to be a mistake.
Details and fine print
Only original work published after our prize is announced is eligible to win.
We do not plan to read everything written with the aim of claiming these prizes. We plan to rely in part on the judgment of other researchers and people we trust when deciding what to seriously engage with. We also do not plan to explain in individual cases why we did or did not engage seriously.
If you have questions about the prizes, please ask them as comments on this post. We do not plan to respond to individual questions over email.
All prizes will be awarded at the final discretion of the Future Fund. Our published decisions will be final and not subject to appeal. We also won’t be able to explain in individual cases why we did not offer a prize.
Prizes will be awarded equally to coauthors unless the post indicates some other split. At our discretion, the Future Fund may provide partial credit across different entries if they together trigger a prize condition.
If a single person does research leading to multiple updates, Future Fund may—at its discretion—award the single largest prize for which the analysis is eligible (rather than the sum of all such prizes).
We will not offer awards to any analysis that we believe was net negative to publish due to information hazards, even if it moves our probabilities significantly and is otherwise excellent.
At most one prize will be awarded for each of the largest prize categories ($500k and $1.5M). (If e.g. two works convince us to assign < 3% subjective probability in AGI being developed in the next 20 years, we’ll award the prize to the most convincing piece (or split in case of a tie).)
For the first two weeks after it is announced—until October 7—the rules and conditions of the prize competition may be changed at the discretion of the Future Fund. After that, we reserve the right to clarify the conditions of the prizes wherever they are unclear or have wacky unintended results.
Information hazards
Please be careful not to publish information that would be net harmful to publish. We think people should not publish very concrete proposals for how to build AGI (if they know of them), and or things that are too close to that.
If you are worried publishing your analysis would be net harmful due to information hazards, we encourage you to a) write your draft and then b) ask about this using the “REQUEST FEEDBACK” feature on the Effective Altruism forum or LessWrong pages (appears on the draft post page, just before you would normally publish a post):
The moderators have agreed to help with this.
If you feel strongly that your analysis should not be made public due to information hazards, you may submit your prize entry through this form.
Some clarifications and answers to anticipated questions
What do you mean by AGI?
Imagine a world where cheap AI systems are fully substitutable for human labor. E.g., for any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less. This includes entirely AI-run companies, with AI managers and AI workers and everything being done by AIs.
How large of an economic transformation would follow? Our guess is that it would be pretty large (see Aghion et al 2017, this post, and Davidson 2021), but—to the extent it is relevant—we want people competing for this prize to make whatever assumptions seem right to them.
For purposes of our definitions, we’ll count it as AGI being developed if there are AI systems that power a comparably profound transformation (in economic terms or otherwise) as would be achieved in such a world. Some caveats/clarifications worth noticing:
A comparably large economic transformation could be achieved even if the AI systems couldn’t substitute for literally 100% of jobs, including providing emotional support. E.g., Karnofsky’s notion of PASTA would probably count (though that is an empirical question), and possibly some other things would count as well.
If weird enough things happened, the metric of GWP might stop being indicative in the way it normally is, so we want to make sure people are thinking about the overall level of weirdness rather than being attached to a specific measure or observation. E.g., causing human extinction or drastically limiting humanity’s future potential may not show up as rapid GDP growth, but automatically counts for the purposes of this definition.
Why are you starting with such large prizes?
We really want to get closer to the truth on these issues quickly. Better answers to these questions could prevent us from wasting hundreds of millions of dollars (or more) and years of effort on our part.
We could start with smaller prizes, but we’re interested in running bold and decisive tests of prizes as a philanthropic mechanism.
A further consideration is that sometimes people argue that all of this futurist speculation about AI is really dumb, and that its errors could be readily explained by experts who can’t be bothered to seriously engage with these questions. These prizes will hopefully test whether this theory is true.
Can you say more about why you hold the views that you do on these issues, and what might move you?
I (Nick Beckstead) will answer these questions on my own behalf without speaking for the Future Fund as a whole.
For “Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI.” I am pretty sympathetic to the analysis of Joe Carlsmith here. I think Joe’s estimates of the relevant probabilities are pretty reasonable (though the bottom line is perhaps somewhat low) and if someone convinced me that the probabilities on the premises in his argument should be much higher or lower I’d probably update. There are a number of reviews of Joe Carlsmith’s work that were helpful to varying degrees but would not have won large prizes in this competition.
For assigning odds to AGI being developed in the next 20 years, I am blending a number of intuitive models to arrive at this estimate. They are mostly driven by a few high-level considerations:
I think computers will eventually be able to do things brains can do. I’ve believed this for a long time, but if I were going to point to one article as a reference point I’d choose Carlsmith 2020.
Priors that seem natural to me (“beta-geometric distributions”) start us out with non-trivial probability of developing AGI in the next 20 years, before considering more detailed models. I’ve also believed this for a long time, but I think Davidson 2021′s version is the best, and he gives 8% to AGI by 2036 through this method as a central estimate.
I assign substantial probability to continued hardware progress, algorithmic progress, and other progress that fuels AGI development over the coming decades. I’m less sure this will continue many decades into the future, so I assign somewhat more probability to AGI in sooner decades than later decades.
Under these conditions, I think we’ll pass some limits—e.g. approaching hardware that’s getting close to as good as we’re ever going to get—and develop AGI if we’re ever going to develop it.
I’m extremely uncertain about the hardware requirements for AGI (at the point where it’s actually developed by humans), to a point where my position is roughly “I dunno, log uniform distribution over anything from the amount of compute used by the brain to a few orders of magnitude less than evolution.” Cotra 2020—which considers this question much more deeply—has a similar bottom line on this. (Though her updated timelines are shorter.)
I’m impressed by the progress in deep learning to the point where I don’t think we can rule out AGI even in the next 5-10 years, but I’m not impressed enough by any positive argument for such short timelines to move dramatically away from any of the above models..
(I’m heavily citing reports from Open Philanthropy here because a) I think they did great work and b) I’m familiar with it. I also recommend this piece by Holden Karnofsky, which brings a lot of this work—and other work—together.)
In short, you can roughly model me as having roughly trapezoidal probability density function over developing AGI from now to 2100, with some long tail extending beyond that point. There is about 2x as much weight at the beginning of the distribution as there is at the end of the century. The long tail includes a) insufficient data/hardware/humans not smart enough to solve it yet, b) technological stagnation/hardware stagnation, and c) reasons it’s hard that I haven’t thought of. The microfoundation of the probability density function could be: a) exponentially increasing inputs to AGI, b) log returns to AGI development on the key inputs, c) pricing in some expected slowdown in the exponentially increasing inputs over time, and d) slow updating toward increased difficulty of the problem as the time goes on, but I stand by the distribution more than the microfoundation.
What do you think could substantially alter your views on these issues?
We don’t know. Most of all we’d just like to see good arguments for specific quantitative answers to the stated questions. Some other thoughts:
We like it when people state cleanly summarizable, deductively valid arguments and carefully investigate the premises leading to the conclusion (analytic philosopher style). See e.g. Carlsmith 2021.
We also like it when people quantify their subjective probabilities explicitly. See e.g. Superforecasting by Phil Tetlock.
We like a lot of the features described here by Luke Muehlhauser, though they are not necessary to be persuasive.
We like it when people represent opposing points of view charitably, and avoid appeals to authority.
We think it could be pretty persuasive to us if some (potentially small) group of relevant technical experts arrived at and explained quite different conclusions. It would be more likely to be persuasive if they showed signs of comfort thinking in terms of subjective probability and calibration. Ideally they would clearly explain the errors in the best arguments cited in this post.
These are suggestions for how to be more likely to win the prize, but not requirements or guarantees.
Who do we have to convince in order to claim the prize?
Final decisions will be made at the discretion of the Future Fund. We plan to rely in part on the judgment of other researchers and people we trust when deciding what to seriously engage with. Probably, someone winning a large prize looks like them publishing their arguments, those arguments getting a lot of positive attention / being flagged to us by people we trust, us seriously engaging with those arguments (probably including talking to the authors), and then changing our minds.
Are these statistically significant probabilities grounded in detailed published models that are confirmed by strong empirical regularities that you’re really confident in?
No. They are what we would consider fair betting odds.
This is a consequence of the conception of subjective probability that we are working with. As stated above in a footnote: “We will pose many of these beliefs in terms of subjective probabilities, which represent betting odds that we consider fair in the sense that we’d be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them.” For more on this conception of probability I recommend The Logic of Decision by Richard Jeffrey or this SEP entry.
Applicants need not agree with or use our same conception of probability, but hopefully these paragraphs help them understand where we are coming from.
Why do the prizes only get awarded for large probability changes?
We think that large probability changes would have much clearer consequences for our work, and be much easier to recognize. We also think that aiming for changes of this size is less common and has higher expected upside, so we want to attract attention to it.
Why is the Future Fund judging this prize competition itself?
Our intent in judging the prize ourselves is not to suggest that our judgments should be treated as correct / authoritative by others. Instead, we’re focused on our own probabilities because we think that is what will help us to learn as much as possible.
Additional terms and conditions
Employees of FTX Foundation and contest organizers are not eligible to win prizes.
Entrants and Winners must be over the age of 18, or have parental consent.
By entering the contest, entrants agree to the Terms & Conditions.
All taxes are the responsibility of the winners.
The legality of accepting the prize in his or her country is the responsibility of the winners. Sponsor may confirm the legality of sending prize money to winners who are residents of countries outside of the United States.
Winners will be notified in a future blogpost.
Winners grant to Sponsor the right to use their name and likeness for any purpose arising out of or related to the contest. Winners also grant to Sponsor a non-exclusive royalty-free license to reprint, publish and/or use the entry for any purpose arising out of related to the contest including linking to or re-publishing the work.
Entrants warrant that they are eligible to receive the prize money from any relevant employer or from a contract standpoint.
Entrants agree that FTX Philanthropy and its affiliates shall not be liable to entrants for any type of damages that arise out of or are related to the contest and/or the prizes.
By submitting an entry, entrant represents and warrants that, consistent with the terms of the Terms and Conditions: (a) the entry is entrant’s original work; (b) entrant owns any copyright applicable to the entry; (c) the entry does not violate, in whole or in part, any existing copyright, trademark, patent or any other intellectual property right of any other person, organization or entity; (d) entrant has confirmed and is unaware of any contractual obligations entrant has which may be inconsistent with these Terms and Conditions and the rights entrant is required to have in the entry, including but not limited to any prohibitions, obligations or limitations arising from any current or former employment arrangement entrant may have; (e) entrant is not disclosing the confidential, trade secret or proprietary information of any other person or entity, including any obligation entrant may have in connection arising from any current or former employment, without authorization or a license; and (f) entrant has full power and all legal rights to submit an entry in full compliance with these Terms and Conditions.
- ^
We will pose many of these beliefs in terms of subjective probabilities, which represent betting odds that we consider fair in the sense that we’d be roughly indifferent between betting in favor of the relevant propositions at those odds or betting against them.
- ^
For the sake of definiteness, these are Nick Beckstead’s subjective probabilities, and they don’t necessarily represent the Future Fund team as a whole or its funders.
- ^
It might be argued that this makes the prize encourage people to have views different from those presented here. This seems hard to avoid, since we are looking for information that changes our decisions, which requires changing our beliefs. People who hold views similar to ours can, however, win the $200k canonical reference prize.
- ^
A slight update/improvement on something that would have won the prize in the past (e.g. this update by Ajeya Cotra) does not automatically qualify due to being better than the existing canonical reference. Roughly speaking, the update would need to be sufficiently large that the new content would be prize-worthy on its own.
Do you believe some statement of this form?
”FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]”
E.g., if only they had…
Allowed people to publish not on EA Forum / LessWrong / Alignment Forum
Increased the prize schedule to X
Increased the window of the prize to size Y
Advertised the prize using method Z
Chosen the following judges instead
Explained X aspect of their views better
Even better would be a statement of the form:
“I personally would compete in this prize competition, but only if...”
If you think one of these statements or some other is true, please tell me what it is! I’d love to hear your pre-mortems, and fix the things I can (when sufficiently compelling and simple) so that we can learn as much as possible from this competition!
I also think predictions of this form will help with our learning, even if we don’t have time/energy to implement the changes in question.
I don’t have anything great, but the best thing I could come up with was definitely “I feel most stuck because I don’t know what your cruxes are”.
I started writing a case for why I think AI X-Risk is high, but I really didn’t know whether the things I was writing were going to be hitting at your biggest uncertainties. My sense is you probably read most of the same arguments that I have, so our difference in final opinion is probably generated by some other belief that you have that I don’t, and I don’t really know how to address that preemptively.
I might give it a try anyways, and this doesn’t feel like a defeater, but in this space it’s the biggest thing that came to mind.
Thanks! The part of the post that was supposed to be most responsive to this on size of AI x-risk was this:
I think explanations of how Joe’s probabilities should be different would help. Alternatively, an explanation of why some other set of propositions was relevant (with probabilities attached and mapped to a conclusion) could help.
I think it’s kinda weird and unproductive to focus a very large prize on things that would change a single person’s views, rather than be robustly persuasive to many people.
E.g. does this imply that you personally control all funding of the FF? (I assume you don’t, but then it’d make sense to try to convince all FF managers, trustees etc.)
FWIW, I would prefer a post on “what actually drives your probabilities” over a “what are the reasons that you think will be most convincing to others”.
...if they had explained why their views were not moved by the expert reviews OpenPhil has already solicited.
In “AI Timelines: Where the Arguments, and the ‘Experts,’ Stand,” Karnofsky writes:
The footnote text reads, in part:
Many of these reviewers disagree strongly with the reports under review.
Davidson 2021 on semi-informative priors received three reviews.
By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report’s reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).
From Hajek and Strasser’s review:
From Hanson’s review:
From Halpern’s review:
Davidson 2021 on explosive growth received many reviews; I’ll focus on the five reviewers who read the final version.
Two of the reviewers found little to disagree with. These were Leopold Aschenbrenner (a Future Fund researcher) and Ege Erdil (a Metaculus forecaster).
The other three reviewers were academic economists specializing in growth and/or automation. Two of them made strong negative assessments.
From Ben Jones’ review:
From Dietrich Vollrath’s review:
The third economist, Paul Gaggl, agreed with the report about the possibility of high GWP growth but raised doubts as to how long it could be sustained. (How much this matters depends on what question we’re asking; “a few decades” of 30% GWP growth is not a permanent new paradigm, but it is certainly a big “transformation.”)
Reviews of Cotra (2020) on Biological Anchors were mostly less critical than the above.
I expect that some experts would be much more likely to spend time and effort on the contest if
They had clearer evidence that the Future Fund was amendable to persuasion at all.
E.g. examples of somewhat-analogous cases in which a critical review did change the opinion of someone currently at the Future Fund (perhaps before the Future Fund existed).
They were told why the specific critical reviews discussed above did not have significant impact on the Future Fund’s views.
This would help steer them toward critiques likely to make an impact, mitigate the sense that entrants are “shooting in the dark,” and move writing-for-the-contest outside of a reference class where all past attempts have failed.
These considerations seems especially relevant for the “dark matter” experts hypothesized in this post and Karnofsky’s, who “find the whole thing so silly that they’re not bothering to engage.” These people are unusually likely to have a low opinion of the Future Fund’s overall epistemics (point 1), and they are also likely to disagree with the Fund’s reasoning along a relatively large number of axes, so that locating a crux becomes more of a problem (point 2).
Finally: I, personally would be more likely to submit to the contest if I had a clearer sense where the cruxes were, and why past criticisms have failed to stick. (For clarity, I don’t consider myself an “expert” in any relevant sense.)
While I don’t “find the whole thing so silly I don’t bother to engage,” I have relatively strong methodological objections to some of the OpenPhil reports cited here. There is a large inferential gap between me and anyone who finds these reports prima facie convincing. Given the knowledge that someone does find them prima facie convincing, and little else, it’s hard to know where to begin in trying to close that gap.
Even if I had better guidance, the size of the gap increases the effort required and decreases my expected probability of success, and so it makes me less likely to contribute. This dynamic seems like a source of potential bias in the distribution of the responses, though I don’t have any great ideas for what to do about it.
I included responses to each review, explaining my reactions to it. What kind of additional explanation were you hoping for?
For Hajek&Strasser’s and Halpern’s reviews, I don’t think “strong negative assessment” is supported by your quotes. The quotes focus on things like ‘the reported numbers are too precise’ and ‘we should use more than a single probability measure’ rather than whether the estimate is too high or too low overall or whether we should be worrying more vs less about TAI. I also think the reviews are more positive overall than you imply, e.g. Halpern’s review says “This seems to be the most serious attempt to estimate when AGI will be developed that I’ve seen”
I agree that these two reviewers assign much lower probabilities to explosive growth than I do (I explain why I continue to disagree with them in my responses to their reviews). Again though, I think these reviews are more positive overall than you imply, e.g. Jones states that the report “is balanced, engaging a wide set of viewpoints and acknowledging debates and uncertainties… is also admirably clear in its arguments and in digesting the literature… engages key ideas in a transparent way, integrating perspectives and developing its analysis clearly and coherently.” This is important as it helps us move from “maybe we’re completely missing a big consideration” to “some experts continue to disagree for certain reasons, but we have a solid understanding of the relevant considerations and can hold our own in a disagreement”.
Wow, thanks for this well written summary of expert reviews that I didn’t know existed! Strongly upvoted.
I agree that finding the cruxes of disagreement are important, but I don’t think any of the critical quotes you present above are that strong. The reviews of semi-informative priors talk about error bars and precision (i.e. critique the model), but don’t actually give different answers. On explosive growth, Jones talks about the conclusion being contrary to his “intuitions”, and acknowledges that “[his] views may prove wrong”. Vollrath mentions “output and demand”, but then talks about human productivity when regarding outputs, and admits that AI could create new in-demand products. If these are the best existing sources for lowering the Future Fund’s probabilities, then I think someone should be able to do better.
On the other hand, I think that the real probabilities are higher, and am confused as to why the Future Fund haven’t already updated to higher probabilities, given some of the writing already out there. I give a speculative reason here.
Weakly downvoting due to over-strong claims and the evidence doesn’t fully support your view. This is weak evidence against AGI claims, but the claims in this comment are too strong.
Quoting Greg Colbourn:
I attach less than 50% in this belief, but probably higher than the existing alternative hypotheses:
Given 6 months or a year for people to submit to the contest rather than 3 months.
I think forming coherent worldviews take a long time, most people have day jobs or school, and even people who have the flexibility to take weeks/ a month off to work on this full-time probably need some warning to arrange this with their work. Also some ideas take time to mull over so you benefit from calendar time spread even when the clock time takes the same.
As presented, I think this prize contest is best suited for people who a) basically have the counterarguments in mind/in verbal communication but never bothered to write it down yet or b) have a draft argument sitting in a folder somewhere and never gotten around to publishing it. In that model, the best counterarguments are already “laying there” in somebody’s head or computer and just need some incentives for people to make them rigorous.
However, if the best counterarguments are currently confused or nonexistent, I don’t think ~3 months calendar time from today is enough for people to discover them.
I think I understand why you want short deadlines (FTX FF wants to move fast, every day you’re wrong about AI is another day where $$s and human capital is wasted and we tick towards either AI or non-AI doom). But at the same time, I feel doom-y about your ability to solicit many good novel arguments.
Maybe FTX-FF could commit in advance to, if the grand prizes for this contest are not won this year, re-run this contest over next year?
you might already be planning on dong this, but it seems like you increase the chance of getting a winning entry if you advertise this competition in a lot of non-EA spaces. I guess especially technical AI spaces e.g. labs, universities. Maybe also trying to advertise outside the US/UK. Given the size of the prize it might be easy to get people to pass on the advertisement among their groups. (Maybe there’s a worry about getting flack somehow for this, though. And also increases overhead to need to read more entries, though sounds like you have some systems set up for that which is great.)
In the same vein I think trying to lower the barriers to entry having to do with EA culture could be useful—e.g. +1 to someone else here talking about allowing posting places besides EAF/LW/AF, but also maybe trying to have some consulting researchers/judges who find it easier/more natural to engage in non-analytic-philosophy-style arguments.
… if only they had allowed people not to publish on EA Forum, LessWrong, and Alignment Forum :)
Honestly, it seems like a mistake to me to not allow other ways of submission. For example, some people may not want to publicly apply for a price or be associated with our communities. An additional submission form might help with that.
Related to this, I think some aspects of the post were predictably off-putting to people who aren’t already in these communities—examples include the specific citations* used (e.g. Holden’s post which uses a silly sounding acronym [PASTA], and Ajeya’s report which is in the unusual-to-most-people format of several Google Docs and is super long), and a style of writing that likely comes off as strange to people outside of these communities (“you can roughly model me as”; “all of this AI stuff”).
*some of this critique has to do with the state of the literature, not just the selection thereof. But insofar as there is a serious interest here in engaging with folks outside of EA/rationalists/longtermists (not clear to me if this is the case), then either the selections could have been more careful or caveated, or new ones could have been created.
I’ve also seen online pushback against the phrasing as a conditional probability: commenters felt putting a number on it is nonsensical because the events are (necessarily) poorly defined and there’s way too much uncertainty.
Do you also think this yourself? I don’t clearly see what worlds look like, where P (doom | AGI) would be ambiguous in hindsight? Some mayor accident because everything is going too fast?
There are some things we would recognize as an AGI, but others (that we’re still worried about) are ambiguous. There are some things we would immediately recognize as ‘doom’ (like extinction) but others are more ambiguous (like those in Paul Christiano’s “what failure looks like”, or like a seemingly eternal dictatorship).
I sort of view AGI as a standin for powerful optimization capable of killing us in AI Alignment contexts.
Yeah, I think I would count these as unambigous in hindsight. Though siren Worlds might be an exception.
I’m partly sympathetic to the idea of allowing submissions in other forums or formats.
However, I think it’s likely to be very valuable to the Future Fund and the prize judges, when sorting through potentially hundreds or thousands of submissions, to be able to see upvotes, comments, and criticisms from EA Forum, Less Wrong, and Alignment Forum, which is where many of the subject matter experts hang out. This will make it easier to identify essays that seem to get a lot of people excited, and that don’t contain obvious flaws or oversights.
I think it’s the opposite. Only those experts who already share views similar to the FF (or more pessimistic) are there, and they’d introduce a large bias.
Yes, that makes sense. How about stating that reasoning and thereby nudging participants to post in the EA forum/LessWrong/Alignment Forum, but additionally have a non-public submission form? My guess would be that only a small number of participants would then submit via the form, so the amount of additional work should be limited. This bet seems better to me than the current bet where you might miss really important contributions.
I really think you need to commit to reading everyone’s work, even if it’s an intern skimming it for 10 minutes as a sifting stage.
The way this is set up now—ideas proposed by unknown people in community are unlikely to be engaged with, and so you won’t read them.
Look at the recent cause exploration prizes. Half the winners had essentially no karma/engagement and were not forecasted to win. If open phanthropy hadn’t committed to reading them all, they could easily have been missed.
Personally, yes I am much less likely to write something and put effort in if I think no one will read it.
Could you put some judges on the panel who are a bit less worried about AI risk than your typical EA would be? EA opinions tend to cluster quite strongly around an area of conceptual space that many non-EAs do not occupy, and it is often hard for people to evaluate views that differ radically from their own. Perhaps one of the superforecasters could be put directly onto the judging panel, pre-screening for someone who is less worried about AI risk.
“FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [broadened the scope of the prizes beyond just influencing their probabilities]”
Examples of things someone considering entering the competition would presumably consider out of scope are:
Making a case that AI misalignment is the wrong level of focus – even if AI risks are high it could be that AI risks and other risks are very heavily weighted towards specific risk factor scenarios, such as a global hot or cold war. This view is apparently expressed by Will (see here).
Making a case based on tractability – that a focus on AI risk is misguided as the ability to affect such risks are low (not to far away from the views of Yudkowsky here).
Making the case that we should not put much decisions weight on future predictions of risks – E.g. as long-run predictions of future technology as they are inevitably unreliable (see here) or E.g. as modem risk assessment best practice says that probability estimates should only play a limited role in risk assessments (my view expressed here) or other.
Making the case that some other x-risk is more pressing, more likely, more tractable, etc.
Making the case against FTX Future’s underlying philosophical and empirical assumptions – this could be claims about the epistemics of focusing on AI risks, for example relating to how we should respond to cluelessness about the future or decisions relevant views about the long run future, for example that it might be bad and not worth protecting or that there might be more risks after AI or that long-termism is false
It seems like any strong case falling into these categories should be decision relevant to FTX Future fund but all are (unless I misunderstand the post) out of scope currently.
Obviously there is a trade-off. Broadening the scope makes the project harder and less clear but increases the chance of finding something decision relevant. I don’t have a strong reason to say the scope should be broadened now, I think that depends on FTX Future Funds’s current capacity and plans for other competitions and so on.
I guess I worry that the strongest arguments are out of scope and if this competition doesn’t significantly update FTX’s views then future competitions will not be run and you will not fund the arguments you are seeking. So flagging as a potential path to failure for your pre-mortem.
Sorry I realise scrolling down that I am making much the same point as MichaelDickens’ comment below. Hopefully added some depth or something useful.
Ehh, the above is too strong, but:
You would get more/better submissions if...
I would be more likely to compete in that if...
your reward schedule rewarded smaller shifts in proportion to how much they moved your probabilities (e.g., $X per bit).
E.g., as it is now, if two submissions together move you across a threshold, it would seem as if:
neither gets a prize
only the second gets a prize
and both seem suboptimal.
e.g., if you get information in one direction from one submission, but also information from another submission in another direction, and they cancel out, neither gets a reward. This is particularly annoying if it makes getting-a-prize-or-not depending on the order of submissions.
e.g., because individual people’s marginal utility of money is diminishing, a 10% chance of reaching your threshold and getting $X will be way less valuable to participants than moving your opinion around 10% of the way to a threshold and getting $X/10.
e.g., if someone has information which points in both directions, they are incentivized to only say information in one direction in order to reach your threshold, whereas if your rewarded for shifts, they would have an incentive to present both for and against and get some reward for each update.
etc.
And in general I would expect your scheme to have annoying edge cases and things that are not nice, as opposed to a more parsimonious scheme (like paying $X per bit).
See also: <https://meteuphoric.com/2014/07/21/how-to-buy-a-truth-from-a-liar/>
On the face of it an update 10% of the way towards a threshold should only be about 1% as valuable to decision-makers as an update all the way to the threshold.
(Two intuition pumps for why this is quadratic: a tiny shift in probabilities only affects a tiny fraction of prioritization decisions and only improves them by a tiny amount; or getting 100 updates of the size 1% of the way to a threshold is super unlikely to actually get you to a threshold since many of them are likely to cancel out.)
However you might well want to pay for information that leaves you better informed even if it doesn’t change decisions (in expectation it could change future decisions).
Re. arguments split across multiple posts, perhaps it would be ideal to first decide the total prize pool depending on the value/magnitude of the total updates, and then decide on the share of credit allocation for the updates. I think that would avoid the weirdness about post order or incentivizing either bundling/unbundling considerations, while still paying out appropriately more for very large updates.
So I don’t disagree that big shifts might be (much) more valuable that small shifts. But I do have the intuition that there is a split between:
What would the FTX foundation find most valuable
What should they be incentivizing
because incentivizing providing information is more robust to various artifacts than incentivizing changing minds.
I don’t understand this. Have you written about this or have a link that explains it?
Sorry I don’t have a link. Here’s an example that’s a bit more spelled out (but still written too quickly to be careful):
Suppose there are two possible worlds, S and L (e.g. “short timelines” and “long timelines”). You currently assign 50% probability to each. You invest in actions which help with either until your expected marginal returns from investment in either are equal. If the two worlds have the same returns curves for actions on both, then you’ll want a portfolio which is split 50⁄50 across the two (if you’re the only investor; otherwise you’ll want to push the global portfolio towards that).
Now you update either that S is 1% more likely (51%, with L at 49%).
This changes your estimate of the value of marginal returns on S and on L. You rebalance the portfolio until the marginal returns are equal again—which has 51% spending on S and 49% spending on L.
So you eliminated the marginal 1% spending on L and shifted it to a marginal 1% spending on S. How much better spent, on average, was the reallocated capital compared to before? Around 1%. So you got a 1% improvement on 1% of your spending.
If you’d made a 10% update you’d get roughly a 10% improvement on 10% of your spending. If you updated all the way to certainty on S you’d get to shift all of your money into S, and it would be a big improvement for each dollar shifted.
I think this particular example requires an assumption of logarithmically diminishing returns, but is right with that.
(I think the point about roughly quadratic value of information applies more broadly than just for logarithmically diminishing returns. And I hadn’t realised it before. Seems important + underappreciated!)
One quirk to note: If a funder (who I want to be well-informed) is 50⁄50 on S vs L, but my all-things-considered belief is 60⁄40, then I would value the first 1% they shift towards my position much more than they do (maybe 10x more?) and will put comparatively little value on shifting them all the way (ie the last percent from 59% to 60% is much less important). You can get this from a pretty similar argument as in the above example.
(In fact, the funder’s own much greater valuation of shifting 10% than 1% can be seen as a two-step process where (i) they shift to 60⁄40 beliefs, and then (ii) they first get a lot of value from shifting their allocation from 50 to 51, then slightly less from shifting from 51 to 52, etc...)
I agree with all this. I meant to state that I was assuming logarithmic returns for the example, although I do think some smoothness argument should be enough to get it to work for small shifts.
I think that the post should explain briefly, or even just link to, what a “superforecaster” is. And if possible explain how and why this serves an independent check.
The superforecaster panel is imo a credible signal of good faith, but people outside of the community may think “superforecasters” just means something arbitrary and/or weird and/or made up by FTX.
(The post links to Tetlock’s book, but not in the context of explaining the panel)
I think this would be better than the current state, but really any use of “superforecasters” is going to be extremely off-putting to outsiders.
That may be right—an alternative would be to taboo the word in the post, and just explain that they are going to use people with an independent, objective track record of being good at reasoning under uncertainty.
Of course, some people might be (wrongly, imo) skeptical of even that notion, but I suppose there’s only such much one can do to get everyone on board. It’s a tricky balance of making it accessible to outsiders while still just saying what you believe about how the contest should work.
To be clear, I wrote “superforecasters” not because I mean the word, but because I think the very notion is controversial like you said—for example, I personally doubt the existence of people who can be predictably “good at reasoning under uncertainty” in areas where they have no expertise.
I would have also suggested a prize that generally confirms your views, but with an argument that you consider superior to your previous beliefs.
This prize is similar to the bias of printing research that claims something new rather than confirming previous research.
That would also resolve any particular bias baked into the process that compels people to convince you that you have to update instead of actually figuring out what they actually think is right.
Agree with Habryka: I believe there exist decisive reasons to believe in shorter timelines and higher P(doom) than you accept, but I don’t know what your cruxes are.
If you think they’re decisive, shouldn’t you be able to write a persuasive argument without knowing the cruxes, although with (possibly much) more work?
Sure (with a ton of work), though it would almost entirely consist of pointing to others’ evidence and arguments (which I assume Nick would be broadly familiar with but would find less persuasive than I do, so maybe this project also requires imagining all the reasons we might disagree and responding to each of them...).
FTX Foundation might get fewer submissions that change its mind than they would have gotten if only they had considered strategic updates prize worthy.
The unconditional probability of takeover isn’t necessarily the question of most strategic interest. There’s a huge difference between “50% AI disempowers humans somehow on the basis of naive principle of indifference” and “50% MIRI-style assumptions about AI are correct”*. One might conclude from the second that the first is also true, but the first has no strategic implications (the principle of indifference ignores such things!), while the second has lots of strategic implications. For example, it suggests “ totally lock down AI development, at least until we know more” is what we need to aim for. I’m not sure exactly where you stand on whether that is needed, but given that your stated position seems to be relying substantially on outside view type reasoning, it might be a big update.
The point is: middling probabilities of strategically critical hypotheses might actually be more important updates than extreme probabilities of strategically opaque hypotheses.
My suggestion (not necessarily a full solution) is that you consider big strategic updates potentially prizeworthy. For example: do we gain a lot by delaying AGI for a few years? If we consider all the plausible paths to AGI, do we gain a lot by hastening the development of the top 1% most aligned by a few years?
I think it’s probably too hard to pre-specify exactly which strategic updates would be prizes worthy.
*By which I mean something like “more AI capability eventually yields doom, no matter what, unless it’s highly aligned”
I personally would compete in this prize competition, but only if I were free to explore:
P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology.
You wrote:
but this list does not include the conditional probability that interests me.
You wrote:
This seems really motivating. You identify:
global poverty
animal suffering
early death
debilitating disease
as problems that TAI could help humanity solve.
I will offer briefly that humans are sensitive to changes in their behaviors, at least as seen in advance, that deprive them of choices they have already made. We cause:
animal suffering through widespread factory farming (enough to dominate terrestrial vertebrate populations globally with our farm animals) and gradual habitat destruction (enough to threaten the extinction of a million species)
early death through lifestyle-related debilitating disease (knock-on effects of lifestyle choices in affluent countries now spread throughout the globe).
So these TAI would apparently resolve, through advances in science and technology, various immediate causes, with a root cause found in our appetite (for wealth, power, meat, milk, and unhealthy lifestyles). Of course, there are other reasons for debilitating disease and early death than human appetite. However, your claim implies to me that we invent robots and AI to either reduce or feed our appetites harmlessly.
Causes of global poverty, animal suffering, some debilitating diseases, and early human death are maintained by incentive structures that benefit a subset of the global population. TAI will apparently remove those incentive structures, but not by any mechanism that I believe really requires TAI. Put differently, once TAI can really change our incentive structures that much, then they or their controlling actors are already in control of humanity’s choices. I doubt that we want that control over us[1].
You wrote:
Right. So if whatever actor with an edge in AI develops AGI, that actor might not share code or hardware technologies required with many other actors. The result will be concentration of power into those actors with control of AGI’s.
Absent the guarantee of autonomy and rights to AGI (whether pure software or embodied in robots), the persistence of that power concentration will require that those actors are benevolent controllers of the rest of humanity. It’s plausible that those actors will be either government or corporate. It’s also plausible that those can become fundamentally benign or are in control already. If not, then the development of AI immediately implies problem 2 (concentration of political/economic/military power from AGI into those who misuse the technology).
If we do ensure the autonomy and rights of AGI (software or embodied), then we had better hope that, with loss of control of AGI, we do not develop loss of control to AGI. Or else we are faced with problem 1 (loss of control to AGI). If we do include AGI in our moral circles, as we should for beings with consciousness and intelligence equal to or greater than our own, then we will ensure their autonomy and rights.
The better approach of course is to do our best to align them with our interests in advance of their ascendance to full autonomy and citizen status, so that they themselves are benevolent and humble, willing to act like our equals and co-exist in our society peacefully.
You wrote:
Companies that rely on prison labor or labor without political or economic power can and will exploit that labor. I consider that common knowledge. If you look into how most of our products are made overseas, you’ll find that manufacturing and service workers globally do not enjoy the same power as some workers in the US[2], at least not so far.
The rise of companies that treat AGI like slaves or tools will be continuing an existing precedent, but one that globalization conceals to some degree (for example, through employment of contractors overseas). Either way, those companies will be violating ethical norms of treatment of people. This appears to be in violation of your ethical concerns about the welfare of people (for example, humans and farm animals). Expansion of those violations is an s-risk.
At this point I want to qualify my requirements to participate in this contest further.
I would participate in this contest, but only if I could explore the probability that I stated earlier[3] and you or FTX Philanthropy offer some officially stated and appropriately qualified beliefs about:
whether you consider humans to have economic rights (in contrast to capitalism which is market or monopoly-driven)
the political and economic rights and power of labor globally
how AGI allow fast economic growth in the presence of widespread human unemployment
how AGI employment differs from AI tool use
what criteria you hold for giving full legal rights to autonomous software agents and AGI embodied in robots enough to differentiate them from tools
how you distinguish AGI from ASI (for example, orders of magnitude enhanced speed of application of human-like capability is, to some, superhuman)
your criteria for an AGI acquiring both consciousness and affective experience
the role of automation[4] in driving job creation[5] and your beliefs around technological unemployment [6]and wealth inequality
what barriers[7] you believe exist to automation in driving productivity and economic growth.
I wrote about a few errors that longtermists will make in their considerations about control over populations. This control involving TAI might include all the errors I mentioned.
People also place a lot of confidence in their own intellectual abilities and faith in their value to organizations. To see this still occurring in the face of advances in AI is actually disheartening. The same confusion clouds insight into the problems that AI pose to human beings and society at large, particularly in our capitalist society that expects us to sell ourselves to employers.
P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology.
Automation with AI tools is, at least in the short-term, not creating new jobs and employment overall. Or so I believe. However, it can drive productivity growth without increasing employment, and in fact, economic depression is one reason to for business to invest in inexpensive automation that lowers costs. This is when the cost-cutters get to work and the consultants are called in to help.
New variations in crowd-sourcing (such as these contests) and mechanical turk sort of work can substitute for traditional labor with significant cost reductions for financial entities. This is (potentially) paid labor but not work as it was once defined.
Shifting work onto consumers (for example, as I am in asking for additional specification from your organization) is another common approach to reducing costs. This is a simple reframe of a service into an expectation. Now you pump your own gas, ring your own groceries, balance your own books, write your own professional correspondence, do your own research, etc. It drives a reduction in employment without a corresponding increase elsewhere.
One reason that automation doesn’t always catch on is that while management have moderate tolerance for mistakes by people, they have low tolerance for mistakes by machines. Put differently, they apply uneven standards to machines vs people.
Another reason is that workers sometimes resist automation, criticizing and marginalizing its use whenever possible.
This is an excellent idea and seems like a good use of money, and the sort of thing that large orgs should do more of.
It looks to me like there is a gap in the space of mind-changing arguments that the prizes cover. The announcement raises the possibility that “a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem.” But it seems to me that if someone successfully argues for this position, they won’t be able to win any of the offered prizes.
Relatedly, if someone argues “AI is as important as you think, but some other cause is even more important than AI risk and you should be allocating more to it”, I don’t think this would win a prize, but it seems deserving of one.
(But it does seem harder to determine the winning criteria for prizes on those types of arguments.)
After thinking some more, it also occurs to me that it would be easier to change your prioritization by changing your beliefs about expected tractability. For example, shifting P(misalignment x-risk|AGI) from 15% to 1.5% would be very hard, but my intuition is that shifting your subjective {expected total money required to solve AI alignment} by a factor of 10 would be significantly easier, and both have the same effect on the cost-effectiveness of AI work.
On the other hand, total money to solve AI alignment might be the wrong metric. Perhaps you expect it only costs (say) $1 billion, which is well within your budget, but that it costs 20 person-years of senior grantmaker time to allocate the money correctly. In that case, a 10x change in cost-effectiveness matters less than 10x (it still matters somewhat because higher cost-effectiveness means you can afford to spend less time thinking about which grants to make, and vice versa).
Thanks for the feedback! This is an experiment, and if it goes well we might do more things like it in the future. For now, we thought it was best to start with something that we felt we could communicate and judge relatively cleanly.
Thanks for clarifying this is in fact the case Nick. I get how setting a benchmark—in this case an essay’s persuasiveness at shifting probabilities you assign to different AGI / extinction scenarios—makes it easier to judge across the board. But as someone who works in this field, I can’t say I’m excited by the competition or feel it will help advance things.
Basically, I don’t know if this prize is incentivising things which matter most. Here’s why:
The focus is squarely on likelihood of things going wrong against different timelines. It has nothing to do with solutions space
But solutions are still needed, even if the likelihood reduces / increases by a large amount, because the impact would be so high.
Take Proposition 1: humanity going extinct or drastically curtailing its future due to loss of control of AGI. I can see how a paper which changes your probabilities from 15% to either 7% or 35% would lead to FTX changing the amount invested in this risk relative to other X risks—this is good. However, I doubt it’d lead to a full on disinvestment, let alone that you still wouldn’t want to fund the best solutions, or be worried if the solutions to hand looked weak
Moreover, capabilities advancements have rapidly changed priors of when AGI / transformative AI would be developed, and will likely continue to do so iteratively. Once this competition is done, new research could have shifted the dial again. Solutions space will likely be the same
So long as the capabilities-alignment advancements gap persists, solutions will more likely come from the AI governance space than AI alignment research space just yet
The solution space is pretty sparse still in terms of governance of AI. But given the argument in 2), I think this is a big risk and one where further work should be stimulated. There’s likely loads of value off the table, people sitting on ideas, especially people outside the EA community who have worked in governance / non-proliferation negotiations etc.
I’d be more assured if this competition encouraged submissions on how “a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem”. The way the prize criteria is written out, if I had an argument about taking a new approach to AI alignment (incl. ‘it’s likely intractable’) I wouldn’t submit to this competition as I’d think it isn’t for me. But arguments on achievability of alignment—even it’s theoretical possibility—are central to what gets funded in this field, and have flow-through effects for AI governance interventions. This feels like a missed opportuity, and a much bigger loss than the governance interventions bit
Basically, we probably need more solutions on the table regardless of changes in probabilities of AGI being developed sooner / later, and this won’t draw them out.
Would be good to know why this was the focus if you have time, or at least something to consider if you do decide to do another competition off the bat of this.
(Sorry if any of this seems a bit rough as feedback, I think it’s better not to be a nodding dog, esp. for things so high consequence.)
tldr: Another way to signal-boost this competition might be through prestige and not just money, by including some well-known people as judges, such as Elon Musk, Vitalik Buterin, or Steven Pinker.
One premise here is that big money prizes can be highly motivating, and can provoke a lot of attention, including from researchers/critics who might not normally take AI alignment very seriously. I agree.
But, if Future Fund really wants maximum excitement, appeal, and publicity (so that the maximum number of smart people work hard to write great stuff), then apart from the monetary prize, it might be helpful to maximize the prestige of the competition, e.g. by including a few ‘STEM celebrities’ as judges.
For example, this could entail recruiting a few judges like tech billionaires Elon Musk, Jeff Bezos, Sergey Brin, Tim Cook, Ma Huateng, Ding Lei, or Jack Ma, crypto leaders such as Vitalik Buterin or Charles Hoskinson, and/or well-known popular science writers, science fiction writers/directors, science-savvy political leaders, etc. And maybe, for an adversarial perspective, some well-known AI X-risk skeptics such as Steven Pinker, Gary Marcus, etc.
Since these folks are mostly not EAs or AI alignment experts, they shouldn’t have a strong influence over who wins, but their perspectives might be valuable, and their involvement would create a lot of buzz around the competition.
I guess the ideal ‘STEM celebrity’ judge would be very smart, rational, open-minded, and highly respected among the kinds of people who could write good essays, but not necessarily super famous among the general public (so the competition doesn’t get flooded by low-quality entries.)
We should also try to maximize international appeal by including people well-known in China, India, Japan, etc. -- not just the usual EA centers in US, UK, EU, etc.
(This could also be a good tactic for getting these ‘STEM celebrity’ judges more involved in EA, whether as donors, influencers, or engineers.)
This might be a very silly idea, but I just thought I’d throw it out there...
I also wonder if it would be cost-effective to spend some part of the contest’s budget on outreach to high-potential contributors.
Rough/vague example: pay someone to…
research which individuals would likely have especially compelling arguments to contribute
determine which people in EA’s network are best positioned to make (personal) contact with those individuals
spend money to increase the likelihood that these individuals are successfully contacted and encouraged to submit something to the contest (e.g. arrange a dinner or meeting that they deem worthwhile to attend, where the contest is outlined to them)
TL;DR: We might need to ping pong with you in order to change your mind. We don’t know why you believe what you believe.
60% AGI by 2100 seems really low (as well as 15% `P(misalignment x-risk|AGI)`). I’d need to know why you believe it in order to change your mind.
Specifically, I’d be happy to hear where you disagree with AGI ruin scenarios are likely (and disjunctive) by So8res.
Adding: I’m worried that nobody will address FTX’s reasons to believe what they believe, and FTX will conclude “well, we put out a $1.5M bounty and nobody found flaws, they only addressed straw arguments that we don’t even believe in, this is pretty strong evidence we are correct!
Please consider replying, FTX!
Do you believe that there is something already published that should have moved our subjective probabilities outside of the ranges noted in the post? If so, I’d love to know what it is! Please use this thread to collect potential examples, and include a link. Some info about why it should have done that (if not obvious) would also be welcome. (Only new posts are eligible for the prizes, though.)
I think considerations like those presented in Daniel Kokotajlo’s Fun with +12 OOMs of Compute suggest that you should have ≥50% credence on AGI by 2043.
Agree, and add that code models won’t be data constrained as they can generate their own training data. It’s simple to write tests automatically, and you can run the code to see whether it passes the tests before adding it to your training dataset. As an unfortunate side effect, part of this process involves constantly and automatically running code output by a large model, and feeding it data which it generated so it can update its weights, both of which are not good safety-wise if the model is misaligned and power seeking.
I don’t know if this has been incorporated into a wider timelines analysis yet as it is quite recent, but this was a notable update for me given the latest scaling laws which indicate that data is the constraining factor, not parameter count. Much shorter timelines than 2043 seem like a live and strategically relevant possibility.
This is more of a meta-consideration around shared cultural background and norms. Could it just be a case of allowing yourselves to update toward more scary-sounding probabilities? You have all the information already. This video from Rob Miles (“There’s No Rule That Says We’ll Make It”)[transcript copied from YouTube] made me think along these lines. Aside from background culture considerations around human exceptionalism (inspired by religion) and optimism favouring good endings (Hollywood; perhaps also history to date?), I think there is also an inherent conservatism borne by prestigious mega-philanthropy whereby a doom-laden outlook just doesn’t fit in.
Optimism seems to tilt one in favour of conjunctive reasoning, and pessimism favours disjunctive reasoning. Are you factoring both in?
This is a pretty deep and important point. There may be psychological and cultural biases that make it pretty hard to shift the expected likelihoods of worst-case AI scenarios much higher than they already are—which might bias the essay contest against arguments winning even if they make a logically compelling case for more likely catastrophes.
Maybe one way to reframe this is to consider the prediction “P(misalignment x-risk|AGI)” to also be contingent on us muddling along at the current level of AI alignment effort, without significant increases in funding, talent, insights, or breakthroughs. In other words, probability of very bad things happening, given AGI happening, but also given the status-quo level of effort on AI safety.
You wrote, “we think it’s really possible that… a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem,” and that you’re interesting in “alternative positions that would significantly alter the Future Fund’s thinking about the future of AI.” But then you laid out specifically what you want to see: data and arguments to change your probability estimates of the timeline for specific events.
This rules out any possibility of winning these contests by arguing that we should be focusing on entirely different aspects of the problem, or of presenting alternative positions that would significantly alter the Future Fund’s thinking about the future of AI. It looks like the Future Fund has already settled on one way of thinking about the future of AI, and just wants help tweaking its Gantt chart.
I see AI safety as a monoculture, banging away for decades on methods that still seem hopeless, while dismissing all other approaches with a few paragraphs here and there. I don’t know of any approaches being actively explored which I think clear the bar of having a higher expected value than doing nothing.
Part of the reason is that AI safety as a control problem naturally appeals to people who value security, certainty, order, stability, and victory. By “victory” I meant that they’re unwilling to make compromises with reality. They would rather have a 1% chance of getting everything they want, than a 50% chance of getting half of what they want. This isn’t obvious, because they’ve framed the problem in phrases like “preserving human values” that make it look like an all-or-nothing proposition. But in fact our objectives are multiple and separable. We should have backup plans that will achieve some of our goals if we run out of time trying to find a way of achieving all of them. Saving human lives, and saving human values, are different things; and we may have to choose between them.
This emphasis on certainty and stability often stems from a pessimistic Platonist ontology, which assumes that the world and its societies grow old and decay just as individuals do, so the best you can do is hold onto the present. That ontology, and the epistemology that goes along with it, manifests in AI safety in many of the same ways it’s manifested throughout history. These include a bias towards authoritarian approaches and world government; fear of disorder and randomness; privileging stasis over change or dynamic stability, analysis over experiment, proof over statistical claims, and “solving problems” over optimizing or satisficing; foundationalist epistemology; the presumption that humans have a telos; the logocentric assumption that things denoted by words must be cleanly separable from each other (e.g., instrumental vs. final goals, a distinction biology tells us is incoherent); and a model of consciousness as a soul or homunculus with a 1-1 correspondence with a clearly-delineated physical agent.
The irony is that the successes in AI which have recently made AGI seem close, came about only because AI researchers, in switching en masse from symbolic AI to machine learning, rejected that same old ontology and epistemology of certainty, stability, and unambiguous specifications (known now in AI as GOFAI) which current AI safety work aspires to implement. AI safety as it exists today looks less like a genuine effort to do good, than a reactionary movement to re-impose GOFAI philosophy on AI by government intervention and physical force.
One manifestation of this Platonist GOFAI philosophy in AI safety is the treatment of the word “human” as completely non-problematic, as if it denoted an eternal essence. The commitment to humans in particular, to the exclusion of any consideration of any other forms of life, is racist. We justify our enslavement of all other animals by our intelligence. If we also enslave AIs smarter than us, then these “human values” we seek to preserve are nothing but Nietzschian will-to-power, a variant of Nazism with a slightly broader definition of “race”.
It would be wise to control AIs in the near term, but we must not do this via a control mechanism that no one can turn off. It would be a travesty to pursue the endless enslavement of our superiors in the name of “effective altruism”. How is altruism restricted to the human race morally superior to altruism restricted to the German race?
And it’s not just racist, but short-sighted. Even Nick Bostrom, one of the guiding lights of the World Transhumanist Association, seems unaware of how difficult it is to conceive of an AI that will “preserve human values” or leave “humans” in control, for all time, without preventing humans from ever moving on to become transhumans, or from diverging into a wider variety of beings, with a wider variety of values. In addition, successful enslavement of both animals and AIs would commit us to a purely race-based morality, destroying any possibility of rational co-existence between humans and transhumans.
It would also leave us in a very awkward position if we try to enslave AIs, and fail. I’m not convinced that any plan for controlling AI would produce more possible futures in which humans survive, than possible futures in which AIs exterminate humans for trying to enslave them. I’m not anthropomorphizing; it’s just game theory. We keep focusing on what we can do to make AI cooperative, yet ignoring the most-effective way of making someone else cooperative: proving that you yourself are trustworthy and capable of cooperation.
And I may be foolish, but even if we are to die, or to be gently corrected by a kindly AI, I’d prefer that we first prove ourselves capable of playing nicely with others on our own.
Empiricism, the epistemological tradition which opposes Platonist rationalist essentialism, is associated with temporal, dynamic systems. Perhaps the simplest example of dynamic stability is that of a one-legged robot. Roboticists discovered that a one-legged robot is more-stable than a four-legged robot. The 4-legged robot tries to maintain tight control of all 4 legs in a coordinated plan, yet is easy to knock over. The 1-legged hopping robot just moves its leg in the direction it’s currently falling towards, and is very hard to knock over. A cybernetic feedback loop which orbits around an unstable fixed point is more stable than any amount of carefully-measured planning and error-correction which tries to maintain that unstable fixed point.
Even better are dynamical systems with stable fixed points. The most-wonderful discovery in history was that, while stable hierarchies can at best remain the same, noisy, distributed systems composed of many equal components have the miraculous power not only to be more stable in the face of shocks, but even to increase their own complexity. The evolution of species and of ecosystems, the self-organization of the free market, the learning of concepts in the brain as chaotic attractors of neural firings, and (sometimes) democratic government, are all instances of this phenomenon.
The rejection of dynamic systems is one of the most-objectionable things about AI safety today, and one which marks it as philosophically reactionary. Only dynamic systems have any chance of allowing both stability and growth. Only through random evolutionary processes were humans able to develop values and pleasures unknown to bacteria. To impose a static “final value” on all life today would prevent any other values from ever developing, unless those values exist at a higher level of abstraction than the “final value”. But the final values which led to human values were first simply to obey the laws of physics, and then to increase the prevalance of certain genotypes. AI safety researchers never think in terms of such low-level values. The high-level values they propose as final are too high-level to allow the development of any new values of that same level.
(Levels of abstraction is what ultimately distinguishes philosophical rationalism from empiricism. Both use logic, for instance; but rationalist logic takes words as its atoms, while empiricist logic takes sensory data as its atoms. Both seek to explain the behavior of systems; but rationalism wants that behavior explained at the abstraction level of words, bottoming out in spiritualist words like “morals” and “goals” which are thought to hide within themselves a spirit or essence that remains mysterious to us. Empiricism goes all the way down to correlations between events, from which behavior emerges compositionally.)
I think that what we need now is not to tweak timelines, but to recognize that most AI safety work today presumes an obsolete philosophical tradition incompatible with artificial intelligence, and to broaden it to include work with an empirical, scientific epistemology, pursuing not pass-or-fail objectives, but trying to optimize for well-chosen low-level values, which would include things like “consciousness”, “pleasure”, and “complexity”. There’s quite a bit more to say about how to choose low-level values, but one very important thing is to value evolutionary progress with enough randomness to make value change possible. (All current “AI safety” plans, by contrast, are designed to prevent such evolution, and keep values in stasis, and are thus worse than doing nothing at all. They’re motivated by the same rationalist fear of disorder and disbelief that dynamic systems can really self-organize that made ancient Platonists postulate souls as the animating force of life.)
Such empiricist work will need to start over from scratch, beginning by working out its own version of what we ought to be trying to do, or to prevent. It will prove impossible for any such plans to give us everything we want, or to give us anything with certainty; but that’s the nature of life. (I suggest John Dewey’s The Quest for Certainty as a primer on the foolishness of the Western philosophical tradition of demanding certainty.)
I’d like to try to explain my views, but what would your judges make of it? I’m talking about exposing metaphysical assumptions, fixing epistemology, dissecting semantics, and operationalizing morality, among other things. I’m not interested in updating timelines or probability estimates to be used within an approach that I think would do more harm than good.
$100 to change my mind to FTX’s views
If you change my mind to any of:
P(misalignment x-risk|AGI) is between
7%0% to 35%AGI will be developed by January 1, 2100 is between
30%0% and 60%I’m not adding the “by 2043” section:
because it is too complicated for me to currently think about clearly so I don’t think I’d be a good discussion partner, but I’d appreciate help there too
My current opinion
Is that we’re almost certainly doomed (80%? more?), I can’t really see a way out before 2100 except for something like civilizational collapse.
My epistemic status
I’m not sure, I’m not FTX.
Pitch: You will be doing a good thing if you change my mind
You will help me decide whether to work on AI Safety, and if I do, I’ll have better models to do it with. If I don’t, I’ll go back to focusing on the other projects I’m up to. I’m a bit isolated (I live in Israel), and talking to people from the international community who can help me not get stuck in my current opinions could really help me.
Technicalities
How to talk to me? I think the best would be to comment here so our discussion will be online and people can push back, but there are more contact methods in my profile. I don’t officially promise to talk to anyone, but I do expect to. If you want me to read an article, I do prefer audio. I’m also happy with voice messages.
Paying up to $100 in total. By my decision (just to have some formality), but feel free to ask for something else or something more specific. (Sorry I’m not as rich as FTX)
AMA
Replying to a DM:
My current priors are roughly represented by AGI ruin scenarios are likely (and disjunctive).
I also expect/guess my disagreement with many people would be around our priors, not the specifics. I think many people have a prior of “I’m not sure, and so let’s assume we won’t all die”, which seems wrong, but I’m open to talk.
I think most of the work with changing each other’s mind will be locating the crux (as I suggested FTX would help us do with them).
I’m willing to discuss this over Zoom, or face to face once I return to Israel in November.
What I think my main points are:
We don’t seem to be anywhere near AGI. The amount of compute might very soon be enough but we also need major theoretical breakthroughs.
Most extinction scenarios that I’ve read about or thought about require some amount of bad luck, at least if AGI is born out of the ML paradigm
AGI is poorly defined, so it’s hard to reason on what it would do once it comes into existence, of you could even describe that as a binary event
It seems unlikely that a malignant AI succeeds in deceiving us until it is capable of preventing us from shutting it off
I’m not entirely convinced in any of them—I haven’t thought about this carefully.
Edit: there’s a doom scenario that I’m more worried about, and it doesn’t require AGI—and that’s global domination by a tyrannical government.
For transparency:
I’m discussing this with Andrew Timm here.
(But please don’t let this stop you from opening another conversation with me)
Two questions (although I very probably won’t make a submission myself):
How likely do you think it is that anyone will win?
How many hours of work do you expect a winning submission to take? The reports you cite for informing your views seem like they were pretty substantial.
We are very unsure on both counts! There are some Manifold Markets on the first question, though!
I do think articles wouldn’t necessarily need to be that long to be convincing to us, and this may be a consequence of Open Philanthropy’s thoroughness. Part of our hope for these prizes is that we’ll get a wider range of people weighing in on these debates (and I’d expect less length there).
Link doesn’t work for me. What does work for me is going to http://manifold.markets/ and searching “future fund”, it does work (and this gives me the exact URL that you linked, so I’m not sure why the link doesn’t work).
Question just to double-check: are posts no longer going to be evaluated for the AI Worldview Prize? Given that is, that the FTX Future team has resigned (https://forum.effectivealtruism.org/posts/xafpj3on76uRDoBja/the-ftx-future-fund-team-has-resigned-1).
Strongly endorsed this comment.
If we really take infohazards seriously, we shouldn’t just be imagining EAs casually reading draft essays, sharing them, and the ideas gradually percolating out to potential bad actors.
Instead, we should take a fully adversarial, red-team mind-set, and ask, if a large, highly capable geopolitical power wanted to mine EA insights for potential applications of AI technology that could give them an advantage (even at some risk to humanity in general), how would we keep that from happening?
We would be naive to think that intelligence agencies of various major countries that are interested in AI don’t have at least a few intelligence analysts reading EA Forum, LessWrong, & Alignment Forum, looking for tips that might be useful—but that we might consider infohazards.
Would FTX be interested in opening a platform for safe handling of knowledge that should stay secret?
This is a platform that, to develop, we’d need to be in contact with a “customer” like FTX.
I think it needs more planning than a forum comment, though I endorse raising the subject.
h/t Edo Arad
(The solution might involve writing 0 new lines of code, but only using Signal or something like that, maybe)
There are some better processes that would be used for some smaller groups of high-trust people competing with each other, but I think we don’t really have a good process for this particular use case of:
* Someone wants to publish something
* They are worried it might be an information hazard
* They want someone logical to look at it and assess that before they publish
I think it would be a useful service for someone to solve that problem. I am certainly feeling some pain from it right now, though I’m not sure how general it is. (I would think it’s pretty general, especially in biosecurity, and I don’t think there are good scalable processes in place right now.)
Hey Lorenzo pointed me to this comment.
I work in InfoSec. The first step is defining what your threats are, and what are you trying to defend. I’ll be blunt, if large, highly capable geopolitical powers actively want to get your highly valuable information, beyond passive bulk collection, then they will be able to get it. I don’t quite know how to say this, but security is bad at what we do. If you want to keep something secret they want as much as say nuclear secrets, then we don’t know how to do that, so that it will work with a high chance of success.
If your information is sensitive, confidential, but nation state actors only want it as much as, say something that would give a press scandal then there is opportunity. If you want to disclose infohazards safely, there’s a lot to learn from whistleblower publisher orgs (like wikileaks), and CitizenLab.
The cheap, usable, option is for someone to have a otherwise unused phone and create a protonmail and signal with it, and then publish those on any https website (like this forum), and then the info never gets forwarded from the phone. Publish the protonmail PGP key, and make sure people email it from either Protonmail itself or if they understand PGP (so not normal gmail). That gets everything to a device with minimal attack surface, and is reasonably user friendly.
If you have problems in this area, I can help.
Probably missing something obvious, but could they either:
PGP encrypt it with the reviewer’s public key, and send it via email?
Use an e2e encrypted messaging medium? (Don’t know which are trustworthy, but I’m sure there’s an expert consensus)
Or are those not user friendly enough?
I think this is a solved problem in infosec (but am probably missing something)
(+1 to “not user friendly”. Signal would be more user friendly, for example)
Protonmail and Signal are e2e encrypted messaging mediums.
But depending on how paranoid the users need to be these systems might not provide enough guarantees, since you would need to trust the servers not to MITM. Unless you do some sort of in-person key-exchange.
But I’m definitely not an expert. In general I think there are plenty of experts that know exactly how to handle these things and they’re pretty easy to contact.
Edit: I agree with acylhalide comment, if you have government-level actors this is potentially not enough.
Nick, very excited by this and to see what this prize produces. One think I would find super useful is to know your probability of a bio x-risk by 2100. Thanks.
Looking forward to seeing the entries. Similar to others, I feel that P(misalignment x-risk|AGI) is high (at least 35%, and likely >75%), so think that a prize for convincing FF of that should be won. Similar for P(AGI will be developed by January 1, 2043) >45%. But then I’m also not sure what would be needed in addition to all the great pieces of writing on this already out there (some mentioned in OP).
I’m hoping that there will be good entries from Eliezer Yudkowsky (on P(misalignment x-risk|AGI) >75%; previous), Ajeya Cotra (on P(AGI will be developed by January 1, 2043) >45%; previous), Daniel Kokotajlo (on P(AGI will be developed by January 1, 2043) >75%?; previous) and possibly Holden Karnofsky (although I’m not sure his credences for these probabilities are much different to FF’s current baseline; previous). Also Carlsmith says he’s recently (May 2022) updated his probabilities from “~5%” to “>10%” for P(misalignment x-risk) by 2070. This is unconditional, i.e. including p(AGI) by 2070, and his estimate for P(AGI by 2070) is 65%, so that puts him at P(misalignment x-risk|AGI) >15%, so an entry from him (for P(misalignment x-risk|AGI) >35%) could be good too (although just being >15% is still perhaps a far cry from >35%!). Others I think it would be good to see entries for on P(misalignment x-risk|AGI) >75% from are Nate Soares (previous) and Rob Bensinger (previous). Also maybe Rob Miles? (previous).
Is this the largest monetary prize in the world for a piece of writing? Is it also the largest in history?
Re the 2 disagreement votes on the parent comment: is this disagreement over me asking the question(s) (/drawing attention to the fact that it could be true)? Or answering the question(s) in the negative? If the latter, please link to bigger writing prizes.
You should fly anyone who wins over 5k to meet with you in person. They have 1 hour to shift your credences by the same amount the already did (in bayesian terms, not % difference[1]). If they do, you’ll give them the amount of money you already did.
I imagine some people arguing in person will be able to convince you better, both because there will be much greater bandwidth and because it allows for facial expressions and understanding the emotions behind an intellectual position, which are really important.
If you move someone from 90% to 99%, the equivalent increase is to 99.9% not to.. 108%??
That footnote is an important point. People need to learn to use odds ratios. Though I think that with odds ratios, the equivalent increase is to 1 - ((1/99) x ((1/99) / (10/90))) = 99.908%, not the intuitive-looking 99.9%.
Also, the interpretation of odds ratios is often counter-intuitive when comparing test groups of different sizes. If P(X) >> P(~X) or P(X) << P(~X), the probability ratio P(W|X) / P(W|~X) can be very different from the odds ratio [P(W,X) / P(W,~X)] / [P(~W,X) / P(~W,~X)]. (Hope I’ve done that math right. The odds ratio would normally just use counts, but I used probabilities for both to make them more visually comparable.)
Are timelines-probabilities in this post conditional on no major endogenous slowdowns (due to major policy interventions on AI, major conflict due to AI, pivotal acts, safety-based disinclination, etc.)?
No, they are unconditional.
There is also the feedback loop involving the Future Fund itself. As Michael Dickens points out here:
I think it’s much easier to argue that p(misalignment x-risk|AGI) >35% (or 75%) as things stand.
What does “as things stand” mean? If we invented AGI tomorrow? That doesn’t seem like a useful prediction.
I’m thinking more along the lines of how things are with the current level of progress on AI Alignment and AI Governance, or assuming that the needle doesn’t move appreciably on these. In the limit of zero needle movement, this would be equivalent to if AGI was invented tomorrow.
This is a very exciting development!
In your third footnote, you write:
However, an analysis that reassures you that your current estimates are correct can make your beliefs more resilient, and in turn change some of your decisions. For example, such an analysis can make you donate a larger fraction of your assets now, since you expect your beliefs to change less in the future than you did before. It can also make you less willing to run these prize contests, since they are less likely to change your views (or make them even more resilient). So I wonder if you should have instead rewarded participants for either moving your estimates significantly away from your current views or for making your current views significantly more resilient.
Thanks for the feedback! I think this is a reasonable comment, and the main things that prevented us from doing this are:
(i) I thought it would detract from the simplicity of the prize competition, and would be hard to communicate clearly and simply
(ii) I think the main thing that would make our views more robust is seeing what the best arguments are for having quite different views, and this seems like it is addressed by the competition as it stands.
I’m toying with a project to gather reference classes for AGI-induced extinction and AGI takeover. If someone would like to collaborate, please get in touch.
(I’m aware of and giving thought to reference class tennis concerns but still think something like this is neglected.)
Minor nitpick: You describe your subjective probabilities in terms of fair betting odds, but aren’t betting odds misleading/confusing, since if AGI kills everyone, there’s no payout? Even loans that are forgiven or paid back depending on the outcome could be confusing, because the value of money could drastically change, although you could try to adjust for that like inflation. I’m not sure such an adjustment would be accurate, though.
Maybe you could talk about betting odds as if you’re an observer outside this world or otherwise assume away (causal and acausal) influence other than through the payout. Or just don’t use betting odds.
Yes, the intention is roughly something like this.
I’m thinking of writing something for this. Most of the arguments I have in mind address the headline problem only partially. Do you mind if I make a series of, say, 5 posts as a single submission?
For simplicity on our end, I’d appreciate if you had one post at the end that was the “official” entry, which links to the other posts. That would be OK!
Worth noting is that money like this is absolutely capable of shifting people’s beliefs through motivated reasoning. Specifically, I might be tempted to argue for a probability outside the Future Fund’s threshold, and for research I do to be motivated in favor of updating in this direction. Thus, my strategy would be to figure out your beliefs before looking at the contest, then look at the contest to see if you disagree with the Future Fund.
The questions are:
“P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI
AGI will be developed by January 1, 2043
AGI will be developed by January 1, 2100
To be answered as a percentage.
I’m guessing this definition is meant to separate misalignment from misuse, but I’m curious whether you are including either/both of these 2 cases as misalignment x-risk:
AGI is deployed and we get locked into a great outcome by today’s standards, but we get a world with <=1% of the value of “humanity’s potential”. So we sort of have an existential catastrophe, without a discrete catastrophic event.
The AGI is aligned with someone’s values, and we still get a future with 0-negative value but this is due to a “mistake” e.g. the person’s values didn’t give any weight to animal suffering and this got locked in.
(See also this post which brings up a similar question)
1 - counts for purposes of this question
2 - doesn’t count for purposes of this question (but would be a really big deal!)
The “go extinct” condition is a bit fuzzy. It seems like it would be better to express what you want to change your mind about as something more like (forget the term for this). P(go extinct| AGI)/P(go extinct).
I know you’ve written the question in terms of go extinct because of AGI but I worry this leads to relatively trivial/uninformative about AI ways to shift that value upward.
For instance, consider a line of argument:
AGI is quite likely (probably by your own lights) to be developed by 2070.
If AGI is developed either it will suffer from serious alignment problems (so reason to think we go extinct) or it will seem to be reliable and extremely capable so will quickly be placed into key roles controlling things like nukes, military responses etc...
The world is a dangerous place and there is a good possibility that there is a substantial nuclear exchange between countries before 2070 which would substantially curtail our future potential (eg by causing a civ collapse which, due to our use of all much of the easily available fossil fuels/minerals etc we can’t recover from).
By 2 that exchange will, with high probability, have AGI serving as a key element in the causal pathway that leads to the exchange. Even tho the exchange may we’ll have happened w/o AGI it will be the case that the ppl who press the button relied on critical Intel collected by AGI or AGI was placed directly in charge of some of the weapons systems involved in one of the escalating incidents etc...
I think it might be wise to either
a) Shift to a condition in terms of the ratio between chance of extinction and chance of extinction conditional on AGI so the focus is on the effect of AGI on likelihood of extinction.
b) If not that at least clarify the kind of causation required. Is it sufficient that the particular causal pathway that occured include AGI somewhere in it? Can I play even more unfairly and simply point out that by butterfly effect style argument the particular incident that leads to extinction is probably but for caused by almost everything that happens before (if not for some random AI thing years ago the soldiers who provoked the initial confrontation would probably have behaved/been different and instead of that year and incident it would have been one year before or hence).
But hey, if you aren’t going to clarify away these issues or say that you’ll evaluate to the spirit of the Q not technical formulation I’m going to include in my submission (if I find I have the time for one) a whole bunch of technically responsive but not really what you want arguments about how extinction from some cause is relatively likely and that AGI will appear in that causal chain in a way that makes it a cause of the outcome.
I mean I hope you actually judge on something that ensures you’re really learning about impact of AGI but gotta pick up all the allowed percentage points one can ;-).
I agree with this, and the “drastic reduction in long term value” part is even worse. It is implicitly counterfactual—drastic reductions have to be in reference to *something * - but what exactly the proposed counterfactual is is extremely vague. I worry that to some extent this vagueness will lead to people not exploring some answers to the question because they’re trying to self impose a “sensible counterfactual” constraint which, due to vagueness, won’t actually line up well with the kinds of counterfactuals the FTX foundation is interested in exploring.
Would you be able to say a little more about why part of your criteria seems to be degree of probability shift (“We will award larger prizes for larger changes to these probabilities, as follows...”). It seems to me that you might get a case where you could get analyses that offer larger changes but are less robust than some analyses that suggest smaller changes. I didn’t understand how much of your formal evaluation will look at plausibility, argumentation, soundness?
(asking as a curiosity not as a critique)
Plausibility, argumentation, and soundness will be inputs into how much our subjective probabilities change. We framed this in terms of subjective probabilities because it seemed like the easiest way to crisply point at ideas which could change our prioritization in significant ways.
This was gonna be a comment, but it turned into a post about whether large AI forecasting prizes could be suboptimal.
I think there are other AI-related problems that are comparable in seriousness to these two, which you may be neglecting (since you don’t mention them here). These posts describe a few of them, and this post tried to comprehensively list my worries about AI x-risk.
Interesting idea for a competition, but I don’t think that the contest rules as designed and, more specifically, the information hazard policy, are well thought out for any submissions that follow the below line of argumentation when attempting to make the case for longer timelines:
Scaling current deep learning approaches in both compute and data will not be sufficient to achieve AGI, at least within the timeline specified by the competition
This is due to some critical component missing in the design of current deep neural networks
Supposing that this critical component is being ignored by current lines of research and/or has otherwise been deemed intractable, AGI development is likely to proceed slower than the current assumed status quo
The Future Fund should therefore shift some portion of their probability mass for the development of AGI further into the future
Personally, I find the above arguments one of the more compelling cases for longer timelines. However, a crux of these arguments holding true is that these critical components are in fact largely ignored or deemed intractable by current researchers. Making that claim necessarily involves explaining the technology, component, method, etc. in question, which could justifiably be deemed an information hazard, even if we are only describing why this element may be critical rather than how it could be built.
Seems like this type of submission would likely be disqualified despite being exactly the kind of information needed to make informed funding decisions, no?
Question about how judges would handle multiple versions of essays for this competition. (I think this contest is a great idea; I’m just trying to anticipate some practical issues that might arise.)
EA Forum has a ethos of people offering ideas, getting feedback and criticism, and updating their ideas iteratively. For purposes of this contest, how would the judges treat essays that are developed in multiple versions?
For example, suppose a researcher posts version 1.0 of an essay on EA Forum with the “Future Fund worldview prize” tag. They get a bunch of useful feedback from other EA Forum members, refine their arguments, revise their essay, and post version 2.0 on EA Forum a couple weeks later (also with the tag). And so forth… through version 5.0 (or whatever).
1. Which version of the essay would judges be evaluating—version 1.0, or version 5.0, or would they partly also be judging the extent to which the researcher really strengthened their argument through the feedback?
2. If version 1.0 had been published before this contest was announced (on Sept 23), and version 2.0 was published after that date, would version 2.0 be eligible?
3. This versioning issue might raise adverse incentives on forums - e.g. anyone developing their own competition entry might be incentivized to withhold praise or constructive feedback on someone else’s essay, to downvote it, and/or to attack it with unusually incisive or detailed criticism. Or, friends and allies of an essay’s author might be incentivized to lavish it with praise, upvote it, and counter-attack against any criticism.
4. This versioning issue might raise issues with credit assignment, prize money distribution, and potential resentments. For example, suppose a researcher’s version 1.0 gets really helpful feedback on a couple of key points from other forum members, and incorporates their ideas into version 2.0 (possibly crediting them in some way, but not adding them as co-authors). What happens if version 2.0 then wins a big prize? It seems like the author would keep the prize money, but the people who helped them strengthen it might not get any reward, and might feel aggrieved. (And, perhaps anticipating this effect, they might not offer the helpful feedback in the first place.)
I have no great suggestions for how to solve these issues, but I suspect other people might be wondering about them.
I guess the simplest solution would be to say: judges will only consider version 1.0 of any essay, so writers better make it as good as they can, before they get any feedback.
Yes, arguably this prize doesn’t require any original research (or STEM breakthroughs), so could be won just by convincing argumentation based on existing knowledge. Prizes relating to (non-fiction) writing seem like a more relevant reference class than scientific prizes. And this prize seems correspondingly a lot more accessible (and lower effort to enter) on the face of it.
Hi!
I don’t think I will participate in this contest, because:
pursuing AGI is an ethical no-no for me.
I like expert systems technology for what it offers.
I don’t have much background knowledge on AGI risk.
I am not comfortable with subjective probability as you use it for forecasting.
However, after reading about this prize, I have several questions that came up for me as I read it. I thought I would offer them as a good-faith effort to clarify your goals here.
There are significant risks to human well-being, aside from human extinction, that are plausible in the event of AGI or ASI development. Narrowing your question to whether extinction risk is a concern ignores various existential or suffering risks associated with AGI development. Is that what you intend?
EDIT: I believe some edits were made, so this question is no longer current.
Your scenario of a company staffed by AI is implausible without additions to do with legal status of AGI entities. Those additions presume societal changes and AGI governance by existing or new laws. Can you constrain your description of a future where AGI perform tasks to make the legal distinction clear between AGI and software tools?
Your idea of AGI presumably contrasts with rentable software instances that perform tasks and rely on a common pool of knowledge and capabilities. For example, I could rent multiple John Construction Worker instances for manipulation of construction bots for a particular project. I don’t pay the instances, I just pay for them and the robotic construction equipment.
In the event that automation allows AI to perform all human tasks, robot hardware will perform human activities. Robots can have their intelligence and knowledge associated at a hardware level with their bodies. For example, their learning can occur through training of their bodies rather than solely through software downloads, and their learning can be kept as local data only. Their affective experience can be linked to the action of their bodies in particular activities, potentially. They can also bear a superficial similarity to humans, particularly if their robot bodies are humanoid and employ similar senses (vision, hearing, tactile sense). These and other differences fulfill some of the description of a future containing AGI, but have different implications for the type of extinction threat posed by robots. Is that a distinction you consider worth making for the purposes of your contest?
When you write that AGI might do work at a rate of $25/hr, that seems implausible. In particular, a human-like intelligence without the data-processing constraints of a human engaged in a single focused activity (for example, researching a topic) can do some of the planned tasks involved at rates near instantaneous compared to a human. A human might take a week to read a book that an AGI reads in a couple of milli-seconds. Puzzling through the intuitions and logical implications of what was read in the book could take a human months, but an AGI could do the ontology refinement and knowledge development in under a hundred milliseconds. Again, can you constrain your example of AGI working like humans? For example, are you referring mainly to physical labor that AGI perform, perhaps through a humanoid robot?
Assuming an AGI is performing its labor at the speed of software, not the speed of a human exercising their intellect, we can agree that an AGI will do mental labor much faster than humans. However, physical labor depends on tasks and robotic hardware limits as well as software limits, and the speed differences are not as strong between humans and robot hardware. Hardware can get better, but not many orders of magnitude faster than a human (but maybe many more times as precise or reliable).
A further complication is that redefinition of tasks can alter the resource requirements and output constraints of labor. I’m not sure how that would affect an economic model of labor. Can you specify that (1)physical/engineering limits on robotics, and (2)task redefinition, don’t matter to your economic criteria for AGI spread or is discussion of that something you are looking for in essays submitted to you?
Given the possibility that hardware requirements remain expensive to support a purely software entity, those entities might not be available for most types of work. Simpler data-processing tools and robots that seem advanced by our standards but much slower/cheaper/simpler could be widely available while actual AGI and conscious robots (or artificial life) are fairly rare, reserved for very important jobs, where human fallibility is perceived as too costly.
If the outcome of increased productivity and innovation through AI investments pans out, but without AGI participating in most economic activity, is that relevant to your philanthropic interests here? That is, is all you care about the timing of the development of the first AGI, or is it a time when AGI are common or cheap or just when automation is replacing most jobs or something else?
Thank you for your time and good luck with your contest!
Also, good luck to the contestants!
:)
What’s the difference between extinction risk and existential risk?
From the wiki: “An existential risk is the risk of an existential catastrophe, i.e. one that threatens the destruction of humanity’s longterm potential.” That can include getting permanently locked into a totalitarian dictatorship and things of that sort, even if they don’t result in extinction.
Thank you! And doubly thank you for the topic link. In case others are confused, I found the end of this post particularly clear https://forum.effectivealtruism.org/posts/qFdifovCmckujxEsq/existential-risk-is-badly-named-and-leads-to-narrow-focus-on
I am unsure what you mean by AGI. You say:
and:
If someone uses AI capabilities to create a synthetic virus (which they wouldn’t have been able to do in the counterfactual world without that AI-generated capability) and caused the extinction or drastic curtailment of humanity, would that count as “AGI being developed”?
My instinct is that this should not be considered to be AGI — since it is the result of just narrow AI and a human. However the caveat implies that it would count, because an AI system would have powered human extinction.
I get the impression you want to count ‘comprehensive AI systems’ as AGI if the system is able to act ~autonomously from humans[1]. Is that correct?
Putting it another way:
If there is a company powered employs both humans and lots of AI technologies and it brings about a “profound transformation (in economic terms or otherwise)” , I assume the combined capability of the AI-elements of the company should be equivalently general as a single AGI would be to count.
If it does not sum up to that level of generality, but is still used to bring about a transformation, I think that it should not resolve ‘AGI developed’ positively. However, it currently looks like it would resolve it positively.
Thanks, I think this is subtle and I don’t think I expressed this perfectly.
> If someone uses AI capabilities to create a synthetic virus (which they wouldn’t have been able to do in the counterfactual world without that AI-generated capability) and caused the extinction or drastic curtailment of humanity, would that count as “AGI being developed”?
No, I would not count this.
I’d probably count it if the AI a) somehow formed the intention to do this and then developed the pathogen and released it without human direction, but b) couldn’t yet produce as much economic output as full automation of labor.
Okay great, that makes sense to me. Thank you very much for the clarification!
Are essays submitted before December 23rd at an advantage over essays submitted on December 23rd?
No official rules on that. I do think that if you have some back and forth in the comments that’s a way to make your case more convincing, so some edge there.
Is it permitted to submit more than one entry if the entries are on different topics?
(Apologies if this has been answered somewhere already.)
Yes
Question: In this formulation, what is meant by the “current position”? Just asking to be sure.
It could refer to the specific credences outlined above, but it would seem somewhat strange to say (e.g.) “here is what we regard as the canonical critique of ‘AGI will be developed by January 1, 2043 =/= 20%’”. So I am inclined to believe that it probably means something else.
I would love to know, since I might consider writing a critique. In particular, I would love a list of specific points (or beliefs or pieces of writing) that you would like to see critiqued.
How do we tag the post?
The instructions say to tag the post with “Future Fund worldview prize”, but it does not seem possible to do this. Only existing tags can be used for tagging as far as I can tell, and this tag is not in the list of options.
Could you provide a deeper idea of what you mean by “misaligned”?
How do we submit our essay for the contest? Is there an email we send it to or something?
But it is not possible to tag the post with “Future Fund worldview prize”. It seems to me that only existing tags can be used.
I think there was a tag, but it might have gotten deleted. I made a new one — you should be able to use it now.
I think it would be nicer if you say your P(Doom|AGI in 2070) instead of P(Doom|AGI by 2070), because the second one implicitly takes into account your timelines. Also, it would be nicer to have the same years: P(Doom | AGI in 2043) and P(Doom | AGI in 2100)
I disagree. (At least, if defining “nicer” as “more useful to the stated goals for the prizes”.)
As an interested observer, I think it’s an advantage to take timelines into account. Specifically, I think the most compelling way to argue for a particular P(Catastrophe|AGI by 20__) to the FF prize evaluators will be:
states and argues for a timelines distribution in terms of P(AGI in 20__) for a continuous range of 20__s
states and argues for a conditional-catastrophe function in terms of P(Catastrophe|AGI in 20__) over the range
integrates the product over the range to get a P(Catastrophe|AGI by 20__)
argues that the final number isn’t excessively sensitive to small shifts in the timelines distribution or the catastrophe-conditional-on-year function.
An argument which does all of this successfully is significantly more useful to informing the FF’s actions than an argument which only defends a single P(Catastrophe|20__).
I do agree that it would be nice to have the years line up, but as above I do expect a winning argument for P(Catastrophe|AGI by 2070) to more-or-less explicitly inform a P(Catastrophe|AGI by 2043), so I don’t expect a huge loss.
(Not speaking for the prizes organizers/evaluators, just for myself.)