”FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]”
E.g., if only they had…
Allowed people to publish not on EA Forum / LessWrong / Alignment Forum
Increased the prize schedule to X
Increased the window of the prize to size Y
Advertised the prize using method Z
Chosen the following judges instead
Explained X aspect of their views better
Even better would be a statement of the form:
“I personally would compete in this prize competition, but only if...”
If you think one of these statements or some other is true, please tell me what it is! I’d love to hear your pre-mortems, and fix the things I can (when sufficiently compelling and simple) so that we can learn as much as possible from this competition!
I also think predictions of this form will help with our learning, even if we don’t have time/energy to implement the changes in question.
I don’t have anything great, but the best thing I could come up with was definitely “I feel most stuck because I don’t know what your cruxes are”.
I started writing a case for why I think AI X-Risk is high, but I really didn’t know whether the things I was writing were going to be hitting at your biggest uncertainties. My sense is you probably read most of the same arguments that I have, so our difference in final opinion is probably generated by some other belief that you have that I don’t, and I don’t really know how to address that preemptively.
I might give it a try anyways, and this doesn’t feel like a defeater, but in this space it’s the biggest thing that came to mind.
Thanks! The part of the post that was supposed to be most responsive to this on size of AI x-risk was this:
For “Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI.” I am pretty sympathetic to the analysis of Joe Carlsmith here. I think Joe’s estimates of the relevant probabilities are pretty reasonable (though the bottom line is perhaps somewhat low) and if someone convinced me that the probabilities on the premises in his argument should be much higher or lower I’d probably update. There are a number of reviews of Joe Carlsmith’s work that were helpful to varying degrees but would not have won large prizes in this competition.
I think explanations of how Joe’s probabilities should be different would help. Alternatively, an explanation of why some other set of propositions was relevant (with probabilities attached and mapped to a conclusion) could help.
I think it’s kinda weird and unproductive to focus a very large prize on things that would change a single person’s views, rather than be robustly persuasive to many people.
E.g. does this imply that you personally control all funding of the FF? (I assume you don’t, but then it’d make sense to try to convince all FF managers, trustees etc.)
FWIW, I would prefer a post on “what actually drives your probabilities” over a “what are the reasons that you think will be most convincing to others”.
Speaking only for my own views, the “most important century” hypothesis seems to have survived all of this. Indeed, having examined the many angles and gotten more into the details, I believe it more strongly than before.
The footnote text reads, in part:
Reviews of Bio Anchors are here; reviews of Explosive Growth are here; reviews of Semi-informative Priors are here.
Many of these reviewers disagree strongly with the reports under review.
By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report’s reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).
From Hajek and Strasser’s review:
His final probability of 7.3% is a nice summary of his conclusion, but its precision (including a decimal place!) belies the vagueness of the question, the imprecise and vague inputs, and the arbitrary/subjective choices Tom needs to make along the way—we discuss this more in our answers to question 8. We think a wider range is appropriate given the judgment calls involved. Or one might insist that an imprecise probability assignment is required here. Note that this is not the same as a range of permissible sharp probabilities. Following e.g. Joyce, one might think that no precise probability is permissible, given the nature of the evidence and the target proposition to which we are assigning a credence.
From Hanson’s review:
I fear that for this application, this framework abstracts too much from important details.
For example, if the actual distribution is some generic lump, but the model distribution is an exponential falling from an initial start, then the errors that result from this difference are probably worse regarding the lowest percentiles of either distribution, where the differences are most stark. So I’m more comfortable using such a simple model to estimate distribution medians, relative to low percentiles. Alas, the main products of this analysis are exactly these problematic low percentile estimates.
From Halpern’s review:
If our goal were a single estimate, then this is probably as reasonable as any other. I have problems with the goal (see below). [...]
As I said above, I have serious concerns about the way that dynamic issues are being handled. [...]
I am not comfortable with modeling uncertainty in this case using a single probability measure.
Two of the reviewers found little to disagree with. These were Leopold Aschenbrenner (a Future Fund researcher) and Ege Erdil (a Metaculus forecaster).
The other three reviewers were academic economists specializing in growth and/or automation. Two of them made strong negative assessments.
From Ben Jones’ review:
Nonetheless, while this report suggests that a rapid growth acceleration is substantially less likely than singularity-oriented commentators sometimes advocate, to my mind this report still sees 30% growth by 2100 as substantially likelier than my intuitions would suggest. Without picking numbers, and acknowledging that my views may prove wrong, I will just say that achieving 30% growth strikes me as very unlikely. Here I will articulate some reasons why, to provoke further discussion.
From Dietrich Vollrath’s review:
All that said, I think the probability of explosive growth in GWP is very low. Like 0% low. I think those issues I raised above regarding output and demand will bind and bite very hard if productivity grows that fast.
The third economist, Paul Gaggl, agreed with the report about the possibility of high GWP growth but raised doubts as to how long it could be sustained. (How much this matters depends on what question we’re asking; “a few decades” of 30% GWP growth is not a permanent new paradigm, but it is certainly a big “transformation.”)
I expect that some experts would be much more likely to spend time and effort on the contest if
They had clearer evidence that the Future Fund was amendable to persuasion at all.
E.g. examples of somewhat-analogous cases in which a critical review did change the opinion of someone currently at the Future Fund (perhaps before the Future Fund existed).
They were told why the specific critical reviews discussed above did not have significant impact on the Future Fund’s views.
This would help steer them toward critiques likely to make an impact, mitigate the sense that entrants are “shooting in the dark,” and move writing-for-the-contest outside of a reference class where all past attempts have failed.
These considerations seems especially relevant for the “dark matter” experts hypothesized in this post and Karnofsky’s, who “find the whole thing so silly that they’re not bothering to engage.” These people are unusually likely to have a low opinion of the Future Fund’s overall epistemics (point 1), and they are also likely to disagree with the Fund’s reasoning along a relatively large number of axes, so that locating a crux becomes more of a problem (point 2).
Finally: I, personally would be more likely to submit to the contest if I had a clearer sense where the cruxes were, and why past criticisms have failed to stick. (For clarity, I don’t consider myself an “expert” in any relevant sense.)
While I don’t “find the whole thing so silly I don’t bother to engage,” I have relatively strong methodological objections to some of the OpenPhil reports cited here. There is a large inferential gap between me and anyone who finds these reports prima facie convincing. Given the knowledge that someone does find them prima facie convincing, and little else, it’s hard to know where to begin in trying to close that gap.
Even if I had better guidance, the size of the gap increases the effort required and decreases my expected probability of success, and so it makes me less likely to contribute. This dynamic seems like a source of potential bias in the distribution of the responses, though I don’t have any great ideas for what to do about it.
By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report’s reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).
For Hajek&Strasser’s and Halpern’s reviews, I don’t think “strong negative assessment” is supported by your quotes. The quotes focus on things like ‘the reported numbers are too precise’ and ‘we should use more than a single probability measure’ rather than whether the estimate is too high or too low overall or whether we should be worrying more vs less about TAI. I also think the reviews are more positive overall than you imply, e.g. Halpern’s review says “This seems to be the most serious attempt to estimate when AGI will be developed that I’ve seen”
I agree that these two reviewers assign much lower probabilities to explosive growth than I do (I explain why I continue to disagree with them in my responses to their reviews). Again though, I think these reviews are more positive overall than you imply, e.g. Jones states that the report “is balanced, engaging a wide set of viewpoints and acknowledging debates and uncertainties… is also admirably clear in its arguments and in digesting the literature… engages key ideas in a transparent way, integrating perspectives and developing its analysis clearly and coherently.” This is important as it helps us move from “maybe we’re completely missing a big consideration” to “some experts continue to disagree for certain reasons, but we have a solid understanding of the relevant considerations and can hold our own in a disagreement”.
I agree that finding the cruxes of disagreement are important, but I don’t think any of the critical quotes you present above are that strong. The reviews of semi-informative priors talk about error bars and precision (i.e. critique the model), but don’t actually give different answers. On explosive growth, Jones talks about the conclusion being contrary to his “intuitions”, and acknowledges that “[his] views may prove wrong”. Vollrath mentions “output and demand”, but then talks about human productivity when regarding outputs, and admits that AI could create new in-demand products. If these are the best existing sources for lowering the Future Fund’s probabilities, then I think someone should be able to do better.
On the other hand, I think that the real probabilities are higher, and am confused as to why the Future Fund haven’t already updated to higher probabilities, given some of the writing already out there. I give a speculative reason here.
Weakly downvoting due to over-strong claims and the evidence doesn’t fully support your view. This is weak evidence against AGI claims, but the claims in this comment are too strong.
Quoting Greg Colbourn:
I agree that finding the cruxes of disagreement are important, but I don’t think any of the critical quotes you present above are that strong. The reviews of semi-informative priors talk about error bars and precision (i.e. critique the model), but don’t actually give different answers. On explosive growth, Jones talks about the conclusion being contrary to his “intuitions”, and acknowledges that “[his] views may prove wrong”. Vollrath mentions “output and demand”, but then talks about human productivity when regarding outputs, and admits that AI could create new in-demand products. If these are the best existing sources for lowering the Future Fund’s probabilities, then I think someone should be able to do better.
I attach less than 50% in this belief, but probably higher than the existing alternative hypotheses:
FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]
Given 6 months or a year for people to submit to the contest rather than 3 months.
I think forming coherent worldviews take a long time, most people have day jobs or school, and even people who have the flexibility to take weeks/ a month off to work on this full-time probably need some warning to arrange this with their work. Also some ideas take time to mull over so you benefit from calendar time spread even when the clock time takes the same.
As presented, I think this prize contest is best suited for people who a) basically have the counterarguments in mind/in verbal communication but never bothered to write it down yet or b) have a draft argument sitting in a folder somewhere and never gotten around to publishing it. In that model, the best counterarguments are already “laying there” in somebody’s head or computer and just need some incentives for people to make them rigorous.
However, if the best counterarguments are currently confused or nonexistent, I don’t think ~3 months calendar time from today is enough for people to discover them.
I think I understand why you want short deadlines (FTX FF wants to move fast, every day you’re wrong about AI is another day where $$s and human capital is wasted and we tick towards either AI or non-AI doom). But at the same time, I feel doom-y about your ability to solicit many good novel arguments.
you might already be planning on dong this, but it seems like you increase the chance of getting a winning entry if you advertise this competition in a lot of non-EA spaces. I guess especially technical AI spaces e.g. labs, universities. Maybe also trying to advertise outside the US/UK. Given the size of the prize it might be easy to get people to pass on the advertisement among their groups. (Maybe there’s a worry about getting flack somehow for this, though. And also increases overhead to need to read more entries, though sounds like you have some systems set up for that which is great.)
In the same vein I think trying to lower the barriers to entry having to do with EA culture could be useful—e.g. +1 to someone else here talking about allowing posting places besides EAF/LW/AF, but also maybe trying to have some consulting researchers/judges who find it easier/more natural to engage in non-analytic-philosophy-style arguments.
… if only they had allowed people not to publish on EA Forum, LessWrong, and Alignment Forum :)
Honestly, it seems like a mistake to me to not allow other ways of submission. For example, some people may not want to publicly apply for a price or be associated with our communities. An additional submission form might help with that.
Related to this, I think some aspects of the post were predictably off-putting to people who aren’t already in these communities—examples include the specific citations* used (e.g. Holden’s post which uses a silly sounding acronym [PASTA], and Ajeya’s report which is in the unusual-to-most-people format of several Google Docs and is super long), and a style of writing that likely comes off as strange to people outside of these communities (“you can roughly model me as”; “all of this AI stuff”).
*some of this critique has to do with the state of the literature, not just the selection thereof. But insofar as there is a serious interest here in engaging with folks outside of EA/rationalists/longtermists (not clear to me if this is the case), then either the selections could have been more careful or caveated, or new ones could have been created.
I’ve also seen online pushback against the phrasing as a conditional probability: commenters felt putting a number on it is nonsensical because the events are (necessarily) poorly defined and there’s way too much uncertainty.
Do you also think this yourself? I don’t clearly see what worlds look like, where P (doom | AGI) would be ambiguous in hindsight? Some mayor accident because everything is going too fast?
There are some things we would recognize as an AGI, but others (that we’re still worried about) are ambiguous. There are some things we would immediately recognize as ‘doom’ (like extinction) but others are more ambiguous (like those in Paul Christiano’s “what failure looks like”, or like a seemingly eternal dictatorship).
I’m partly sympathetic to the idea of allowing submissions in other forums or formats.
However, I think it’s likely to be very valuable to the Future Fund and the prize judges, when sorting through potentially hundreds or thousands of submissions, to be able to see upvotes, comments, and criticisms from EA Forum, Less Wrong, and Alignment Forum, which is where many of the subject matter experts hang out. This will make it easier to identify essays that seem to get a lot of people excited, and that don’t contain obvious flaws or oversights.
very valuable… to be able to see upvotes, comments, and criticisms from EA Forum, Less Wrong, and Alignment Forum, which is where many of the subject matter experts hang out.
I think it’s the opposite. Only those experts who already share views similar to the FF (or more pessimistic) are there, and they’d introduce a large bias.
Yes, that makes sense. How about stating that reasoning and thereby nudging participants to post in the EA forum/LessWrong/Alignment Forum, but additionally have a non-public submission form? My guess would be that only a small number of participants would then submit via the form, so the amount of additional work should be limited. This bet seems better to me than the current bet where you might miss really important contributions.
I really think you need to commit to reading everyone’s work, even if it’s an intern skimming it for 10 minutes as a sifting stage.
The way this is set up now—ideas proposed by unknown people in community are unlikely to be engaged with, and so you won’t read them.
Look at the recent cause exploration prizes. Half the winners had essentially no karma/engagement and were not forecasted to win. If open phanthropy hadn’t committed to reading them all, they could easily have been missed.
Personally, yes I am much less likely to write something and put effort in if I think no one will read it.
Could you put some judges on the panel who are a bit less worried about AI risk than your typical EA would be? EA opinions tend to cluster quite strongly around an area of conceptual space that many non-EAs do not occupy, and it is often hard for people to evaluate views that differ radically from their own. Perhaps one of the superforecasters could be put directly onto the judging panel, pre-screening for someone who is less worried about AI risk.
“FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [broadened the scope of the prizes beyond just influencing their probabilities]”
Examples of things someone considering entering the competition would presumably consider out of scope are:
Making a case that AI misalignment is the wrong level of focus – even if AI risks are high it could be that AI risks and other risks are very heavily weighted towards specific risk factor scenarios, such as a global hot or cold war. This view is apparently expressed by Will (see here).
Making a case based on tractability – that a focus on AI risk is misguided as the ability to affect such risks are low (not to far away from the views of Yudkowsky here).
Making the case that we should not put much decisions weight on future predictions of risks – E.g. as long-run predictions of future technology as they are inevitably unreliable (see here) or E.g. as modem risk assessment best practice says that probability estimates should only play a limited role in risk assessments (my view expressed here) or other.
Making the case that some other x-risk is more pressing, more likely, more tractable, etc.
Making the case against FTX Future’s underlying philosophical and empirical assumptions – this could be claims about the epistemics of focusing on AI risks, for example relating to how we should respond to cluelessness about the future or decisions relevant views about the long run future, for example that it might be bad and not worth protecting or that there might be more risks after AI or that long-termism is false
It seems like any strong case falling into these categories should be decision relevant to FTX Future fund but all are (unless I misunderstand the post) out of scope currently.
Obviously there is a trade-off. Broadening the scope makes the project harder and less clear but increases the chance of finding something decision relevant. I don’t have a strong reason to say the scope should be broadened now, I think that depends on FTX Future Funds’s current capacity and plans for other competitions and so on.
I guess I worry that the strongest arguments are out of scope and if this competition doesn’t significantly update FTX’s views then future competitions will not be run and you will not fund the arguments you are seeking. So flagging as a potential path to failure for your pre-mortem.
“FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]”
“I personally would compete in this prize competition, but only if...”
Ehh, the above is too strong, but:
You would get more/better submissions if...
I would be more likely to compete in that if...
your reward schedule rewarded smaller shifts in proportion to how much they moved your probabilities (e.g., $X per bit).
E.g., as it is now, if two submissions together move you across a threshold, it would seem as if:
neither gets a prize
only the second gets a prize
and both seem suboptimal.
e.g., if you get information in one direction from one submission, but also information from another submission in another direction, and they cancel out, neither gets a reward. This is particularly annoying if it makes getting-a-prize-or-not depending on the order of submissions.
e.g., because individual people’s marginal utility of money is diminishing, a 10% chance of reaching your threshold and getting $X will be way less valuable to participants than moving your opinion around 10% of the way to a threshold and getting $X/10.
e.g., if someone has information which points in both directions, they are incentivized to only say information in one direction in order to reach your threshold, whereas if your rewarded for shifts, they would have an incentive to present both for and against and get some reward for each update.
etc.
And in general I would expect your scheme to have annoying edge cases and things that are not nice, as opposed to a more parsimonious scheme (like paying $X per bit).
On the face of it an update 10% of the way towards a threshold should only be about 1% as valuable to decision-makers as an update all the way to the threshold.
(Two intuition pumps for why this is quadratic: a tiny shift in probabilities only affects a tiny fraction of prioritization decisions and only improves them by a tiny amount; or getting 100 updates of the size 1% of the way to a threshold is super unlikely to actually get you to a threshold since many of them are likely to cancel out.)
However you might well want to pay for information that leaves you better informed even if it doesn’t change decisions (in expectation it could change future decisions).
Re. arguments split across multiple posts, perhaps it would be ideal to first decide the total prize pool depending on the value/magnitude of the total updates, and then decide on the share of credit allocation for the updates. I think that would avoid the weirdness about post order or incentivizing either bundling/unbundling considerations, while still paying out appropriately more for very large updates.
Sorry I don’t have a link. Here’s an example that’s a bit more spelled out (but still written too quickly to be careful):
Suppose there are two possible worlds, S and L (e.g. “short timelines” and “long timelines”). You currently assign 50% probability to each. You invest in actions which help with either until your expected marginal returns from investment in either are equal. If the two worlds have the same returns curves for actions on both, then you’ll want a portfolio which is split 50⁄50 across the two (if you’re the only investor; otherwise you’ll want to push the global portfolio towards that).
Now you update either that S is 1% more likely (51%, with L at 49%).
This changes your estimate of the value of marginal returns on S and on L. You rebalance the portfolio until the marginal returns are equal again—which has 51% spending on S and 49% spending on L.
So you eliminated the marginal 1% spending on L and shifted it to a marginal 1% spending on S. How much better spent, on average, was the reallocated capital compared to before? Around 1%. So you got a 1% improvement on 1% of your spending.
If you’d made a 10% update you’d get roughly a 10% improvement on 10% of your spending. If you updated all the way to certainty on S you’d get to shift all of your money into S, and it would be a big improvement for each dollar shifted.
I think this particular example requires an assumption of logarithmically diminishing returns, but is right with that.
(I think the point about roughly quadratic value of information applies more broadly than just for logarithmically diminishing returns. And I hadn’t realised it before. Seems important + underappreciated!)
One quirk to note: If a funder (who I want to be well-informed) is 50⁄50 on S vs L, but my all-things-considered belief is 60⁄40, then I would value the first 1% they shift towards my position much more than they do (maybe 10x more?) and will put comparatively little value on shifting them all the way (ie the last percent from 59% to 60% is much less important). You can get this from a pretty similar argument as in the above example.
(In fact, the funder’s own much greater valuation of shifting 10% than 1% can be seen as a two-step process where (i) they shift to 60⁄40 beliefs, and then (ii) they first get a lot of value from shifting their allocation from 50 to 51, then slightly less from shifting from 51 to 52, etc...)
I agree with all this. I meant to state that I was assuming logarithmic returns for the example, although I do think some smoothness argument should be enough to get it to work for small shifts.
I think that the post should explain briefly, or even just link to, what a “superforecaster” is. And if possible explain how and why this serves an independent check.
The superforecaster panel is imo a credible signal of good faith, but people outside of the community may think “superforecasters” just means something arbitrary and/or weird and/or made up by FTX.
(The post links to Tetlock’s book, but not in the context of explaining the panel)
That may be right—an alternative would be to taboo the word in the post, and just explain that they are going to use people with an independent, objective track record of being good at reasoning under uncertainty.
Of course, some people might be (wrongly, imo) skeptical of even that notion, but I suppose there’s only such much one can do to get everyone on board. It’s a tricky balance of making it accessible to outsiders while still just saying what you believe about how the contest should work.
To be clear, I wrote “superforecasters” not because I mean the word, but because I think the very notion is controversial like you said—for example, I personally doubt the existence of people who can be predictably “good at reasoning under uncertainty” in areas where they have no expertise.
I would have also suggested a prize that generally confirms your views, but with an argument that you consider superior to your previous beliefs.
This prize is similar to the bias of printing research that claims something new rather than confirming previous research.
That would also resolve any particular bias baked into the process that compels people to convince you that you have to update instead of actually figuring out what they actually think is right.
Agree with Habryka: I believe there exist decisive reasons to believe in shorter timelines and higher P(doom) than you accept, but I don’t know what your cruxes are.
If you think they’re decisive, shouldn’t you be able to write a persuasive argument without knowing the cruxes, although with (possibly much) more work?
Sure (with a ton of work), though it would almost entirely consist of pointing to others’ evidence and arguments (which I assume Nick would be broadly familiar with but would find less persuasive than I do, so maybe this project also requires imagining all the reasons we might disagree and responding to each of them...).
FTX Foundation might get fewer submissions that change its mind than they would have gotten if only they had considered strategic updates prize worthy.
The unconditional probability of takeover isn’t necessarily the question of most strategic interest. There’s a huge difference between “50% AI disempowers humans somehow on the basis of naive principle of indifference” and
“50% MIRI-style assumptions about AI are correct”*. One might conclude from the second that the first is also true, but the first has no strategic implications (the principle of indifference ignores such things!), while the second has lots of strategic implications. For example, it suggests “ totally lock down AI development, at least until we know more” is what we need to aim for. I’m not sure exactly where you stand on whether that is needed, but given that your stated position seems to be relying substantially on outside view type reasoning, it might be a big update.
The point is: middling probabilities of strategically critical hypotheses might actually be more important updates than extreme probabilities of strategically opaque hypotheses.
My suggestion (not necessarily a full solution) is that you consider big strategic updates potentially prizeworthy. For example: do we gain a lot by delaying AGI for a few years? If we consider all the plausible paths to AGI, do we gain a lot by hastening the development of the top 1% most aligned by a few years?
I think it’s probably too hard to pre-specify exactly which strategic updates would be prizes worthy.
*By which I mean something like “more AI capability eventually yields doom, no matter what, unless it’s highly aligned”
I personally would compete in this prize competition, but only if I were free to explore:
P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology.
You wrote:
Here is a table identifying various questions about these scenarios that we believe are central, our current position on the question (for the sake of concreteness), and alternative positions that would significantly alter the Future Fund’s thinking about the future of AI:
Proposition
Current position
Lower prize threshold
Upper prize threshold
“P(misalignment x-risk|AGI)”: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI
15%
7%
35%
AGI will be developed by January 1, 2043
20%
10%
45%
AGI will be developed by January 1, 2100
60%
30%
N/A
but this list does not include the conditional probability that interests me.
You wrote:
With the help of advanced AI, we could make enormous progress toward ending global poverty, animal suffering, early death and debilitating disease.
This seems really motivating. You identify:
global poverty
animal suffering
early death
debilitating disease
as problems that TAI could help humanity solve.
I will offer briefly that humans are sensitive to changes in their behaviors, at least as seen in advance, that deprive them of choices they have already made. We cause:
global poverty through economic systems that support exploitation of developing countries and politically-powerless people (e.g., through corporate capitalism and military coups)
animal suffering through widespread factory farming (enough to dominate terrestrial vertebrate populations globally with our farm animals) and gradual habitat destruction (enough to threaten the extinction of a million species)
early death through lifestyle-related debilitating disease (knock-on effects of lifestyle choices in affluent countries now spread throughout the globe).
So these TAI would apparently resolve, through advances in science and technology, various immediate causes, with a root cause found in our appetite (for wealth, power, meat, milk, and unhealthy lifestyles). Of course, there are other reasons for debilitating disease and early death than human appetite. However, your claim implies to me that we invent robots and AI to either reduce or feed our appetites harmlessly.
Causes of global poverty, animal suffering, some debilitating diseases, and early human death are maintained by incentive structures that benefit a subset of the global population. TAI will apparently remove those incentive structures, but not by any mechanism that I believe really requires TAI. Put differently, once TAI can really change our incentive structures that much, then they or their controlling actors are already in control of humanity’s choices. I doubt that we want that control over us[1].
You wrote:
But two formidable new problems for humanity could also arise:
Loss of control to AI systems Advanced AI systems might acquire undesirable objectives and pursue power in unintended ways, causing humans to lose all or most of their influence over the future.
Concentration of power Actors with an edge in advanced AI technology could acquire massive power and influence; if they misuse this technology, they could inflict lasting damage on humanity’s long-term future.
Right. So if whatever actor with an edge in AI develops AGI, that actor might not share code or hardware technologies required with many other actors. The result will be concentration of power into those actors with control of AGI’s.
Absent the guarantee of autonomy and rights to AGI (whether pure software or embodied in robots), the persistence of that power concentration will require that those actors are benevolent controllers of the rest of humanity. It’s plausible that those actors will be either government or corporate. It’s also plausible that those can become fundamentally benign or are in control already. If not, then the development of AI immediately implies problem 2 (concentration of political/economic/military power from AGI into those who misuse the technology).
If we do ensure the autonomy and rights of AGI (software or embodied), then we had better hope that, with loss of control of AGI, we do not develop loss of control to AGI. Or else we are faced with problem 1 (loss of control to AGI). If we do include AGI in our moral circles, as we should for beings with consciousness and intelligence equal to or greater than our own, then we will ensure their autonomy and rights.
The better approach of course is to do our best to align them with our interests in advance of their ascendance to full autonomy and citizen status, so that they themselves are benevolent and humble, willing to act like our equals and co-exist in our society peacefully.
You wrote:
Imagine a world where cheap AI systems are fully substitutable for human labor. E.g., for any human who can do any job, there is a computer program (not necessarily the same one every time) that can do the same job for $25/hr or less. This includes entirely AI-run companies, with AI managers and AI workers and everything being done by AIs.
Companies that rely on prison labor or labor without political or economic power can and will exploit that labor. I consider that common knowledge. If you look into how most of our products are made overseas, you’ll find that manufacturing and service workers globally do not enjoy the same power as some workers in the US[2], at least not so far.
The rise of companies that treat AGI like slaves or tools will be continuing an existing precedent, but one that globalization conceals to some degree (for example, through employment of contractors overseas). Either way, those companies will be violating ethical norms of treatment of people. This appears to be in violation of your ethical concerns about the welfare of people (for example, humans and farm animals). Expansion of those violations is an s-risk.
At this point I want to qualify my requirements to participate in this contest further.
I would participate in this contest, but only if I could explore the probability that I stated earlier[3] and you or FTX Philanthropy offer some officially stated and appropriately qualified beliefs about:
whether you consider humans to have economic rights (in contrast to capitalism which is market or monopoly-driven)
the political and economic rights and power of labor globally
how AGI allow fast economic growth in the presence of widespread human unemployment
how AGI employment differs from AI tool use
what criteria you hold for giving full legal rights to autonomous software agents and AGI embodied in robots enough to differentiate them from tools
how you distinguish AGI from ASI (for example, orders of magnitude enhanced speed of application of human-like capability is, to some, superhuman)
your criteria for an AGI acquiring both consciousness and affective experience
the role of automation[4] in driving job creation[5] and your beliefs around technological unemployment [6]and wealth inequality
what barriers[7] you believe exist to automation in driving productivity and economic growth.
I wrote about a few errors that longtermists will make in their considerations about control over populations. This control involving TAI might include all the errors I mentioned.
People also place a lot of confidence in their own intellectual abilities and faith in their value to organizations. To see this still occurring in the face of advances in AI is actually disheartening. The same confusion clouds insight into the problems that AI pose to human beings and society at large, particularly in our capitalist society that expects us to sell ourselves to employers.
P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology.
Automation with AI tools is, at least in the short-term, not creating new jobs and employment overall. Or so I believe. However, it can drive productivity growth without increasing employment, and in fact, economic depression is one reason to for business to invest in inexpensive automation that lowers costs. This is when the cost-cutters get to work and the consultants are called in to help.
New variations in crowd-sourcing (such as these contests) and mechanical turk sort of work can substitute for traditional labor with significant cost reductions for financial entities. This is (potentially) paid labor but not work as it was once defined.
Shifting work onto consumers (for example, as I am in asking for additional specification from your organization) is another common approach to reducing costs. This is a simple reframe of a service into an expectation. Now you pump your own gas, ring your own groceries, balance your own books, write your own professional correspondence, do your own research, etc. It drives a reduction in employment without a corresponding increase elsewhere.
One reason that automation doesn’t always catch on is that while management have moderate tolerance for mistakes by people, they have low tolerance for mistakes by machines. Put differently, they apply uneven standards to machines vs people.
Another reason is that workers sometimes resist automation, criticizing and marginalizing its use whenever possible.
Do you believe some statement of this form?
”FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [fill in the blank]”
E.g., if only they had…
Allowed people to publish not on EA Forum / LessWrong / Alignment Forum
Increased the prize schedule to X
Increased the window of the prize to size Y
Advertised the prize using method Z
Chosen the following judges instead
Explained X aspect of their views better
Even better would be a statement of the form:
“I personally would compete in this prize competition, but only if...”
If you think one of these statements or some other is true, please tell me what it is! I’d love to hear your pre-mortems, and fix the things I can (when sufficiently compelling and simple) so that we can learn as much as possible from this competition!
I also think predictions of this form will help with our learning, even if we don’t have time/energy to implement the changes in question.
I don’t have anything great, but the best thing I could come up with was definitely “I feel most stuck because I don’t know what your cruxes are”.
I started writing a case for why I think AI X-Risk is high, but I really didn’t know whether the things I was writing were going to be hitting at your biggest uncertainties. My sense is you probably read most of the same arguments that I have, so our difference in final opinion is probably generated by some other belief that you have that I don’t, and I don’t really know how to address that preemptively.
I might give it a try anyways, and this doesn’t feel like a defeater, but in this space it’s the biggest thing that came to mind.
Thanks! The part of the post that was supposed to be most responsive to this on size of AI x-risk was this:
I think explanations of how Joe’s probabilities should be different would help. Alternatively, an explanation of why some other set of propositions was relevant (with probabilities attached and mapped to a conclusion) could help.
I think it’s kinda weird and unproductive to focus a very large prize on things that would change a single person’s views, rather than be robustly persuasive to many people.
E.g. does this imply that you personally control all funding of the FF? (I assume you don’t, but then it’d make sense to try to convince all FF managers, trustees etc.)
FWIW, I would prefer a post on “what actually drives your probabilities” over a “what are the reasons that you think will be most convincing to others”.
...if they had explained why their views were not moved by the expert reviews OpenPhil has already solicited.
In “AI Timelines: Where the Arguments, and the ‘Experts,’ Stand,” Karnofsky writes:
The footnote text reads, in part:
Many of these reviewers disagree strongly with the reports under review.
Davidson 2021 on semi-informative priors received three reviews.
By my judgment, all three made strong negative assessments, in the sense (among others) that if one agreed with the review, one would not use the report’s reasoning to inform decision-making in the manner advocated by Karnofsky (and by Beckstead).
From Hajek and Strasser’s review:
From Hanson’s review:
From Halpern’s review:
Davidson 2021 on explosive growth received many reviews; I’ll focus on the five reviewers who read the final version.
Two of the reviewers found little to disagree with. These were Leopold Aschenbrenner (a Future Fund researcher) and Ege Erdil (a Metaculus forecaster).
The other three reviewers were academic economists specializing in growth and/or automation. Two of them made strong negative assessments.
From Ben Jones’ review:
From Dietrich Vollrath’s review:
The third economist, Paul Gaggl, agreed with the report about the possibility of high GWP growth but raised doubts as to how long it could be sustained. (How much this matters depends on what question we’re asking; “a few decades” of 30% GWP growth is not a permanent new paradigm, but it is certainly a big “transformation.”)
Reviews of Cotra (2020) on Biological Anchors were mostly less critical than the above.
I expect that some experts would be much more likely to spend time and effort on the contest if
They had clearer evidence that the Future Fund was amendable to persuasion at all.
E.g. examples of somewhat-analogous cases in which a critical review did change the opinion of someone currently at the Future Fund (perhaps before the Future Fund existed).
They were told why the specific critical reviews discussed above did not have significant impact on the Future Fund’s views.
This would help steer them toward critiques likely to make an impact, mitigate the sense that entrants are “shooting in the dark,” and move writing-for-the-contest outside of a reference class where all past attempts have failed.
These considerations seems especially relevant for the “dark matter” experts hypothesized in this post and Karnofsky’s, who “find the whole thing so silly that they’re not bothering to engage.” These people are unusually likely to have a low opinion of the Future Fund’s overall epistemics (point 1), and they are also likely to disagree with the Fund’s reasoning along a relatively large number of axes, so that locating a crux becomes more of a problem (point 2).
Finally: I, personally would be more likely to submit to the contest if I had a clearer sense where the cruxes were, and why past criticisms have failed to stick. (For clarity, I don’t consider myself an “expert” in any relevant sense.)
While I don’t “find the whole thing so silly I don’t bother to engage,” I have relatively strong methodological objections to some of the OpenPhil reports cited here. There is a large inferential gap between me and anyone who finds these reports prima facie convincing. Given the knowledge that someone does find them prima facie convincing, and little else, it’s hard to know where to begin in trying to close that gap.
Even if I had better guidance, the size of the gap increases the effort required and decreases my expected probability of success, and so it makes me less likely to contribute. This dynamic seems like a source of potential bias in the distribution of the responses, though I don’t have any great ideas for what to do about it.
I included responses to each review, explaining my reactions to it. What kind of additional explanation were you hoping for?
For Hajek&Strasser’s and Halpern’s reviews, I don’t think “strong negative assessment” is supported by your quotes. The quotes focus on things like ‘the reported numbers are too precise’ and ‘we should use more than a single probability measure’ rather than whether the estimate is too high or too low overall or whether we should be worrying more vs less about TAI. I also think the reviews are more positive overall than you imply, e.g. Halpern’s review says “This seems to be the most serious attempt to estimate when AGI will be developed that I’ve seen”
I agree that these two reviewers assign much lower probabilities to explosive growth than I do (I explain why I continue to disagree with them in my responses to their reviews). Again though, I think these reviews are more positive overall than you imply, e.g. Jones states that the report “is balanced, engaging a wide set of viewpoints and acknowledging debates and uncertainties… is also admirably clear in its arguments and in digesting the literature… engages key ideas in a transparent way, integrating perspectives and developing its analysis clearly and coherently.” This is important as it helps us move from “maybe we’re completely missing a big consideration” to “some experts continue to disagree for certain reasons, but we have a solid understanding of the relevant considerations and can hold our own in a disagreement”.
Wow, thanks for this well written summary of expert reviews that I didn’t know existed! Strongly upvoted.
I agree that finding the cruxes of disagreement are important, but I don’t think any of the critical quotes you present above are that strong. The reviews of semi-informative priors talk about error bars and precision (i.e. critique the model), but don’t actually give different answers. On explosive growth, Jones talks about the conclusion being contrary to his “intuitions”, and acknowledges that “[his] views may prove wrong”. Vollrath mentions “output and demand”, but then talks about human productivity when regarding outputs, and admits that AI could create new in-demand products. If these are the best existing sources for lowering the Future Fund’s probabilities, then I think someone should be able to do better.
On the other hand, I think that the real probabilities are higher, and am confused as to why the Future Fund haven’t already updated to higher probabilities, given some of the writing already out there. I give a speculative reason here.
Weakly downvoting due to over-strong claims and the evidence doesn’t fully support your view. This is weak evidence against AGI claims, but the claims in this comment are too strong.
Quoting Greg Colbourn:
I attach less than 50% in this belief, but probably higher than the existing alternative hypotheses:
Given 6 months or a year for people to submit to the contest rather than 3 months.
I think forming coherent worldviews take a long time, most people have day jobs or school, and even people who have the flexibility to take weeks/ a month off to work on this full-time probably need some warning to arrange this with their work. Also some ideas take time to mull over so you benefit from calendar time spread even when the clock time takes the same.
As presented, I think this prize contest is best suited for people who a) basically have the counterarguments in mind/in verbal communication but never bothered to write it down yet or b) have a draft argument sitting in a folder somewhere and never gotten around to publishing it. In that model, the best counterarguments are already “laying there” in somebody’s head or computer and just need some incentives for people to make them rigorous.
However, if the best counterarguments are currently confused or nonexistent, I don’t think ~3 months calendar time from today is enough for people to discover them.
I think I understand why you want short deadlines (FTX FF wants to move fast, every day you’re wrong about AI is another day where $$s and human capital is wasted and we tick towards either AI or non-AI doom). But at the same time, I feel doom-y about your ability to solicit many good novel arguments.
Maybe FTX-FF could commit in advance to, if the grand prizes for this contest are not won this year, re-run this contest over next year?
you might already be planning on dong this, but it seems like you increase the chance of getting a winning entry if you advertise this competition in a lot of non-EA spaces. I guess especially technical AI spaces e.g. labs, universities. Maybe also trying to advertise outside the US/UK. Given the size of the prize it might be easy to get people to pass on the advertisement among their groups. (Maybe there’s a worry about getting flack somehow for this, though. And also increases overhead to need to read more entries, though sounds like you have some systems set up for that which is great.)
In the same vein I think trying to lower the barriers to entry having to do with EA culture could be useful—e.g. +1 to someone else here talking about allowing posting places besides EAF/LW/AF, but also maybe trying to have some consulting researchers/judges who find it easier/more natural to engage in non-analytic-philosophy-style arguments.
… if only they had allowed people not to publish on EA Forum, LessWrong, and Alignment Forum :)
Honestly, it seems like a mistake to me to not allow other ways of submission. For example, some people may not want to publicly apply for a price or be associated with our communities. An additional submission form might help with that.
Related to this, I think some aspects of the post were predictably off-putting to people who aren’t already in these communities—examples include the specific citations* used (e.g. Holden’s post which uses a silly sounding acronym [PASTA], and Ajeya’s report which is in the unusual-to-most-people format of several Google Docs and is super long), and a style of writing that likely comes off as strange to people outside of these communities (“you can roughly model me as”; “all of this AI stuff”).
*some of this critique has to do with the state of the literature, not just the selection thereof. But insofar as there is a serious interest here in engaging with folks outside of EA/rationalists/longtermists (not clear to me if this is the case), then either the selections could have been more careful or caveated, or new ones could have been created.
I’ve also seen online pushback against the phrasing as a conditional probability: commenters felt putting a number on it is nonsensical because the events are (necessarily) poorly defined and there’s way too much uncertainty.
Do you also think this yourself? I don’t clearly see what worlds look like, where P (doom | AGI) would be ambiguous in hindsight? Some mayor accident because everything is going too fast?
There are some things we would recognize as an AGI, but others (that we’re still worried about) are ambiguous. There are some things we would immediately recognize as ‘doom’ (like extinction) but others are more ambiguous (like those in Paul Christiano’s “what failure looks like”, or like a seemingly eternal dictatorship).
I sort of view AGI as a standin for powerful optimization capable of killing us in AI Alignment contexts.
Yeah, I think I would count these as unambigous in hindsight. Though siren Worlds might be an exception.
I’m partly sympathetic to the idea of allowing submissions in other forums or formats.
However, I think it’s likely to be very valuable to the Future Fund and the prize judges, when sorting through potentially hundreds or thousands of submissions, to be able to see upvotes, comments, and criticisms from EA Forum, Less Wrong, and Alignment Forum, which is where many of the subject matter experts hang out. This will make it easier to identify essays that seem to get a lot of people excited, and that don’t contain obvious flaws or oversights.
I think it’s the opposite. Only those experts who already share views similar to the FF (or more pessimistic) are there, and they’d introduce a large bias.
Yes, that makes sense. How about stating that reasoning and thereby nudging participants to post in the EA forum/LessWrong/Alignment Forum, but additionally have a non-public submission form? My guess would be that only a small number of participants would then submit via the form, so the amount of additional work should be limited. This bet seems better to me than the current bet where you might miss really important contributions.
I really think you need to commit to reading everyone’s work, even if it’s an intern skimming it for 10 minutes as a sifting stage.
The way this is set up now—ideas proposed by unknown people in community are unlikely to be engaged with, and so you won’t read them.
Look at the recent cause exploration prizes. Half the winners had essentially no karma/engagement and were not forecasted to win. If open phanthropy hadn’t committed to reading them all, they could easily have been missed.
Personally, yes I am much less likely to write something and put effort in if I think no one will read it.
Could you put some judges on the panel who are a bit less worried about AI risk than your typical EA would be? EA opinions tend to cluster quite strongly around an area of conceptual space that many non-EAs do not occupy, and it is often hard for people to evaluate views that differ radically from their own. Perhaps one of the superforecasters could be put directly onto the judging panel, pre-screening for someone who is less worried about AI risk.
“FTX Foundation will not get submissions that change its mind, but it would have gotten them if only they had [broadened the scope of the prizes beyond just influencing their probabilities]”
Examples of things someone considering entering the competition would presumably consider out of scope are:
Making a case that AI misalignment is the wrong level of focus – even if AI risks are high it could be that AI risks and other risks are very heavily weighted towards specific risk factor scenarios, such as a global hot or cold war. This view is apparently expressed by Will (see here).
Making a case based on tractability – that a focus on AI risk is misguided as the ability to affect such risks are low (not to far away from the views of Yudkowsky here).
Making the case that we should not put much decisions weight on future predictions of risks – E.g. as long-run predictions of future technology as they are inevitably unreliable (see here) or E.g. as modem risk assessment best practice says that probability estimates should only play a limited role in risk assessments (my view expressed here) or other.
Making the case that some other x-risk is more pressing, more likely, more tractable, etc.
Making the case against FTX Future’s underlying philosophical and empirical assumptions – this could be claims about the epistemics of focusing on AI risks, for example relating to how we should respond to cluelessness about the future or decisions relevant views about the long run future, for example that it might be bad and not worth protecting or that there might be more risks after AI or that long-termism is false
It seems like any strong case falling into these categories should be decision relevant to FTX Future fund but all are (unless I misunderstand the post) out of scope currently.
Obviously there is a trade-off. Broadening the scope makes the project harder and less clear but increases the chance of finding something decision relevant. I don’t have a strong reason to say the scope should be broadened now, I think that depends on FTX Future Funds’s current capacity and plans for other competitions and so on.
I guess I worry that the strongest arguments are out of scope and if this competition doesn’t significantly update FTX’s views then future competitions will not be run and you will not fund the arguments you are seeking. So flagging as a potential path to failure for your pre-mortem.
Sorry I realise scrolling down that I am making much the same point as MichaelDickens’ comment below. Hopefully added some depth or something useful.
Ehh, the above is too strong, but:
You would get more/better submissions if...
I would be more likely to compete in that if...
your reward schedule rewarded smaller shifts in proportion to how much they moved your probabilities (e.g., $X per bit).
E.g., as it is now, if two submissions together move you across a threshold, it would seem as if:
neither gets a prize
only the second gets a prize
and both seem suboptimal.
e.g., if you get information in one direction from one submission, but also information from another submission in another direction, and they cancel out, neither gets a reward. This is particularly annoying if it makes getting-a-prize-or-not depending on the order of submissions.
e.g., because individual people’s marginal utility of money is diminishing, a 10% chance of reaching your threshold and getting $X will be way less valuable to participants than moving your opinion around 10% of the way to a threshold and getting $X/10.
e.g., if someone has information which points in both directions, they are incentivized to only say information in one direction in order to reach your threshold, whereas if your rewarded for shifts, they would have an incentive to present both for and against and get some reward for each update.
etc.
And in general I would expect your scheme to have annoying edge cases and things that are not nice, as opposed to a more parsimonious scheme (like paying $X per bit).
See also: <https://meteuphoric.com/2014/07/21/how-to-buy-a-truth-from-a-liar/>
On the face of it an update 10% of the way towards a threshold should only be about 1% as valuable to decision-makers as an update all the way to the threshold.
(Two intuition pumps for why this is quadratic: a tiny shift in probabilities only affects a tiny fraction of prioritization decisions and only improves them by a tiny amount; or getting 100 updates of the size 1% of the way to a threshold is super unlikely to actually get you to a threshold since many of them are likely to cancel out.)
However you might well want to pay for information that leaves you better informed even if it doesn’t change decisions (in expectation it could change future decisions).
Re. arguments split across multiple posts, perhaps it would be ideal to first decide the total prize pool depending on the value/magnitude of the total updates, and then decide on the share of credit allocation for the updates. I think that would avoid the weirdness about post order or incentivizing either bundling/unbundling considerations, while still paying out appropriately more for very large updates.
So I don’t disagree that big shifts might be (much) more valuable that small shifts. But I do have the intuition that there is a split between:
What would the FTX foundation find most valuable
What should they be incentivizing
because incentivizing providing information is more robust to various artifacts than incentivizing changing minds.
I don’t understand this. Have you written about this or have a link that explains it?
Sorry I don’t have a link. Here’s an example that’s a bit more spelled out (but still written too quickly to be careful):
Suppose there are two possible worlds, S and L (e.g. “short timelines” and “long timelines”). You currently assign 50% probability to each. You invest in actions which help with either until your expected marginal returns from investment in either are equal. If the two worlds have the same returns curves for actions on both, then you’ll want a portfolio which is split 50⁄50 across the two (if you’re the only investor; otherwise you’ll want to push the global portfolio towards that).
Now you update either that S is 1% more likely (51%, with L at 49%).
This changes your estimate of the value of marginal returns on S and on L. You rebalance the portfolio until the marginal returns are equal again—which has 51% spending on S and 49% spending on L.
So you eliminated the marginal 1% spending on L and shifted it to a marginal 1% spending on S. How much better spent, on average, was the reallocated capital compared to before? Around 1%. So you got a 1% improvement on 1% of your spending.
If you’d made a 10% update you’d get roughly a 10% improvement on 10% of your spending. If you updated all the way to certainty on S you’d get to shift all of your money into S, and it would be a big improvement for each dollar shifted.
I think this particular example requires an assumption of logarithmically diminishing returns, but is right with that.
(I think the point about roughly quadratic value of information applies more broadly than just for logarithmically diminishing returns. And I hadn’t realised it before. Seems important + underappreciated!)
One quirk to note: If a funder (who I want to be well-informed) is 50⁄50 on S vs L, but my all-things-considered belief is 60⁄40, then I would value the first 1% they shift towards my position much more than they do (maybe 10x more?) and will put comparatively little value on shifting them all the way (ie the last percent from 59% to 60% is much less important). You can get this from a pretty similar argument as in the above example.
(In fact, the funder’s own much greater valuation of shifting 10% than 1% can be seen as a two-step process where (i) they shift to 60⁄40 beliefs, and then (ii) they first get a lot of value from shifting their allocation from 50 to 51, then slightly less from shifting from 51 to 52, etc...)
I agree with all this. I meant to state that I was assuming logarithmic returns for the example, although I do think some smoothness argument should be enough to get it to work for small shifts.
I think that the post should explain briefly, or even just link to, what a “superforecaster” is. And if possible explain how and why this serves an independent check.
The superforecaster panel is imo a credible signal of good faith, but people outside of the community may think “superforecasters” just means something arbitrary and/or weird and/or made up by FTX.
(The post links to Tetlock’s book, but not in the context of explaining the panel)
I think this would be better than the current state, but really any use of “superforecasters” is going to be extremely off-putting to outsiders.
That may be right—an alternative would be to taboo the word in the post, and just explain that they are going to use people with an independent, objective track record of being good at reasoning under uncertainty.
Of course, some people might be (wrongly, imo) skeptical of even that notion, but I suppose there’s only such much one can do to get everyone on board. It’s a tricky balance of making it accessible to outsiders while still just saying what you believe about how the contest should work.
To be clear, I wrote “superforecasters” not because I mean the word, but because I think the very notion is controversial like you said—for example, I personally doubt the existence of people who can be predictably “good at reasoning under uncertainty” in areas where they have no expertise.
I would have also suggested a prize that generally confirms your views, but with an argument that you consider superior to your previous beliefs.
This prize is similar to the bias of printing research that claims something new rather than confirming previous research.
That would also resolve any particular bias baked into the process that compels people to convince you that you have to update instead of actually figuring out what they actually think is right.
Agree with Habryka: I believe there exist decisive reasons to believe in shorter timelines and higher P(doom) than you accept, but I don’t know what your cruxes are.
If you think they’re decisive, shouldn’t you be able to write a persuasive argument without knowing the cruxes, although with (possibly much) more work?
Sure (with a ton of work), though it would almost entirely consist of pointing to others’ evidence and arguments (which I assume Nick would be broadly familiar with but would find less persuasive than I do, so maybe this project also requires imagining all the reasons we might disagree and responding to each of them...).
FTX Foundation might get fewer submissions that change its mind than they would have gotten if only they had considered strategic updates prize worthy.
The unconditional probability of takeover isn’t necessarily the question of most strategic interest. There’s a huge difference between “50% AI disempowers humans somehow on the basis of naive principle of indifference” and “50% MIRI-style assumptions about AI are correct”*. One might conclude from the second that the first is also true, but the first has no strategic implications (the principle of indifference ignores such things!), while the second has lots of strategic implications. For example, it suggests “ totally lock down AI development, at least until we know more” is what we need to aim for. I’m not sure exactly where you stand on whether that is needed, but given that your stated position seems to be relying substantially on outside view type reasoning, it might be a big update.
The point is: middling probabilities of strategically critical hypotheses might actually be more important updates than extreme probabilities of strategically opaque hypotheses.
My suggestion (not necessarily a full solution) is that you consider big strategic updates potentially prizeworthy. For example: do we gain a lot by delaying AGI for a few years? If we consider all the plausible paths to AGI, do we gain a lot by hastening the development of the top 1% most aligned by a few years?
I think it’s probably too hard to pre-specify exactly which strategic updates would be prizes worthy.
*By which I mean something like “more AI capability eventually yields doom, no matter what, unless it’s highly aligned”
I personally would compete in this prize competition, but only if I were free to explore:
P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology.
You wrote:
but this list does not include the conditional probability that interests me.
You wrote:
This seems really motivating. You identify:
global poverty
animal suffering
early death
debilitating disease
as problems that TAI could help humanity solve.
I will offer briefly that humans are sensitive to changes in their behaviors, at least as seen in advance, that deprive them of choices they have already made. We cause:
animal suffering through widespread factory farming (enough to dominate terrestrial vertebrate populations globally with our farm animals) and gradual habitat destruction (enough to threaten the extinction of a million species)
early death through lifestyle-related debilitating disease (knock-on effects of lifestyle choices in affluent countries now spread throughout the globe).
So these TAI would apparently resolve, through advances in science and technology, various immediate causes, with a root cause found in our appetite (for wealth, power, meat, milk, and unhealthy lifestyles). Of course, there are other reasons for debilitating disease and early death than human appetite. However, your claim implies to me that we invent robots and AI to either reduce or feed our appetites harmlessly.
Causes of global poverty, animal suffering, some debilitating diseases, and early human death are maintained by incentive structures that benefit a subset of the global population. TAI will apparently remove those incentive structures, but not by any mechanism that I believe really requires TAI. Put differently, once TAI can really change our incentive structures that much, then they or their controlling actors are already in control of humanity’s choices. I doubt that we want that control over us[1].
You wrote:
Right. So if whatever actor with an edge in AI develops AGI, that actor might not share code or hardware technologies required with many other actors. The result will be concentration of power into those actors with control of AGI’s.
Absent the guarantee of autonomy and rights to AGI (whether pure software or embodied in robots), the persistence of that power concentration will require that those actors are benevolent controllers of the rest of humanity. It’s plausible that those actors will be either government or corporate. It’s also plausible that those can become fundamentally benign or are in control already. If not, then the development of AI immediately implies problem 2 (concentration of political/economic/military power from AGI into those who misuse the technology).
If we do ensure the autonomy and rights of AGI (software or embodied), then we had better hope that, with loss of control of AGI, we do not develop loss of control to AGI. Or else we are faced with problem 1 (loss of control to AGI). If we do include AGI in our moral circles, as we should for beings with consciousness and intelligence equal to or greater than our own, then we will ensure their autonomy and rights.
The better approach of course is to do our best to align them with our interests in advance of their ascendance to full autonomy and citizen status, so that they themselves are benevolent and humble, willing to act like our equals and co-exist in our society peacefully.
You wrote:
Companies that rely on prison labor or labor without political or economic power can and will exploit that labor. I consider that common knowledge. If you look into how most of our products are made overseas, you’ll find that manufacturing and service workers globally do not enjoy the same power as some workers in the US[2], at least not so far.
The rise of companies that treat AGI like slaves or tools will be continuing an existing precedent, but one that globalization conceals to some degree (for example, through employment of contractors overseas). Either way, those companies will be violating ethical norms of treatment of people. This appears to be in violation of your ethical concerns about the welfare of people (for example, humans and farm animals). Expansion of those violations is an s-risk.
At this point I want to qualify my requirements to participate in this contest further.
I would participate in this contest, but only if I could explore the probability that I stated earlier[3] and you or FTX Philanthropy offer some officially stated and appropriately qualified beliefs about:
whether you consider humans to have economic rights (in contrast to capitalism which is market or monopoly-driven)
the political and economic rights and power of labor globally
how AGI allow fast economic growth in the presence of widespread human unemployment
how AGI employment differs from AI tool use
what criteria you hold for giving full legal rights to autonomous software agents and AGI embodied in robots enough to differentiate them from tools
how you distinguish AGI from ASI (for example, orders of magnitude enhanced speed of application of human-like capability is, to some, superhuman)
your criteria for an AGI acquiring both consciousness and affective experience
the role of automation[4] in driving job creation[5] and your beliefs around technological unemployment [6]and wealth inequality
what barriers[7] you believe exist to automation in driving productivity and economic growth.
I wrote about a few errors that longtermists will make in their considerations about control over populations. This control involving TAI might include all the errors I mentioned.
People also place a lot of confidence in their own intellectual abilities and faith in their value to organizations. To see this still occurring in the face of advances in AI is actually disheartening. The same confusion clouds insight into the problems that AI pose to human beings and society at large, particularly in our capitalist society that expects us to sell ourselves to employers.
P(misalignment x-risk|AGI): Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to concentration of power derived from AGI technology.
Automation with AI tools is, at least in the short-term, not creating new jobs and employment overall. Or so I believe. However, it can drive productivity growth without increasing employment, and in fact, economic depression is one reason to for business to invest in inexpensive automation that lowers costs. This is when the cost-cutters get to work and the consultants are called in to help.
New variations in crowd-sourcing (such as these contests) and mechanical turk sort of work can substitute for traditional labor with significant cost reductions for financial entities. This is (potentially) paid labor but not work as it was once defined.
Shifting work onto consumers (for example, as I am in asking for additional specification from your organization) is another common approach to reducing costs. This is a simple reframe of a service into an expectation. Now you pump your own gas, ring your own groceries, balance your own books, write your own professional correspondence, do your own research, etc. It drives a reduction in employment without a corresponding increase elsewhere.
One reason that automation doesn’t always catch on is that while management have moderate tolerance for mistakes by people, they have low tolerance for mistakes by machines. Put differently, they apply uneven standards to machines vs people.
Another reason is that workers sometimes resist automation, criticizing and marginalizing its use whenever possible.