AI strategy & governance. Blog: Not Optional.
Zach Stein-Perlman
Thanks for your comments!
My first gut reaction is skepticism that [a “pretty good” scenario] is a likely or stable state.
I certainly agree that Earthly utopia won’t happen; I just wrote that to illustrate how prosaic values would be disastrous in some circumstances. But here are some similar things that I think are very possible:
Scenarios where some choices that are excellent by prosaic standards unintentionally make great futures unlikely or impossible.
Scenarios where the choices that would tend to promote great futures are very weird by prosaic standards and fail to achieve the level of consensus necessary for adoption.
In retrospect, I should have thought and written more about failure scenarios instead of just risk factors for those scenarios. I expect to revise this post, and failure scenarios would be an important addition. For now, here’s my baseline intuition for a “pretty good” future:
After an intelligence explosion, a state controls aligned superintelligence. Political elites
are not familiar with ideas like long reflection and indirect normativity,
do not understand why such ideas are important,
are constrained from pursuing such goals ( or perhaps because opposed factions can veto such ideas), or
do not get to decide what to do with superintelligence because the state’s decisionmaking system is bound by prior decisions about how powerful AI should be used (either directly, by forbidding great uses of AI, or indirectly, by giving decisionmaking power to groups unlikely to choose a great future)
So the state initially uses AI in prosaic ways and, roughly speaking, thinks of AI in prosaic ways. I don’t have a great model of what happens to our cosmic endowment in this scenario, but since we’re at the point where unwise individuals/institutions are empowered, the following all feel possible:
We optimize for something prosaic
We lock in a choice that disallows intentionally optimizing for anything
We enter a stable state in which we do not choose to optimize for anything
I don’t have much to say about Hanson right now, but I’ll note that a future that involves status-seeking humans making decisions about cosmic-scale policy (for more than a transition period to locking in something great) is probably a failure; success looks more like optimizing the universe.
I suspect that a large fraction of people who seriously start thinking about the longterm future of humanity fall into the camp that you consider “people/institutions that currently want great outcomes”
Historically, sure. But I think that’s due to selection: the people who think about the longterm future are mostly rationalist/EA-aligned. I would be very surprised if a similar fraction of a more representative group had the wisdom/humility/whatever to want a great future, much less the background to understand why we even have a “pretty good” future problem.
If this is true, one might suspect that this will become a much stronger faction
I suspect that humans and institutions will converge considerably towards the “making most of our endowment” stance.
This would surprise me. I expect poor discourse (in the US, at least) about how to use powerful AI. In particular, I expect:
The discourse will focus on prosaic issues like privacy and the future of work.
People will assume that the universe-scale future looks like “humans flying around in spaceships” and debate what those humans should do (rather than “superintelligent von Neumann probes optimizing for something” and debate what they should optimize for — much less recognize that we shouldn’t be thinking about what they should optimize for; we should delegate that decision to a better system than current human judgment).
(Also, your comment implies that aligned superintelligence will try to optimize for all humans’ preferences. I would be surprised if this occurs; I expect aligned superintelligence to try to do what its controller says.)
I would be very excited to call to discuss this further. Please PM me if you’re interested.
It seems really useful to me to understand better how likely states will end up calling the shots.
Yes, absolutely. I think this largely depends on the extent to which political elites appreciate AI’s importance; I expect that political elites will appreciate AI and take action in a few years, years before an intelligence explosion. I want to read/think/talk about this.
While big tech companies will probably come up with more strategies, I’m skeptical about their ability to not be nationalized or closely supervised by states. In response to your specific suggestions:
I think states are broadly able to seize property in their territory. To secure autonomy, I think a corporation would have to get the government to legally bind itself. I can’t imagine the US or China doing this. Perhaps a US corporation could make a deal with another government and move its relevant hardware to that state before the US appreciates AI or before the US has time to respond? That would be quite radical. Given the major national security implications of AI, even such a move might not guarantee autonomy. But I think corporations would probably have to move somehow to maintain autonomy if there was political will and a public mandate for nationalization.
I don’t understand. But if the US and China appreciate AI’s national security implications, they won’t be distracted.
I don’t understand “assembling . . . ability,” but corporations intentionally making AI feel nonthreatening is interesting. I hadn’t thought about this. Hmm. This might be a factor. But there’s only so much that making systems feel nonthreatening can do. If political elites appreciate AI, then it won’t matter whether currently-deployed AI systems feel nonthreatening: there will be oversight. It’s also very possible that the US will have a Sputnik moment for AI and then there’s strong pressure for a national AI project independent of the current state of private AI in the US.
It’s certainly possible, and I think such analysis is valuable. It’s just not my comparative advantage and not so neglected (I think). Also, I think we don’t lose much analytically by separating foreseeable causes of great power conflict into two distinct categories:
Conflict due to specific factors that we recognize as important today (e.g., US-China tension and India-Pakistan tension and their underlying causes)
Conflict due to more general forces and phenomena (and due to my empirical beliefs, I think emerging-technology-related forces are relatively likely to cause conflict)
This post aims to start a conversation on 2 — or get people to direct me to previous work on 2.
Also to explain my focus, I would be surprised by major conflict for normal reasons by 2040 but not surprised by major conflict because the world is going crazy by 2040. But I didn’t justify this. I should have mentioned my exclusion of major conflict for normal reasons in my post; thanks for your comment.
Thanks for your comment. US-China tension currently seems most likely to me to cause great power conflict, and cyber capabilities were mostly what I had in mind for “offense outpaces defense” scenarios. I think this post is more valuable if it’s more general, though, and I don’t know enough about US-China, cyber capabilities, or warfare to say much more specifically.
I think understanding possible futures of cyber capabilities would be quite valuable. I would not be surprised to look back in 2030 or 2040 and say:
Civilization was just devastated by cyberattacks. In retrospect, it should have been obvious — or rather, the rest of us should have listened to those who were sounding the alarm. Since the 2000s, it’s been clear that offense is easy and defense is hard. Since the 2010s, great powers have had the capability to devastate one another’s cities with cyberattacks. In the last few years, offensive capabilities strengthened and proliferated. Then dozens of agents had the capability to cause countless explosions, destroy infrastructure, and take down electric grids almost everywhere, and it was only a matter of time until unilateral or multilateral forces led one to do it.
But again, such work is not my comparative advantage (and, as a disclaimer for the above paragraph, I don’t know what I’m talking about).
Thank you!
Going forward, we no longer plan to run the “standard” Forum Prize process.
Some things I liked about the old Forum Prize:
Highlighting great content
Recognizing great work
It doesn’t need to have money attached, but I hope a regular way to emphasize good work of all sorts (beyond karma, and more selective than inclusion in the forum newsletter) reappears, for the sake of both authors and readers. As a new writer (this is hypothetical; I haven’t yet written any polished posts), winning would be very valuable for me (and while it probably wouldn’t really affect my writing much, the existence of the forum prize would make me more excited about writing polished posts). As a reader, a regular prize has helped me find excellent content outside of the stuff I normally read.
“What needs to happen in order for the field of x-risk-motivated AI alignment research to employ a thousand ML researchers and engineers”?
When choosing between projects, we’ll be thinking about questions like “to what extent is this class of techniques fundamentally limited? Is this class of techniques likely to be a useful tool to have in our toolkit when we’re trying to align highly capable systems, or is it a dead end?”
We’re trying to take a language model that has been fine-tuned on completing fiction, and then modify it so that it never continues a snippet in a way that involves describing someone getting injured. (source)
Suppose you successfully modify GPT models as desired, at moderate cost in compute and human classification. How might your process generalize?
I strongly encourage you/everyone to not call “practical SFE” SFE. It’s much better (analytically) to distinguish the value of causing happiness and preventing suffering from empirical considerations. Under your definition, if (say) utilitarianism is true, then SFE is true given certain empirical circumstances but not others. This is an undesirable definition. Anything called SFE should contain a suffering-focused ranking of possible worlds (for a SF theory of the good) or ranking of possible actions (for a SF theory of the right), not merely a contingent decision procedure. Otherwise the fact that someone accepts SFE is nearly meaningless; it does not imply that they would be willing to sacrifice happiness to prevent suffering, that they should be particularly concerned with S-risks, etc.
Practical SFE views . . . are compatible with a vast range of ethical theories. To adopt a practical SFE view, one just needs to believe that suffering has a particularly high practical priority.
This makes SFE describe the options available to us, rather than how to choose between those options. That is not what an ethical theory does. We could come up with a different term to describe the practical importance of preventing suffering at the margin, but I don’t think it would be very useful: given an ethical theory, we should compare different specific possibilities rather than saying “preventing suffering tends to be higher-leverage now, so let’s just focus on that.” That is, “practical SFE” (roughly defined as the thesis that the best currently-available actions in our universe generally decrease expected suffering much more than they increase expected happiness) has quite weak implications: it does not imply that the best thing we can do involves preventing suffering; to get that implication, we would need to have the truth of “practical SFE” be a feature of each agent (and the options available to them) rather than the universe.
Edit: there are multiple suffering-related ethical questions we could ask. One is “what ought we—humans in 2021 in our particular circumstances—to do?” Another is “what is good, and what is right?” The second question is more general (we can plug empirical facts into an answer to the second to get an answer to the first), more important, and more interesting, so I want an ethical theory to answer it.
We still plan to use our budget to incentivize and reward strong writing
And elsewhere, you’ve said the goal is to “promote the creation of excellent Forum content.” I think it’s a mistake to see the goal as causing more good content at the expense of identifying and helping spread good content. Good writing is only as effective as the attention it gets. LessWrong has curation and an annual review for this purpose; the EA Forum just has the newsletter, which isn’t sufficiently selective to serve the same role. Regardless of whether there’s money attached, I wish the EA Forum did more to identify great content.
If you have ideas along these lines, add a comment here.
I don’t love the idea of themed prizes like the creative writing contest. But if you’re going to do them, I’d like to see something for taxonomies/trees. I think high-quality taxonomies, like this, can have great analytic value. And taxonomies are rare and not widely appreciated, so a contest—making the format visible and highlighting good examples—would result in people being aware of the format and using it in the future when it would be useful, in addition to creating content that they otherwise wouldn’t have thought to.
Edit: importantly, taxonomies/trees have a very high idea-to-length ratio, since the taxonomy/tree itself is essentially just a list of relations. I would happily read every taxonomy/tree produced in a contest, and likely learn something or gain a new perspective from each.
Yes! I’m glad the OP was written and I agree with many of its points. But if I hadn’t taken extra classes, I wouldn’t have taken CS, which I now (because I took extra classes) know is something that I am interested in — and might develop enough knowledge in to be useful (I’m still an undergraduate), from the point of view of the universe.
How valuable do you think your research to date has been? Which few pieces of your research to date have been highest-impact? What has surprised you or been noteworthy about the impact of your research?
Why do you have the distribution of focus on health/development vs animals vs longtermism vs meta-stuff that you do? How do you feel about it? What might make you change this distribution, or add or remove priority areas?
Thanks for your reply. I think (1) and (2) are doing a ton of work — they largely determine whether expected marginal research is astronomically important or not. So I’ll ask a more pointed follow-up:
Why does RP think it has reason to spend significant resources on both shorttermist and longtermist issues (or is this misleading; e.g., do all of your unrestricted funds go to just one)? What are your “opinions on high level cause prioritization” and the “disagreement inside RP about this topic”? What would make RP focus more exclusively on either short-term or long-term issues?
We have to be a little careful when the probability of existential catastrophe is not near zero, and especially when it is near one. For example, if the probability that unaligned AI will kill us (assuming no other existential catastrophe occurs first) is 90%, then preventing an independent 0.1% probability of climate catastrophe isn’t worth 10 basis points — it’s worth 1. The simulation argument may raise a similar issue.
I think we can get around this issue by reframing as as “change in survival probability from baseline.” Then reducing climate risk from 0.1% to 0 is about a 0.1% improvement, and reducing AI risk from 90% to 89% is about a 10% improvement, no matter what other risks there are or how likely we are to survive overall (assuming the risks are independent).
- 1 Mar 2022 22:30 UTC; 8 points) 's comment on .01% Fund—Ideation and Proposal by (
Suppose we believe there is 90% probability that we are living in a simulation, and the simulation will be shut down at some point (regardless of what we do), and this is independent of all other X-risks. Then a basis point suddenly costs ten times what it did before we considered simulation, since nine out of ten basis points go toward preventing other X-risks in worlds in which we’ll be shut down anyway. This is correct in some sense but potentially misleading in some ways — and I think we don’t need to worry about it at all if we reframe as “change in survival probability,” since any intervention has the same increase in survival probability in the 90%-simulation world as increase in survival probability in the 0%-simulation world.
let’s say we’re looking at existential risks in the next 100 years
This works out fine for me, since for empirical reasons, I place overwhelming probability on the conjunction of “existential catastrophe by 2121” and “long-term future looks very likely to be excellent in 2121.” But insofar as we’re using existential-risk-reduction as a proxy for good-accomplished, I think that 1 basis point should be worth something more like 1⁄10,000 of the transition from a catastrophic future to a near-optimal future. Those are units that we should fundamentally care about maximizing; we don’t fundamentally care about maximizing no-catastrophe-before-2121. (And then we can make comparisons to interventions that don’t affect X-risk by saying what fraction of a transition from catastrophe to excellent they are worth. I guess I’m saying we should think in terms of something equivalent to utilons, where utilons are whatever we ought to increase, since optimizing for anything not equivalent to them is by definition optimizing imprecisely.)
For empirical reasons, I agree because value is roughly binary. But by definition, existential risk reduction could involve moving from catastrophic futures to futures in which we realize a significant fraction of our potential but the future is not near-optimal.
I found this post after writing a related post. My post takes a different approach to the issue, fortunately. As a high-level comparison between our positions:
I agree that disappointing futures are very possible.
I agree that disappointing futures are about as bad as existential catastrophes. Indeed, I would definitely categorize disappointing futures as existential catastrophes: disappointing futures involve losing our potential, the definition of an existential catastrophe.
I am curious about what you think a disappointing future would look like. I believe it is very likely that we will soon (on the scale of civilization) go extinct or achieve technological maturity. If we achieve technological maturity, and if we do not optimize the universe for value, I expect that we will still use mature technology to make the future excellent by prosaic, provincial-human standards. Regardless, understanding what disappointing futures—or “pretty good” futures, my term matching my expectation—look like and what might cause them is important to avoid them.
I would be excited to discuss possible disappointing futures (in these comments, in the comments on my post, or on a call) if you’re interested.