Pivotal outcomes and pivotal processes

Andrew CritchJun 17, 2022, 11:43 PM

49 points

tl;dr: If you think humanity is on a dangerous path, and needs to “pivot” toward a different future in order to achieve safety, consider how such a pivot could be achieved by multiple acts across multiple persons and institutions, rather than a single act. Engaging more actors in the process is more costly in terms of coordination, but in the end may be a more practicable social process involving less extreme risk-taking than a single “pivotal act”.

Preceded by: “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

[This post is also available on LessWrong.]

In the preceding post, I argued for the negative consequences of the intention to carry out a pivotal act, i.e., a single, large world-changing act sufficient to ‘pivot’ humanity off of a dangerous path onto a safer one. In short, there are negative side effects of being the sort of institution aiming or willing to carry out a pivotal act, and those negative side effects alone might outweigh the benefit of the act, or prevent the act from even happening.

In this post, I argue that it’s still a good idea for humanity-as-a-whole to make a large / pivotal change in its developmental trajectory in order to become safer. In other words, my main concern is not with the “pivot”, but with trying to get the whole “pivot” from a single “act”, i.e., from a single agent-like entity, such a single human person, institution, or AI system.

Pivotal outcomes and processes

To contrast with pivotal acts, here’s a simplified example of a pivotal outcome that one could imagine making a big positive difference to humanity’s future, which in principle could be brought about by a multiplicity of actors:

(the “AI immune system”) The whole internet — including space satellites and the internet-of-things — becomes way more secure, and includes a distributed network of non-nuclear electromagnetic pulse emitters that will physically shut down any tech infrastructure appearing to be running rogue AI agents.

(For now, let’s set aside debate about whether this outcome on its own would be pivotal, in the sense of pivoting humanity onto a safe developmental trajectory… it needs a lot more details and improvements to be adequate for that! My goal in this post is to focus on how the outcome comes about. So for the sake of argument I’m asking to take the “pivotality” of the outcome for granted.)

If a single institution imposed the construction of such an AI immune system on its own, that would constitute a pivotal act. But if a distributed network of several states and companies separately instituted different parts of the change — say, designing and building the EMP emitters, installing them in various jurisdictions, etc. — then I’d call that a pivotal distributed process, or pivotal process for short.

In summary, a pivotal outcome can be achieved through a pivotal (distributed) process without a single pivotal act being carried out by any one institution. Of course, the “can” there is very difficult, and involves solving a ton of coordination problems that I’m not saying humanity will succeed in solving. However, aiming for a pivotal outcome via a pivotal distributed process definitively seems safer to me, in terms of the dynamics it would create between labs and militaries, compared to a single lab planning to do it all on their own.

Revisiting the consequences of pivotal act intentions

In AGI Ruin, Eliezer writes the following, I believe correctly:

The reason why nobody in this community has successfully named a ‘pivotal weak act’ where you do something weak enough with an AGI to be passively safe, but powerful enough to prevent any other AGI from destroying the world a year later—and yet also we can’t just go do that right now and need to wait on AI—is that nothing like that exists. There’s no reason why it should exist. There is not some elaborate clever reason why it exists but nobody can see it. It takes a lot of power to do something to the current world that prevents any other AGI from coming into existence; nothing which can do that is passively safe in virtue of its weakness.

I think the above realization is important. The un-safety of trying to get a single locus of action to bring about a pivotal outcome all on its own is important, and it pretty much covers my rationale for why we (humanity) shouldn’t advocate for unilateral actors doing that sort of thing.

Less convincingly-to-me, Eliezer then goes on to (seemingly) advocate for using AI to carry out a pivotal act, which he acknowledges would be quite a forceful intervention on the world:

If you can’t solve the problem right now (which you can’t, because you’re opposed to other actors who don’t want [it] to be solved and those actors are on roughly the same level as you) then you are resorting to some cognitive system that can do things you could not figure out how to do yourself, that you were not close to figuring out because you are not close to being able to, for example, burn all GPUs. Burning all GPUs would actually stop Facebook AI Research from destroying the world six months later; weaksauce Overton-abiding stuff about ‘improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything’ will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically. There are no pivotal weak acts.

I’m not entirely sure if the above is meant to advocate for AGI development teams planning to use their future AGI to burn other people’s GPU’s, but it could certainly be read that way, and my counterargument to that reading has already been written, in “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments. Basically, a lab X with the intention to burn all the world’s GPUs will create a lot of fear that lab X is going to do something drastic that ends up destroying the world by mistake, which in particular drives up the fear and desperation of other AI labs to “get there first” to pull off their own version of a pivotal act. Plus, it requires populating the AGI lab with people willing to do some pretty drastically invasive things to other companies, in particular violating private property laws and state boundaries. From the perspective of a tech CEO, it’s quite unnerving to employ and empower AGI developers who are willing to do that sort of thing. You’d have to wonder if they’re going to slip out with a thumb drive to try deploying an AGI against you, because they have their own notion of the greater good that they’re willing to violate your boundaries to achieve.

So, thankfully-according-to-me, no currently-successful AGI labs are oriented on carrying out pivotal acts, at least not all on their own.

Back to pivotal outcomes

Again, my critique of pivotal acts is not meant to imply that humanity has to give up on pivotal outcomes. Granted, it’s usually harder to get an outcome through a distributed process spanning many actors, but in the case of a pivotal outcome for humanity, I argue that:

it’s safer to aim for a pivotal outcome to be carried out by a distributed process spanning multiple institutions and states, because the process can happen in a piecemeal fashion that doesn’t change the whole world at once, and
it’s easier as well, because
1. you won’t be constantly setting off alarm bells of the form “Those people are going to try to unilaterally change the whole world in a drastic way”, and
2. you won’t be trying to populate a lab with AGI developers who, in John Wentworth’s terms, think like “villains” (source).

I’m not arguing that we (humanity) are going to succeed in achieving a pivotal outcome through a distributed process; only that it’s a safer and more practical endeavor than aiming for a single pivotal act from a single institution.

What links here?

Andrew CritchJun 17, 2022, 11:43 PM

49 points

1 comment4 min readEA link

AI safety AI governance

RobBensinger Jun 22, 2022, 10:24 AM
20 points
0 ∶ 1

An example of a possible “pivotal act” I like that isn’t “melt all GPUs” is:
Use AGI to build fast-running high-fidelity human whole-brain emulations. Then run thousands of very-fast-thinking copies of your best thinkers. Seems to me this plausibly makes it realistic to keep tabs on the world’s AGI progress, and locally intervene before anything dangerous happens, in a more surgical way rather than via mass property destruction of any sort.
Looking for pivotal acts that are less destructive (and, more importantly for humanity’s sake, less difficult to align) than “melt all GPUs” seems like a worthy endeavor to me. But I prefer the framing ‘let’s discuss the larger space of pivotal acts, brainstorm new ideas, and try to find options that are easier to achieve, because that particular toy proposal seems suboptimally dangerous and there just hasn’t been very much serious analysis and debate about pathways’. In the course of that search, if it then turns out that the most likely-to-succeed option is a process, then we should obviously go with a process.
But I don’t like constraining that search to ‘processes only, not acts’, because:
- (a) I’m guessing something more local, discrete, and act-like will be necessary, even if it’s less extreme than “melt all GPUs”;
- (b) insofar as I’m uncertain about which paths will be viable and think the problem is already extremely hard and extremely constrained, I don’t want to further narrow the space of options that humanity can consider and reason through;
- (c) I worry that the “processes” framing will encourage more Rube-Goldberg-machine-like proposals, where the many added steps and layers and actors obscure the core world-saving cognition and action, making it harder to spot flaws and compare tradeoffs;
- and (d) I worry that the extra steps, layers, and actors will encourage “design by committee” and slow-downs that doom otherwise-promising projects.
I suspect we also have different intuitions about pivotal acts because we have different high-level pictures of the world’s situation.
I think that humanity as it exists today is very far off from thinking like a serious civilization would about these issues. As a consequence, our current trajectory has a negligible chance of producing good long-run outcomes. Rather than trying to slightly nudge the status quo toward marginally better thinking, we have more hope if we adopt a heuristic like speak candidly and realistically about things, as though we lived on the Earth that does take these issues seriously, and hope that this seriousness and sanity might be infectious.
On my model, we don’t have much hope if we continue to half-say-the-truth, and continue to make small steady marginal gains, and continue to talk around the hard parts of the problem; but we do have the potential within us to just drop the act and start fully sharing our models and being real with each other, including being real about the parts where there will be harsh disagreements.
I think that a large part of the reason humanity is currently endangering itself is that everyone is too focused on ‘what’s in the Overton window?’, and is too much trying to finesse each other’s models and attitudes, rather than blurting out their actual views and accepting the consequences.
This makes the situation I described in The inordinately slow spread of good AGI conversations in ML much stickier: very little of the high-quality / informed public discussion of AGI is candid and honest, and people notice this, so updating and epistemic convergence is a lot harder; and everyone is dissembling in the same direction, toward ‘be more normal’, ‘treat AGI more like business-as-usual’, ‘pretend that the future is more like the past’.
All of this would make me less eager to lean into proposals like “yes, let’s rush into establishing a norm that large parts of the strategy space are villainous and not to be talked about” even if I agreed that pivotal processes are a better path to long-run good outcomes than pivotal acts. This is inviting even more of the central problem with current discourse, which is that people don’t feel comfortable even talking about their actual views.
You may not think that a pivotal act is necessary, but there are many who disagree with you. Of those, I would guess that most aren’t currently willing to discuss their thoughts, out of fear that the resultant discussion will toss norms of scholarly discussion out the window. This seems bad to me, and not like the right direction for a civilization to move into if it’s trying to emulate ‘the kind of civilization that handles AGI successfully’. I would rather a world where humanity’s best and brightest were debating this seriously, doing scenario analysis, assigning probabilities and considering specific mainline and fallback plans, etc., over one where we prejudge ‘discrete pivotal acts definitely won’t be necessary’ and decide at the outset to roll over and die if it does turn out that pivotal acts are necessary.
My alternative proposal would be: Let’s do scholarship at the problem, discuss it seriously, and not let this topic be ruled by ‘what is the optimal social-media soundbite?’.
If the best idea sounds bad in soundbite form, then let’s have non-soundbite-length conversations about it. It’s an important enough topic, and a complex enough one, that this would IMO be a no-brainer in a world well-equipped to handle developments like AGI.
it’s safer to aim for a pivotal outcome to be carried out by a distributed process spanning multiple institutions and states, because the process can happen in a piecemeal fashion that doesn’t change the whole world at once
We should distinguish “safer” in the sense of “less likely to cause a bad outcome” from “safer” in the sense of “less likely to be followed by a bad outcome”.
E.g., the FDA banning COVID-19 testing in the US in the early days of the pandemic was “safer” in the narrow sense that they legitimately reduced the risk that COVID-19 tests would cause harm. But the absence of testing resulted in much more harm, and was “unsafe” in that sense.
Similarly: I’m mildly skeptical that humanity refusing to attempt any pivotal acts makes us safer from the particular projects that enact this norm. But I’m much more skeptical that humanity refusing to attempt any pivotal acts makes us safer from harm in general. These two versions of “safer” need to be distinguished and argued for separately.
Any proposal that adds red tape, inefficiencies, slow-downs, process failures, etc. will make AGI projects “safer” in the first sense, inasmuch as it cripples the project or slows it down to the point of irrelevance.
As someone who worries that timelines are probably way too short for us to solve enough of the “pre-AGI alignment prerequisites” to have a shot at aligned AGI, I’m a big fan of sane, non-adversarial ideas that slow down the field’s AGI progress today.
But from my perspective, the situation is completely reversed when you’re talking about slowing down a particular project’s progress when they’re actually building, aligning, and deploying their AGI.
At some point, a group will figure out how to build AGI. When that happens, I expect an AGI system to destroy the world within just a few years, if no pivotal act or processes finishes occurring first. And I expect safety-conscious projects to be at a major speed disadvantage relative to less safety-conscious projects.
Adding any unnecessary steps to the process—anything that further slows down the most safety-conscious groups—seems like suicide to me, insofar as it either increases the probability that the project fails to produce a pivotal outcome in time, or increases the probability that the project cuts more corners on safety because it knows that it has that much less time.
I obviously don’t want the first AGI projects to rush into a half-baked plan and destroy the world. First and foremost, do not destroy the world by your own hands, or commit the fallacy of “something must be done, and this is something!”.
But I feel more worried about AGI projects insofar as they don’t have a lot of time to carefully align their systems (so I’m extremely reluctant to tack on any extra hurdles that might slow them down and that aren’t crucial for alignment), and also more worried insofar as they haven’t carefully thought about stuff like this in advance. (Because I think a pivotal act is very likely to be necessary, and I think disaster is a lot more likely if people don’t feel like they can talk candidly about it, and doubly so if they’re rushing into a plan like this at the last minute rather than having spent decades prior carefully thinking about and discussing it.)
What links here?
- RobBensinger's comment on Don’t leave your fingerprints on the future by So8res (Oct 15, 2022, 9:42 PM; 9 points)
- RobBensinger's comment on What does it mean for an AGI to be ‘safe’? by So8res (Oct 15, 2022, 5:20 PM; 8 points)