On the Vulnerable World Hypothesis

This is a cross-post from my Substack.

Summary

Nick Bostrom presents the Vulnerable World Hypothesis (VWH): if technological progress continues without safeguards, we should expect the means and information necessary to cause a catastrophic event to become sufficiently available for malicious actors to cause such an event. He argues that to the extent that the VWH is true, we should consider implementing ubiquitous real-time worldwide surveillance (henceforth just surveillance) to mitigate the risk of these catastrophic events.

I have some misgivings about this argument, which I divide into three parts. First, I consider the premises and assumptions of the VWH. I argue that:

  • not many people want to cause global catastrophes

  • it’s not clear that we must assume continuing technological development

Second, I consider the costs and benefits of surveillance. I start with a best case scenario, where I assume surveillance is:

  • implemented by good actors

  • extremely effective (if not perfect)

  • democratically agreed upon

  • global

and argue that even under these conditions, surveillance:

  • limits free thought

  • disincentivises weirdness

  • potentially erodes trust norms and diminishes the value of trust

I also make a more general argument against systems with a single point of failure.

Third, I relax these assumptions one by one.

  • Relaxing the good actors assumption increases totalitarianism risk and (more weakly) misuse risk

  • Relaxing the effectiveness assumption suggests surveillance would not be worth its costs, and may lead to a false sense of security

  • Relaxing the democratic assumption increases totalitarianism risk and would be bad in itself

  • Relaxing the global assumption reduces surveillance’s effectiveness

My main concerns are that:

  • It’s not clear that surveillance is the best response to the VWH

  • Even if the best response is surveillance, its costs probably outweigh its benefits:

    • Even the best case scenario has distinct costs and seems very unlikely

    • In particular, it seems really hard to achieve global democratic surveillance

  • Relaxing the assumptions of the best case scenario increases the costs

  • There’s a real risk that surveillance increases the likelihood of totalitarianism, including totalitarian lock-in

So as things stand, I don’t think ubiquitous real-time worldwide surveillance is a good idea.

A quick disclaimer/​some context

I wrote this post in a semi-academic style, and I’m worried that this signals undue seriousness or confidence in my thoughts. I’m really not very confident in what I’ve said! Everything here was written up quickly after some initial thoughts and discussion with others. At times I make claims and hardly defend them; at other times I defend my claims, but I expect given some time to reflect on them, I’d change my mind. Interpret this as ‘some thoughts I had and wanted to write up’, rather than ‘a carefully reasoned critique of an academic paper’.

I’d like to hear criticism and corrections, particularly if you think they undermine key parts of my argument.

Thinking about the VWH

Bostrom defines the VWH as follows:

VWH: If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semianarchic default condition.

Bostrom later stipulates that ‘civilisational destruction’ here refers to:

any destructive event that is at least as bad as the death of 15 per cent of the world population or a reduction of global GDP by >50 per cent per cent lasting for more than a decade.

Such events would be considered global catastrophic risks (GCRs) under most definitions of GCRs. Given this (and for simplicity) , I’ll refer to them as global catastrophes.

Bostrom presents the VWH as just that—a hypothesis—and states explicitly that he thinks its truth is still an open question. For my purposes it’s helpful to restate the VWH as an argument with three premises, remembering that Bostrom would not commit to it:

P1: The information necessary to cause an event which is itself a global catastrophe, or likely to lead to a global catastrophe, is sufficiently available

P2: The means to cause an event which is itself a global catastrophe, or likely to lead to a global catastrophe, is sufficiently available[1]

P3: There is a non-zero number of actors who want to cause such events

Drawing conclusions from the VWH

To get from the VWH to a claim about ubiquitous real-time worldwide surveillance, we’ll need to add another premise.

P4: Ubiquitous real-time worldwide surveillance is the best way to decrease the risk of global catastrophes[2]

Which allows us to conclude that:

C1: We should consider implementing ubiquitous real-time worldwide surveillance.

These are deliberately pretty vague premises! Before considering the costs and benefits of surveillance, I think it’s worth clarifying some of them.

The technological development assumption

Underpinning P1 and P2 is an assumption of continuing technological development, which is not uncontroversial. Unless we’re very confident that failing to reach technological maturity would be an existential catastrophe, we should consider whether alternatives to unrestricted technological development (including differential technological development) are feasible.[3]

Thinking about P3

P3: There is a non-zero number of actors who want to cause such events

Who are the bad guys?

It’s unclear to me what the reference class for these malicious actors should be. Different actors will have different motivations for causing these events, which in turn suggests different risk levels and different possible responses. Conflating all malicious actors into one category loses nuance.

Suppose we mean groups willing to cause highly destructive events to achieve a particular goal (e.g. terrorists). It’s no longer the case that surveillance is the only possible intervention. If these events are a means to these actors’ desired ends, then we can (for example) negotiate with them and compromise to partially satisfy their ends, rather than implement surveillance to cut off their ability to do such acts entirely.

I’m not saying we should pursue other interventions—surveillance might still be best! But I think it’s important to note that we do have alternatives, and they should be considered before we accept surveillance.

Now suppose instead that we mean actors who want to commit these acts as an end in themselves. (I’m imagining classic world-ending cults.) We can’t negotiate with these people: here, our only intervention is to prevent them from having the opportunity to do really bad things. But this group is a lot smaller than the previous group, and so all else being equal, we should expect them to be less well-connected/​competent and thus less dangerous.

This doesn’t mean that we should ignore the risk from world-ending cults, but it does suggest this risk is a lot smaller. This should give us reason to pause and re-evaluate the case for global surveillance to mitigate this risk.

We care about a subset of these groups

Here I make the obvious point that actually, we don’t care about all malicious actors—we care about malicious actors who _also _have the information and means necessary to cause global catastrophes.

These two things don’t necessarily go hand-in-hand. Under a specific set of assumptions, they might: if we think that the information and means necessary to cause global catastrophes are going to be easily accessible, then we can conclude that any sufficiently motivated malicious actor would be able to cause global catastrophes. But these assumptions aren’t obviously correct, and so the number of people we care about may well be smaller than even my previous point suggested.

How easy is it to make terrible things happen?

It currently seems pretty doable to make really bad things happen, and at least some people would like to make really bad things happen. Yet despite this, really bad things keep not happening.

Does this matter? I think at least a little. Sceptics can say that the information and means to commit really bad acts, including global catastrophes, haven’t been around for very long. This makes a direct comparison to long-run risk unfair, since we’re worried about global catastrophes happening over a span of centuries (if not longer), rather than a span of decades. That’s a fair complaint. But I think at the very least, we should conclude from this that at least one of the following claims are true:

  • Causing global catastrophes is harder than it seems in theory

  • People who want to cause global catastrophes are less competent than we might imagine

  • Current efforts to prevent global catastrophes are more effective than we might imagine

All of these considerations affect our evaluation of the costs and benefits of different interventions.

Other possible interventions

Interventions which target P1:

  • Could we implement differential technological progress?

    • e.g. banning some kinds of scientific research (perhaps with exceptions for government laboratories)

  • Limiting access to information

    • e.g. requiring some scientific research to _not _be published

Interventions which target P1 and P2:

  • If the means of causing global catastrophes involves some physical preparation (e.g. (bio)weapons), could we ban the required materials?

  • Or monitor who has access to them, or who buys them?

    • This seems to have worked well(ish) for nuclear risk

  • If the means of causing global catastrophes is primarily digital (i.e. doesn’t require many/​any physical materials, beyond the commonplace—computer/​internet connection, etc), could we implement internet monitoring or even digital surveillance?

    • Digital surveillance would have many associated costs, but it would be less costly than ubiquitous surveillance

Interventions which target P3:

  • Maybe we could change human values so nobody (or almost nobody) wants to cause global catastrophes?

    • I’m assuming here that a non-zero risk of global catastrophes is acceptable, if the risk is sufficiently small (which I think is a reasonable assumption: I don’t think global surveillance could reduce the risk to zero, and I also don’t think Bostrom thinks this, so it seems only fair to apply the same standards to all possible interventions)

    • I don’t really know how we could do this. Moral values change? Weird psychological stuff?

    • I’m definitely not saying this is a good intervention, or one we could do now, just one that is an alternative to surveillance and which seems plausible under the same set of assumptions required for surveillance to be plausible (i.e. continuing technological progress)

Interventions, which target more than one premise

  • We could restrict access to the relevant information or means (P1, P2) to certain groups of people, and rigorously psychologically screen these people (P3) to make sure they won’t misuse this access

    • e.g. similarly to how intelligence agencies screen applicants

Do the costs of surveillance outweigh the benefits?

Even if we accept premises 1 through 3, to conclude that we should implement global surveillance to guard against this vulnerability, we must also accept P4 (that surveillance is the best response).

The best case surveillance scenario

I’ll start with the following assumptions:

  1. Surveillance is implemented by good actors

  2. Surveillance is extremely effective (if not perfect)

  3. Surveillance is democratically agreed upon

  4. Surveillance is global

Even under this set of assumptions, surveillance is still pretty scary, and I think it gets scarier when these assumptions are relaxed.

What I’m worried about even in the best case scenario

Even assuming that:

  1. Surveillance is implemented by good actors

  2. Surveillance is extremely effective (if not perfect)

  3. Surveillance is democratically agreed upon

  4. Surveillance is global

Surveillance limits free thought, and it disincentivises weirdness.

The free thought effect

This argument comes from from a paper by Neil Richards: surveillance is bad because it harms intellectual privacy, which is itself necessary for free thought. New ideas and beliefs often form best away from public exposure, and furthermore challenge the mainstream. This means that intellectual privacy is necessary to form new ideas and beliefs, thus necessary for free thought.

Note that we can’t avoid this cost via secret surveillance, for the following reasons:

  1. Secret surveillance is unlikely to remain secret in the long-term

  2. Secret surveillance is undemocratic

  3. Secret surveillance has unique harms (I think most saliently the risk of blackmail)

The weirdness effect

This is one of Richards’ empirical claims: surveillance ‘inclines us to the mainstream and the boring’, and thus threatens what he calls intellectual diversity and ‘eccentric individuality’.

If we value epistemics or personal intellectual freedom (and I think we should!), we should worry about these effects.

Erodes trust norms and reduces the value of trust

Bostrom argues that surveillance could in fact enhance public trust, but I think it’s unclear which way this would go. Maybe surveillance would lead to everyone feeling comfortable and happy trusting others; maybe instead it would create an atmosphere of paranoia and an erosion of traditional trust norms.

More speculatively, there seems to be something valuable about earning and granting trust which is lost with the introduction of surveillance. The process of forming relationships and being vulnerable with others is distinctive precisely because it’s not universal; trusting others matters because it doesn’t happen by default. I think this trust would lose its distinctiveness, and thus some of its value, if global surveillance and thus a generic ‘trust in others’ became the default.

Single point of control = single point of failure

Having one defence mechanism against global catastrophes seems unnecessarily risky. I’d prefer we adopt a Swiss cheese model of defence, with multiple overlapping mechanisms to prevent global catastrophe.

Of course, implementing surveillance doesn’t preclude implementing other defence mechanisms. But I think that in practice, a surveillance system (especially one which seems extremely effective) would provide a false sense of security, which would in turn make implementing other defence mechanisms less likely.

What’s this, no mention of the privacy cost?

I don’t think a privacy/​security trade off is the right framing for thinking about the costs and benefits of surveillance. This is a moderately hot take? The trade off framing has informed a lot of work on surveillance: most surveillance scholars assume that surveillance infringes on people’s privacy, and are pretty worried about the corresponding harms.

I’m not convinced this is the best way to think about surveillance for the following reasons:

  • I don’t think the privacy/​security trade off always holds. There’s been a lot of cool work recently on differential privacy, which largely draws on cryptography and interesting ways of training large models to find privacy-preserving ways of analysing data.

    • A nice example: the police often want to check the set intersection of different datasets (e.g. the intersection of lists of phone numbers which used particular phone masts, for the masts near the scenes of crimes which were probably all committed by the same person). Previously, this cross-checking for elements in the sets’ intersection had to be done on unencrypted data. This means that every number using particular phone masts was shared with the police, even if these people had nothing to do with the crime. Segal, Feigenbaum, and Ford developed a neat method for set intersection on encrypted data, which means that data which is not in the sets’ intersection can remain private.

    • Or a non-technical example: bag searches and those airport scanners share the same function (identifying dangerous material in people’s bags), but the latter is much more privacy-preserving: only people flagged on the scanner will have their bags searched, rather than everyone having their bag searched.

    • These methods obviously aren’t perfect, but they’re a Pareto improvement on alternatives. And I’m pretty optimistic that we can expect even more privacy-preserving solutions, given the rate of progress recently!

  • People don’t always think in terms of a privacy/​security trade off

    • This doesn’t mean that privacy isn’t important: maybe people who don’t think in terms of this trade off are just thinking wrongly, or (more likely) this finding doesn’t generalise to other countries. But insofar as we want our intervention to be democratically approved, it suggests that privacy isn’t some inviolable right.

  • There is no universally agreed-upon concept of privacy, let alone one which is always violated by surveillance[4]

    • So talking about a privacy trade off in general terms, without being clear about which conception of privacy we mean, is unlikely to be helpful

Moving beyond the best case: relaxing the ‘good actors’ assumption

Assume that:

  1. Surveillance is extremely effective (if not perfect)

  2. Surveillance is democratically agreed upon

  3. Surveillance is global

but that surveillance is not necessarily implemented by good actors.

Totalitarianism risk

I don’t think surveillance guarantees totalitarianism, but it makes it much more likely. Implementing this surveillance system would concentrate power in the hands of its enforcers., and if they want to, they could use this system to suppress actions they disapprove of. This could easily be abused to keep the enforcers in power, and greatly increase the risk of totalitarian lock-in.

Totalitarian lock-in is described by Bostrom as an existential risk: even without lock-in, lasting totalitarianism could be a global catastrophe. So there’s an interesting tension here where attempts to reduce existential risk themselves increase global catastrophic risks. I’m not sure if this is true for many other attempts at reducing existential risk (biosecurity proposals don’t seem to increase GCR, some AI governance proposals might do?), but I think it’s worth thinking seriously about.

Misuse risk

Even assuming we don’t end up in some totalitarian dystopia, I think surveillance creates a new vulnerability that malicious actors can exploit. Suppose that surveillance effectively eliminates the possibility of causing other global catastrophes, but is partly implemented by humans. In this case, surveillance offers a new opportunity of causing harm while curtailing other opportunities to cause harm. Given this, we’d expect malicious actors to switch focus to getting in a position of power in the surveillance administration, so they can cause harm there.

Seen this way, surveillance may reduce the risk of malicious actors using other harmful technologies, but it also introduces a new risk of malicious actors using surveillance to cause harm.

What if we put AI systems in charge?

One response to this risk is to eliminate or minimise the human element in surveillance. Bostrom proposes using AI to monitor live video and audio and detect potentially harmful actions, only involving humans to check data flagged as possibly harmful. The assumption here is that most people don’t want global catastrophes to happen, so using a sufficiently large (and sufficiently non-malicious) sample of people to check footage reduces the likelihood that a video checker would let something awful happen to an acceptably low level.

I think this works to reduce misuse risk, but introduces its own risks. Current image classification systems are notoriously easy to attack, and bad actors would be strongly incentivised to figure out ways of getting around AI surveillance systems.[5] If we could be very confident that the AI systems used for image classification were very accurate, then I’d be much more optimistic about this. But absent evidence that future classification models will be extremely accurate, I’m a little nervous about this response.

Surveillance for what ends?

Once actors have surveillance capabilities, they face incentives to surveil, and once they start surveilling, they face incentives to expand the range of activities they detect and prevent or punish. I’m worried that this has pretty bad effects.

Governments which surveil their citizens to prevent global catastrophes face incentives to expand surveillance to wider issues: the surveillance infrastructure is established, so the cost would be low, and the value of preventing crime is high. As Bostrom argues, ubiquitous real-time worldwide surveillance for all kinds of criminal activities (i.e. not just global catastrophes) could be overall cost-effective.

But we lack global agreement on what acts should be surveilled and stopped. Imagine giving a semi-authoritarian leader surveillance capabilities. What’s to stop them using surveillance to identify and prevent acts others would consider acceptable and even valuable, e.g. peaceful protests?

I think we should expect surveillance to be used to prevent non-catastrophic events, and this is likely to be harmful.

Just put the good guys in charge!

One response to this is to argue that we should decide globally (perhaps via democratic means) what actions may and may not be punished or stopped.

I think this is a better idea, but still has some major flaws:

  • I’m sceptical we could get democratic agreement to surveil, let alone on what actions should be punished or stopped

  • Even if we did achieve democratic agreement, states still face strong incentives to violate this agreement and misuse their surveillance capabilities

Another response would be centralising the administration of surveillance. I think this reduces risk from repressive states, but increasing power concentration increases misuse and totalitarianism risks from the surveillance administration. Also, abandoning the democratic requirement sounds risky!

Relaxing the ‘extremely effective’ assumption

Assume that:

  1. Surveillance is implemented by good actors

  2. Surveillance is democratically agreed upon

  3. Surveillance is global

but that surveillance is not necessarily extremely effective.

This surveillance faces three key problems:

  • both kinds of ineffectiveness (false negatives and false positives) are harmful

  • not knowing it’s ineffective is dangerous, as it leads to a false sense of security

  • ineffective surveillance isn’t worth the costs

False negatives

Misidentifying catastrophic acts as innocent acts leads to, well, global catastrophe.

False positives

Misidentifying innocent acts as catastrophic acts limits free thought and disincentivises weirdness.

If we don’t know it’s ineffective, surveillance provides a false sense of security

If our surveillance system appears to work, and we collectively relax a bit, we’re likely to be underprepared for any failure of the system.

Another way of looking at this is that it’s rational to spend a lot of time preparing for the outcomes of very bad events in the no surveillance world: they seem sufficiently likely that the benefits of planning for them outweigh the costs. If we were to implement effective global surveillance, then planning for the outcomes of really bad events no longer seems worth it. But we could be wrong about how effective our surveillance is, which makes failure to plan much worse; and even if we’re right about effectiveness, bad events we haven’t planned for could still occur, and would be much worse given we haven’t planned.

Ineffective surveillance isn’t worth the harms

Surveillance has to be a certain level of effectiveness to justify its harms. I think what this level is is, and whether it would be technically feasible to reach that level, are still open questions.

Relaxing the ‘democratically agreed upon’ assumption

Assume that:

  1. Surveillance is implemented by good actors

  2. Surveillance is extremely effective (if not perfect)

  3. Surveillance is global

But that surveillance is not necessarily democratically agreed upon.

Democratic agreement is an unrealistic assumption

Global surveillance is a kind of collective action problem. The cost to each individual is high: the surveilled sacrifice privacy, weirdness, and trust, as I have argued. The benefit to each individual, however, is relatively small.[6]Given this, it seems unlikely that surveillance would be democratically consented to.

Undemocratic global surveillance is really quite bad

Three key claims here:

  1. Democracy is good

  2. Democracy is important for surveillance, since misuse of surveillance threatens democracy

  3. Democracy is important for surveillance, since democratic approval increases the likelihood of compliance

I’m not going to argue for democracy from first principles. I think it’s pretty good! And more contentiously, I think it’s also uniquely important for surveillance, since misused surveillance can directly threaten democracy.

Relaxing the ‘global’ assumption

Assume that:

  1. Surveillance is implemented by good actors

  2. Surveillance is extremely effective (if not perfect)

  3. Surveillance is democratically agreed upon

But that surveillance is not necessarily global.

Non-global surveillance is ineffective

Two problems here:

  • Non-global surveillance is not effective at preventing global catastrophes in countries which do not surveil

  • It is still possible to cause global surveillance in countries which do surveil, if these catastrophes can be triggered remotely

I’m not sure whether some countries implementing surveillance increases or decreases the risk of global catastrophe in other countries: it seems like it could go either way. On one hand, implementing surveillance imposes a kind of risk externality onto non-surveilling countries (as all else being equal, malicious actors will prefer to act in locations where they’re more likely to succeed). On the other hand, implementing surveillance may provide countries with information on malicious actors which they can share globally, and so reduce their corresponding risk. But even if some countries implementing surveillance decreases global catastrophe risk in non-surveilling countries, I would expect the magnitude of the effect to be small. Thus, the first problem remains. Under our assumption of technological progress, I think the second problem is very plausible. So ultimately, non-global surveillance is ineffective.

Conclusion

I think there are a few key points here:

  • Having examined Bostrom’s VWH, it’s not clear that the best intervention is surveillance

  • Even if the best intervention is surveillance, the costs probably outweigh the benefits

    • The best case scenario has distinct costs

    • The best case scenario seems very unlikely; in particular, it seems really hard to achieve global democratic surveillance

    • Relaxing these assumptions increases the costs

  • Surveillance is one example of an intervention to mitigate some kinds of existential risk which itself increases other kinds of existential risk (or, at minimum, global catastrophic risks)

    • It would be interesting to see if other interventions fit this description, and if so, if that tells us anything about our conceptualisation of existential and global catastrophic risks

In one line: ubiquitous real-time worldwide surveillance is probably a bad idea.

Notes


  1. ↩︎

    The means/​information distinction is pretty arbitrary—I introduce it to clarify my thinking later. I draw the distinction by assuming that to cause a global catastrophe you need information on how to do so, and the physical implementation of that information (e.g. assembling a bomb, or writing malicious code). ‘Means’ is supposed to capture everything in the second category, i.e. everything necessary to cause a global catastrophe which isn’t just information.

  2. ↩︎

    There a few different ways it could be the ‘best’ response:

    1. If it is the most effective response

      1. i.e. reduces vulnerability risk the most

      2. i.e. reduces overall risk the most

    2. If it is the least costly response

      1. i.e. weighing the benefits of risk reduction against the harms of possible interventions, it has the highest expected utility

      2. i.e. for a given ‘intervention harm budget’, it delivers the greatest risk reduction

      3. i.e. for a given ‘risk reduction quota’, it has the smallest intervention harm

    I think my arguments stand regardless of which notion we choose, and I try to remain neutral between these interpretations.

  3. ↩︎
  4. ↩︎

    See, for example, The Right to Privacy

  5. ↩︎

    See, for example, this project

  6. ↩︎

    Here I assume that really bad actions are very unlikely to happen to any particular person, and only become extremely likely in the long term. It’s unlikely that I will personally experience a global catastrophic event: the risk of a global catastrophe happening is high only when considered over time spans greater than a lifetime. The second perspective is relevant for existential risk mitigation, while the first is relevant for individual action.