#191 (Part 2) – Government and society after AGI (Carl Shulman on the 80,000 Hours Podcast)

We just published an interview: Carl Shulman on government and society after AGI. Listen on Spotify or click through for other audio options, the transcript, and related links. Below are the episode summary and some key excerpts.

Episode summary

The AI advisor would point out all of these places where the system is making the top-level objective of getting a vaccine quickly, where that’s going wrong, and clarifies which changes will make it happen quicker. “If you replace person X with person Y; if you cancel this regulation, these outcomes will happen, and you’ll get the vaccine earlier. People’s lives will be saved, the economy will be rebooted,” et cetera.
There’s just all kinds of ways in which the thing is self-destructive, and only sustainable by deep epistemic failures and the corruption of the knowledge system that very often happens to human institutions. But making it as easy as possible to avoid that would improve it. And then going forward, I think these same sort of systems advise us to change our society such that we will never again have a pandemic like that, and we would be robust even to an engineered pandemic and the like.
- Carl Shulman

This is the second part of our marathon interview with Carl Shulman. The first episode is on the economy and national security after AGI. You can listen to them in either order!

If we develop artificial general intelligence that’s reasonably aligned with human goals, it could put a fast and near-free superhuman advisor in everyone’s pocket. How would that affect culture, government, and our ability to act sensibly and coordinate together?

It’s common to worry that AI advances will lead to a proliferation of misinformation and further disconnect us from reality. But in today’s conversation, AI expert Carl Shulman argues that this underrates the powerful positive applications the technology could have in the public sphere.

As Carl explains, today the most important questions we face as a society remain in the “realm of subjective judgement” — without any “robust, well-founded scientific consensus on how to answer them.” But if AI ‘evals’ and interpretability advance to the point that it’s possible to demonstrate which AI models have truly superhuman judgement and give consistently trustworthy advice, society could converge on firm or ‘best-guess’ answers to far more cases.

If the answers are publicly visible and confirmable by all, the pressure on officials to act on that advice could be great.

That’s because when it’s hard to assess if a line has been crossed or not, we usually give people much more discretion. For instance, a journalist inventing an interview that never happened will get fired because it’s an unambiguous violation of honesty norms — but so long as there’s no universally agreed-upon standard for selective reporting, that same journalist will have substantial discretion to report information that favours their preferred view more often than that which contradicts it.

Similarly, today we have no generally agreed-upon way to tell when a decision-maker has behaved irresponsibly. But if experience clearly shows that following AI advice is the wise move, not seeking or ignoring such advice could become more like crossing a red line — less like making an understandable mistake and more like fabricating your balance sheet.

To illustrate the possible impact, Carl imagines how the COVID pandemic could have played out in the presence of AI advisors that everyone agrees are exceedingly insightful and reliable.

To start, advance investment in preventing, detecting, and containing pandemics would likely have been at a much higher and more sensible level, because it would have been straightforward to confirm which efforts passed a cost-benefit test for government spending. Politicians refusing to fund such efforts when the wisdom of doing so is an agreed and established fact would seem like malpractice.

Low-level Chinese officials in Wuhan would have been seeking advice from AI advisors instructed to recommend actions that are in the interests of the Chinese government as a whole. As soon as unexplained illnesses started appearing, that advice would be to escalate and quarantine to prevent a possible new pandemic escaping control, rather than stick their heads in the sand as happened in reality. Having been told by AI advisors of the need to warn national leaders, ignoring the problem would be a career-ending move.

From there, these AI advisors could have recommended stopping travel out of Wuhan in November or December 2019, perhaps fully containing the virus, as was achieved with SARS-1 in 2003. Had the virus nevertheless gone global, President Trump would have been getting excellent advice on what would most likely ensure his reelection. Among other things, that would have meant funding Operation Warp Speed far more than it in fact was, as well as accelerating the vaccine approval process, and building extra manufacturing capacity earlier. Vaccines might have reached everyone far faster.

These are just a handful of simple changes from the real course of events we can imagine — in practice, a significantly superhuman AI might suggest novel approaches better than any we can suggest here.

In the past we’ve usually found it easier to predict how hard technologies like planes or factories will change than to imagine the social shifts that those technologies will create — and the same is likely happening for AI.

Carl Shulman and host Rob Wiblin discuss the above, as well as:

The risk of society using AI to lock in its values.
The difficulty of preventing coups once AI is key to the military and police.
What international treaties we need to make this go well.
How to make AI superhuman at forecasting the future.
Whether AI will be able to help us with intractable philosophical questions.
Whether we need dedicated projects to make wise AI advisors, or if it will happen automatically as models scale.
Why Carl doesn’t support AI companies voluntarily pausing AI research, but sees a stronger case for binding international controls once we’re closer to ‘crunch time.’
Opportunities for listeners to contribute to making the future go well.

Producer and editor: Keiran Harris
Audio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic Armstrong
Transcriptions: Katy Moore

Highlights

How AI advisors could have saved us from COVID-19

Carl Shulman: With Operation Warp Speed, which was the effort under the Trump administration to put up the money to mass produce vaccines in advance, they went fairly big on it by historical standards, but not nearly as big as they should have. Spending much more money to get the vaccines moderately faster and end lockdowns slightly earlier would be hugely valuable. You’d save lives, you’d save incredible quantities of money. It was absolutely worth it to spend an order of magnitude or more additional funds on that. And then many other countries, European countries, that haggled over the price on these things, and therefore didn’t get as much vaccine early, at a cost of losing $10 or $100 for every dollar you “save” on this thing.
If you have the AI advisors, and they are telling you, “Look, this stuff is going to happen; you’re going to regret it.” The AI advisor is credible. It helps navigate between politicians not fully understanding the economics and politics. It helps the politicians deal with the public, because the politicians can cite the AI advice, and that helps to deflect blame from them, including controversial decisions.
So one reason why there was resistance to Operation Warp Speed and similar efforts is you’re supporting the development of vaccines before they’ve been fully tested on the actual pandemic. And you may be embarrassed if you paid a lot of money for a vaccine that turns out not actually to be super useful and super helpful. And if you’re risk averse, you’re very afraid of that outcome. You’re not as correspondingly excited about saving everybody as long as you’re not clearly blameworthy. Well, having this publicly accessible thing, where everyone knows that HonestGPT is saying this, then you look much worse to the voters when you go against the advice that everyone knows is the best estimate, and then things go wrong.
I think from that, you get vaccines being produced in quantity earlier and used. And then at the level of their deployment, I think you get similar things. So Trump, Donald Trump, in his last year in office, he was actually quite enthusiastic about getting a vaccine out before he went up for reelection. And of course, his opponents were much less enthusiastic about rapid vaccine development before election day, compared to how they were after. But the president wanted to get the system moving as quickly as possible in the direction of vaccines being out and deployed quickly, and then lockdowns reduced and such early — and in fact, getting vaccines fast, and lockdowns and NPIs over quickly, was the better policy.
And so if he had access to AI advisors telling him what is going to maximise your chances of reelection, they would suggest, “The timeline for this development is too slow. If you make challenge trials happen to get these things verified very early, you’ll be able to get the vaccine distributed months earlier.” And then the AI advisor would tell you about all the problems that actually slowed the implementation of challenge trials in the real pandemic. They say, “They’ll be quibbling about producing a suitable version of the pathogen for administration in the trial. There’ll be these regulatory delays, where commissions just decide to go home for the weekend instead of making a decision three days earlier and saving enormous numbers of lives.”
And so the AI advisor would point out all of these places where the system is making the top-level objective of getting a vaccine quickly, where that’s going wrong, and clarifies which changes will make it happen quicker. “If you replace person X with person Y; if you cancel this regulation, these outcomes will happen, and you’ll get the vaccine earlier. People’s lives will be saved, the economy will be rebooted,” et cetera.
And then we go from there out through the end. You’d have similar benefits on the effect of school closures, on learning loss. You’d have similar effects on anti-vaccine sentiment. So processing the data about the demonisation of vaccines that happened in the United States later on, and so having that as a very systematic, trusted source — where even the honest GPT made by conservatives, maybe Grok, Elon Musk’s new AI system, Grok would also be telling you, if you are a Republican conservative who’s suspicious of vaccines, especially after the vaccine is no longer associated with the Trump administration, but the Biden administration, then having Grok, an equivalent, telling you these things, having it tell you that anti-vaccine things are to the disadvantage of conservatives, because they were getting disproportionately killed and reducing the number of conservative voters…
There’s just all kinds of ways in which the thing is self-destructive, and only sustainable by deep epistemic failures and the corruption of the knowledge system that very often happens to human institutions. But making it as easy as possible to avoid that would improve it. And then going forward, I think these same sort of systems advise us to change our society such that we will never again have a pandemic like that, and we would be robust even to an engineered pandemic and the like.
This is my soup to nuts how AI advisors would have really saved us from COVID-19 and the mortality and health loss and economic losses, and then just the political disruption and ongoing derangement of politics as a result of that kind of dynamic.

Why Carl doesn’t support enforced pauses on AI research

Carl Shulman: The big question that one needs to answer is what happens during the pause. I think this is one of the major reasons why there was a much more limited set of people ready to sign and support the open letter calling for a six-month pause in AI development, and suggesting that governments figure out their regulatory plans with respect to AI during that period. Many people who did not sign that letter then went on to sign the later letter noting that AI posed a risk of human extinction and should be considered alongside threats of nuclear weapons and pandemics. I think I would be in the group that was supportive of the second letter, but not the first.
I’d say that for me, the key reason is that when you ask, when does a pause add the most value? When do you get the greatest improvements in safety or ability to regulate AI, or ability to avoid disastrous geopolitical effects of AI? Those make a bigger difference the more powerful the AI is, and they especially make a bigger difference the more rapid change in progress in AI becomes.
I think the pace of technological, industrial, and economic change is going to intensify enormously as AI becomes capable of automating the processes of further improving AI and developing other technologies. And that’s also the point where AI is getting powerful enough that, say, threats of AI takeover or threats of AI undermining nuclear deterrence come into play. So it could make an enormous difference whether you have two years rather than two months, or six months rather than two months, to do certain tasks in safely aligning AI — because that is a period when AI might hack the servers it’s operating on, undermine all of your safety provisions, et cetera. It can make a huge difference, and the political momentum to take measures would be much greater in the face of clear evidence that AI had reached such spectacular capabilities.
To the extent you have a willingness to do a pause, it’s going to be much more impactful later on. And even worse, it’s possible that a pause, especially a voluntary pause, then is disproportionately giving up the opportunity to do pauses at that later stage when things are more important. So if we have a situation where, say, the companies with the greatest concern about misuse of AI or the risk of extinction from AI — and indeed the CEOs of several of these leading AI labs signed the extinction risk letter, while not the pause letter — if those companies, only the signatories of the extinction letter do a pause, then the companies with the least concern about these downsides gain in relative influence, relative standing.
And likewise in the international situation. So right now, the United States and its allies are the leaders in semiconductor technology and the production of chips. The United States has been restricting semiconductor exports to some states where it’s concerned about their military use. And a unilateral pause is shifting relative influence and control over these sorts of things to those states that don’t participate — especially if, as in the pause letter, it was restricted to training large models rather than building up semiconductor industries, building up large server farms and similar.
So it seems this would be reducing the slack and intensifying the degree to which international competition might otherwise be close, which might make it more likely that things like safety get compromised a lot.
Because the best situation might be an international deal that can regulate the pace of progress during that otherwise incredible rocket ship of technological change and potential disaster that would happen near when AI was fully automating AI research.
Second best might be you have an AI race, but it’s relatively coordinated — it’s at least at the level of large international blocs — and where that race is not very close. So the leader can afford to take six months rather than two months, or 12 months or more to not cut corners with respect to safety or the risk of a coup that overthrows their governmental system or similar. That would be better.
And then the worst might be a very close race between companies, a corporate free-for-all.
So along those lines, it doesn’t seem obvious that that is a direction that increases the ability for later explosive AI progress to be controlled or managed safely, or even to be particularly great for setting up international deals to control and regulate AI.

Value lock-in

Carl Shulman: When I think about, say, an application in North Korea or in the People’s Republic of China, it is already the official doctrine that information needs to be managed in many ways to manipulate the opinions and loyalties of the population. And you might say that these issues of epistemic propaganda and whatnot, a lot of what we’ve been talking about is not really relevant there, because it’s already just a matter of government policy.
But you could see how that could distort things even within the regime. So the Soviet Union collapsed because Gorbachev rose to the top of the system while thinking it was terrible in many ways. Good in many ways: he did want to preserve the Soviet Union; he just was not willing to use violence to keep it together.
But if the ruling party in some of these places sets conditions for, say, loyalty indexes, and then has an AI system that optimised to generate as high a loyalty index as possible — and it gives this result where the loyalty index is higher for someone who really can believe the party line in various ways, although of course changing it whenever the party authorities want something different, then you can wind up with successors or later decisions made by people who have been to some extent driven mad by these things that were mandated as part of the apparatus of loyalty and social control.
And you can imagine, say, in Iran, if the ruling clerics are getting AI advice and just visible evidence that some AIs systematically undermine the faith of people who use them, and that AIs directed to strengthen people’s faith really work, that could result relatively quickly in a collective decision for more of the latter, less of the former. And that gets applied also to the people making these decisions, and results in a sort of runaway ideological shift, in the same way that many of many groups became ideologically extreme in the first place: where there’s competitive signalling to be more loyal to the system, more loyal to the regime than others.
Rob Wiblin: Is there anything that we can do to try to make this kind of misuse less likely to occur?
Carl Shulman: So where a regime is already set up that would have a strong commitment to causing itself to have various delusions, there may be only so much you can do. But by developing the scientific and technical understanding of these kinds of dynamics and communicating that, you could at least help avoid the situations where the leadership of authoritarian regimes get high on their own supply, and wind up accidentally driving themselves into delusions that they might have wanted to avoid.
And at a broader level, to the extent these places are using AI models that are developed in much less oppressive locations, this can mean do not provide models that will engage in this sort of behaviour. Which may mean API access to the very powerful models: do not provide it to North Korea to provide propaganda to its population.
And then there’s a more challenging issue where very advanced open source models are then available to all the dictatorships and oppressive regimes. That’s an issue that recurs for many kinds of potential AI misuse, like bioterrorism and whatnot.

How democracies avoid coups

Carl Shulman: In general, the problem of how democracies avoid coups, avoid the overthrow of the liberal democratic system, tends to work through a setup where different factions expect that the outcomes will be better for them by continuing to follow along with the rules rather than going against them. And part of that is that, when your side loses an election, you expect not to be horribly mistreated on the next round. Part of it is cultivating principles of civilian control of the military, things like separating military leadership from ongoing politics.
Now, AI disrupts that, because you have this new technology that can suddenly replace a lot of the humans whose loyalties previously were helping to defend the system, who would choose not to go along with a coup that would overthrow democracy. So there it seems one needs to be embedding new controls, new analogues to civilian control of the military, into the AI systems themselves, and then having the ability to audit and verify that those rules are being complied with — that the AIs being produced are motivated such that they would not go along with any coup or overthrow the rules that were being set, and that setting and changing those rules required a broad buy-in from society.
So things like supermajority support. There are some institutions — for example, in the United States, the Federal Elections Commission — and in general, election supervisors have to have representation from both parties, because single-party referees for a two-party competitive election is not very solid. But this may mean passing more binding legislation, enabling very rapid judicial supervision and overview of violations of those rules may be necessary, because you need them to happen quite quickly, potentially. This also might be a situation where maybe you should be calling elections more often, when technological change is accelerating tenfold, a hundredfold, and maybe make some provisions for that.
That’s the kind of, unfortunately, human, social, political move that is large; it would require a lot of foresight and buy-in to it being necessary to make the changes. And then there’s just great inertia and resistance. So the difficulty of arranging human and legal and political institutions to manage these kinds of things is one reason why I think it’s worthwhile to put at least a bit of effort into paying attention to where we might be going. But at the same time, I think there are limits to what one can do, and we should just try to pursue every option we can to have the development of AI occur in a context where legal and political authority, and then enforcement mechanisms for that, reflect multiple political factions, multiple countries — and that reduces the risk to pluralism of one faction in one country suddenly dragging the world indefinitely in an unpleasant way.
Carl Shulman: In a world where there are thousands or millions of robots per human, to have a military and security forces that don’t depend on AI is pretty close to just disarmament and banning war. And I hope we do ban war and have general disarmament, but it could be quite difficult to avoid. And in avoiding it, just like the problem of banning nuclear weapons, if you’re going to restrict it, you have to set up a system such that any attempt to break that arrangement is itself stopped.
So I think we do have to think about how we would address the problem when security forces are largely automated, and therefore the protection of constitutional principles like democracy is really dependent on the loyalties of those machines.
Rob Wiblin: At some point, once most of the military power basically is just AI making decisions, having it saying that the way we’re going to keep it safe is that it will always follow human instructions, well, if all of the equipment is following the instructions of the same general, then that’s an extremely unstable situation. And in fact, you need to say no, we need them to follow principles that are not merely following instructions; we need them to reject instructions when those instructions are bad.
Carl Shulman: Indeed. And human soldiers are obligated to reject illegal orders, although it can be harder to implement in practice sometimes than to specify that as a goal. And yes, to the extent that you automate all of these key functions, including the function of safeguarding a democratic constitution, then you need to incorporate that same capacity to reject illegal orders, and even to prevent an illegal attempt to interfere with the processes by which you reject illegal orders. It’s no good if the AIs will refuse an order to, say, overthrow democracy or kill the population, but they will not defend themselves from just being reprogrammed by an illegal attempt.
So that poses deep challenges and is the reason why you want, A, problems of AI alignment and honest AI advice to be solved, and secondly, to have institutional procedures whereby the motives being put into those AIs reflect a broad, pluralistic set of values and all the different interests and factions that need to be represented.

Building trust between adversaries about which models you can believe

Carl Shulman: Right now this is a difficult problem — and you can see that with respect to large software products. So if Windows has backdoors, say, to enable the CIA to route machines running it, Russia or China cannot just purchase off-the-shelf software and have their cybersecurity agencies go through it and find every single zero-day exploit and bug. That’s just quite beyond their capabilities. They can look, and if they find even one, then say, “Now we’re no longer going to trust commercial software that is coming from country X,” they can do that, but they can’t reliably find every single exploit that exists within a large piece of software.
And there’s some evidence that may be true with these AIs. For one thing, there will be software programs running the neural network and providing the scaffolding for AI agents or networks of AI agents and their tools, which can have backdoors in the ordinary way. There are issues with adversarial examples, data poisoning and passwords. So a model can be trained to behave normally, classify images accurately, or produce text normally under most circumstances, but then in response to some special stimulus that would never be produced spontaneously, it will then behave in some quite different way, such as turning against a user who had purchased a copy of it or had been given some access.
So that’s a problem. And developing technical methods that either are able to locate that kind of data poisoning or conditional disposition, or are able to somehow moot it — for example, by making it so that if there are any of these habits or dispositions, they will wind up unable to actually control the behaviour of the AI, and you give it some additional training that restricts how it would react to such impulses. Maybe you have some majority voting system. You could imagine any number of techniques, but right now, I think technically you have a very difficult time being sure that an AI provided by some other company or some other country genuinely had the loyalties that were being claimed — and especially that it wouldn’t, in response to some special code or stimulus, suddenly switch its behaviour or switch its loyalties.
So that is an area where I would very much encourage technical research. Governments that want to have the ability to manage that sort of thing, which they have very strong reasons to do, should want to invest in it. Because if government contractors are producing AIs that are going to be a foundation not just of the public epistemology and political things, but also of industry, security, and military applications, the US military should be pretty wary of a situation where, for all they know, one of their contractors supplying AI systems can give a certain code word, and the US military no longer works for the US military. It works for Google or Microsoft or whatnot. That’s just a situation that is just not very appealing. It’s not one that would arise for a Boeing.
Even if there were a sort of sabotage or backdoor placed in some systems, the potential rewards or uses of that would be less. But if you’re deploying these powerful AI systems at scale, they’re having an enormous amount of influence and power in society — eventually to the point where ultimately the instruments of state hinge on their loyalties — then you really don’t want to have this kind of backdoor or password, because it could actually overthrow the government, potentially. So this is a capability that governments should very much want, almost regardless, and this is a particular application where they should really want it.
But it also would be important for being sure that AI systems deployed at scale by a big government, A, will not betray that government on behalf of the companies that produce them; will not betray the constitutional or legal order of that state on behalf of, say, the executive officials who are nominally in charge of those: you don’t want to have AI enabling a coup that overthrows democracy on behalf of a president against a congress. Or, if you have AI that is developed under international auspices, so it’s supposed to reflect some agreement between multiple states that are all contributing to the endeavour or have joined in the treaty arrangement, you want to be sure that AIs will respect the terms of behaviour that were specified by the multinational agreement and not betray the larger project on behalf of any member state or participating organisation.
So this is a technology that we really should want systematically, just because empowering AIs this much, we want to be able to know their loyalties, and not have it be dependent on no one having inserted an effective backdoor anywhere along a chain of production.

Opportunities for listeners

Carl Shulman: There is huge social value potentially to be provided by predicting the political consequences and economic consequences of different policies. So when we talked earlier about the application to COVID, if politicians were continuously getting smart feedback about how this will affect the public’s happiness two years later, four years later, six years later, and their political response to the politician, that could really shift discourse.
But it’s not the kind of thing that’s likely to result in an enormous amount of financing, unless you might have some government programme to fight misinformation that attempts to create models, or fine-tune open source models, or contract large AI companies to produce AI that appears trustworthy on all of the easy examinations and probes and tests one can make for bias. And it might be that different political actors in government could demand that sort of thing as a criterion for AI being deployed in government, and that could be potentially significant.
Rob Wiblin: Yeah. Are there any other opportunities for listeners potentially to cause this epistemic revolution to happen sooner or better that are worth shouting out?
Carl Shulman: Yeah. Some small academic research effort or the like is going to have difficulty comparing to the resources that these giant AI companies can mobilise. But one enormous advantage they have is independence. So watchdog agencies or organisations that systematically probe the major corporate AI models for honesty, dishonesty, bias of various kinds — and attempt also to fine-tune and scaffold those models to do better on metrics of honesty of various kinds — those could be really helpful, and provide incentives for these large companies to produce models that both do very well on any probe of honesty that one can muster from the outside, and secondly, do so in a way that is relatively robust or transparent to these outside auditors.
But right now this is something that is, I think, not being evaluated in a good systematic way, and there’s a lot of room for developing metrics.