Persuasion Tools: AI takeover without AGI or agency?

[epistemic sta­tus: spec­u­la­tion]
Cross­posted from LessWrong

I’m en­vi­sion­ing that in the fu­ture there will also be sys­tems where you can in­put any con­clu­sion that you want to ar­gue (in­clud­ing moral con­clu­sions) and the tar­get au­di­ence, and the sys­tem will give you the most con­vinc­ing ar­gu­ments for it. At that point peo­ple won’t be able to par­ti­ci­pate in any on­line (or offline for that mat­ter) dis­cus­sions with­out risk­ing their ob­ject-level val­ues be­ing hi­jacked.

--Wei Dai

What if most peo­ple already live in that world? A world in which tak­ing ar­gu­ments at face value is not a ca­pac­ity-en­hanc­ing tool, but a se­cu­rity vuln­er­a­bil­ity? Without trusted filters, would they not dis­miss high­falutin ar­gu­ments out of hand, and fo­cus on whether the per­son mak­ing the ar­gu­ment seems friendly, or un­friendly, us­ing hard to fake group-af­fili­a­tion sig­nals?


1. AI-pow­ered memetic war­fare makes all hu­mans effec­tively in­sane.

--Wei Dai, list­ing non­stan­dard AI doom scenarios

This post spec­u­lates about per­sua­sion tools—how likely they are to get bet­ter in the fu­ture rel­a­tive to coun­ter­mea­sures, what the effects of this might be, and what im­pli­ca­tions there are for what we should do now.

To avert eye-rolls, let me say up front that I don’t think the world is likely to be driven in­sane by AI-pow­ered memetic war­fare. I think progress in per­sua­sion tools will prob­a­bly be grad­ual and slow, and defenses will im­prove too, re­sult­ing in an over­all shift in the bal­ance that isn’t huge: a de­te­ri­o­ra­tion of col­lec­tive episte­mol­ogy, but not a mas­sive one. How­ever, (a) I haven’t yet ruled out more ex­treme sce­nar­ios, es­pe­cially dur­ing a slow take­off, and (b) even small, grad­ual de­te­ri­o­ra­tions are im­por­tant to know about. Such a de­te­ri­o­ra­tion would make it harder for so­ciety to no­tice and solve AI safety and gov­er­nance prob­lems, be­cause it is worse at notic­ing and solv­ing prob­lems in gen­eral. Such a de­te­ri­o­ra­tion could also be a risk fac­tor for world war three, rev­olu­tions, sec­tar­ian con­flict, ter­ror­ism, and the like. More­over, such a de­te­ri­o­ra­tion could hap­pen lo­cally, in our com­mu­nity or in the com­mu­ni­ties we are try­ing to in­fluence, and that would be al­most as bad. Since the date of AI takeover is not the day the AI takes over, but the point it’s too late to re­duce AI risk, these things ba­si­cally shorten timelines.

Six ex­am­ples of per­sua­sion tools

An­a­lyz­ers: Poli­ti­cal cam­paigns and ad­ver­tisers already use fo­cus groups, A/​B test­ing, de­mo­graphic data anal­y­sis, etc. to craft and tar­get their pro­pa­ganda. Imag­ine a world where this sort of anal­y­sis gets bet­ter and bet­ter, and is used to guide the cre­ation and dis­sem­i­na­tion of many more types of con­tent.

Feed­ers: Most hu­mans already get their news from var­i­ous “feeds” of daily in­for­ma­tion, con­trol­led by recom­men­da­tion al­gorithms. Even worse, peo­ple’s abil­ity to seek out new in­for­ma­tion and find an­swers to ques­tions is also to some ex­tent con­trol­led by recom­men­da­tion al­gorithms: Google Search, for ex­am­ple. There’s a lot of talk these days about fake news and con­spir­acy the­o­ries, but I’m pretty sure that se­lec­tive/​bi­ased re­port­ing is a much big­ger prob­lem.

Chat­bot: Thanks to re­cent ad­vance­ments in lan­guage mod­el­ing (e.g. GPT-3) chat­bots might be­come ac­tu­ally good. It’s easy to imag­ine chat­bots with mil­lions of daily users con­tinu­ally op­ti­mized to max­i­mize user en­gage­ment—see e.g. Xiaoice. The sys­tems could then be re­trained to per­suade peo­ple of things, e.g. that cer­tain con­spir­acy the­o­ries are false, that cer­tain gov­ern­ments are good, that cer­tain ide­olo­gies are true. Per­haps no one would do this, but I’m not op­ti­mistic.

Coach: A cross be­tween a chat­bot, a feeder, and an an­a­lyzer. It doesn’t talk to the tar­get on its own, but you give it ac­cess to the con­ver­sa­tion his­tory and ev­ery­thing you know about the tar­get and it coaches you on how to per­suade them of what­ever it is you want to per­suade them of.

Drugs: There are ru­mors of drugs that make peo­ple more sug­gestible, like scopolomine. Even if these ru­mors are false, it’s not hard to imag­ine new drugs be­ing in­vented that have a similar effect, at least to some ex­tent. (Al­co­hol, for ex­am­ple, seems to lower in­hi­bi­tions. Other drugs make peo­ple more cre­ative, etc.) Per­haps these drugs by them­selves would be not enough, but would work in com­bi­na­tion with a Coach or Chat­bot. (You meet tar­get for din­ner, and slip some drug into their drink. It is mild enough that they don’t no­tice any­thing, but it primes them to be more sus­cep­ti­ble to the ask you’ve been coached to make.)

Im­perius Curse: Th­ese are a kind of ad­ver­sar­ial ex­am­ple that gets the tar­get to agree to an ask (or even switch sides in a con­flict!), or adopt a be­lief (or even an en­tire ide­ol­ogy!). Pre­sum­ably they wouldn’t work against hu­mans, but they might work against AIs, es­pe­cially if meme the­ory ap­plies to AIs as it does to hu­mans. The rea­son this would work bet­ter against AIs than against hu­mans is that you can steal a copy of the AI and then use mas­sive amounts of com­pute to ex­per­i­ment on it, find­ing ex­actly the se­quence of in­puts that max­i­mizes the prob­a­bil­ity that it’ll do what you want.

We might get pow­er­ful per­sua­sion tools prior to AGI

The first thing to point out is that many of these kinds of per­sua­sion tools already ex­ist in some form or an­other. And they’ve been get­ting bet­ter over the years, as tech­nol­ogy ad­vances. Defenses against them have been get­ting bet­ter too. It’s un­clear whether the bal­ance has shifted to fa­vor these tools, or their defenses, over time. How­ever, I think we have rea­son to think that the bal­ance may shift heav­ily in fa­vor of per­sua­sion tools, prior to the ad­vent of other kinds of trans­for­ma­tive AI. The main rea­son is that progress in per­sua­sion tools is con­nected to progress in Big Data and AI, and we are cur­rently liv­ing through a pe­riod of rapid progress those things, and prob­a­bly progress will con­tinue to be rapid (and pos­si­bly ac­cel­er­ate) prior to AGI.

How­ever, here are some more spe­cific rea­sons to think per­sua­sion tools may be­come rel­a­tively more pow­er­ful:

Sub­stan­tial prior: Shifts in the bal­ance be­tween things hap­pen all the time. For ex­am­ple, the bal­ance be­tween weapons and ar­mor has os­cillated at least a few times over the cen­turies. Ar­guably per­sua­sion tools got rel­a­tively more pow­er­ful with the in­ven­tion of the print­ing press, and again with ra­dio, and now again with the in­ter­net and Big Data. Some have sug­gested that the print­ing press helped cause re­li­gious wars in Europe, and that ra­dio as­sisted the vi­o­lent to­tal­i­tar­ian ide­olo­gies of the early twen­tieth cen­tury.

Con­sis­tent with re­cent ev­i­dence: A shift in this di­rec­tion is con­sis­tent with the so­cietal changes we’ve seen in re­cent years. The in­ter­net has brought with it many in­ven­tions that im­prove col­lec­tive episte­mol­ogy, e.g. google search, Wikipe­dia, the abil­ity of com­mu­ni­ties to cre­ate fo­rums… Yet on bal­ance it seems to me that col­lec­tive episte­mol­ogy has de­te­ri­o­rated in the last decade or so.

Lots of room for growth: I’d guess that there is lots of “room for growth” in per­sua­sive abil­ity. There are many kinds of per­sua­sion strat­egy that are tricky to use suc­cess­fully. Like a com­plex en­g­ine de­sign com­pared to a sim­ple one, these strate­gies might work well, but only if you have enough data and time to re­fine them and find the spe­cific ver­sion that works at all, on your spe­cific tar­get. Hu­mans never have that data and time, but AI+Big Data does, since it has ac­cess to mil­lions of con­ver­sa­tions with similar tar­gets. Per­sua­sion tools will be able to say things like “In 90% of cases where tar­gets in this spe­cific de­mo­graphic are prompted to con­sider and then re­ject the simu­la­tion ar­gu­ment, and then challenged to jus­tify their prej­u­dice against ma­chine con­scious­ness, the tar­get gets flus­tered and con­fused. Then, if we make em­pa­thetic noises and change the sub­ject again, 50% of the time the sub­ject sub­con­sciously changes their mind so that when next week we pre­sent our ar­gu­ment for ma­chine rights they go along with it, com­pared to 10% baseline prob­a­bil­ity.”

Plau­si­bly pre-AGI: Per­sua­sion is not an AGI-com­plete prob­lem. Most of the types of per­sua­sion tools men­tioned above already ex­ist, in weak form, and there’s no rea­son to think they can’t grad­u­ally get bet­ter well be­fore AGI. So even if they won’t im­prove much in the near fu­ture, plau­si­bly they’ll im­prove a lot by the time things get re­ally in­tense.

Lan­guage mod­el­ling progress: Per­sua­sion tools seem to be es­pe­cially benefit­ted by progress in lan­guage mod­el­ling, and lan­guage mod­el­ling seems to be mak­ing even more progress than the rest of AI these days.

More things can be mea­sured: Thanks to said progress, we now have the abil­ity to cheaply mea­sure nu­anced things like user ide­ol­ogy, en­abling us to train sys­tems to­wards those ob­jec­tives.

Chat­bots & Coaches: Thanks to said progress, we might see some halfway-de­cent chat­bots prior to AGI. Thus an en­tire cat­e­gory of per­sua­sion tool that hasn’t ex­isted be­fore might come to ex­ist in the fu­ture. Chat­bots too stupid to make good con­ver­sa­tion part­ners might still make good coaches, by helping the user pre­dict the tar­get’s re­ac­tions and sug­gest­ing pos­si­ble things to say.

Minor im­prove­ments still im­por­tant: Per­sua­sion doesn’t have to be perfect to rad­i­cally change the world. An an­a­lyzer that helps your memes have a 10% higher repli­ca­tion rate is a big deal; a coach that makes your asks 30% more likely to suc­ceed is a big deal.

Faster feed­back: One way defenses against per­sua­sion tools have strength­ened is that peo­ple have grown wise to them. How­ever, the sorts of per­sua­sion tools I’m talk­ing about seem to have sig­nifi­cantly faster feed­back loops than the pro­pa­gan­dists of old; they can learn con­stantly, from the en­tire pop­u­la­tion, whereas past pro­pa­gan­dists (if they were learn­ing at all, as op­posed to evolv­ing) re­lied on nois­ier, more de­layed sig­nals.

Over­hang: Find­ing per­sua­sion drugs is costly, im­moral, and not guaran­teed to suc­ceed. Per­haps this ex­plains why it hasn’t been at­tempted out­side a few cases like MKULTRA. But as tech­nol­ogy ad­vances, the cost goes down and the prob­a­bil­ity of suc­cess goes up, mak­ing it more likely that some­one will at­tempt it, and giv­ing them an “over­hang” with which to achieve rapid progress if they do. (I hear that there are now mul­ti­ple star­tups built around us­ing AI for drug dis­cov­ery, by the way.) A similar ar­gu­ment might hold for per­sua­sion tools more gen­er­ally: We might be in a “per­sua­sion tool over­hang” in which they have not been de­vel­oped for eth­i­cal and risk­i­ness rea­sons, but at some point the price and risk­i­ness drops low enough that some­one does it, and then that trig­gers a cas­cade of more and richer peo­ple build­ing bet­ter and bet­ter ver­sions.

Spec­u­la­tion about effects of pow­er­ful per­sua­sion tools

Here are some hasty spec­u­la­tions, be­gin­ning with the most im­por­tant one:

Ide­olo­gies & the bio­sphere anal­ogy:

The world is, and has been for cen­turies, a memetic war­zone. The main fac­tions in the war are ide­olo­gies, broadly con­strued. It seems likely to me that some of these ide­olo­gies will use per­sua­sion tools—both on their hosts, to for­tify them against ri­val ide­olo­gies, and on oth­ers, to spread the ide­ol­ogy.

Con­sider the memetic ecosys­tem—all the memes repli­cat­ing and evolv­ing across the planet. Like the biolog­i­cal ecosys­tem, some memes are adapted to, and con­fined to, par­tic­u­lar niches, while other memes are wide­spread. Some memes are in the pro­cess of grad­u­ally go­ing ex­tinct, while oth­ers are ex­pand­ing their ter­ri­tory. Many ex­ist in some sort of equil­ibrium, at least for now, un­til the cli­mate changes. What will be the effect of per­sua­sion tools on the memetic ecosys­tem?

For ide­olo­gies at least, the effects seem straight­for­ward: The ide­olo­gies will be­come stronger, harder to erad­i­cate from hosts and bet­ter at spread­ing to new hosts. If all ide­olo­gies got ac­cess to equally pow­er­ful per­sua­sion tools, per­haps the over­all bal­ance of power across the ecosys­tem would not change, but re­al­is­ti­cally the tools will be un­evenly dis­tributed. The likely re­sult is a rapid tran­si­tion to a world with fewer, more pow­er­ful ide­olo­gies. They might be more in­ter­nally unified, as well, hav­ing fewer spin-offs and schisms due to the cen­tral­ized con­trol and stan­dard­iza­tion im­posed by the per­sua­sion tools. An ad­di­tional force push­ing in this di­rec­tion is that ide­olo­gies that are big­ger are likely to have more money and data with which to make bet­ter per­sua­sion tools, and the tools them­selves will get bet­ter the more they are used.

Re­call the quotes I led with:

… At that point peo­ple won’t be able to par­ti­ci­pate in any on­line (or offline for that mat­ter) dis­cus­sions with­out risk­ing their ob­ject-level val­ues be­ing hi­jacked.

--Wei Dai

What if most peo­ple already live in that world? A world in which tak­ing ar­gu­ments at face value is not a ca­pac­ity-en­hanc­ing tool, but a se­cu­rity vuln­er­a­bil­ity? Without trusted filters, would they not dis­miss high­falutin ar­gu­ments out of hand … ?


1. AI-pow­ered memetic war­fare makes all hu­mans effec­tively in­sane.

--Wei Dai, list­ing non­stan­dard AI doom scenarios

I think the case can be made that we already live in this world to some ex­tent, and have for mil­le­nia. But if per­sua­sion tools get bet­ter rel­a­tive to coun­ter­mea­sures, the world will be more like this.

This seems to me to be an ex­is­ten­tial risk fac­tor. It’s also a risk fac­tor for lots of other things, for that mat­ter. Ide­olog­i­cal strife can get pretty nasty (e.g. re­li­gious wars, gu­lags, geno­cides, to­tal­i­tar­i­anism), and even when it doesn’t, it still of­ten gums things up (e.g. sup­pres­sion of sci­ence, zero-sum men­tal­ity pre­vent­ing win-win-solu­tions, virtue sig­nal­ling death spirals, re­fusal to com­pro­mise). This is bad enough already, but it’s dou­bly bad when it comes at a mo­ment in his­tory where big new col­lec­tive ac­tion prob­lems need to be rec­og­nized and solved.

Ob­vi­ous uses: Ad­ver­tis­ing, scams, pro­pa­ganda by au­thor­i­tar­ian regimes, etc. will im­prove. This means more money and power to those who con­trol the per­sua­sion tools. Maybe an­other im­por­tant im­pli­ca­tion would be that democ­ra­cies would have a ma­jor dis­ad­van­tage on the world stage com­pared to to­tal­i­tar­ian au­toc­ra­cies. One of many rea­sons for this is that scis­sor state­ments and other di­vi­sive­ness-sow­ing tac­tics may not tech­ni­cally count as per­sua­sion tools but they would prob­a­bly get more pow­er­ful in tan­dem.

Will the truth rise to the top: Op­ti­misti­cally, one might hope that wide­spread use of more pow­er­ful per­sua­sion tools will be a good thing, be­cause it might cre­ate an en­vi­ron­ment in which the truth “rises to the top” more eas­ily. For ex­am­ple, if ev­ery side of a de­bate has ac­cess to pow­er­ful ar­gu­ment-mak­ing soft­ware, maybe the side that wins is more likely to be the side that’s ac­tu­ally cor­rect. I think this is a pos­si­bil­ity but I do not think it is prob­a­ble. After all, it doesn’t seem to be what’s hap­pened in the last two decades or so of wide­spread in­ter­net use, big data, AI, etc. Per­haps, how­ever, we can make it true for some do­mains at least, by set­ting the rules of the de­bate.

Data hoard­ing: A com­mu­nity’s data (chat logs, email threads, de­mo­graph­ics, etc.) may be­come even more valuable. It can be used by the com­mu­nity to op­ti­mize their in­ward-tar­geted per­sua­sion, im­prov­ing group loy­alty and co­he­sion. It can be used against the com­mu­nity if some­one else gets ac­cess to it. This goes for in­di­vi­d­u­als as well as com­mu­ni­ties.

Chat­bot so­cial hack­ing viruses: So­cial hack­ing is sur­pris­ingly effec­tive. The clas­sic ex­am­ple is call­ing some­one pre­tend­ing to be some­one else and get­ting them to do some­thing or re­veal sen­si­tive in­for­ma­tion. Phish­ing is like this, only much cheaper (be­cause au­to­mated) and much less effec­tive. I can imag­ine a virus that is close to as good as a real hu­man at so­cial hack­ing while be­ing much cheaper and able to scale rapidly and in­definitely as it ac­quires more com­pute and data. In fact, a virus like this could be made with GPT-3 right now, us­ing prompt pro­gram­ming and “moth­er­ship” servers to run the model. (The prompts would evolve to match the lo­cal en­vi­ron­ment be­ing hacked.) Whether GPT-3 is smart enough for it to be effec­tive re­mains to be seen.


I doubt that per­sua­sion tools will im­prove dis­con­tin­u­ously, and I doubt that they’ll im­prove mas­sively. But minor and grad­ual im­prove­ments mat­ter too.

Of course, in­fluence over the fu­ture might not dis­ap­pear all on one day; maybe there’ll be a grad­ual loss of con­trol over sev­eral years. For that mat­ter, maybe this grad­ual loss of con­trol be­gan years ago and con­tinues now...

--Me, from a pre­vi­ous post

I think this is po­ten­tially (5% cre­dence) the new Cause X, more im­por­tant than (tra­di­tional) AI al­ign­ment even. It prob­a­bly isn’t. But I think some­one should look into it at least, more thor­oughly than I have.

To be clear, I don’t think it’s likely that we can do much to pre­vent this stuff from hap­pen­ing. There are already lots of peo­ple rais­ing the alarm about filter bub­bles, recom­men­da­tion al­gorithms, etc. so maybe it’s not su­per ne­glected and maybe our in­fluence over it is small. How­ever, at the very least, it’s im­por­tant for us to know how likely it is to hap­pen, and when, be­cause it helps us pre­pare. For ex­am­ple, if we think that col­lec­tive episte­mol­ogy will have de­te­ri­o­rated sig­nifi­cantly by the time crazy AI stuff starts hap­pen­ing, that in­fluences what sorts of AI policy strate­gies we pur­sue.

Note that if you dis­agree with me about the ex­treme im­por­tance of AI al­ign­ment, or if you think AI timelines are longer than mine, or if you think fast take­off is less likely than I do, you should all else equal be more en­thu­si­as­tic about in­ves­ti­gat­ing per­sua­sion tools than I am.

Thanks to Katja Grace, Emery Cooper, Richard Ngo, and Ben Gold­haber for feed­back on a draft.

Re­lated pre­vi­ous work:

Epistemic Se­cu­rity report

Align­ing Recom­mender Systems

Stuff I’d read if I was in­ves­ti­gat­ing this in more depth:

Not Born Yesterday

The stuff here and here