[Link] How understanding valence could help make future AIs safer

A blog post by Mike John­son, Direc­tor of the Qualia Re­search In­sti­tute: https://​​openthe­ory.net/​​2015/​​09/​​fai_and_valence/​​ (a)

Ex­cerpt:

What makes some pat­terns of con­scious­ness feel bet­ter than oth­ers? I.e. can we crisply re­verse-en­g­ineer what makes cer­tain ar­eas of mind-space pleas­ant, and other ar­eas un­pleas­ant?
If we make a smarter-than-hu­man Ar­tifi­cial In­tel­li­gence, how do we make sure it has a pos­i­tive im­pact?… The fol­low­ing out­lines some pos­si­ble ways that progress on the first ques­tion could help us with the sec­ond ques­tion.
...
1. Valence re­search could sim­plify the Value Prob­lem and the Value Load­ing Prob­lem.* If plea­sure/​hap­piness is an im­por­tant core part of what hu­man­ity val­ues, or should value, hav­ing the ex­act in­for­ma­tion-the­o­retic defi­ni­tion of it on-hand could di­rectly and dras­ti­cally sim­plify the prob­lems of what to max­i­mize, and how to load this value into an AGI**...
2. Valence re­search could form the ba­sis for a well-defined ‘san­ity check’ on AGI be­hav­ior. Even if plea­sure isn’t a core ter­mi­nal value for hu­mans, it could still be used as a use­ful in­di­rect heuris­tic for de­tect­ing value de­struc­tion. I.e., if we’re con­sid­er­ing hav­ing an AGI carry out some in­ter­ven­tion, we could ask it what the ex­pected effect is on what­ever pat­tern pre­cisely cor­re­sponds to plea­sure/​hap­piness. If there’s be a lot less of that pat­tern, the in­ter­ven­tion is prob­a­bly a bad idea...
3. Valence re­search could help us be hu­mane to AGIs and WBEs*. There’s go­ing to be a lot of ex­per­i­men­ta­tion in­volv­ing in­tel­li­gent sys­tems, and al­though many of these sys­tems won’t be “sen­tient” in the way hu­mans are, some sys­tem types will ap­proach or even sur­pass hu­man ca­pac­ity for suffer­ing. Un­for­tu­nately, many of these early sys­tems won’t work well— i.e., they’ll be in­sane. It would be great if we had a good way to de­tect profound suffer­ing in such cases and halt the sys­tem...
4. Valence re­search could help us pre­vent Mind Crimes. Nick Bostrom sug­gests in Su­per­in­tel­li­gence that AGIs might simu­late vir­tual hu­mans to re­verse-en­g­ineer hu­man prefer­ences, but that these vir­tual hu­mans might be suffi­ciently high-fidelity that they them­selves could mean­ingfully suffer. We can tell AGIs not to do this- but know­ing the ex­act in­for­ma­tion-the­o­retic pat­tern of suffer­ing would make it eas­ier to spec­ify what not to do.
5. Valence re­search could en­able rad­i­cal forms of cog­ni­tive en­hance­ment. Nick Bostrom has ar­gued that there are hard limits on tra­di­tional phar­ma­ceu­ti­cal cog­ni­tive en­hance­ment, since if the pres­ence of some sim­ple chem­i­cal would help us think bet­ter, our brains would prob­a­bly already be pro­duc­ing it. On the other hand, there seem to be fewer a pri­ori limits on mo­ti­va­tional or emo­tional en­hance­ment. And sure enough, the most effec­tive “cog­ni­tive en­hancers” such as ad­der­all, modafinil, and so on seem to work by mak­ing cog­ni­tive tasks seem less un­pleas­ant or more in­ter­est­ing. If we had a crisp the­ory of valence, this might en­able par­tic­u­larly pow­er­ful ver­sions of these sorts of drugs.
6. Valence re­search could help al­ign an AGI’s nom­i­nal util­ity func­tion with visceral hap­piness. There seems to be a lot of con­fu­sion with re­gard to hap­piness and util­ity func­tions. In short: they are differ­ent things! Utility func­tions are goal ab­strac­tions, gen­er­ally re­al­ized ei­ther ex­plic­itly through high-level state vari­ables or im­plic­itly through dy­namic prin­ci­ples. Hap­piness, on the other hand, seems like an emer­gent, sys­temic prop­erty of con­scious states, and like other qualia but un­like util­ity func­tions, it’s prob­a­bly highly de­pen­dent upon low-level ar­chi­tec­tural and im­ple­men­ta­tional de­tails and dy­nam­ics...
7. Valence re­search could help us con­struct makeshift util­ity func­tions for WBEs and Neu­ro­mor­phic* AGIs...
8. Valence re­search could help us bet­ter un­der­stand, and per­haps pre­vent, AGI wire­head­ing. How can AGI re­searchers pre­vent their AGIs from wire­head­ing (di­rect ma­nipu­la­tion of their util­ity func­tions)? I don’t have a clear an­swer, and it seems like a com­plex prob­lem which will re­quire com­plex, ar­chi­tec­ture-de­pen­dent solu­tions, but un­der­stand­ing the uni­verse’s al­gorithm for plea­sure might help clar­ify what kind of prob­lem it is, and how evolu­tion has ad­dressed it in hu­mans...
9. Valence re­search could help re­duce gen­eral meta­phys­i­cal con­fu­sion. We’re go­ing to be fac­ing some very weird ques­tions about philos­o­phy of mind and meta­physics when build­ing AGIs, and ev­ery­body seems to have their own pet as­sump­tions on how things work. The bet­ter we can clear up the fog which sur­rounds some of these top­ics, the lower our co­or­di­na­tional fric­tion will be when we have to di­rectly ad­dress them...
10. Valence re­search could change the so­cial and poli­ti­cal land­scape AGI re­search oc­curs in. This could take many forms: at best, a break­through could lead to a hap­pier so­ciety where many pre­vi­ously nihilis­tic in­di­vi­d­u­als sud­denly have “skin in the game” with re­spect to ex­is­ten­tial risk. At worst, it could be a profound in­for­ma­tion haz­ard, and ir­re­spon­si­ble dis­clo­sure or mi­suse of such re­search could lead to mass wire­head­ing, mass emo­tional ma­nipu­la­tion, and to­tal­i­tar­i­anism. Either way, it would be an im­por­tant topic to keep abreast of.