Take care with notation for uncertain quantities


BLUF: 3 000 and are equiv­a­lent; so too a prob­a­bil­ity of 10% and 9-to-1 odds; and so too 0.002 and 1500. De­spite their ob­jec­tive equiv­alence, they can look very differ­ent to our mind’s eye. Differ­ent no­ta­tion im­plies par­tic­u­lar scales or dis­tri­bu­tions are canon­i­cal, and so can frame our sense of un­cer­tainty about them. As the wrong no­ta­tion is can be­witch us, it should be cho­sen with care.

Introduction

I pre­sume it is com­mon knowl­edge that our minds are woe­ful with num­bers in gen­eral and any par­tic­u­lar use of them in par­tic­u­lar.

I high­light one ex­am­ple of clini­cal risk com­mu­ni­ca­tion, where I was drilled in med­i­cal school to pre­sent risks in terms of ab­solute risks or num­bers needed to treat (see). Com­mu­ni­ca­tion of rel­a­tive risks was strongly dis­cour­aged, as:

Risk fac­tor X in­creases your risk of con­di­tion Y by 50%

Can be mis­in­ter­preted as:

With risk fac­tor X, your risk of con­di­tion Y is 50% (+ my baseline risk)

This is mostly base-rate ne­glect: if my baseline risk was 1 in a mil­lion, in­creas­ing this to 1.5 in a mil­lion (i.e. an ab­solute risk in­crease of 1 in 2 mil­lion, and a num­ber needed to treat of 2 mil­lion) prob­a­bly should not worry me.

But it is also partly an is­sue of no­ta­tion: if risk fac­tor X in­creased my risk of Y by 100% or more, peo­ple would be less likely to make the mis­take above. Like­wise if one uses differ­ent phrases than ‘in­creases risk by Z%’ (e.g. “mul­ti­ples your risk by 1.5”).

I think this gen­eral is­sue ap­plies elsewhere

Im­plicit scal­ing and in­no­cent overconfidence

A com­mon ex­er­cise to demon­strate over­con­fi­dence is to ask par­ti­ci­pants to write down 90% cred­ible in­ter­vals for a num­ber of ‘fermi-style’ es­ti­ma­tion ques­tions (e.g. “What is the area of the At­lantic Ocean?”, “How many books are owned by the Bri­tish Library?”). Al­most ev­ery­one finds the true value lies out­side their 90% in­ter­val much more of­ten than 10% of the time.

I do not think poor cal­ibra­tion here is solely at­tributable to col­lo­quial-sense over­con­fi­dence (i.e. an epistemic vice of be­ing more sure of your­self than you should be). I think a lot it is a more in­no­cent in­ep­ti­tude at ar­tic­u­lat­ing their un­cer­tainty nu­mer­i­cally.

I think most peo­ple will think about an ab­solute value for (e.g.) how many books the Bri­tish Library owns (“A mil­lion?”). Although they recog­nise they have ba­si­cally no idea, life in gen­eral has taught them var­i­ance (sub­jec­tive or ob­jec­tive) around ab­solute val­ues will be roughly sym­met­ri­cal, and not larger than the value it­self (“they can’t have nega­tive books, and if I think I could be over­es­ti­mat­ing by 500k, I should think I could be un­der­es­ti­mat­ing by about the same”). So they fail to trans­late (~ap­pro­pri­ate) sub­jec­tive un­cer­tainty into cor­re­spond­ing bounds: 1 000 000 (some num­ber less than 1 mil­lion)

I think they would fare bet­ter if they put their ini­tial guess in sci­en­tific no­ta­tion: . This em­pha­sises the lead­ing digit and ex­po­nent, and nat­u­rally prompts one to think in or­ders of mag­ni­tude (e.g. “could it be a thou­sand? A billion? 100 mil­lion?”), and that one could be out by a mul­ti­ple or or­der of mag­ni­tude rather than a par­tic­u­lar value (com­pare to the pre­vi­ous ap­proach—it is much less nat­u­ral to think you should be un­sure about how many digits one should write down). would in­clude the true value of 25 mil­lion.

Odds are odds are bet­ter than prob­a­bil­ities for small chances

Although there are some rea­sons (es­pe­cially for Bayesi­ans) to pre­fer odds to prob­a­bil­ity in gen­eral, rea­sons par­allel to the above recom­mend them for weigh­ing up small chances—es­pe­cially sub-per­centile chances. I sus­pect two pit­falls:

  1. Trans­lat­ing ap­pro­pri­ate lev­els of un­cer­tainty is eas­ier: more nat­u­ral to con­tem­plat­ing adding ze­ros to the de­nom­i­na­tor of 1/​x than adding ze­ros af­ter the dec­i­mal for 0.x.

  2. Per­centages are not that use­ful no­ta­tion for sub-per­centile figures (e.g. 0.001% might make one won­der whether the ‘%’ is a typo). Us­ing per­centages for prob­a­bil­ities ‘by re­flex’ risks an­chor­ing one to the [1%, 99%] range (“well, it’s not im­pos­si­ble, so it can’t be 0%, but it is very un­likely...”)

Also similar to the above, al­though peo­ple es­pe­cially suck with rare events, I think they may suck less if they de­velop the habit of at least cross-check­ing with odds. As a mod­est claim, I think peo­ple can at least dis­crim­i­nate 1 in 1000 ver­sus 1 in 500000 bet­ter than 0.1% and 0.002%. There is ex­pres­sion in the ra­tio scale of odds that the ab­solute scale of prob­a­bil­ity com­presses out.

The ‘have you tried frac­tions?’ solu­tion to in­ter-the­o­retic value comparison

Con­sider this ex­am­ple from an Open Philan­thropy re­port [h/​t Ben Pace]:

The “an­i­mal-in­clu­sive” vs. “hu­man-cen­tric” di­vide could be in­ter­preted as be­ing about a form of “nor­ma­tive un­cer­tainty”: un­cer­tainty be­tween two differ­ent views of moral­ity. It’s not en­tirely clear how to cre­ate a sin­gle “com­mon met­ric” for ad­ju­di­cat­ing be­tween two views. Con­sider:

Com­par­i­son method A: say that “a hu­man life im­proved” is the main met­ric val­ued by the hu­man-cen­tric wor­ld­view, and that “a chicken life im­proved” is worth >1% of these (an­i­mal-in­clu­sive view) or 0 of these (hu­man-cen­tric view). In this case, a >10% prob­a­bil­ity on the an­i­mal-in­clu­sive view would lead chick­ens to be val­ued >0.1% as much as hu­mans, which would likely im­ply a great deal of re­sources de­voted to an­i­mal welfare rel­a­tive to near-term hu­man-fo­cused causes.

Com­par­i­son method B: say that “a chicken life im­proved” is the main met­ric val­ued by the an­i­mal-in­clu­sive wor­ld­view, and that “a hu­man life im­proved” is worth <100 of these (an­i­mal-in­clu­sive view) or an as­tro­nom­i­cal num­ber of these (hu­man-cen­tric view). In this case, a >10% prob­a­bil­ity on the hu­man-in­clu­sive view would be effec­tively similar to a 100% prob­a­bil­ity on the hu­man-cen­tric view.

Th­ese meth­ods have es­sen­tially op­po­site prac­ti­cal im­pli­ca­tions.

There is a typ­i­cal two-en­velopes style prob­lem here (q.v. To­masik): if you take the ‘hu­man value’ or ‘chicken value’ en­velope first, method A and method B (re­spec­tively) recom­mend switch­ing de­spite say­ing roughly the same thing. But some­thing else is also go­ing on. Although method A and method B have similar ver­bal de­scrip­tions, their math­e­mat­i­cal sketches im­ply very differ­ent fram­ings of the prob­lem.

Con­sider the trade ra­tio of chicken value: hu­man value. Method A uses per­centages, which im­plies we we are work­ing in ab­solute val­ues, and some­thing like the real num­ber line should be the x axis for our dis­tri­bu­tion. Us­ing this as the x-axis also sug­gests our un­cer­tainty should be dis­tributed non-crazily across it: the prob­a­bil­ity den­sity is not go­ing up or down by mul­ti­ples of zillions as we tra­verse it, and 0.12 is more salient than . It could look some­thing like this:

One pos­si­ble im­plicit dis­tri­bu­tion of method A

By con­trast, method B uses a ra­tio—fur­ther­more a po­ten­tially as­tro­nom­i­cal ra­tio. It sug­gests the x-axis has to be on some­thing like a scale (roughly un­bounded be­low, al­though it does rule out zero or - it is an as­tro­nom­i­cal ra­tio, not an in­finite one). This axis, alongside the in­tu­ition the dis­tri­bu­tion should not look crazy, sug­gests our un­cer­tainty should be smeared over or­ders of mag­ni­tude. is salient, whilst is not. It could look some­thing like:

One pos­si­ble im­plicit dis­tri­bu­tion for method B

Yet these in­tu­itive re­sponses do look very crazy if plot­ted against one an­other on a shared axis. On method A’s im­plied scale, method B nat­u­rally leads us to con­cen­trate a lot of prob­a­bil­ity den­sity into tiny in­ter­vals near zero—e.g. much higher den­sity at the pixel next to the ori­gin () than ranges like [0.2, 0.3]. On method B’s im­plied scale, method A places in­finite den­sity in­finitely far along the axis, vir­tu­ally no prob­a­bil­ity mass on most of the range, and a lot of mass to­wards the top (e.g much more be­tween and than and ):

????

Which one is bet­ter? I think most will find method A’s fram­ing leads them to more in­tu­itive re­sults on re­flec­tion. Con­tra method B, we think there is some chance chick­ens have zero value (so hu­mans count ‘in­finitely more’ as a ra­tio), and we think there’s es­sen­tially no chance of them be­ing be­tween (say) a billionth and a trillionth of a hu­man, where they would ‘only’ count as­tro­nom­i­cally more. In this case, method B’s fram­ing of an un­bounded ra­tio both mis­leads our in­tu­itive as­sign­ment, and its fa­cially similar de­scrip­tion to method A con­ceals the stark differ­ences be­tween them.

Conclusions

It would be good to have prin­ci­ples to guide us, be­yond col­lect­ing par­tic­u­lar ex­am­ples. Per­haps:

  1. One can get clues for which no­ta­tion to use from one’s im­pres­sion of the dis­tri­bu­tion (or gen­er­a­tor). If the quan­tity re­sults from mul­ti­ply­ing (rather than adding) a lot of things to­gether, you’re prob­a­bly work­ing on a log scale.

  2. If one is not sure, one can com­pare and con­trast by men­tally con­vert­ing into equiv­a­lent no­ta­tion. One can then cross-check these against ones ini­tial in­tu­ition to see which is a bet­ter fit.

Per­haps all this is be­labour­ing the ob­vi­ous, as all of this arises as pedes­trian corol­laries from other ideas in and around EA-land. We of­ten talk about heavy-tailed dis­tri­bu­tions, man­ag­ing un­cer­tainty, the risk of in­ad­ver­tently mis­lead­ing with in­for­ma­tion, and so on. Yet de­spite be­ing ac­quainted with all this since med­i­cal school, de­vel­op­ing these par­tic­u­lar habits of mind only struck me in the late-mid­dle-age of my youth. By the prin­ci­ple of medi­ocrity—de­spite be­ing ex­cep­tion­ally mediocre—I sus­pect I am not alone.