Biases in our estimates of Scale, Neglectedness and Solvability?

I sus­pect there are a few er­rors that us­ing log­a­r­ith­mic scales for our es­ti­mates of Scale, Ne­glect­ed­ness and Solv­abil­ity can make us prone to, al­though I’m not sure how bad this is in prac­tice in the EA com­mu­nity. I de­scribe three such pos­si­ble er­rors. I also de­scribe an­other er­ror re­lated only to cor­re­la­tions be­tween the fac­tors and not the log scale. In fact, as a rule, we aren’t ac­count­ing for pos­si­ble cor­re­la­tions be­tween the fac­tors, or we’re effec­tively as­sum­ing the fac­tors aren’t cor­re­lated, and there’s no gen­eral con­stant up­per bound on how much this could bias our es­ti­mates. As a gen­eral trend, the more un­cer­tainty in the fac­tors, the greater the bias can be.

Th­ese could af­fect not just cause area analy­ses, but also grant-mak­ing and dona­tions in­formed by these fac­tors, as done by the Open Philan­thropy Pro­ject.


Background

Scale/​Im­por­tance, Ne­glect­ed­ness/​Crowd­ed­ness and Solv­abil­ity/​Tractabil­ity are es­ti­mated on a log­a­r­ith­mic scale (log scale) and then added to­gether. See the 80,000 Hours ar­ti­cle about the frame­work. What we re­ally care about is their product, on the lin­ear scale (the reg­u­lar scale), but since , the log­a­r­ithm of the product of the fac­tors is the sum of the log scale fac­tors, so us­ing the sum of log scale fac­tors makes sense:



1. Us­ing the ex­pected val­ues of the log­a­r­ithms in­stead of the log­a­r­ithms of the ex­pected val­ues, bi­as­ing us against high-risk high-re­ward causes.

If there’s un­cer­tainty in any of the lin­ear scale fac­tors (or their product), say , then us­ing the ex­pected value of the log­a­r­ithm un­der­es­ti­mates the quan­tity we care about, , be­cause the log­a­r­ithm is a strictly con­cave func­tion, so we have (the re­verse) Jensen’s in­equal­ity:

When there is a lot of un­cer­tainty, we may be prone to un­der­es­ti­mat­ing the log scale fac­tors. The differ­ence can be sig­nifi­cant. Con­sider a 10% prob­a­bil­ity of 1 billion and a 90% prob­a­bil­ity of 1 mil­lion; in base 10, that’s 14.5 vs 18.4. 80,000 Hours uses base , in­cre­ment­ing by 2 for ev­ery fac­tor of 10, so this would be closer to 7 vs 9, which is quite sig­nifi­cant, since some of the top causes differ by around this much or less (ac­cord­ing to 80,000 Hours’ old es­ti­mates).

Peo­ple could also make this mis­take with­out ac­tu­ally do­ing any ex­plicit ex­pected value calcu­la­tions. They might think “this cause looks like some­where be­tween a 3 and a 5 on Tractabil­ity, so I’ll use 4 as the av­er­age”, while hav­ing a sym­met­ric dis­tri­bu­tion cen­tred at 4 in mind (i.e. the dis­tri­bu­tion looks the same if you re­flect it left and right through 4). This ac­tu­ally cor­re­sponds to us­ing a skewed dis­tri­bu­tion, with more prob­a­bil­ity given to the lower val­ues, whereas a uniform dis­tri­bu­tion over the in­ter­val would give you 4.7 on a log scale. That be­ing said, I think it does make sense to gen­er­ally have these dis­tri­bu­tions more skewed to­wards lower val­ues on the lin­ear scale and we might oth­er­wise be bi­ased to­wards more sym­met­ric dis­tri­bu­tions over the lin­ear scale, so these two bi­ases could work in op­po­site di­rec­tions. Fur­ther­more, we might already be bi­ased in favour of high-risk high-re­ward in­ter­ven­tions, since we aren’t suffi­ciently skep­ti­cal and are sub­ject to the op­ti­mizer’s curse.

The solu­tion is to always make sure to deal with the un­cer­tainty be­fore tak­ing log­a­r­ithms, or be aware that a dis­tri­bu­tion over the log scale cor­re­sponds to a dis­tri­bu­tion that’s rel­a­tively more skewed to­wards lower val­ues in the lin­ear scale.


2. Up­wards bias of log scale fac­tors, if the pos­si­bil­ity of nega­tive val­ues isn’t con­sid­ered.

Log­a­r­ithms can be nega­tive, but I’ve never once seen a nega­tive value for any of the log scale fac­tors (ex­cept in com­ments on 80,000 Hours’ prob­lem frame­work page). If peo­ple mis­tak­enly as­sume that nega­tive val­ues are im­pos­si­ble, this might push up their view of what kind of val­ues are “rea­son­able”, i.e. “range bias”, or cause us to pre­ma­turely filter out causes that would have nega­tive log scale fac­tors.

80,000 Hours does provide con­crete ex­am­ples for the val­ues on the log scale fac­tors, which can pre­vent this. It’s worth not­ing that ac­cord­ing to their Crowd­ed­ness score, $100 billion dol­lars in an­nual spend­ing cor­re­sponds to a 0, but Health in poor coun­tries gets a 2 on Ne­glect­ed­ness with “The least de­vel­oped coun­tries plus In­dia spend about $300 billion on health each year (PPP).” So, should this ac­tu­ally be nega­tive? Maybe it shouldn’t, if these re­sources aren’t gen­er­ally be­ing spent effec­tively, or there aren’t a mil­lion peo­ple work­ing on the prob­lem.

We could also be bi­ased away from fur­ther con­sid­er­ing causes (or ac­tions) that have nega­tive log scale fac­tors that make up for them with other fac­tors. In par­tic­u­lar, some small acts of kind­ness/​de­cency or helping an in­di­vi­d­ual could be very low in Scale, but when they’re ba­si­cally free in terms of time or re­sources spent, like thank­ing peo­ple when they do things for you, they’re still worth do­ing. How­ever, I do ex­pect that writ­ing an ar­ti­cle defend­ing small acts of kind­ness prob­a­bly has much less im­pact than writ­ing one defend­ing one of the typ­i­cal EA causes, and may even un­der­mine EA if it causes peo­ple to fo­cus on small acts of kind­ness which aren’t ba­si­cally free. $1 is not free, and is still gen­er­ally bet­ter given to an EA char­ity, or to make do­ing EA more sus­tain­able for you or some­one else.

Fur­ther­more, when you nar­row the scope of a cause area fur­ther and fur­ther, you ex­pect the Scale to de­crease and the Ne­glect­ed­ness to in­crease. At the level of in­di­vi­d­ual de­ci­sions, the Scale could of­ten be nega­tive. At the ex­treme end,

  • New cause area: re­view a spe­cific EA ar­ti­cle and provide feed­back to the au­thor.

  • New cause area: donate this $1 to the Against Malaria Foun­da­tion.


3. Ne­glect­ing the pos­si­bil­ity that work on a cause could do more harm than good, if the pos­si­bil­ity of un­defined log­a­r­ithms isn’t con­sid­ered.

In this case, where the ar­gu­ment is nega­tive (or 0), the log­a­r­ithm is not defined. If peo­ple are in­cor­rectly tak­ing ex­pec­ta­tions af­ter tak­ing log­a­r­ithms in­stead of be­fore as in 1, then they should ex­pect un­defined val­ues. The fact that we aren’t see­ing them could be a sign that they aren’t se­ri­ously con­sid­er­ing the pos­si­bil­ity of net harm. If log scale val­ues are in­cor­rectly as­sumed to always to be defined, this is similar to the range bias in 2, and could bias our es­ti­mates up­wards.

On the other hand, if they are cor­rectly tak­ing ex­pec­ta­tions be­fore log­a­r­ithms, then since the log­a­r­ithm is un­defined if and only if its ar­gu­ment is nega­tive (or 0), if it would have been un­defined, it’s be­cause work on the cause does more harm than good in ex­pec­ta­tion, and we wouldn’t want to pur­sue it any­way. So, as in the above, if log scale val­ues are in­cor­rectly as­sumed to always be defined, then this may also pre­vent us from con­sid­er­ing the pos­si­bil­ity that work on a cause does more harm than good in ex­pec­ta­tion, and could also also bias our es­ti­mates up­wards.

Note: I’ve rewrit­ten this sec­tion since first pub­lish­ing this post on the EA Fo­rum to con­sider more pos­si­bil­ities of bi­ases.


Bonus: Ig­nor­ing cor­re­la­tions be­tween fac­tors.

Sum­mary: When there’s un­cer­tainty in the fac­tors and they cor­re­late pos­i­tively, we may be un­der­es­ti­mat­ing the marginal cost-effec­tive­ness. When there’s un­cer­tainty in the fac­tors and they cor­re­late nega­tively, we may be over­es­ti­mat­ing the marginal cost-effec­tive­ness.

What we care about is , but the ex­pected value of the product is not in gen­eral equal to the product of the ex­pected val­ues, i.e.

They are equal if the terms are un­cor­re­lated, e.g. in­de­pen­dent. If it’s the case that the higher in Scale a prob­lem is, the lower in Solv­abil­ity it is, i.e. the bet­ter it is to solve, the harder it is to make rel­a­tive progress, we might over­es­ti­mate the over­all score, since, as an ex­am­ple, let­ting be a uniform ran­dom vari­able on , and , also uniform over the same in­ter­val, but perfectly an­ti­cor­re­lated, then we have

Of course, this is less than a fac­tor of 2 in the lin­ear scale, so less than a differ­ence of 1 on the log scale.

(EDIT: ev­ery­thing be­low was added later.)

How­ever, the quo­tient can be ar­bi­trar­ily large (or ar­bi­trar­ily small), so there’s no con­stant up­per bound on how wrong we could be. For ex­am­ple, let and with prob­a­bil­ity , and and with prob­a­bil­ity . Then,

and the quo­tient goes to as goes to , since the left-hand side goes to and the right-hand side goes to .

On the other hand, if the dis­tri­bu­tions are more con­cen­trated over the same in­ter­val, the gap is lower. Com­par­ing to the first ex­am­ple, let have prob­a­bil­ity den­sity func­tion over and again . Then, we have:

With an­ti­cor­re­la­tion, I think this would bias us to­wards higher-risk higher-re­ward causes.

On the other hand, pos­i­tive cor­re­la­tions lead to the op­po­site bias and un­der­es­ti­ma­tion. If , iden­ti­cally, and uniform over , we have:


The sign of the differ­ence be­tween and is the same as the sign of the cor­re­la­tion, since we have for the co­var­i­ance be­tween and ,

and the cor­re­la­tion is

where .

So, as a gen­eral trend, the more un­cer­tainty in the fac­tors, the greater the pos­si­bil­ity for larger bi­ases.


Pos­si­ble ex­am­ples.

One ex­am­ple could be that we don’t know the ex­act Scale of wild an­i­mal suffer­ing, in part be­cause we aren’t sure which an­i­mals are ac­tu­ally sen­tient, and if it does turn out that many more an­i­mals are sen­tient than ex­pected, that might mean that rel­a­tive progress on the prob­lem is harder. It could ac­tu­ally turn out to be the op­po­site, though; if we think we could get more cost-effec­tive meth­ods to ad­dress wild in­ver­te­brate suffer­ing than for for wild ver­te­brate suffer­ing (in­ver­te­brates are gen­er­ally be­lieved to be less (likely to be) sen­tient than ver­te­brates, with a few ex­cep­tions), then the Scale and Solv­abil­ity might be pos­i­tively cor­re­lated.

Similarly, there could be a re­la­tion­ship be­tween the Scale of a global catas­trophic risk or x-risk and its Solv­abil­ity. If ad­vanced AI can cause value lock-in, how long the effects last might be re­lated to how difficult it is to make rel­a­tive progress on al­ign­ing AI, and more gen­er­ally, how pow­er­ful AI will be is prob­a­bly re­lated to both the Scale and Solv­abil­ity of the prob­lem. How bad cli­mate change or a nu­clear war could be might be re­lated to its Solv­abil­ity, too, if worse risks are rel­a­tively more or less difficult to make progress on.


Some in­de­pen­dence?

It might some­times be pos­si­ble to define a cause, its scope and the three fac­tors in such a way that and are in­de­pen­dent (and un­cor­re­lated), or at least in­de­pen­dent (or un­cor­re­lated) given . For ex­am­ple, in the case of wild an­i­mal suffer­ing, we should in­clude all fund­ing to­wards in­ver­te­brate welfare to­wards even if it turns out to be the case that in­ver­te­brates aren’t sen­tient. How­ever, is defined in terms of the other two fac­tors, so should not in gen­eral be ex­pected to be in­de­pen­dent from or un­cor­re­lated with ei­ther. The in­de­pen­dence of and (given ) and the law of iter­ated ex­pec­ta­tions al­low us to write

How­ever, should, in the­ory, con­sider fu­ture work, too, and the greater in or a prob­lem is, the more fu­ture re­sources we might ex­pect to go to it in the fu­ture, so might ac­tu­ally be an­ti­cor­re­lated with the other two fac­tors.

This is some­thing to keep in mind for marginal cost-effec­tive­ness anal­y­sis, too.


Bound­ing the er­ror.

Without ac­tu­ally calcu­lat­ing the co­var­i­ance and think­ing about how the fac­tors may de­pend on one an­other, we have the fol­low­ing bound on the differ­ence in terms of of the var­i­ances (by the Cauchy-Sch­warz in­equal­ity, which is also why cor­re­la­tions are bounded be­tween −1 and 1):

and this bounds the bias in our es­ti­mate. We can fur­ther di­vide each side by to get

and so

which bounds our bias as a ra­tio with­out con­sid­er­ing any de­pen­dence be­tween and . Tak­ing log­a­r­ithms bounds the differ­ence .

Other bounds can be de­rived us­ing the max­i­mum val­ues (or es­sen­tial suprema) and min­i­mum val­ues (or es­sen­tial in­fima) of and , or Hölder’s in­equal­ity, which gen­er­al­izes the Cauchy-Sch­warz in­equal­ity. For ex­am­ple, as­sum­ing is always pos­i­tive (or non­nega­tive), be­cause , we have

and so, di­vid­ing by

The in­equal­ity is re­versed if .

Similarly, if and one of the other two fac­tors are always pos­i­tive (or non­nega­tive, and is always non­nega­tive), then

and so, di­vid­ing by ,

And again, the in­equal­ity is re­versed if .

Here, we can take to be any of , or , and and to be the other two. So, if at most one fac­tor ranges over mul­ti­ple or­ders of mag­ni­tude, then tak­ing the product of the ex­pected val­ues should be within a few or­der of mag­ni­tudes of the ex­pected value of the product, since we can use the above bounds with as the fac­tor with the widest log scale range.

With some care, I think similar bounds can be de­rived when is not always pos­i­tive (or non­nega­tive), i.e. can be nega­tive.

Fi­nally, if one of the three fac­tors, say , is in­de­pen­dent of the other two (or the product of the other two), then we have , and it suffices to bound , since