Forecasting Newsletter: September 2020.

Highlights

Index

  • Highlights

  • Pre­dic­tion Mar­kets & Fore­cast­ing Platforms

  • In The News

  • Hard To Categorize

  • Long Content

Sign up here or browse past newslet­ters here.

Pre­dic­tion Mar­kets & Fore­cast­ing Platforms

Me­tac­u­lus up­dated their track record page. You can now look at ac­cu­racy across time, at the dis­tri­bu­tion of brier scores, and a cal­ibra­tion graph. They also have a new black swan ques­tion: When will US metac­u­lus users face an em­i­gra­tion crisis?.

Good Judge­ment Open has a thread in which fore­cast­ers share and dis­cuss tips, tricks and ex­pe­riences. An ac­count is needed to browse it.

Augur mod­ifi­ca­tions in re­sponse to higher ETH prices. Some un­filtered com­ments on reddit

An overview of PlotX, a new de­cen­tral­ized pre­dic­tion pro­to­col/​mar­ket­place. PlotX fo­cuses on non-sub­jec­tive mar­kets that can be pro­gram­mat­i­cally de­ter­mined, like the ex­change rate be­tween cur­ren­cies or to­kens.

A Repli­ca­tion Mar­kets par­ti­ci­pant wrote What’s Wrong with So­cial Science and How to Fix It: Reflec­tions After Read­ing 2578 Papers. See also: An old long-form in­tro­duc­tion to Repli­ca­tion Mar­kets.

Ge­orge­town’s CSET is at­tempt­ing to use fore­cast­ing to in­fluence policy. A sem­i­nar dis­cussing their ap­proach Us­ing Crowd Fore­cast­ing to In­form Policy with Ja­son Ma­theny is sched­uled for the 19th of Oc­to­ber. But their cur­rent fore­cast­ing tour­na­ment, foretell, isn’t yet very well pop­u­lated, and the ag­gre­gate isn’t that good be­cause par­ti­ci­pants don’t up­date all that of­ten, lead­ing to some­times clearly out­dated ag­gre­gates. Per­haps be­cause of this rel­a­tive lack of com­pe­ti­tion, my team is in 2nd place at the time of this writ­ting (with my­self at #6, Eli Lifland at #12 and Misha Yagudin at #21). You can join foretell here.

There is a new con­test on Hyper­mind, The Long Fork Pro­ject, which aims to pre­dict the im­pact of a Trump or a Bi­den vic­tory in Novem­ber, with $20k in prize money. H/​t to user Chick­Coun­terfly.

The Univer­sity of Chicago’s Effec­tive Altru­ism group is host­ing a fore­cast­ing tour­na­ment be­tween all in­ter­ested EA col­lege groups start­ing Oc­to­ber 12th, 2020. More de­tails here

In the News

News me­dia sen­sa­tion­al­izes es­sen­tially ran­dom fluc­tu­a­tions on US elec­tion odds caused by big bet­tors en­ter­ing pre­dic­tion mar­kets such as Bet­fair, where bets on the or­der of $50k can visi­bly al­ter the mar­ket price. Si­mul­ta­neously, polls/​mod­els and pre­dic­tion mar­ket odds have di­verged, be­cause a sub­stan­tial frac­tion of bet­tors lend cre­dence to the the­sis that polls will be bi­ased as in the pre­vi­ous elec­tions, even though pol­ling firms seem to have im­proved their meth­ods.

Red Cross and Red Cres­cent so­cieties have been try­ing out fore­cast based fi­nanc­ing. The idea is to cre­ate fore­casts and early warn­ing in­di­ca­tors for some nega­tive out­come, such as a flood, us­ing weather fore­casts, satel­lite imagery, cli­mate mod­els, etc, and then re­lease funds au­to­mat­i­cally if the fore­cast reaches a given thresh­old, al­low­ing the funds to be put to work be­fore the dis­aster hap­pens in a more au­to­matic, fast and effi­cient man­ner. Goals and modus operandi might res­onate with the Effec­tive Altru­ism com­mu­nity:

“In the pre­cious win­dow of time be­tween a fore­cast and a po­ten­tial dis­aster, FbF re­leases re­sources to take early ac­tion. Ul­ti­mately, we hope this early ac­tion will be more effec­tive at re­duc­ing suffer­ing, com­pared to wait­ing un­til the dis­aster hap­pens and then do­ing only dis­aster re­sponse. For ex­am­ple, in Bangladesh, peo­ple who re­ceived a fore­cast-based cash trans­fer were less mal­nour­ished dur­ing a flood in 2017.” (bold not mine)

Pre­dic­tion Mar­kets’ Time Has Come, but They Aren’t Ready for It. Pre­dic­tion mar­kets could have been use­ful for pre­dict­ing the spread of the pan­demic (see: coro­n­ain­for­ma­tion­mar­kets.com), or for in­form­ing pres­i­den­tial elec­tion con­se­quences (see: Hyper­mind above), but their rel­a­tively small size makes them less in­for­ma­tive. Blockchain based pre­dic­tion tech­nolo­gies, like Augur, Gno­sis or Omen could have helped by­pass US reg­u­la­tory hur­dles (which ban many kinds of gam­bling), but the re­cent in­crease in trans­ac­tion fees means that “ev­ery­thing be­low a $1,000 bet is ba­si­cally eco­nom­i­cally un­fea­si­ble”

Floods in In­dia and Bangladesh:

The many tribes of 2020 elec­tion wor­ri­ers: An ethno­graphic re­port by the Wash­ing­ton Post.

Elec­tric­ity time se­ries de­mand and sup­ply fore­cast­ing startup raises $8 mil­lion. I keep see­ing this kind of an­nounce­ment; do­ing fore­cast­ing well in an un­der­fore­casted do­main seems to be some­what prof­itable right now, and it’s not like there is an ab­sence of do­mains to which fore­cast­ing can be ap­plied. This might be a good idea for an earn­ing-to-give startup.

NSF and NASA part­ner to ad­dress space weather re­search and fore­cast­ing. To­gether, NSF and NASA are in­vest­ing over $17 mil­lion into six, three-year awards, each of which con­tributes to key re­search that can ex­pand the na­tion’s space weather pre­dic­tion ca­pa­bil­ities.

In its monthly re­port, OPEC said it ex­pects the pan­demic to re­duce de­mand by 9.5 mil­lion bar­rels a day, fore­cast­ing a fall in de­mand of 9.5% from last year, re­ports the Wall Street Journal

Some crit­i­cism of Gno­sis, a de­cen­tral­ized pre­dic­tion mar­kets startup, by early in­vestors who want to cash out. Here is a blog post by said early in­vestors; they claim that “Gno­sis took out what was in effect a 3+ year in­ter­est-free loan from to­ken hold­ers and failed to de­liver the prod­ucts laid out in its fundrais­ing whitepa­per, quin­tu­pled the size of its bal­ance sheet due sim­ply to pos­i­tive price fluc­tu­a­tions in ETH, and then launched prod­ucts that ac­crue value only to Gno­sis man­age­ment.”

What a study of video games can tell us about be­ing bet­ter de­ci­sion mak­ers ($), a frus­trat­ingly well-pay­walled, yet ex­haus­tive, com­plete and in­for­ma­tive overview of the IARPA’s FOCUS tour­na­ment:

To study what makes some­one good at think­ing about coun­ter­fac­tu­als, the in­tel­li­gence com­mu­nity de­cided to study the abil­ity to fore­cast the out­comes of simu­la­tions. A simu­la­tion is a com­puter pro­gram that can be run again and again, un­der differ­ent con­di­tions: es­sen­tially, re­run­ning his­tory. In a simu­lated world, the re­searchers could know the effect a par­tic­u­lar de­ci­sion or in­ter­ven­tion would have. They would show teams of an­a­lysts the out­come of one run of the simu­la­tion and then ask them to pre­dict what would have hap­pened if some key vari­able had been changed.

Nega­tive Examples

Why Don­ald Trump Isn’t A Real Can­di­date, In One Chart, wrote 538 in 2015.

For this rea­son alone, Trump has a bet­ter chance of cameo­ing in an­other “Home Alone” movie with Ma­caulay Culkin — or play­ing in the NBA Fi­nals — than win­ning the Repub­li­can nom­i­na­tion.

Travel CFOs He­si­tant on Fore­casts as Pan­demic Fogs Out­look, re­ports the Wall Street Jour­nal.

“We’re ba­si­cally pre­vented from say­ing the word ‘fore­cast’ right now be­cause what­ever we fore­cast...it’s wrong,” said Shan­non Ok­i­naka, chief fi­nan­cial officer at Hawaiian Air­lines. “So we’ve started to use the word ‘plan­ning sce­nar­ios’ or ‘plan­ning as­sump­tions.’”

Long Content

An­drew Gel­man et al. re­lease In­for­ma­tion, in­cen­tives, and goals in elec­tion fore­casts.

  • Nei­ther The Economist’s model nor 538′s are fully Bayesian. In par­tic­u­lar, they are not mar­t­in­gales, that is, their cur­rent prob­a­bil­ity is not the ex­pected value of their fu­ture prob­a­bil­ity.

    cam­paign polls are more sta­ble than ev­ery be­fore,and even the rel­a­tively small swings that do ap­pear can largely be at­tributed to differ­en­tial nonresponse

    Re­gard­ing pre­dic­tions for 2020, the cre­ator of the Fivethir­tyeight fore­cast writes, “we think it’s ap­pro­pri­ate to make fairly con­ser­va­tive choices es­pe­cially when it comes to the tails of your dis­tri­bu­tions. His­tor­i­cally this has led 538 to well-cal­ibrated fore­casts (our 20%s re­ally mean 20%)” (Silver, 2020b). But con­ser­va­tive pre­dic­tion cor­re­sponds can pro­duce a too-wide in­ter­val, one that plays it safe by in­clud­ing ex­tra un­cer­tainty. In other words, con­ser­va­tive fore­casts should lead to un­der­con­fi­dence: in­ter­vals whose cov­er­age is greater than ad­ver­tised. And, in­deed, ac­cord­ing to the cal­ibra­tion plot shown by Boice and Wez­erek (2019) of Fivethir­tyeight’s poli­ti­cal fore­casts, in this do­main 20% for them re­ally means 14%, and 80% re­ally means 88%.

The Liter­ary Digest Poll of 1936. A poll so bad that it de­stroyed the mag­a­z­ine.

  • Com­pare the Liter­ary Digest and Gal­lup polls of 1936 with The New York Times’s model of 2016 and 538′s 2016 fore­cast, re­spec­tively.

    In ret­ro­spect, the pol­ling tech­niques em­ployed by the mag­a­z­ine were to blame. Although it had pol­led ten mil­lion in­di­vi­d­u­als (of whom 2.27 mil­lion re­sponded, an as­tro­nom­i­cal to­tal for any opinion poll),[5] it had sur­veyed its own read­ers first, a group with dis­pos­able in­comes well above the na­tional av­er­age of the time (shown in part by their abil­ity to af­ford a mag­a­z­ine sub­scrip­tion dur­ing the depths of the Great De­pres­sion), and those two other read­ily available lists, those of reg­istered au­to­mo­bile own­ers and that of tele­phone users, both of which were also wealthier than the av­er­age Amer­i­can at the time.

    Re­search pub­lished in 1972 and 1988 con­cluded that as ex­pected this sam­pling bias was a fac­tor, but non-re­sponse bias was the pri­mary source of the er­ror—that is, peo­ple who dis­liked Roo­sevelt had strong feel­ings and were more will­ing to take the time to mail back a re­sponse.

    Ge­orge Gal­lup’s Amer­i­can In­sti­tute of Public Opinion achieved na­tional recog­ni­tion by cor­rectly pre­dict­ing the re­sult of the 1936 elec­tion, while Gal­lup also cor­rectly pre­dicted the (quite differ­ent) re­sults of the Liter­ary Digest poll to within 1.1%, us­ing a much smaller sam­ple size of just 50,000.[5] Gal­lup’s fi­nal poll be­fore the elec­tion also pre­dicted Roo­sevelt would re­ceive 56% of the pop­u­lar vote: the offi­cial tally gave Roo­sevelt 60.8%.

    This de­ba­cle led to a con­sid­er­able re­fine­ment of pub­lic opinion pol­ling tech­niques, and later came to be re­garded as ush­er­ing in the era of mod­ern sci­en­tific pub­lic opinion re­search.

Feyn­man in 1985, an­swer­ing ques­tions about whether ma­chines will ever be more in­tel­li­gent than hu­mans.

Why Most Pub­lished Re­search Find­ings Are False, back from 2005. The ab­stract reads:

There is in­creas­ing con­cern that most cur­rent pub­lished re­search find­ings are false. The prob­a­bil­ity that a re­search claim is true may de­pend on study power and bias, the num­ber of other stud­ies on the same ques­tion, and, im­por­tantly, the ra­tio of true to no re­la­tion­ships among the re­la­tion­ships probed in each sci­en­tific field. In this frame­work, a re­search find­ing is less likely to be true when the stud­ies con­ducted in a field are smaller; when effect sizes are smaller; when there is a greater num­ber and lesser pre­s­e­lec­tion of tested re­la­tion­ships; where there is greater flex­i­bil­ity in de­signs, defi­ni­tions, out­comes, and an­a­lyt­i­cal modes; when there is greater fi­nan­cial and other in­ter­est and prej­u­dice; and when more teams are in­volved in a sci­en­tific field in chase of statis­ti­cal sig­nifi­cance. Si­mu­la­tions show that for most study de­signs and set­tings, it is more likely for a re­search claim to be false than true. More­over, for many cur­rent sci­en­tific fields, claimed re­search find­ings may of­ten be sim­ply ac­cu­rate mea­sures of the pre­vailing bias. In this es­say, I dis­cuss the im­pli­ca­tions of these prob­lems for the con­duct and in­ter­pre­ta­tion of re­search.

Refer­ence class fore­cast­ing. Refer­ence class fore­cast­ing or com­par­i­son class fore­cast­ing is a method of pre­dict­ing the fu­ture by look­ing at similar past situ­a­tions and their out­comes. The the­o­ries be­hind refer­ence class fore­cast­ing were de­vel­oped by Daniel Kah­ne­man and Amos Tver­sky. The the­o­ret­i­cal work helped Kah­ne­man win the No­bel Prize in Eco­nomics.Refer­ence class fore­cast­ing is so named as it pre­dicts the out­come of a planned ac­tion based on ac­tual out­comes in a refer­ence class of similar ac­tions to that be­ing fore­cast.

Refer­ence class problem

In statis­tics, the refer­ence class prob­lem is the prob­lem of de­cid­ing what class to use when calcu­lat­ing the prob­a­bil­ity ap­pli­ca­ble to a par­tic­u­lar case. For ex­am­ple, to es­ti­mate the prob­a­bil­ity of an air­craft crash­ing, we could re­fer to the fre­quency of crashes among var­i­ous differ­ent sets of air­craft: all air­craft, this make of air­craft, air­craft flown by this com­pany in the last ten years, etc. In this ex­am­ple, the air­craft for which we wish to calcu­late the prob­a­bil­ity of a crash is a mem­ber of many differ­ent classes, in which the fre­quency of crashes differs. It is not ob­vi­ous which class we should re­fer to for this air­craft. In gen­eral, any case is a mem­ber of very many classes among which the fre­quency of the at­tribute of in­ter­est differs. The refer­ence class prob­lem dis­cusses which class is the most ap­pro­pri­ate to use.

  • See also some thoughts on this here

The Base Rate Book by Credit Suisse.

This book is the first com­pre­hen­sive repos­i­tory for base rates of cor­po­rate re­sults. It ex­am­ines sales growth, gross prof­ita­bil­ity, op­er­at­ing lev­er­age, op­er­at­ing profit mar­gin, earn­ings growth, and cash flow re­turn on in­vest­ment. It also ex­am­ines stocks that have de­clined or risen sharply and their sub­se­quent price perfor­mance. We show how to thought­fully com­bine the in­side and out­side views. The anal­y­sis pro­vides in­sight into the rate of re­gres­sion to­ward the mean and the mean to which re­sults regress.

Hard To Categorize

Im­prov­ing de­ci­sions with mar­ket in­for­ma­tion: an ex­per­i­ment on cor­po­rate pre­dic­tion mar­kets (sci-hub; archive link)

We con­duct a lab ex­per­i­ment to in­ves­ti­gate an im­por­tant cor­po­rate pre­dic­tion mar­ket set­ting: A man­ager needs in­for­ma­tion about the state of a pro­ject, which work­ers have, in or­der to make a state-de­pen­dent de­ci­sion. Work­ers can po­ten­tially re­veal this in­for­ma­tion by trad­ing in a cor­po­rate pre­dic­tion mar­ket. We test two differ­ent mar­ket de­signs to de­ter­mine which pro­vides more in­for­ma­tion to the man­ager and leads to bet­ter de­ci­sions. We also in­ves­ti­gate the effect of top-down ad­vice from the mar­ket de­signer to par­ti­ci­pants on how the pre­dic­tion mar­ket is in­tended to func­tion. Our re­sults show that the the­o­ret­i­cally su­pe­rior mar­ket de­sign performs worse in the lab—in terms of man­ager de­ci­sions—with­out top-down ad­vice. With ad­vice, man­ager de­ci­sions im­prove and both mar­ket de­signs perform similarly well, al­though the the­o­ret­i­cally su­pe­rior mar­ket de­sign fea­tures less mis-pric­ing. We provide a be­hav­ioral ex­pla­na­tion for the failure of the the­o­ret­i­cal pre­dic­tions and dis­cuss im­pli­ca­tions for cor­po­rate pre­dic­tion mar­kets in the field.

The non­profit Ought or­ga­nized a fore­cast­ing thread on ex­is­ten­tial risk, where par­ti­ci­pants dis­play and dis­cuss their prob­a­bil­ity dis­tri­bu­tions for ex­is­ten­tial risk, and out­line some re­flec­tions on a pre­vi­ous fore­cast­ing thread on AI timelines.

A draft re­port on AI timelines, sum­ma­rized in the comments

Gre­gory Lewis has a se­ries of posts re­lated to fore­cast­ing and un­cer­tainty:

Es­ti­ma­tion of prob­a­bil­ities to get tenure track in academia: baseline and pub­li­ca­tions dur­ing the PhD.

How to think about an un­cer­tain fu­ture: les­sons from other sec­tors & mis­takes of longter­mist EAs. The cen­tral the­sis is:

Ex­pected value calcu­la­tions, the favoured ap­proach for EA de­ci­sion mak­ing, are all well and good for com­par­ing ev­i­dence backed global health char­i­ties, but they are of­ten the wrong tool for deal­ing with situ­a­tions of high un­cer­tainty, the do­main of EA longter­mism.

Dis­cus­sion by a Pre­dic­tIt bet­tor on how he made money by fol­low­ing Nate Silver’s pre­dic­tions, from r/​TheMotte.

Also on r/​TheMotte, on the promises and defi­cien­cies of pre­dic­tion mar­kets:

Pre­dic­tion mar­kets will never be able to pre­dict the un­pre­dictable. Their promise is to be bet­ter than all of the available al­ter­na­tives, by in­cor­po­rat­ing all available in­for­ma­tion sources, weighted by ex­perts who are mo­ti­vated by fi­nan­cial re­turns.

So, you’ll never have a perfect pre­dic­tion of who will win the pres­i­den­tial elec­tion, but a good pre­dic­tion mar­ket could provide the best pos­si­ble guess of who will win the pres­i­den­tial elec­tion.

To reach that po­ten­tial, you’d need to clear away the red tape. It would need to be le­gal to make bets on the mar­ket, fees for mak­ing trans­ac­tion need to be low, par­ti­ci­pants would need faith in the bet ad­ju­di­ca­tion pro­cess, and there can’t be limits to the amount you can bet. Signs that you’d suc­ceeded would in­clude so­phis­ti­cated in­vestors mak­ing large bets with a nar­row bid/​ask spread.

Un­for­tu­nately pre­dic­tion mar­kets are nowhere close to that ideal to­day; they’re at most “barely le­gal,” bet sizes are limited, trans­ac­tion fees are high, get­ting money in or out is clumsy and sketchy, trad­ing vol­umes are pretty low, and you don’t see any hedge funds with “pre­dic­tion mar­ket” desks or strate­gies. As a re­sult, I put very lit­tle stock in poli­ti­cal pre­dic­tion mar­kets to­day. At best they’re pop­u­lated by dumb money, and at worst they’re ac­tively ma­nipu­lated by cam­paigns or par­ti­sans who are not mo­ti­vated by di­rect fi­nan­cial re­turns.

Nate Silver on a small twit­ter thread on pre­dic­tion mar­kets: “Most of what makes poli­ti­cal pre­dic­tion mar­kets dumb is that peo­ple as­sume they have ex­per­tise about elec­tion fore­cast­ing be­cause they a) fol­low poli­tics and b) un­der­stand “data” and “mar­kets”. Without more spe­cific do­main knowl­edge, though, that combo is a recipe for stu­pidity.”

  • In­ter­est­ingly, I’ve re­cently found out that 538′s poli­ti­cal pre­dic­tions are prob­a­bly un­der­con­fi­dent, i.e., an 80% hap­pens 88% of the time.

Deloitte fore­casts US holi­day sea­son re­tail sales (but doesn’t provide con­fi­dence in­ter­vals.)

So­lar fore­cast. Sun to leave the quietest part of its cy­cle, but still re­main rel­a­tively quiet and not pro­duce world-end­ing coro­nal mass ejec­tions, the New York Times re­ports.

The Fore­sight In­si­tute or­ga­nizes weekly talks; here is one with Samo Burja on long-lived in­sti­tu­tions.

Some ex­am­ples of failed tech­nol­ogy pre­dic­tions.

Last, but not least, Ozzie Gooen on Mul­ti­vari­ate es­ti­ma­tion & the Squig­gly lan­guage:


Note to the fu­ture: All links are added au­to­mat­i­cally to the In­ter­net Archive. In case of link rot, go there and in­put the dead link.


Lit­tle­wood’s law states that a per­son can ex­pect to ex­pe­rience events with odds of one in a mil­lion (defined by the law as a “mir­a­cle”) at the rate of about one per month.”