Help improve reasoning evaluation in intelligence organisations

TL;DR—My re­search group at the Univer­sity of Melbourne is work­ing to im­prove meth­ods for eval­u­at­ing qual­ity of rea­son­ing, par­tic­u­larly for use within gov­ern­ment in­tel­li­gence or­gani­sa­tions. We’re con­duct­ing a study to com­pare a new eval­u­a­tion method with the method cur­rently used by in­tel­li­gence agen­cies in the US. By par­ti­ci­pat­ing, you get ac­cess to train­ing ma­te­ri­als in both the ex­ist­ing and pro­posed meth­ods. You can sign up here.

Study Motivation

It is im­por­tant that con­clu­sions reached by an­a­lysts work­ing in pro­fes­sional in­tel­li­gence or­gani­sa­tions are ac­cu­rate so that re­sult­ing de­ci­sions made by gov­ern­ments and other de­ci­sion-mak­ers are grounded in re­al­ity. His­tor­i­cally, failures of in­tel­li­gence have con­tributed to de­ci­sions or over­sights that wasted re­sources and of­ten caused sig­nifi­cant harm. Promi­nent ex­am­ples from US his­tory in­clude the at­tack on Pearl Har­bour, the 1961 Bay of Pigs in­va­sion, 9/​11, and the Iraq War.

Such events are at least partly the re­sult of in­sti­tu­tional de­ci­sions made based on poor rea­son­ing. To re­duce the risk of such events, it is im­por­tant that the anal­y­sis in­form­ing those de­ci­sions is well rea­soned. We use the phrase well rea­soned to mean that the ar­gu­ments ar­tic­u­lated es­tab­lish the stated con­clu­sion. (If the ar­gu­ments fail to es­tab­lish the stated con­clu­sion, we say the anal­y­sis is poorly rea­soned.)

The ‘in­dus­try stan­dard’ method for eval­u­at­ing qual­ity of rea­son­ing (QoR) amongst in­tel­li­gence or­gani­sa­tions in the US is the IC Rat­ing Scale, a rubric based on a set of An­a­lytic Stan­dards is­sued by the US Office of the Direc­tor of Na­tional In­tel­li­gence (ODNI) in 2015. There are sig­nifi­cant ques­tion marks over the ex­tent to which the IC Rat­ing Scale is (and can be) op­er­a­tional­ised to im­prove the QoR in in­tel­li­gence or­gani­sa­tions. See here for a de­tailed sum­mary, but in brief:

  • In­ter-rater re­li­a­bil­ity of the Rat­ing Scale be­tween in­di­vi­d­ual raters is poor. (Though re­li­a­bil­ity be­tween ag­gre­gated rat­ings—con­structed by av­er­ag­ing the rat­ings of mul­ti­ple raters—is bet­ter.)

  • In­for­ma­tion is lack­ing on whether or not the Rat­ing Scale is valid (whether it in fact mea­sures QoR, as in­tended).

  • Am­bi­gui­ties in the speci­fi­ca­tion of the Rat­ing Scale can make it difficult for raters to ap­ply.

  • The Rat­ing Scale can be overly pre­scrip­tive and de­tailed, mak­ing it difficult to quickly dis­t­in­guish well rea­soned from poorly rea­soned an­a­lytic prod­ucts.

Our re­search group has been de­vel­op­ing an al­ter­na­tive method for eval­u­at­ing QoR, no­tion­ally called the Rea­son­ing Stress Test (RST), which fo­cuses on de­tect­ing the pres­ence of par­tic­u­lar types of rea­son­ing flaws in writ­ten rea­son­ing. The RST is de­signed to be an easy to ap­ply and effi­cient method, but this ap­proach comes at a cost: raters do not con­sider the de­gree to which the rea­son­ing dis­plays other rea­son­ing virtues, nor go through a check­list of the nec­es­sary and suffi­cient con­di­tions of good rea­son­ing.

We are con­duct­ing a study to com­pare the abil­ity of par­ti­ci­pants trained in each method to dis­crim­i­nate be­tween well and poorly rea­soned in­tel­li­gence-style prod­ucts (among other re­search ques­tions).

We are offer­ing train­ing in both the cur­rent and novel meth­ods for eval­u­at­ing QoR in re­turn for par­ti­ci­pa­tion in the study. The train­ing has been pri­mar­ily de­signed for in­tel­li­gence anal­y­sis, so will give you in­sight into how rea­son­ing is eval­u­ated in such in­sti­tu­tions. How­ever, the prin­ci­ples of rea­son­ing qual­ity taught are much more broadly ap­pli­ca­ble. They ap­ply to all types of rea­son­ing, and can be used to as­sess QoR in any in­sti­tu­tion with in­tel­li­gence or an­a­lyt­i­cal roles.

Method­olog­i­cal Note

We are aware that by pub­li­cly de­scribing the po­ten­tial limi­ta­tions of the two meth­ods—as we have done above—we risk prej­u­dic­ing par­ti­ci­pants’ re­sponses to ei­ther method in the study. The al­ter­na­tive, not to provide such in­for­ma­tion, would make it harder for you to de­cide whether the train­ing is of in­ter­est. We de­cided to provide the in­for­ma­tion be­cause:

  • we will be mod­el­ling the effect of ex­ist­ing fa­mil­iar­ity with ei­ther method, rather than ex­clud­ing par­ti­ci­pants on that ba­sis;

  • in the con­text of our study de­sign, it is difficult to ar­tic­u­late a plau­si­ble mechanism through which such prej­u­dice could in­fluence good faith par­ti­ci­pa­tion in the study; and

  • at the cur­rent stage of re­search into meth­ods for eval­u­at­ing QoR, we be­lieve that the value of ad­di­tional data that may be gained by ex­plain­ing the study mo­ti­va­tion out­weighs the po­ten­tial limi­ta­tions of that data as a re­sult of this po­ten­tial prej­u­dic­ing effect.

Sig­nifi­cant work has been done to de­velop pol­ished, in­sight­ful train­ing into both meth­ods, and we are con­fi­dent that learn­ing the prin­ci­ples be­hind and ap­pli­ca­tion of both meth­ods will help you eval­u­ate the rea­son­ing of oth­ers.

What does par­ti­ci­pa­tion in­volve?

Par­ti­ci­pat­ing in the study in­volves:

  • Ran­dom al­lo­ca­tion to one of the two rea­son­ing eval­u­a­tion methods

  • Train­ing on how to use the method, in­clud­ing some sim­ple re­view questions

  • A se­ries of challeng­ing fic­tional in­tel­li­gence prod­ucts (i.e. re­ports or as­sess­ments) to eval­u­ate. In pre­vi­ous test­ing, we have found that many of these are very difficult to eval­u­ate.

  • Ex­pert re­sponses to each ques­tion to com­pare to your own.

  • After you have com­pleted all the train­ing on the first method, you will be given ac­cess to the train­ing ma­te­rial for the other method. You can choose to com­plete the train­ing in the sec­ond method or not as you pre­fer.

You can sign up here.


Any ques­tions, com­ments or sug­ges­tions wel­come.

No comments.