Estimating value from pairwise comparisons

How can you estimate the value of research output? You could use pairwise comparisons, e.g., to ask specialists how much more valuable Darwin’s The Original of Species is than Dembski’s Intelligent Design. Then you can use these relative valuations to estimate absolute valuations.

Summary

1. Estimating values is hard. One way to elicit value estimates is ask researchers to compare two different items and , asking how much better is than . This makes the problem more concrete than just asking “what is the value of ?”. The Quantified Uncertainty Institute has made an app for doing this kind of thing, described here.

2. Nuño Sempere had a post about eliciting comparisons of research value from effective altruism researchers. This is a more recent post about AI risk, but it uses distributions instead of point estimates.

3. This post proposes some technical solutions to problems introduced to me in Nuño’s post. In particular, it includes principled ways to

1. estimate subjective values,

2. measure consistency in pairwise value judgments,

3. measure agreement between the raters,

4. aggregate subjective values.

5. I also propose to use weighted least squares when the raters supply distributions instead of numbers. It is not clear to me it is worth it to ask for distributions in these kinds of questions though, as your uncertainty level can be modelled implicitly by comparing different pairwise comparisons.

4. I use these methods on the data from the 6 researchers post.

I’m assuming you have read the 6 researchers post recently. I think this post will be hard to read if you haven’t.

• In my previous job, we used the technique described below to prioritize feature requests and estimate their relative value. Feel free to skip this comment if you’re not interested in slightly related survey techniques.

• Show a random sample of five items to a survey participant

• Participant selects the most important and least important (leaving three items “somewhere in-between”)

• Repeat

Each iteration creates six links between items (A > B, A > C, A > D, B > E, C > E, D > E) plus, transitively, A > E. After enough iterations, a preference order can be established using something like the Schulze Method.

I’ve forgotten the name of this survey method, but found it quite neat. It is both easy to use for participants and yields rich information. I remember participants saying that it was “hard to cheat” in this type of survey, and so it might result in fewer inconsistencies than using the utility function extractor.

• Thank you for telling about this! In economics, the discrete choice model is used to estimate a scale-free utility function in similar way. It is used in health research for estimating QALYs, among other things, see e.g. this review paper.

But discrete choice /​ the Schulze method should probably not be used by themselves, as they cannot give us information about scale, only ordering. A possibility, which I find promising, is to combine the methods. Say that I have ten items I want you to rate. Then I can ask “Do you prefer to ?” for some pairs and “How many times better is than ?” for other pairs, hopefully in an optimal way. Then we would lessen the cognitive load of the study participants and make it easier to scale this kind of thing up.

(The congitive load of using distributions is the main reason why I’m skeptical about having participants using them in place of point estimates when doing pairwise comparisons.)

• I liked the rigor in your post and learned a lot from it, thank you for writing it.