The obvious improvement is to do the above, followed by a discussion on each of the largest divergent ratings, followed by a post-test where I expect more of a consensus.*
This is because it’s hard to generate all the relevant facts on your own. Divergences are likely to be due to some crucial factual consideration (e.g. Thinking Fast & Slow was only read by 5% of the people who bought it”; “Thinking Fast & Slow is >40% false”) or a value disagreement. (Most value disagreements are inert on short timescales—but not over years.)
* This fails to be useful to the extent we’re not all equally persuasive, biddable, high-status.
The sharing of information can—sometimes—lead to more conservative funding due to people weighting other peoples’ weak points greater than their strong points. See here for a really fascinating paper in the economics of science: https://pubsonline.informs.org/doi/pdf/10.1287/mnsc.2021.4107
Fwiw, I’d imagine you are all less succumb to weighting other evaluators negative points (different interests at play to journal reviewers) - but still may be a bias here.
The obvious improvement is to do the above, followed by a discussion on each of the largest divergent ratings, followed by a post-test where I expect more of a consensus.*
This is because it’s hard to generate all the relevant facts on your own. Divergences are likely to be due to some crucial factual consideration (e.g. Thinking Fast & Slow was only read by 5% of the people who bought it”; “Thinking Fast & Slow is >40% false”) or a value disagreement. (Most value disagreements are inert on short timescales—but not over years.)
* This fails to be useful to the extent we’re not all equally persuasive, biddable, high-status.
The sharing of information can—sometimes—lead to more conservative funding due to people weighting other peoples’ weak points greater than their strong points. See here for a really fascinating paper in the economics of science: https://pubsonline.informs.org/doi/pdf/10.1287/mnsc.2021.4107
This is a risk, but we’ll still have the pre-test rankings and can probably do something clever here.
Fwiw, I’d imagine you are all less succumb to weighting other evaluators negative points (different interests at play to journal reviewers) - but still may be a bias here.
Peaked my curiosity, what sort of clever thing?
Babbling:
Allocating some of the funding using the pre-test rankings;
or the other way, using the diff between pre and post as a measure for how bad/fragile the pre was;
otherwise working out whether each evaluator leans under- or over-confident and using this to correct their post ranking.
Thanks Gavin!