I expect that once there’s been quite a few unjournal reviews, that people will attempt to compare scores across projects. I.e., I can imagine a world in which a report / paper receives a 40⁄100 and people point out “This is below the 60⁄100 of the average article in the same area”. How useful do you expect such comparisons to be?
I’m hoping these will be very useful, if we scaled up enough. I also want to work to make these scores more concretely grounded and tied to specific benchmarks and comparison groups. And hope to do better to operationalize specific predictions,[1], and use well-justified tools for aggregating individual evaluation ratings into reliable metrics. (E.g., potentially partnering with initiatives like RepliCATS.
Do you think there could / should be a way to control for the depth of the analysis?
Not sure precisely what you mean by ‘control for the depth’. The ratings we currently elicit are multidimensional, and the depth should be captured in some of these. But these were basically a considered first-pass; I suspect we can do better to come up with a well-grounded and meaningful set of categories to rate. (And I hope to discuss this further and incorporate the ideas of particular meta-science researchers and initiatives)
For things like citations, measures of impact, replicability, votes X years on ‘how impactful was this paper’ etc., perhaps leveraging prediction markets.
I’m hoping these will be very useful, if we scaled up enough. I also want to work to make these scores more concretely grounded and tied to specific benchmarks and comparison groups. And hope to do better to operationalize specific predictions,[1], and use well-justified tools for aggregating individual evaluation ratings into reliable metrics. (E.g., potentially partnering with initiatives like RepliCATS.
Not sure precisely what you mean by ‘control for the depth’. The ratings we currently elicit are multidimensional, and the depth should be captured in some of these. But these were basically a considered first-pass; I suspect we can do better to come up with a well-grounded and meaningful set of categories to rate. (And I hope to discuss this further and incorporate the ideas of particular meta-science researchers and initiatives)
For things like citations, measures of impact, replicability, votes X years on ‘how impactful was this paper’ etc., perhaps leveraging prediction markets.