“Ideally, you’d want the star ratings to be based on the calibration and resolution that now-resolved questions from that platform (or of that type, or similar) have tended to have in the past. But there’s not yet enough data to allow that. So you asked people who know about each platform to give their best guess as to how each platform has historically compared in calibration and resolution.”
Yes. Note that I actually didn’t ask them about their guess, I asked them to guess a function from various parameters to stars (e.g. “3 stars, but 2 stars when the probability is higher than 90% or less than 10%”.
Or maybe people gave their best guess as to how the platforms will compare on those fronts, based on who uses each platform, what incentives it has, etc.?
Also yes, past performance is highly indicative of future performance.
Also, unlike some other uses for platform comparison, if one platform systematically had much easier questions which they got almost always right, I’d want to give them a higher score (but perhaps show them afterwards in the search, because they might be somewhat trivial).
Yes. Note that I actually didn’t ask them about their guess, I asked them to guess a function from various parameters to stars (e.g. “3 stars, but 2 stars when the probability is higher than 90% or less than 10%”.
Also yes, past performance is highly indicative of future performance.
Also, unlike some other uses for platform comparison, if one platform systematically had much easier questions which they got almost always right, I’d want to give them a higher score (but perhaps show them afterwards in the search, because they might be somewhat trivial).