TBH, I think that the time spent scoring rationales is probably quite manageable: I don’t think it should take longer than 30 person-minutes to decently judge each rationale (e.g., have three judges each spend 10 minutes evaluating each), maybe less? It might be difficult to have results within 1-2 hours if you don’t have that many judges, but probably it should be available by the end of the day.
To be clear, I was thinking that only a small number (no more than three, maybe just two) of the total questions should be “rationale questions.”
But definitely the information value of “do rationale scores correlate with performance” would be interesting! I’m not sure if the literature has ever done this (I don’t think I’ve encountered anything like that, but I haven’t actively searched for it)
TBH, I think that the time spent scoring rationales is probably quite manageable: I don’t think it should take longer than 30 person-minutes to decently judge each rationale (e.g., have three judges each spend 10 minutes evaluating each), maybe less? It might be difficult to have results within 1-2 hours if you don’t have that many judges, but probably it should be available by the end of the day.
To be clear, I was thinking that only a small number (no more than three, maybe just two) of the total questions should be “rationale questions.”
But definitely the information value of “do rationale scores correlate with performance” would be interesting! I’m not sure if the literature has ever done this (I don’t think I’ve encountered anything like that, but I haven’t actively searched for it)