I’m curious whether there is a viable way to add a short quick-resolving round, either with a subset of questions that will resolve within days or by adding another discipline like Fermi estimation (e.g., https://www.quantifiedintuitions.org/estimation-game)
Human psychology being what it is, getting some sort of results sooner may be reinforcing. From a recruiting/buzz perspective, “our school’s team won a prize at a forecasting tournament last week!” is a lot easier to pitch to the campus newspaper than “we actually won this thing three months ago but just found out...”
I was also going to recommend this, but I’ll just add an implementation idea (which IDK if I fully endorse): you could try to recruit a few superforecasters or subject-matter experts (SMEs) in given field to provide forecasts on the questions at the same time, then have a reciprocal scoring element (I.e., who came closest to the superforecasters’/SMEs’ forecasts). This is basically what was done in the 2022 Existential Risk Persuasion/Forecasting Tournament (XPT), which Philip Tetlock ran (and I participated in). IDK when the study results for that tournament will be out, and maybe it won’t recommend reciprocal scoring, but it definitely seems worth considering.
A separate idea (which again IDK if I fully endorse but was also in the XPT): have people provide dense rationales for a few big forecasts, then you can rate them on the merits of their rationales. (Yes, this involves subjectivity, but it’s not very different from speech and debate tournaments; the bigger problem could be the time required to review the rationales, but even this definitely seems manageable, especially if you provide a clear rubric, as is common in some competitive speech leagues.)
A trial of #2 would have some information value—you could discern how strong the correlation was between the rationale scores and final standings to decide if rationales were a good way to produce a same-week result.
Maybe you could also use idea #1 with only the top-scoring teams making it to the rationale round, to cut down on time spent scoring rationales?
TBH, I think that the time spent scoring rationales is probably quite manageable: I don’t think it should take longer than 30 person-minutes to decently judge each rationale (e.g., have three judges each spend 10 minutes evaluating each), maybe less? It might be difficult to have results within 1-2 hours if you don’t have that many judges, but probably it should be available by the end of the day.
To be clear, I was thinking that only a small number (no more than three, maybe just two) of the total questions should be “rationale questions.”
But definitely the information value of “do rationale scores correlate with performance” would be interesting! I’m not sure if the literature has ever done this (I don’t think I’ve encountered anything like that, but I haven’t actively searched for it)
agreed, quick feedback loops are vital for good engagement + learning. we couldn’t figure out a good way to do it for the pilot, but this is definitely something we’re interested in building out for the next competition.
also, fermi estimation is a great idea — jane street sometimes runs an (unaffiliated) estimathon, but it would be cool to build in an estimation round, or something along those lines. do you have any other ideas for quickly-resolving rounds?
Metaculus is getting better at writing quickly-resolving questions, and we can probably help write some good ones for the next iteration of OPTIC.
There’s a certain eye for news that is interesting, forecastable, and short-term one develops. Our Beginner tournaments (current, 2023 Q1, 2022 Q4) explicitly only have questions that resolve within 1 week, so you can see some inspiration there.
yeah, i agree — i think we’ll probably rely more heavily on questions in that style for the next iteration of OPTIC. i don’t think we relied enough on existing questions/tournaments (see here).
I’m curious whether there is a viable way to add a short quick-resolving round, either with a subset of questions that will resolve within days or by adding another discipline like Fermi estimation (e.g., https://www.quantifiedintuitions.org/estimation-game)
Human psychology being what it is, getting some sort of results sooner may be reinforcing. From a recruiting/buzz perspective, “our school’s team won a prize at a forecasting tournament last week!” is a lot easier to pitch to the campus newspaper than “we actually won this thing three months ago but just found out...”
I was also going to recommend this, but I’ll just add an implementation idea (which IDK if I fully endorse): you could try to recruit a few superforecasters or subject-matter experts (SMEs) in given field to provide forecasts on the questions at the same time, then have a reciprocal scoring element (I.e., who came closest to the superforecasters’/SMEs’ forecasts). This is basically what was done in the 2022 Existential Risk Persuasion/Forecasting Tournament (XPT), which Philip Tetlock ran (and I participated in). IDK when the study results for that tournament will be out, and maybe it won’t recommend reciprocal scoring, but it definitely seems worth considering.
A separate idea (which again IDK if I fully endorse but was also in the XPT): have people provide dense rationales for a few big forecasts, then you can rate them on the merits of their rationales. (Yes, this involves subjectivity, but it’s not very different from speech and debate tournaments; the bigger problem could be the time required to review the rationales, but even this definitely seems manageable, especially if you provide a clear rubric, as is common in some competitive speech leagues.)
A trial of #2 would have some information value—you could discern how strong the correlation was between the rationale scores and final standings to decide if rationales were a good way to produce a same-week result.
Maybe you could also use idea #1 with only the top-scoring teams making it to the rationale round, to cut down on time spent scoring rationales?
TBH, I think that the time spent scoring rationales is probably quite manageable: I don’t think it should take longer than 30 person-minutes to decently judge each rationale (e.g., have three judges each spend 10 minutes evaluating each), maybe less? It might be difficult to have results within 1-2 hours if you don’t have that many judges, but probably it should be available by the end of the day.
To be clear, I was thinking that only a small number (no more than three, maybe just two) of the total questions should be “rationale questions.”
But definitely the information value of “do rationale scores correlate with performance” would be interesting! I’m not sure if the literature has ever done this (I don’t think I’ve encountered anything like that, but I haven’t actively searched for it)
great points!
agreed, quick feedback loops are vital for good engagement + learning. we couldn’t figure out a good way to do it for the pilot, but this is definitely something we’re interested in building out for the next competition.
also, fermi estimation is a great idea — jane street sometimes runs an (unaffiliated) estimathon, but it would be cool to build in an estimation round, or something along those lines. do you have any other ideas for quickly-resolving rounds?
thanks for your thoughts!
~ saul
Metaculus is getting better at writing quickly-resolving questions, and we can probably help write some good ones for the next iteration of OPTIC.
There’s a certain eye for news that is interesting, forecastable, and short-term one develops. Our Beginner tournaments (current, 2023 Q1, 2022 Q4) explicitly only have questions that resolve within 1 week, so you can see some inspiration there.
yeah, i agree — i think we’ll probably rely more heavily on questions in that style for the next iteration of OPTIC. i don’t think we relied enough on existing questions/tournaments (see here).