Hi Vasco, I hope you do not mind two follow-up questions: Why does Metaculus default to “resolve time” when in your analysis you think it is better to present “all times”? And given my goal of using Metaculus, which “evaluated at” setting should I pick?
The first vibe I get from this is that Metaculus is cherry picking a method of evaluation that make their predictions look better than they are. But then I think that it cannot be that bad, the crew behind Metaculus seem really scientifically minded and high integrity. So I guess the reason for different methods is that they serve different purposes.
I then spent 10 minutes thinking about what the difference was, got a headache and thought I would ask you in case it takes you 2 minutes to respond or refer me to some online explanation.
My goal is to give “regular” (university educated and well read, but not spent time thinking about risks or forecasting) people confidence in Metaculus’ ability to predict future catastrophes (>10% pop decline in <5 years) as well as the source of these (these types of questions). I want to demonstrate to people these are probably the best estimates available of what threats society and individuals are most likely to face in the coming decades and therefore a good way to think about how to build resilience against these threats.
Thanks again for your excellent work and for you patience with my questions.
Why does Metaculus default to “resolve time” when in your analysis you think it is better to present “all times”? And given my goal of using Metaculus, which “evaluated at” setting should I pick?
The Brier score evaluated at “all times” applies to the whole period during which the question was open. It is the mean Brier score, i.e. the one I would see if I selected a random time during which the question was open. I used it because it contains more information.
I think the setting one should pick depends on the context. If you are looking into:
A question which has already closed, but not yet resolved, I would pick “close time”.
A question which is still open, I would check “all times”, and “other time” matching your current conditions (for example, 1 year “prior to resolve time”). The less data I had for the “other time” option, the more weight I would give to “all times” (everything else equal).
I want to demonstrate to people these are probably the best estimates available of what threats society and individuals are most likely to face in the coming decades and therefore a good way to think about how to build resilience against these threats.
I think it is hard to know how reliable Metaculus’ predictions will be with respect to these questions, as Metaculus’ track record does not yet contain data about long-range questions. There are only 8 questions whose Brier can be evaluated 5 years prior to resolve time. For communicating risk to your audience, one could try to make a case for the possibility of the next few decades being wild (if Metaculus’ nearterm predictions about AI are to be trusted), and the possibility of this being the most important century.
Thanks again for your excellent work and for you patience with my questions.
Hi Vasco, I hope you do not mind two follow-up questions: Why does Metaculus default to “resolve time” when in your analysis you think it is better to present “all times”? And given my goal of using Metaculus, which “evaluated at” setting should I pick?
The first vibe I get from this is that Metaculus is cherry picking a method of evaluation that make their predictions look better than they are. But then I think that it cannot be that bad, the crew behind Metaculus seem really scientifically minded and high integrity. So I guess the reason for different methods is that they serve different purposes.
I then spent 10 minutes thinking about what the difference was, got a headache and thought I would ask you in case it takes you 2 minutes to respond or refer me to some online explanation.
My goal is to give “regular” (university educated and well read, but not spent time thinking about risks or forecasting) people confidence in Metaculus’ ability to predict future catastrophes (>10% pop decline in <5 years) as well as the source of these (these types of questions). I want to demonstrate to people these are probably the best estimates available of what threats society and individuals are most likely to face in the coming decades and therefore a good way to think about how to build resilience against these threats.
Thanks again for your excellent work and for you patience with my questions.
Thanks for the follow-up questions!
The Brier score evaluated at “all times” applies to the whole period during which the question was open. It is the mean Brier score, i.e. the one I would see if I selected a random time during which the question was open. I used it because it contains more information.
I think the setting one should pick depends on the context. If you are looking into:
A question which has already closed, but not yet resolved, I would pick “close time”.
A question which is still open, I would check “all times”, and “other time” matching your current conditions (for example, 1 year “prior to resolve time”). The less data I had for the “other time” option, the more weight I would give to “all times” (everything else equal).
I think it is hard to know how reliable Metaculus’ predictions will be with respect to these questions, as Metaculus’ track record does not yet contain data about long-range questions. There are only 8 questions whose Brier can be evaluated 5 years prior to resolve time. For communicating risk to your audience, one could try to make a case for the possibility of the next few decades being wild (if Metaculus’ nearterm predictions about AI are to be trusted), and the possibility of this being the most important century.
No worries; you are welcome!