By the way, the report “Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament” was evaluated by The Unjournal – see unjournal.pubpub.org. Please let us know if you found our evaluation useful and how we can do better; we’re working to measure and boost our impact. You can email us at contact@unjournal.org, and we can schedule a chat. (Semi-automated comment)
We’re working to track our impact on evaluated research (see coda.io/d/Unjournal-...) So We asked Claude 4.5 to consider the differences across paper versions, how they related to the Unjournal evaluator suggestions, and whether this was likely to have been causal.
See Claude’s report here coming from the prompts here. Claude’s assessment: Some changes seemed to potentially reflect the evaluators’ comments. Several ~major suggestions were not implemented, such as the desire for further statistical metrics and inference.
Maybe the evaluators (or Claude) got this wrong, or these changes were not warranted under the circumstances We’re all about an open research conversation, and we invite the authors’ (and others’) responses.
By the way, the report “Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament” was evaluated by The Unjournal – see unjournal.pubpub.org. Please let us know if you found our evaluation useful and how we can do better; we’re working to measure and boost our impact. You can email us at contact@unjournal.org, and we can schedule a chat. (Semi-automated comment)
Here’s the Unjournal evaluation package
A version of this work has been published in the International Journal of Forecasting under the title “Subjective-probability forecasts of existential risk: Initial results from a hybrid persuasion-forecasting tournament”
We’re working to track our impact on evaluated research (see coda.io/d/Unjournal-...) So We asked Claude 4.5 to consider the differences across paper versions, how they related to the Unjournal evaluator suggestions, and whether this was likely to have been causal.
See Claude’s report here coming from the prompts here. Claude’s assessment: Some changes seemed to potentially reflect the evaluators’ comments. Several ~major suggestions were not implemented, such as the desire for further statistical metrics and inference.
Maybe the evaluators (or Claude) got this wrong, or these changes were not warranted under the circumstances We’re all about an open research conversation, and we invite the authors’ (and others’) responses.