Thanks, Peter!
To your questions:
I’m fairly confident (let’s say 80%) that Metaculus has underestimated progress on benchmarks so far. This doesn’t mean it will keep doing so in the future because (i) forecasters may have learned from this experience to be more bullish and/or (ii) AI progress might slow down. I wouldn’t bet on (ii), but I expect (i) has already happened to some extent—it has certainly happened to me!
The other categories have fewer questions and some have special circumstances that make the evidence of bias much weaker in my view. Specifically, the biggest misses in “compute” came from GPU price spikes that can probably be explained by post-COVID supply chain disruptions and increased demand from crypto miners. Both of these factors were transient.
I like your example with the two independent dice. My takeaway is that, if you have access to a prior that’s more informative than a uniform distribution (in this case, “both dice are unbiased so their sum must be a triangular distribution”), then you should compare your performance against that. My assumption when writing this was that a (log-)uniform prior over the relevant range was the best we could do for these questions. This is in line with the fact that Metaculus’s log score on continuous questions is normalized using a (log-)uniform distribution.
That’s a good point re: different time horizons. I didn’t bother to check the average time between close and resolution for all questions on the platform, but, assuming it’s <<1 year as you suggest, I agree it’s an important caveat. If you know that number off the top of your head, I’ll add it to the post.
Thanks for writing this!
Since your decision seems to come down to the expected positive effect on your happiness, I’m curious whether you considered even cheaper happiness-boosting interventions. For example, hundreds (thousands?) of hours of meditation might give you the “love, belonging, connection” and “personal growth” benefits with fewer downsides, though this might work less reliably than having kids.