Yeah good find, I also think that passes the bar. Although I do think people have generally overestimated GPT’s essay-writing ability compared to humans, and think I might be falling for that here.
I’m not planning to change the doc because Bing’s AI wasn’t released by Feb 23, but if you think it should be included (which would be reasonable given OpenAI pretty obviously made this before Feb 23), it would mean:
Experts expected 9 milestones to be met vs actually 11 milestones
The calibration curve looks four percentage points worse at the 10% mark
Bulls’ Brier score: 0.29
Experts’ Brier score: 0.24
Bears’ Brier score: 0.29
I’ve added it to this tracker of milestones (feel free to request edit access).
Yeah good find, I also think that passes the bar. Although I do think people have generally overestimated GPT’s essay-writing ability compared to humans, and think I might be falling for that here.
I’m not planning to change the doc because Bing’s AI wasn’t released by Feb 23, but if you think it should be included (which would be reasonable given OpenAI pretty obviously made this before Feb 23), it would mean:
Experts expected 9 milestones to be met vs actually 11 milestones
The calibration curve looks four percentage points worse at the 10% mark
Bulls’ Brier score: 0.29
Experts’ Brier score: 0.24
Bears’ Brier score: 0.29
I’ve added it to this tracker of milestones (feel free to request edit access).