The o1-preview and Claude 3.5-powered template bots did pretty well relative to the rest of the bots.
As I think about it, this surprises me a bit. Did participants have access to these early on?
If so, it seems like many participants underperformed the examples/defaults? That seems kind of underwhelming. I guess it’s easy to make a lot of changes that seem good at the time but wind up hurting performance when tested. Of course, this raises the point that it’s concerning that there wasn’t any faster/cheaper way of testing these bots first. Something seems a bit off here.
Yes, they’ve had access to the template from the get-go, and I believe a lot of people built their bots on the template. I guess it doesn’t surprise me that much. Just another case of KISS.
That said, pgodzinai did layer quite a lot of things, albeit in under 40 hours, and did remarkably well, peer score-wise (compared to his bot peers). And no one did any fine-tuning afaik, which plausibly could improve performance.
As for faster/cheaper way to test the bots: we’re working on something to address this!
As I think about it, this surprises me a bit. Did participants have access to these early on?
If so, it seems like many participants underperformed the examples/defaults? That seems kind of underwhelming. I guess it’s easy to make a lot of changes that seem good at the time but wind up hurting performance when tested. Of course, this raises the point that it’s concerning that there wasn’t any faster/cheaper way of testing these bots first. Something seems a bit off here.
Yes, they’ve had access to the template from the get-go, and I believe a lot of people built their bots on the template. I guess it doesn’t surprise me that much. Just another case of KISS.
That said, pgodzinai did layer quite a lot of things, albeit in under 40 hours, and did remarkably well, peer score-wise (compared to his bot peers). And no one did any fine-tuning afaik, which plausibly could improve performance.
As for faster/cheaper way to test the bots: we’re working on something to address this!