You don’t necessarily need to collect training data, that’s why RL works. And simulating an environment is potentially cheap, as you noted. So again, I’m unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.
You don’t necessarily need to collect training data, that’s why RL works. And simulating an environment is potentially cheap, as you noted. So again, I’m unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.