It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data won’t allow them to do flexibly without possibly impractical amounts of it. You’d need to anticipate how the environment would respond, and you’d basically be filling the entries of a giant lookup table for the responses you anticipate.
AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.
It’s plausible the environments don’t need to be very complex or detailed, though, to the point that most of the operations are still in the AI.
You don’t necessarily need to collect training data, that’s why RL works. And simulating an environment is potentially cheap, as you noted. So again, I’m unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.
It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data won’t allow them to do flexibly without possibly impractical amounts of it. You’d need to anticipate how the environment would respond, and you’d basically be filling the entries of a giant lookup table for the responses you anticipate.
AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.
It’s plausible the environments don’t need to be very complex or detailed, though, to the point that most of the operations are still in the AI.
You don’t necessarily need to collect training data, that’s why RL works. And simulating an environment is potentially cheap, as you noted. So again, I’m unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.