It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data wonât allow them to do flexibly without possibly impractical amounts of it. Youâd need to anticipate how the environment would respond, and youâd basically be filling the entries of a giant lookup table for the responses you anticipate.
AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.
Itâs plausible the environments donât need to be very complex or detailed, though, to the point that most of the operations are still in the AI.
You donât necessarily need to collect training data, thatâs why RL works. And simulating an environment is potentially cheap, as you noted. So again, Iâm unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.
It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data wonât allow them to do flexibly without possibly impractical amounts of it. Youâd need to anticipate how the environment would respond, and youâd basically be filling the entries of a giant lookup table for the responses you anticipate.
AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.
Itâs plausible the environments donât need to be very complex or detailed, though, to the point that most of the operations are still in the AI.
You donât necessarily need to collect training data, thatâs why RL works. And simulating an environment is potentially cheap, as you noted. So again, Iâm unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.