I’m entirely unconvinced that this is a relevant concern—if training data is the equivalent of complex environments, we kind-of get it for free, and even where we don’t, we can simulate natural environments and other agents much more cheaply than nature.
if training data is the equivalent of complex environments, we kind-of get it for free
Don’t disagree
we can simulate natural environments and other agents much more cheaply than nature
Also don’t disagree, but this is a matter of degree, no? For example, I’m thinking that having an enviroment with many agents acting on each other and on the environment would make the training process less paralelizable.
Personally I found it pretty hard to give a number to “the least complex environment which could give rise to intelligent life”; if you have thoughts on how to bound this I’d be keen to hear them.
It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data won’t allow them to do flexibly without possibly impractical amounts of it. You’d need to anticipate how the environment would respond, and you’d basically be filling the entries of a giant lookup table for the responses you anticipate.
AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.
It’s plausible the environments don’t need to be very complex or detailed, though, to the point that most of the operations are still in the AI.
You don’t necessarily need to collect training data, that’s why RL works. And simulating an environment is potentially cheap, as you noted. So again, I’m unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.
I’m entirely unconvinced that this is a relevant concern—if training data is the equivalent of complex environments, we kind-of get it for free, and even where we don’t, we can simulate natural environments and other agents much more cheaply than nature.
Don’t disagree
Also don’t disagree, but this is a matter of degree, no? For example, I’m thinking that having an enviroment with many agents acting on each other and on the environment would make the training process less paralelizable.
Personally I found it pretty hard to give a number to “the least complex environment which could give rise to intelligent life”; if you have thoughts on how to bound this I’d be keen to hear them.
That makes sense, and I think we’re mostly agreeing—it just seemed like you were skipping this entirely in your explanation.
It might be very costly, perhaps impractically costly, to collect training data that can make up for the responsiveness of a simulated environment to the choices an agent makes. An agent can actively test and explore their environment in a way collected training data won’t allow them to do flexibly without possibly impractical amounts of it. You’d need to anticipate how the environment would respond, and you’d basically be filling the entries of a giant lookup table for the responses you anticipate.
AlphaGo was originally pretrained to mimic experts, but extra performance came from simulated self-play, and the next version skipped the expert mimicking pretraining.
It’s plausible the environments don’t need to be very complex or detailed, though, to the point that most of the operations are still in the AI.
You don’t necessarily need to collect training data, that’s why RL works. And simulating an environment is potentially cheap, as you noted. So again, I’m unconvinced that this is actually a problem with bio anchors, at least above and beyond what Cotra says in the report itself.