I haven’t gone through any of this in detail so I’m sure I’m missing quite a bit of the technical specifics and maybe some of the big picture, but here’s what I’m seeing. It doesn’t seem surprising that a large neural network would perform well on scenarios in the training distribution but generalize poorly. It also does not surprise me that models somehow constrained or incentivised to search for conceptually simpler solutions would discover physical laws that do generalize better. But this just sounds like you’ve re-discovered Occam’s razor. Yes, all science is oriented around searching for laws that have a sort of conceptual simplicity to them, a simplicity that your standard neural network is just not interested in, and that bias toward conceptual simplicity is what allows generalization. So were the systems really designed around the physical laws, or were they simply designed around the scientific method? A human scientist has to be taught Occam’s razor in their training too, so perhaps it should not surprise us that the same holds for ML models.
I haven’t gone through any of this in detail so I’m sure I’m missing quite a bit of the technical specifics and maybe some of the big picture, but here’s what I’m seeing. It doesn’t seem surprising that a large neural network would perform well on scenarios in the training distribution but generalize poorly. It also does not surprise me that models somehow constrained or incentivised to search for conceptually simpler solutions would discover physical laws that do generalize better. But this just sounds like you’ve re-discovered Occam’s razor. Yes, all science is oriented around searching for laws that have a sort of conceptual simplicity to them, a simplicity that your standard neural network is just not interested in, and that bias toward conceptual simplicity is what allows generalization. So were the systems really designed around the physical laws, or were they simply designed around the scientific method? A human scientist has to be taught Occam’s razor in their training too, so perhaps it should not surprise us that the same holds for ML models.