This paper has at least two significant flaws when used to estimate relative complexity for useful purposes. In the authors’ defense such an estimate wasn’t the main motivation of the paper, but the Quanta article is all about estimation and the paper doesn’t mention the flaws.
Flaw one: no reversed control Say we have two parameterized model classes An and Bn, and ask what ns are necessary for An to approximate B1 and Bn to approximate A1. It is trivial to construct model classes for which the n is large in both directions, just because A1 is a much better algorithm to approximate A1 than B1 and vice versa. I’m not sure how much this cuts off the 1000 estimate, but it could easily be 10x.
Flaw two: no scaling w.r.t. multiple neurons I don’t see any reason to believe the 1000 factor would remain constant as you add more neurons, so that we’re approximating many real neurons with many (more) artificial neurons. In particular, it’s easy to construct model classes where the factor decays to 1 as you add more real neurons. I don’t know how strong this effect is, but again there is no discussion or estimation of it in the paper.
Thanks, these are both excellent points. I did hint to the first one, and I specifically came back to this post to mention the second, but you beat me to it. ;)
I’ve edited my post.
EDIT: Also edited again to emphasize the weaknesses.
This paper has at least two significant flaws when used to estimate relative complexity for useful purposes. In the authors’ defense such an estimate wasn’t the main motivation of the paper, but the Quanta article is all about estimation and the paper doesn’t mention the flaws.
Flaw one: no reversed control
Say we have two parameterized model classes An and Bn, and ask what ns are necessary for An to approximate B1 and Bn to approximate A1. It is trivial to construct model classes for which the n is large in both directions, just because A1 is a much better algorithm to approximate A1 than B1 and vice versa. I’m not sure how much this cuts off the 1000 estimate, but it could easily be 10x.
Brief Twitter thread about this: https://twitter.com/geoffreyirving/status/1433487270779174918
Flaw two: no scaling w.r.t. multiple neurons
I don’t see any reason to believe the 1000 factor would remain constant as you add more neurons, so that we’re approximating many real neurons with many (more) artificial neurons. In particular, it’s easy to construct model classes where the factor decays to 1 as you add more real neurons. I don’t know how strong this effect is, but again there is no discussion or estimation of it in the paper.
Thanks, these are both excellent points. I did hint to the first one, and I specifically came back to this post to mention the second, but you beat me to it. ;)
I’ve edited my post.
EDIT: Also edited again to emphasize the weaknesses.