Hi Charles, thanks for all the comments! I’ll reply to this one first since it seems like the biggest crux. I completely agree with you that feedforward NNs != RNN/LSTM… and that I haven’t given a crisp argument that the latter can’t scale to TAI. But I don’t think I claim to in the piece! All I wanted to do here was to (1) push back against the claim that the UAT for feedforward networks provides positive evidence that DL->TAI, and (2) give an example of a strategy that could be used to argue in a more principled way that other architectures won’t scale up to certain capabilities, if one is able to derive effective theories for them as was done for MLPs by Roberts et al. (I think it would be really interesting to show this for other architectures and I’d like to think more about it in the future.)
Is the UAT mentioned anywhere in the bio anchors report as a reason for thinking DL will scale to TAI? I didn’t find any mentions of it quickly ctrl-fing in any of the 4 parts or the appendices.
Hi Charles, thanks for all the comments! I’ll reply to this one first since it seems like the biggest crux. I completely agree with you that feedforward NNs != RNN/LSTM… and that I haven’t given a crisp argument that the latter can’t scale to TAI. But I don’t think I claim to in the piece! All I wanted to do here was to (1) push back against the claim that the UAT for feedforward networks provides positive evidence that DL->TAI, and (2) give an example of a strategy that could be used to argue in a more principled way that other architectures won’t scale up to certain capabilities, if one is able to derive effective theories for them as was done for MLPs by Roberts et al. (I think it would be really interesting to show this for other architectures and I’d like to think more about it in the future.)
Is the UAT mentioned anywhere in the bio anchors report as a reason for thinking DL will scale to TAI? I didn’t find any mentions of it quickly ctrl-fing in any of the 4 parts or the appendices.
Yes, it’s mentioned on page 19 of part 4 (as point 1, and my main concern is with point 2b).
Ah, thanks for the pointer