I didn’t claim that there isn’t plenty more data. But a relevant question is: plenty more data for what? He says that the data situation looks pretty good, which I trust is true in many domains (e.g. video data), and that data would probably in turn improve performance in those domains. But I don’t see him claiming that the data situation looks good in terms of ensuring significant performance gains across all domains, which would be a more specific and stronger claim.
Moreover, the deference question could be posed in the other direction as well, e.g. do you not trust the careful data collection and projections of Epoch? (Though again, Ilya saying that the data situation looks pretty good is arguably not in conflict with Epoch’s projections — nor with any claim I made above — mostly because his brief “pretty good” remark is quite vague.)
Note also that, at least in some domains, OpenAI could end up having less data to train their models with going forward, as they might have been using data illegally.
I didn’t claim that there isn’t plenty more data. But a relevant question is: plenty more data for what? He says that the data situation looks pretty good, which I trust is true in many domains (e.g. video data), and that data would probably in turn improve performance in those domains. But I don’t see him claiming that the data situation looks good in terms of ensuring significant performance gains across all domains, which would be a more specific and stronger claim.
Moreover, the deference question could be posed in the other direction as well, e.g. do you not trust the careful data collection and projections of Epoch? (Though again, Ilya saying that the data situation looks pretty good is arguably not in conflict with Epoch’s projections — nor with any claim I made above — mostly because his brief “pretty good” remark is quite vague.)
Note also that, at least in some domains, OpenAI could end up having less data to train their models with going forward, as they might have been using data illegally.
Let’s hope that OpenAI is forced to pull GPT-4 over the illegal data harvesting used to create it.