Thanks for your response. I’ll just respond to a couple things.
Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper,
Human time is expensive: We may lack enough human time to judge every debate, which we can address by training ML models to predict human reward as in Christiano et al. [2017]. Most debates can be judged by the reward predictor rather than by the humans themselves. Critically, the reward predictors do not need to be as smart as the agents by our assumption that judging debates is easier than debating, so they can be trained with less data. We can measure how closely a reward predictor matches a human by showing the same debate to both.
Re
We’d also really contest the ‘perform very similarly to human raters’ is enough—it’d be surprising if we already have a free lunch, no information lost, way to simulate humans well enough to make better AI.
I also find this surprising, or at least I did the first 3 times I came across medium-quality evidence pointing this direction. I don’t find it as surprising any more because I’ve updated my understanding of the world to “welp, I guess 2023 AIs actually are that good on some tasks.” Rather than making arguments to try and convince you, I’ll just link some of the evidence that I have found compelling, maybe you will too, maybe not: Model Written Evals, MACHIAVELLI benchmark, Alpaca (maybe the most significant for my thinking), this database, Constitutional AI.
I’m far from certain that this trend, of LLMs being useful for making better LLMs and for replacing human feedback, continues rather than hitting a wall in the next 2 years, but it does seem more likely than not to me, based on my read of the evidence. Some important decisions in my life rely on how soon this AI stuff is happening (for instance if we have 20+ years I should probably aim to do policy work), so I’m pretty interested in having correct views. Currently, LLMs improving the next generation of AIs via more and better training data is one of the key factors in how I’m thinking about this. If you don’t find these particular evidences compelling and are able to explain why, that would be useful to me!
I’m actually unsure here. I expect there are some times where it’s fine to have no humans in the loop and other times where it’s critical. It generally gives me the ick to take humans out of the loop, but I expect there are some times where I would think it’s correct.
Makes sense that this would be a big factor in what to do with our time, and AI timelines. And we’re surprised too by how AI can overperform expectations, like in the sources you cited.
We’d still say the best way of characterizing the problem of creating synthetic data is that it’s a wide open problem, rather than high confidence that naive approaches using current LMs will just work. How about a general intuition instead of parsing individual sources. We wouldn’t expect making the dataset bigger by just repeating the same example over and over to work. We generate data by having ‘models’ of the original data generators, humans. If we knew what exactly made human data ‘good,’ we could optimize directly for it and simplify massively (this runs into the well-defined eval problem again—we can craft datasets to beat benchmarks of course).
An analogy (a disputed one, to be fair) is Ted Chiang’s lossy compression. So for every case of synthetic data working, there’s also cases where it fails, like Shumailov et el. we cited. If we knew exactly what made human data ‘good,’ we’d argue you wouldn’t see labs continue to ramp up hiring contractors specifically to generate high-quality data in expert domains, like programming.
A fun exercise—take a very small open-source dataset, train your own very small LM, and have it augment (double!) its own dataset. Try different prompts, plot n-gram distributions vs the original data. Can you get one behavior out of the next generation that looks like magic compared to the previous, or does improvement plateau? May have nitpicks with this experiment, but I don’t think it’s that different to what’s happening at large scale.
Thanks for your response. I’ll just respond to a couple things.
Re Constitutional AI: I agree normatively that it seems bad to hand over judging AI debates to AIs[1]. I also think this will happen. To quote from the original AI Safety via Debate paper,
Re
I also find this surprising, or at least I did the first 3 times I came across medium-quality evidence pointing this direction. I don’t find it as surprising any more because I’ve updated my understanding of the world to “welp, I guess 2023 AIs actually are that good on some tasks.” Rather than making arguments to try and convince you, I’ll just link some of the evidence that I have found compelling, maybe you will too, maybe not: Model Written Evals, MACHIAVELLI benchmark, Alpaca (maybe the most significant for my thinking), this database, Constitutional AI.
I’m far from certain that this trend, of LLMs being useful for making better LLMs and for replacing human feedback, continues rather than hitting a wall in the next 2 years, but it does seem more likely than not to me, based on my read of the evidence. Some important decisions in my life rely on how soon this AI stuff is happening (for instance if we have 20+ years I should probably aim to do policy work), so I’m pretty interested in having correct views. Currently, LLMs improving the next generation of AIs via more and better training data is one of the key factors in how I’m thinking about this. If you don’t find these particular evidences compelling and are able to explain why, that would be useful to me!
I’m actually unsure here. I expect there are some times where it’s fine to have no humans in the loop and other times where it’s critical. It generally gives me the ick to take humans out of the loop, but I expect there are some times where I would think it’s correct.
Makes sense that this would be a big factor in what to do with our time, and AI timelines. And we’re surprised too by how AI can overperform expectations, like in the sources you cited.
We’d still say the best way of characterizing the problem of creating synthetic data is that it’s a wide open problem, rather than high confidence that naive approaches using current LMs will just work. How about a general intuition instead of parsing individual sources. We wouldn’t expect making the dataset bigger by just repeating the same example over and over to work. We generate data by having ‘models’ of the original data generators, humans. If we knew what exactly made human data ‘good,’ we could optimize directly for it and simplify massively (this runs into the well-defined eval problem again—we can craft datasets to beat benchmarks of course).
An analogy (a disputed one, to be fair) is Ted Chiang’s lossy compression. So for every case of synthetic data working, there’s also cases where it fails, like Shumailov et el. we cited. If we knew exactly what made human data ‘good,’ we’d argue you wouldn’t see labs continue to ramp up hiring contractors specifically to generate high-quality data in expert domains, like programming.
A fun exercise—take a very small open-source dataset, train your own very small LM, and have it augment (double!) its own dataset. Try different prompts, plot n-gram distributions vs the original data. Can you get one behavior out of the next generation that looks like magic compared to the previous, or does improvement plateau? May have nitpicks with this experiment, but I don’t think it’s that different to what’s happening at large scale.