This one might be for LW or the AF instead /​ as well, but I’d like to write a post about:
should we try to avoid some /​ all alignment research casually making it into the training sets for frontier AI models?
if so, what are the means that we can use to do this? how do they fare on the ratio between reduction in AI access vs. reduction in human access?
I made this into two posts, my first LessWrong posts:
Keeping content out of LLM training datasets
Should we exclude alignment research from LLM training datasets?
This one might be for LW or the AF instead /​ as well, but I’d like to write a post about:
should we try to avoid some /​ all alignment research casually making it into the training sets for frontier AI models?
if so, what are the means that we can use to do this? how do they fare on the ratio between reduction in AI access vs. reduction in human access?
I made this into two posts, my first LessWrong posts:
Keeping content out of LLM training datasets
Should we exclude alignment research from LLM training datasets?