Ben Millwood🔸 comments on Ben Millwood’s Quick takes

Ben Millwood🔸 11 Jul 2024 15:15 UTC
2 points
0 ∶ 0
This one might be for LW or the AF instead / as well, but I’d like to write a post about:
- should we try to avoid some / all alignment research casually making it into the training sets for frontier AI models?
- if so, what are the means that we can use to do this? how do they fare on the ratio between reduction in AI access vs. reduction in human access?
- Ben Millwood🔸 18 Jul 2024 10:36 UTC
  2 points
  0 ∶ 0
  Parent
  I made this into two posts, my first LessWrong posts:
  - Keeping content out of LLM training datasets
  - Should we exclude alignment research from LLM training datasets?