Buck comments on Longtermist EA needs more Phase 2 work

Buck 25 Apr 2022 15:44 UTC
3 points
0 ∶ 0
FWIW I think that compared to Chris Olah’s old interpretability work, Redwood’s adversarial training work feels more like phase 2 work, and our current interpretability work is similarly phase 2.
- Owen Cotton-Barratt 25 Apr 2022 21:14 UTC
  2 points
  0 ∶ 0
  Parent
  Thanks for this; it made me notice that I was analyzing Chris’s work more in far mode and Redwood’s more in near mode. Maybe you’re right about these comparisons. I’d be be interested to understand whether/how you think the adversarial training work could most plausibly be directly applied (or if you just mean “fewer intermediate steps till eventual application”, or something else).