That if there was a pause, alignment research would magically revert back to what it was back in the MIRI days
The claim is more like, “the MIRI days are a cautionary tale about what may happen when alignment research isn’t embedded inside a feedback loop with capabilities.” I don’t literally believe we would revert back to pure theoretical research during a pause, but I do think the research would get considerably lower quality.
However, I’m worried that your [white box] framing is confusing and will cause people to talk past each other.
Perhaps, but I think the current conventional wisdom that neural nets are “black box” is itself a confusing and bad framing and I’m trying to displace it.
AI safety currently seems to heavily lean towards empirical and this emphasis only seems to be growing, so I’m rather skeptical that a bit more theoretical work on the margin will be some kind of catastrophe. I’d actually expect it to be a net positive.
There are probably 100s of AI Alignment / Interpretability PhD theses that could be done on GPT-4 alone. That’s 5 years of empirical work right there without any further advances in capabilities.
The claim is more like, “the MIRI days are a cautionary tale about what may happen when alignment research isn’t embedded inside a feedback loop with capabilities.” I don’t literally believe we would revert back to pure theoretical research during a pause, but I do think the research would get considerably lower quality.
Perhaps, but I think the current conventional wisdom that neural nets are “black box” is itself a confusing and bad framing and I’m trying to displace it.
AI safety currently seems to heavily lean towards empirical and this emphasis only seems to be growing, so I’m rather skeptical that a bit more theoretical work on the margin will be some kind of catastrophe. I’d actually expect it to be a net positive.
There are probably 100s of AI Alignment / Interpretability PhD theses that could be done on GPT-4 alone. That’s 5 years of empirical work right there without any further advances in capabilities.