I’m skeptical that corporate AI safety commitments work like @Holden Karnofsky suggests. The “cage-free” analogy breaks: one temporary defector can erase ~all progress, unlike with chickens.
I’m less sure about corporate commitments to AI safety than Karnofsky. In the latest 80k hrs podcast episode, Karnofsky uses the cage free example of why it might be effective to push frontier AI companies on safety. I feel the analogy might fail in potentially a significant way in that the analogy breaks in terms of how many companies need to be convinced: -For cage free chicken, convincing one company, even just for a few months, is a big win -For frontier AI companies, you might not win until every single company is convinced, forever. One company not committing, perhaps only for a few months, and the risk reduction could evaporate, or at least take a significant hit
I do recognize that it might be more nuanced but felt the 80k interview overstated optimism on this front. For example, steel-manning his argument, maybe if one gets 60% “coverage” in a critical period, it still reduces risk significantly. But if it is to a large degree a “cat-out-of-the-bag” situation, the bag only needs to be open briefly.
Perhaps I am missing something obvious, so useful if people can correct me.
I’m skeptical that corporate AI safety commitments work like @Holden Karnofsky suggests. The “cage-free” analogy breaks: one temporary defector can erase ~all progress, unlike with chickens.
I’m less sure about corporate commitments to AI safety than Karnofsky. In the latest 80k hrs podcast episode, Karnofsky uses the cage free example of why it might be effective to push frontier AI companies on safety. I feel the analogy might fail in potentially a significant way in that the analogy breaks in terms of how many companies need to be convinced:
-For cage free chicken, convincing one company, even just for a few months, is a big win
-For frontier AI companies, you might not win until every single company is convinced, forever. One company not committing, perhaps only for a few months, and the risk reduction could evaporate, or at least take a significant hit
I do recognize that it might be more nuanced but felt the 80k interview overstated optimism on this front. For example, steel-manning his argument, maybe if one gets 60% “coverage” in a critical period, it still reduces risk significantly. But if it is to a large degree a “cat-out-of-the-bag” situation, the bag only needs to be open briefly.
Perhaps I am missing something obvious, so useful if people can correct me.