So, to me, the main question for knowing if work on AI safety is net positive is whether the intermediary scenario where AI is aligned with current humanity values, but not with ‘Great values’, is better than the scenario where AI is not aligned at all
AI not being aligned at all is not exactly a live option? The pre-training relies on lots of human data, so it alone leads to some alignment with humanity. Then I would say that current frontier models post-alignment already have better values than a random human, so I assume alignment techniques will be enough to at least end up with better than typical human values, even if not great values.
I suppose that, for most the vast majority of cases, trying to make a technology safer does in fact make it safer. So I believe there should be a strong prior for working on AI safety being good. However, I still think corporate campaigns for chicken welfare are more cost-effective.
Interesting points!
AI not being aligned at all is not exactly a live option? The pre-training relies on lots of human data, so it alone leads to some alignment with humanity. Then I would say that current frontier models post-alignment already have better values than a random human, so I assume alignment techniques will be enough to at least end up with better than typical human values, even if not great values.
I suppose that, for most the vast majority of cases, trying to make a technology safer does in fact make it safer. So I believe there should be a strong prior for working on AI safety being good. However, I still think corporate campaigns for chicken welfare are more cost-effective.