Yes, in the longer term you would need to scale up the evaluation work that is currently going on. It doesn’t have to be done by the alignment community; there are lots of capable ML researchers and engineers who can do this (and I expect at least some would be interested in it).
I could see this being less of an issue if the evaluations only need to happen for the models that are really the most capable at a given point in time
Yes, I think that’s what you would do.
my worry would be that as capabilities increase, even if we test the top models rigorously, the second-tier models could still end up doing harm.
The proposal would be that if the top models are risky, then you pause. So, if you haven’t paused, then that means your tests concluded that the top models aren’t risky. In that case I don’t know why you expect the second-tier models to be risky?
Yes, in the longer term you would need to scale up the evaluation work that is currently going on. It doesn’t have to be done by the alignment community; there are lots of capable ML researchers and engineers who can do this (and I expect at least some would be interested in it).
Yes, I think that’s what you would do.
The proposal would be that if the top models are risky, then you pause. So, if you haven’t paused, then that means your tests concluded that the top models aren’t risky. In that case I don’t know why you expect the second-tier models to be risky?