Ok, cool, that’s helpful to know. Is your intuition that these examples will definitely occur and we just haven’t seen them yet (due to model size or something like this)? If so, why?
My intuition is that they will occur, hopefully before it’s too late (but it’s possible that due to incentives for deception etc we may not see it before it’s too late). More here: Evaluating LM power-seeking .
Ok, cool, that’s helpful to know. Is your intuition that these examples will definitely occur and we just haven’t seen them yet (due to model size or something like this)? If so, why?
My intuition is that they will occur, hopefully before it’s too late (but it’s possible that due to incentives for deception etc we may not see it before it’s too late). More here: Evaluating LM power-seeking .