A single really convincing demonstration of something like deceptive alignment could make a big difference to the case for standards and monitoring (next section).
This struck me as a particularly good example of a small improvement having a meaningful impact. On a personal note, seeing the example of deceptive alignment you wrote would make me immediately move to the hit-the-emergency-brakes/burn-it-all-down camp. I imagine that many would react in a similar way, which might place a lot of pressure on AI labs to collectively start implementing some strict (not just for show) standards.
This struck me as a particularly good example of a small improvement having a meaningful impact. On a personal note, seeing the example of deceptive alignment you wrote would make me immediately move to the hit-the-emergency-brakes/burn-it-all-down camp. I imagine that many would react in a similar way, which might place a lot of pressure on AI labs to collectively start implementing some strict (not just for show) standards.