Paul, I think deceptive alignment (or other spontaneous, stable-across-situations goal pursuit) after just pretraining is very unlikely. I am happy to take bets if youāre interested. If so, email me (alex@turntrout.com), since I donāt check this very much.
I think that ādeceptively aligned during pre-trainingā is closer to e.g. Eliezerās historical views.
I agree, and the actual published arguments for deceptive alignment Iāve seen donāt depend on any difference between pretraining and finetuning, so they canāt only apply to one. (People have tried to claim to me, unsurprisingly, that the arguments havenāt historically focused on pretraining.)
The EA community has a significant undersupply of information from victims of abusive conduct, since the victims are often branded as ātriggeredā or āirrationalā. Iāve heard this from female friends, Iāve read about this (e.g. in the TIME article), and I myself paid social costs in sharing a different kind of negative experience. Victims often pay significant social costs to talk about their experiences.
Community norms should not impose costs on sharing such information. Iām sorry you had to pay these costs, Frances. Thank you for speaking out. Hopefully this post decreases the cost in these communities. In fact, such important information should be socially subsidized, not taxed (since e.g. speaking out often requires reliving trauma, which is unpleasant; and most of the benefit is external).