I previously did some work on model diffing (base vs chat models) on llama2, llama3 and mistral (as they have similar architectures) for the final project of AISES(https://www.aisafetybook.com/), and found some interesting patterns;
https://docs.google.com/presentation/d/1s-ymk45r_ekdPAdCHbX1hP5ZaAPb82ta/edit#slide=id.p3
Planning to explore more and expand ; Welcome any thoughts/comments/discussions
I previously did some work on model diffing (base vs chat models) on llama2, llama3 and mistral (as they have similar architectures) for the final project of AISES(https://www.aisafetybook.com/), and found some interesting patterns;
https://docs.google.com/presentation/d/1s-ymk45r_ekdPAdCHbX1hP5ZaAPb82ta/edit#slide=id.p3
Planning to explore more and expand ; Welcome any thoughts/comments/discussions