yz comments on MATS Applications + Research Directions I’m Currently Excited About

yz 8 Feb 2025 16:16 UTC
3 points
0 ∶ 0
I previously did some work on model diffing (base vs chat models) on llama2, llama3 and mistral (as they have similar architectures) for the final project of AISES(https://www.aisafetybook.com/), and found some interesting patterns;
https://docs.google.com/presentation/d/1s-ymk45r_ekdPAdCHbX1hP5ZaAPb82ta/edit#slide=id.p3
Planning to explore more and expand ; Welcome any thoughts/comments/discussions