I couldn’t attend the interpretability hackathon and was hoping to get acquainted with LLM interpretability research as a sofware dev with no experience in interpretability or transformers. So here’s a starting point following in the footsteps of this submission (see their writeup here):
Basically I am thinking we can use the hackathon as a collaborative study session to become more familiar with transformers and interpretability, ultimately culminating in replicating the results in the linked submission (it took them 3 days but since we have a starting point, possibly we can replicate their project and grok what they did much quicker).
Not shoehorned to this idea though. If you think there is a better avenue to using the hackathon to upskill in LLM interpretability and transformers, do share.
Nice — this seems ambitious, I really like this idea.
Maybe you can start a study group in GatherTown to continue this virtually as well. I’m sure you’d get takers from other folks interested in ML research.
I couldn’t attend the interpretability hackathon and was hoping to get acquainted with LLM interpretability research as a sofware dev with no experience in interpretability or transformers. So here’s a starting point following in the footsteps of this submission (see their writeup here):
https://github.com/neelnanda-io/Easy-Transformer
Neel Nanda live coding/testing transformer hypotheses learning (1.5hrs) https://www.youtube.com/watch?v=yo4QvDn-vsU&t=3403s
Go through Transformer theory:
Attention is all you need paper: https://arxiv.org/abs/1706.03762
Good summary: https://nostalgebraist.tumblr.com/post/185326092369/the-transformer-explained
Positional embeddings explained (9mins): https://www.youtube.com/watch?v=1biZfFLPRSY
Basically I am thinking we can use the hackathon as a collaborative study session to become more familiar with transformers and interpretability, ultimately culminating in replicating the results in the linked submission (it took them 3 days but since we have a starting point, possibly we can replicate their project and grok what they did much quicker).
Not shoehorned to this idea though. If you think there is a better avenue to using the hackathon to upskill in LLM interpretability and transformers, do share.
Nice — this seems ambitious, I really like this idea.
Maybe you can start a study group in GatherTown to continue this virtually as well. I’m sure you’d get takers from other folks interested in ML research.