There are some relevant awesome lists (AIS, Alignment, ML Interpretability), but none of them are both up to date and on topic. There’s also alignment.dev, but not all the projects are open source, and it’s very infrastructure-oriented.
I wouldn’t be that surprised if I’m missing such a list, but AFAIK it doesn’t exist, and plausibly someone should work on this! (Maybe coordinate through AED?)
Inspect is open-source, and should be exactly what you’re looking for given your stated interest in METR
There are some relevant awesome lists (AIS, Alignment, ML Interpretability), but none of them are both up to date and on topic. There’s also alignment.dev, but not all the projects are open source, and it’s very infrastructure-oriented.
I wouldn’t be that surprised if I’m missing such a list, but AFAIK it doesn’t exist, and plausibly someone should work on this! (Maybe coordinate through AED?)