There are some relevant awesome lists (AIS, Alignment, ML Interpretability), but none of them are both up to date and on topic. There’s also alignment.dev, but not all the projects are open source, and it’s very infrastructure-oriented.
I wouldn’t be that surprised if I’m missing such a list, but AFAIK it doesn’t exist, and plausibly someone should work on this! (Maybe coordinate through AED?)
There are some relevant awesome lists (AIS, Alignment, ML Interpretability), but none of them are both up to date and on topic. There’s also alignment.dev, but not all the projects are open source, and it’s very infrastructure-oriented.
I wouldn’t be that surprised if I’m missing such a list, but AFAIK it doesn’t exist, and plausibly someone should work on this! (Maybe coordinate through AED?)