My first 2 posts for this project went live on the Alignment Forum today:
1. Introduction to the sequence: Interpretability Research for the Most Important Century2. (main post) Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
My first 2 posts for this project went live on the Alignment Forum today:
1. Introduction to the sequence: Interpretability Research for the Most Important Century
2. (main post) Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios