Since AI X-risk is a main cause area for EA, shouldn’t significant money be going into mechanistic interpretability? After reading the AI 2027 forecast, the opacity of AIs appears to be the main source of risk coming from them. Making significant progress in this field seems very important for alignment.
I took the Giving What We Can Pledge, I want to say there should be something like it but for mechanistic interpretability, but probably only very few people could be convinced to give 10% of their income to mechanistic interpretability.
I’ve had similar considerations. Manifund has projects you can fund directly, some of which are about interpretability. Though without specialized knowledge, I find it difficult to trust my judgement more than people whose job it is to research and think strategically about marginal impact.
Since AI X-risk is a main cause area for EA, shouldn’t significant money be going into mechanistic interpretability? After reading the AI 2027 forecast, the opacity of AIs appears to be the main source of risk coming from them. Making significant progress in this field seems very important for alignment.
I took the Giving What We Can Pledge, I want to say there should be something like it but for mechanistic interpretability, but probably only very few people could be convinced to give 10% of their income to mechanistic interpretability.
what do you mean by the opacity of AIs appearing to be the main source of risk?
I’ve had similar considerations. Manifund has projects you can fund directly, some of which are about interpretability. Though without specialized knowledge, I find it difficult to trust my judgement more than people whose job it is to research and think strategically about marginal impact.