if you want to make a widget that’s 5% better, you can specialize in widget making and then go home and believe in crystal healing and diversity and inclusion after work.
and
if you want to make impactful changes to the world and you believe in crystal healing and so on, you will probably be drawn away from correct strategies because correct strategies for improving the world tend to require an accurate world model including being accurate about things that are controversial.
and
many people seriously believed that communism was good, and they believed that so much that they rejected evidence to the contrary. Entire continents have been ravaged as a result.
A crux seems to be that I think AI alignment research is a fairly narrow domain, more akin to bacteriology than e.g. “finding EA cause X” or “thinking about if newly invented systems of government will work well”. This seems more true if I imagine for my AI alignment researcher someone trying to run experiments on sparse autoencoders, and less true if I imagine someone trying to have a end-to-end game plan for how to make transformative AI as good as possible for the lightcone, which is obviously a more interdisciplinary topic more likely to require correct contrarianism in a variety of domains. But I think most AI researchers are more in the former category, and will be increasingly so.
If you mean some parts of technical alignment, then perhaps that’s true, but I mean alignment in the broad sense of creating good outcomes.
> AI researchers are more in the former category
Yeah but people in the former category don’t matter much in terms of outcomes. Making a better sparse autoencoder won’t change the world at the margin, just like technocrats working to make soviet central planning better ultimately didn’t change the world because they were making incremental progress in the wrong direction.
I agree with
and
and
A crux seems to be that I think AI alignment research is a fairly narrow domain, more akin to bacteriology than e.g. “finding EA cause X” or “thinking about if newly invented systems of government will work well”. This seems more true if I imagine for my AI alignment researcher someone trying to run experiments on sparse autoencoders, and less true if I imagine someone trying to have a end-to-end game plan for how to make transformative AI as good as possible for the lightcone, which is obviously a more interdisciplinary topic more likely to require correct contrarianism in a variety of domains. But I think most AI researchers are more in the former category, and will be increasingly so.
If you mean some parts of technical alignment, then perhaps that’s true, but I mean alignment in the broad sense of creating good outcomes.
> AI researchers are more in the former category
Yeah but people in the former category don’t matter much in terms of outcomes. Making a better sparse autoencoder won’t change the world at the margin, just like technocrats working to make soviet central planning better ultimately didn’t change the world because they were making incremental progress in the wrong direction.