I like this take: if AI is dangerous enough to kill us in three years, no feasible amount of additional interpretability research would save us.
Our efforts should instead go to limiting the amount of damage that initial AIs could do. That might involve work securing dangerous human-controlled technologies. It might involve creating clever honey pots to catch unsophisticated-but-dangerous AIs before they can fully get their act together. It might involve lobbying for processes or infrastructure to quickly shut down Azure or AWS.
I like this take: if AI is dangerous enough to kill us in three years, no feasible amount of additional interpretability research would save us.
Our efforts should instead go to limiting the amount of damage that initial AIs could do. That might involve work securing dangerous human-controlled technologies. It might involve creating clever honey pots to catch unsophisticated-but-dangerous AIs before they can fully get their act together. It might involve lobbying for processes or infrastructure to quickly shut down Azure or AWS.