It is not obvious to me that a number of suggested actions here meet this bar. Developing evals, funding work that accidentally encourages race dynamics, or engaging in fear-mongering about current largely harmless or even net-positive AI applications all seem likely to qualify.
Regarding evals, I was referring specifically to evals focused on AI safety and risk-related behaviors like dangerous capabilities, deception, or situational awareness (I will edit the post). I think it’s important to measure and quantify these capabilities to determine when risk mitigation strategies are necessary. Otherwise we risk deploying models with hidden risks and insufficient safeguards.
Exaggerating the risks of current AI models would be misleading so we should avoid that. The point I intended to communicate was that we should try to accurately inform everyone about both the risks and benefits of AI and the opinions of different experts. Given the potential future importance of AI, I believe the quantity and quality of discussion on the topic is too low and this problem is often worsened by the media which tends to focus on short-term events rather than what’s important in the long term.
More generally, while we should aim to avoid causing harm, avoiding all actions that have a non-zero risk of causing harm would lead to inaction.
If overly cautious individuals refrain from taking action, decision making and progress may then be driven by those who are less concerned about risks, potentially leading to worse overall situation.
Therefore, a balanced approach that considers the risks and benefits of each action without stifling all action is needed to make meaningful progress.
I think we basically agree, but I wanted to add the note of caution. Also, I’m evidently more skeptical of the value of evals, as I don’t see a particularly viable theory of change.
It is not obvious to me that a number of suggested actions here meet this bar. Developing evals, funding work that accidentally encourages race dynamics, or engaging in fear-mongering about current largely harmless or even net-positive AI applications all seem likely to qualify.
Thank you for your comment.
Regarding evals, I was referring specifically to evals focused on AI safety and risk-related behaviors like dangerous capabilities, deception, or situational awareness (I will edit the post). I think it’s important to measure and quantify these capabilities to determine when risk mitigation strategies are necessary. Otherwise we risk deploying models with hidden risks and insufficient safeguards.
Exaggerating the risks of current AI models would be misleading so we should avoid that. The point I intended to communicate was that we should try to accurately inform everyone about both the risks and benefits of AI and the opinions of different experts. Given the potential future importance of AI, I believe the quantity and quality of discussion on the topic is too low and this problem is often worsened by the media which tends to focus on short-term events rather than what’s important in the long term.
More generally, while we should aim to avoid causing harm, avoiding all actions that have a non-zero risk of causing harm would lead to inaction.
If overly cautious individuals refrain from taking action, decision making and progress may then be driven by those who are less concerned about risks, potentially leading to worse overall situation.
Therefore, a balanced approach that considers the risks and benefits of each action without stifling all action is needed to make meaningful progress.
I think we basically agree, but I wanted to add the note of caution. Also, I’m evidently more skeptical of the value of evals, as I don’t see a particularly viable theory of change.