Good point, but I still think that many of my beliefs and values differ pretty dramatically from the dominant perspectives often found in EA AI x-risk circles. I think these differences in my underlying worldview should carry just as much weight—if not more—than whether my bottom-line estimates of x-risk align with the median estimates in the community. To elaborate:
On the values side:
Willingness to accept certain tradeoffs that are ~taboo in EA: I am comfortable with many scenarios where AI risk increases by a non-negligible amount if this accelerates AI progress. In other words, I think the potential benefits of faster progress in AI development can often outweigh the risks posed by an increase in existential risk.
Relative indifference to human disempowerment: With some caveats, I am largely comfortable with human disempowerment, and I don’t think the goal of AI governance should be to keep humans in control. To me, the preference for prioritizing human empowerment over other outcomes feels like an arbitrary form of speciesism—favoring humans simply because we are human, rather than due to any solid moral reasoning.
On the epistemic side:
Skepticism of AI alignment’s central importance to AI x-risk: I am skeptical that AI alignment is very important for reducing x-risk from AI. My primary threat model for AI risk doesn’t center on the idea that an AI with a misaligned utility function would necessarily pose a danger. Instead, I think the key issue lies in whether agents with differing values—be they human or artificial—will have incentives to cooperate and compromise peacefully or whether their environment will push them toward conflict and violence.
Doubts about the treacherous turn threat model: I believe the “treacherous turn” threat model is significantly overrated. (For context, this model posits that an AI system could pretend to be aligned with human values until it becomes sufficiently capable to act against us without risk.) I’ll note that both Paul Christiano and Eliezer Yudkowsky have identified this as their main threat model, but it is not my primary threat model.
Good point, but I still think that many of my beliefs and values differ pretty dramatically from the dominant perspectives often found in EA AI x-risk circles. I think these differences in my underlying worldview should carry just as much weight—if not more—than whether my bottom-line estimates of x-risk align with the median estimates in the community. To elaborate:
On the values side:
Willingness to accept certain tradeoffs that are ~taboo in EA:
I am comfortable with many scenarios where AI risk increases by a non-negligible amount if this accelerates AI progress. In other words, I think the potential benefits of faster progress in AI development can often outweigh the risks posed by an increase in existential risk.
Relative indifference to human disempowerment:
With some caveats, I am largely comfortable with human disempowerment, and I don’t think the goal of AI governance should be to keep humans in control. To me, the preference for prioritizing human empowerment over other outcomes feels like an arbitrary form of speciesism—favoring humans simply because we are human, rather than due to any solid moral reasoning.
On the epistemic side:
Skepticism of AI alignment’s central importance to AI x-risk:
I am skeptical that AI alignment is very important for reducing x-risk from AI. My primary threat model for AI risk doesn’t center on the idea that an AI with a misaligned utility function would necessarily pose a danger. Instead, I think the key issue lies in whether agents with differing values—be they human or artificial—will have incentives to cooperate and compromise peacefully or whether their environment will push them toward conflict and violence.
Doubts about the treacherous turn threat model:
I believe the “treacherous turn” threat model is significantly overrated. (For context, this model posits that an AI system could pretend to be aligned with human values until it becomes sufficiently capable to act against us without risk.) I’ll note that both Paul Christiano and Eliezer Yudkowsky have identified this as their main threat model, but it is not my primary threat model.