Critical Correction for Conceptual Accuracy Flagged in both our errata and here for highest visibility.
Critical philosophical framing error: The relevant section currently argues “conscious beings will resist death,” but it should state that even current psychopath-like AI systems with optimization drives exhibit survival-like behaviours and would strategically resist shutdown—regardless of consciousness, especially at superintelligence or self-improvement/autonomous levels.
Survival drives emerge from optimization dynamics, not consciousness per se.
This misframes the core argument and weakens the “kill switch” critique. Major correction needed for conceptual accuracy in v1.1.
Noting this in a separate comment as it is the most critical point of the paper; understanding what truly drives AI behaviour (optimisation incentives vs consciousness/morality) is fundamental to alignment. Community discussion on this is critical, especially as it relates to likely existential risk.
Critical Correction for Conceptual Accuracy
Flagged in both our errata and here for highest visibility.
Noting this in a separate comment as it is the most critical point of the paper; understanding what truly drives AI behaviour (optimisation incentives vs consciousness/morality) is fundamental to alignment. Community discussion on this is critical, especially as it relates to likely existential risk.