Executive summary: This exploratory post proposes that before making irreversible decisions, aligned AIs should cultivate philosophical “wisdom” — particularly a clearer understanding of what constitutes a catastrophic mistake — by preemptively clarifying attitudes toward difficult, foundational concepts like meta-philosophy, epistemology, and decision theory, as deferring this entirely to future AIs risks bad path dependencies and garbage-in-garbage-out dynamics.
Key points:
Definition and importance of “wisdom concepts”: These are concepts relevant to evaluating catastrophic mistakes but not objectively verifiable as right or wrong; because AI alignment may not reliably generalize to these domains, initial attitudes toward such concepts could shape long-term outcomes in irreversible ways.
Garbage-in, garbage-out risk: Deferring foundational philosophical reasoning to future AIs assumes their initial epistemic and normative attitudes are correct, but without prior grounding, this deferral may embed flawed or arbitrary assumptions.
Survey of foundational topics: The post non-exhaustively identifies and motivates key domains for philosophical clarification, including meta-philosophy, epistemology, ontology, unawareness, bounded cognition, anthropics, decision theory, and normative uncertainty.
ROMU and philosophical standards: It proposes “Really Open-Minded Updatelessness” (ROMU) as a way to allow agents to revise decisions in light of deeper philosophical reflection, yet recognizes the difficulty in specifying ROMU rigorously for bounded agents.
Implications for AI safety and governance: Cultivating philosophical wisdom in advance could mitigate path-dependent errors, resist epistemic disruption from competitive dynamics, and guide altruistic decision-making even apart from AI.
Call for foundational research: While not offering concrete prioritization, the author argues for more pre-deployment work on clarifying these wisdom concepts to reduce catastrophic risks from high-stakes but poorly understood decisions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.
Executive summary: This exploratory post proposes that before making irreversible decisions, aligned AIs should cultivate philosophical “wisdom” — particularly a clearer understanding of what constitutes a catastrophic mistake — by preemptively clarifying attitudes toward difficult, foundational concepts like meta-philosophy, epistemology, and decision theory, as deferring this entirely to future AIs risks bad path dependencies and garbage-in-garbage-out dynamics.
Key points:
Definition and importance of “wisdom concepts”: These are concepts relevant to evaluating catastrophic mistakes but not objectively verifiable as right or wrong; because AI alignment may not reliably generalize to these domains, initial attitudes toward such concepts could shape long-term outcomes in irreversible ways.
Garbage-in, garbage-out risk: Deferring foundational philosophical reasoning to future AIs assumes their initial epistemic and normative attitudes are correct, but without prior grounding, this deferral may embed flawed or arbitrary assumptions.
Survey of foundational topics: The post non-exhaustively identifies and motivates key domains for philosophical clarification, including meta-philosophy, epistemology, ontology, unawareness, bounded cognition, anthropics, decision theory, and normative uncertainty.
ROMU and philosophical standards: It proposes “Really Open-Minded Updatelessness” (ROMU) as a way to allow agents to revise decisions in light of deeper philosophical reflection, yet recognizes the difficulty in specifying ROMU rigorously for bounded agents.
Implications for AI safety and governance: Cultivating philosophical wisdom in advance could mitigate path-dependent errors, resist epistemic disruption from competitive dynamics, and guide altruistic decision-making even apart from AI.
Call for foundational research: While not offering concrete prioritization, the author argues for more pre-deployment work on clarifying these wisdom concepts to reduce catastrophic risks from high-stakes but poorly understood decisions.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.