Executive summary: This post suggests backup plans if AI systems become misaligned, as well as ideas for making AI systems more cooperative.
Key points:
We could study AI generalization to influence properties like lack of spitefulness, even if not full alignment.
Some properties, like lack of spite, may lead misaligned AIs to cooperate more with humans or other AIs.
We could implement “surrogate goals” in AI systems as harmless placeholders that threats could target instead of original goals.
Negotiation-assist AI could help resolve complex situations with many parties and options.
Acausal decision theory suggests learning too much could be risky; we may want caution before expanding knowledge of distant civilizations.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: This post suggests backup plans if AI systems become misaligned, as well as ideas for making AI systems more cooperative.
Key points:
We could study AI generalization to influence properties like lack of spitefulness, even if not full alignment.
Some properties, like lack of spite, may lead misaligned AIs to cooperate more with humans or other AIs.
We could implement “surrogate goals” in AI systems as harmless placeholders that threats could target instead of original goals.
Negotiation-assist AI could help resolve complex situations with many parties and options.
Acausal decision theory suggests learning too much could be risky; we may want caution before expanding knowledge of distant civilizations.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.