Executive summary: Counting arguments that future AIs will likely “scheme” against humans are flawed and provide no good reason to worry AIs will intentionally deceive or try dominating humans.
Key points:
A counting argument for neural networks massively overfitting training data is structurally identical and more plausible than counting arguments for AI schemers, yet overfitting rarely happens. This shows counting arguments are generally unsound.
Counting arguments rely on the principle of indifference, which is known to give absurd results in many cases. There is no good reason to apply indifference reasoning to an AI’s goals or behaviors.
The assumption that AIs have fundamental inner goals separate from their behaviors is doubtful. Behaviors likely come first, with goal-attribution happening later based on patterns in those behaviors.
Even AIs with explicit inner optimization objectives would not necessarily behave in ways that coherently pursue those objectives. Their behaviors result from complex interactions between components and are not cleanly dictated by any simple objective.
Other arguments for worrying about AI schemers similarly rely on unsound indifference reasoning or implausible assumptions. Once indifference reasoning is rejected, there is very little reason left to believe AIs will spontaneously become schemers.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: Counting arguments that future AIs will likely “scheme” against humans are flawed and provide no good reason to worry AIs will intentionally deceive or try dominating humans.
Key points:
A counting argument for neural networks massively overfitting training data is structurally identical and more plausible than counting arguments for AI schemers, yet overfitting rarely happens. This shows counting arguments are generally unsound.
Counting arguments rely on the principle of indifference, which is known to give absurd results in many cases. There is no good reason to apply indifference reasoning to an AI’s goals or behaviors.
The assumption that AIs have fundamental inner goals separate from their behaviors is doubtful. Behaviors likely come first, with goal-attribution happening later based on patterns in those behaviors.
Even AIs with explicit inner optimization objectives would not necessarily behave in ways that coherently pursue those objectives. Their behaviors result from complex interactions between components and are not cleanly dictated by any simple objective.
Other arguments for worrying about AI schemers similarly rely on unsound indifference reasoning or implausible assumptions. Once indifference reasoning is rejected, there is very little reason left to believe AIs will spontaneously become schemers.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.