RSS

An­thropic capture

TagLast edit: 12 Jul 2022 0:12 UTC by Pablo

Anthropic capture is a capability control method in which an advanced artificial intelligence thinks it might be in a simulation and as such attempts to behave in ways that will be rewarded by its simulators.

Further reading

Bostrom, Nick (2014) Superintelligence: paths, dangers, strategies, Oxford: Oxford University Press, pp. 134–135.

A Ne­glected Align­ment Strat­egy: De­ci­sion-The­o­retic Self-Align­ment via Si­mu­la­tion Uncertainty

Mental Maths Mentor19 Jan 2026 23:11 UTC
8 points
0 comments2 min readEA link
(darayat.substack.com)
No comments.