This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:
1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation
2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we will not be able to produce a computational version
3. **Nonspecifiable safety:** we will never write down a specification for safe AI.
Formal verification techniques are applicable only to the first case. Unfortunately, it seems that no one expects the first case to hold in practice: even CHAI, with its mission of building provably beneficial AI systems, is talking about proofs in the informal specification case (which still includes math), on the basis of comments like [these](
) in Human Compatible. In addition, it currently seems particularly hard for experts in formal verification to impact actual practice, and there doesn’t seem to be much reason to expect that to change. As a result, the author is relatively pessimistic about formal verification as a route to reduce existential risk from failures of AI alignment.
Planned summary for the Alignment Newsletter: