Different approaches. ARC, Anthropic, and Redwood seem to be more in the “prosaic alignment” field (see eg Paul Christiano’s post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key fundamental issues (logical uncertainty, inner alignment, corrigibility), and, undoubtedly, a lot of stuff I don’t know about. (Apologies if I have mischaracterised any of these organisations).
Our approach is to solve values extrapolation, which we see as comprehensive and fundamental problem, and address the other specific issues as applications of this solution (MIRI’s stuff being the main exception—values extrapolation has pretty weak connections with logical uncertainty and inner alignment).
But the different approach should be quite complementary—progress by any group should make the task easier for the others.
Comment copied to new “Stuart Armstrong” account:
Different approaches. ARC, Anthropic, and Redwood seem to be more in the “prosaic alignment” field (see eg Paul Christiano’s post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key fundamental issues (logical uncertainty, inner alignment, corrigibility), and, undoubtedly, a lot of stuff I don’t know about. (Apologies if I have mischaracterised any of these organisations).
Our approach is to solve values extrapolation, which we see as comprehensive and fundamental problem, and address the other specific issues as applications of this solution (MIRI’s stuff being the main exception—values extrapolation has pretty weak connections with logical uncertainty and inner alignment).
But the different approach should be quite complementary—progress by any group should make the task easier for the others.