Comment copied to new “Stuart Armstrong” account:
Different approaches. ARC, Anthropic, and Redwood seem to be more in the “prosaic alignment” field (see eg Paul Christiano’s post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key fundamental issues (logical uncertainty, inner alignment, corrigibility), and, undoubtedly, a lot of stuff I don’t know about. (Apologies if I have mischaracterised any of these organisations).
Our approach is to solve values extrapolation, which we see as comprehensive and fundamental problem, and address the other specific issues as applications of this solution (MIRI’s stuff being the main exception—values extrapolation has pretty weak connections with logical uncertainty and inner alignment).
But the different approach should be quite complementary—progress by any group should make the task easier for the others.
Comment copied to new “Stuart Armstrong” account:
Interesting! And nice to see ADT make an appearance ^_^
I want to point to where ADT+total utilitarianism diverges from SIA. Basically, SIA has no problem with extreme “Goldilocks” theories—theories that imply that only worlds almost exactly like the Earth have inhabitants. These theories are a priori unlikely (complexity penalty) but SIA is fine with them (if h1 is “only the Earth has life, but has it with certainty”, while h2 is “every planet has life with 50% probability”, then SIA loves h1 twice as much as h2).
ADT+total ut, however, cares about agents that reason similarly to us, even if they don’t evolve in exactly the same circumstances. So h2 weights much more than h1 for that theory.
This may be relevant to further developments of the argument.