Precisely. And supporting subsidized contraception is a long way away from both the formal definition of eugenics and its common understanding.
Stuart Armstrong
I feel that saying “subsidized contraception is not eugenics” is rhetorically better and more accurate than this approach.
Ah, you made the same point I did, but better :-)
>Most people endorse some form of ‘eugenics’
No, they don’t. It is akin to saying “most people endorse some form of ‘communism’.” We can point to a lot of overlap between theoretical communism and values that most people endorse; this doesn’t mean that people endorse communism. That’s because communism covers a lot more stuff, including a lot of historical examples and some related atrocities. Eugenics similarly covers a lot of historical examples, including some atrocities (not only in fascist countries), and this is what the term means to most people—and hence, in practice, what the term means.
Many people endorse screening embryos for genetic abnormalities. The same people would respond angrily if you said they endorsed eugenics; the same way that people who endorse minimum wages would respond angrily if you said they endorsed communism. Eugenics is evil because it descriptively describes something evil; trying to force it into some other technical meaning is incorrect.
Thanks! Should be corrected now.
Thanks, that makes sense.
I’ve been aware of those kind of issues; what I’m hoping is that we can get a framework to include these subtleties automatically (eg by having the AI learn them from observations or from human published papers) without having to put it all in by hand ourselves.
Hey there! It is a risk, but the reward is great :-)
Value extrapolation makes most other AI safety approaches easier (eg interpretability, distillation and amplification, low impact...). Many of these methods also make value extrapolation easier (eg interpretability, logical uncertainty,...). So I’d say the contribution is superlinear—solving 10% of AI safety our way will give us more than 10% progress.
I think it already has reframed AI safety from “align AI to the actual (but idealised) human values” to “have an AI construct values that are reasonable extensions of human values”.
Can you be more specific here, with examples from those fields?
I see value extrapolation as including almost all my previous ideas—it would be much easier to incorporate model fragments into our value function, if we have decent value extrapolation.
An AI that is aware that value is fragile will behave in a much more cautious way. This gives a different dynamic to the extrapolation process.
Nothing much to add to the other post.
Imagine that you try to explain to a potential superintelligence that we want it to preserve a world with happy people in it by showing it videos of happy people. It might conclude that it should make people happy. Or it might conclude that we want more videos of happy people. The latter is more compatible with the training that we have given it. The AI will be safer if it hypothesizes that we may have meant the former, despite having given it evidence more compatible with the latter, and pursues both goals rather than merely the latter. This is what we are working towards.
Value alignment. Good communication and collaboration skills. Machine learning skills. Smart, reliable, and creative. Good at research. At present we are looking for a Principal ML Engineer and other senior roles.
The ability to move quickly from theory to model to testing the model and back
Most of the alignment research pursued by other EA groups (eg Anthropic, Redwood, ARC, MIRI, the FHI,...) would be useful to us if successful (and vice versa: our research would be useful for them). Progress in inner alignment, logical uncertainty, and interpretability is always good.
Fast increase in AI capabilities might result in a superintelligence before our work is ready. If the top algorithms become less interpretable than they are today, this might make our work harder.
Whole brain emulations would change things in ways that are hard to predict, and could make our approach either less or more successful.
A problem here is that values that are instrumentally useful, can become terminal values that humans value for their own sake.
For example, equality under the law is very useful in many societies, especially modern capitalistic ones; but a lot of people (me included) feel it has strong intrinsic value. In more traditional and low-trust societies, the tradition of hospitality is necessary for trade and other exchanges; yet people come to really value it for its own sake. Family love is evolutionarily adaptive, yet also something we value.
So just because some value has developed from a suboptimal system does not mean that it isn’t worth keeping.
Nick Bostrom’s “Superintelligence” is an older book, but still a good overview. Stuart Russell’s “Human Compatible” is a more modern take. I touch upon some of the main issues in my talk here. Paul Christiano’s excellent “What Failure Looks Like” tackles the argument from another angle.
We’re Aligned AI, AMA
Comment copied to new “Stuart Armstrong” account:
Different approaches. ARC, Anthropic, and Redwood seem to be more in the “prosaic alignment” field (see eg Paul Christiano’s post on that). ARC seems to be focusing on eliciting latent knowledge (getting human relevant information out of the AI that the AI knows but has no reason to inform us of). Redwood is aligning text-based systems and hoping to scale up. Anthropic is looking at a lot of interlocking smaller problems that will (hopefully) be of general use for alignment. MIRI seems to focus on some key fundamental issues (logical uncertainty, inner alignment, corrigibility), and, undoubtedly, a lot of stuff I don’t know about. (Apologies if I have mischaracterised any of these organisations).
Our approach is to solve values extrapolation, which we see as comprehensive and fundamental problem, and address the other specific issues as applications of this solution (MIRI’s stuff being the main exception—values extrapolation has pretty weak connections with logical uncertainty and inner alignment).
But the different approach should be quite complementary—progress by any group should make the task easier for the others.
Comment copied to new “Stuart Armstrong” account:
Interesting! And nice to see ADT make an appearance ^_^
I want to point to where ADT+total utilitarianism diverges from SIA. Basically, SIA has no problem with extreme “Goldilocks” theories—theories that imply that only worlds almost exactly like the Earth have inhabitants. These theories are a priori unlikely (complexity penalty) but SIA is fine with them (if is “only the Earth has life, but has it with certainty”, while is “every planet has life with probability”, then SIA loves twice as much as ).
ADT+total ut, however, cares about agents that reason similarly to us, even if they don’t evolve in exactly the same circumstances. So weights much more than for that theory.
This may be relevant to further developments of the argument.
I argue that it’s entirely the truth, the way that the term is used and understood.