Do you have instincts or perhaps even analysis about what interventions to expand “good” values look like? I am interested as my interest in values was how I came into EA thinking to begin with and have since thought more and more that it is too large a task to tackle.
Thanks for the question, and sharing you story! I do not think I have great insights here, but I can at least share some relevant resources (you may well be aware of them already, but they could still be useful to other readers):
I know of the Sentience Institute but my feeling is that they are more about research and less about actually going out and spreading positive values.
I have the impression the field is still quite nascent, and share your sense that the above organisations are mostly doing research. CLR’s guide has a section on approaches to s-risk reduction, but it seems to point towards further investigation, as opposed to specific interventions. Cooperative AI was what I found in the guide which seemed more like an intervention with direct applications, but it is targeted at improving cooperation among advanced AI models, and you may be looking for something more broad.
Maybe concern for the suffering of factory-farmed animals is a decent way to promote positive values, but I have not thought much about this. I mostly think the best animal welfare interventions are a super cost-effective way of decreasing nearterm suffering.
It would be useful to ensure that frontier AI models have good values. So I have wondered about whether people at frontier AI labs (namely OpenAI, Anthropic, and Deepmind) and organisations like Arc Evals should be running a few tests to see assess:
The underlying morality of LLMs, like figuring out whether they support the total or average view, expected value maximisation with or without risk aversion, which theory of wellbeing, and how they answer thought experiments like the repugnant conclusion or the trolley problem.
Their stance on real world problems. For example, what are the views of LLMs on factory farming, wild animal suffering, extreme poverty, high global catastrophic risks, current wars, and digital sentience?
I have not checked whether people are looking into this, but it seems worth it as a more empirical type of intervention to influence future values. Then maybe the models could be (partly) aligned based on their views on questions like the above.
I really like your suggestion on just plain AI value alignment perhaps being the most effective. In a sense, even though these are perilous times we do perhaps have the opportunity to massively impact the values of millions of machine intelligences so that even though humans stick with pretty much the same values (something I perceive being really hard to change), we will “improve” the average values globally. Thanks for your thoughtful response—it certainly have made me see this issue from a new perspective!
Hi Ulrik,
Thanks for the question, and sharing you story! I do not think I have great insights here, but I can at least share some relevant resources (you may well be aware of them already, but they could still be useful to other readers):
80,000 Hours’ profiles on s-risks and promoting positive values.
CLR’s beginner’s guide to reducing s-risks. Maybe people at @Center on Long-Term Risk would be keep to expand the above 80,000 Hours’ profiles, which as of now are quite short?
Websites of organisations working in the area:
Center on Long-Term Risk.
Centre for Reducing Suffering.
Sentience Institute.
I have the impression the field is still quite nascent, and share your sense that the above organisations are mostly doing research. CLR’s guide has a section on approaches to s-risk reduction, but it seems to point towards further investigation, as opposed to specific interventions. Cooperative AI was what I found in the guide which seemed more like an intervention with direct applications, but it is targeted at improving cooperation among advanced AI models, and you may be looking for something more broad.
Maybe concern for the suffering of factory-farmed animals is a decent way to promote positive values, but I have not thought much about this. I mostly think the best animal welfare interventions are a super cost-effective way of decreasing nearterm suffering.
It would be useful to ensure that frontier AI models have good values. So I have wondered about whether people at frontier AI labs (namely OpenAI, Anthropic, and Deepmind) and organisations like Arc Evals should be running a few tests to see assess:
The underlying morality of LLMs, like figuring out whether they support the total or average view, expected value maximisation with or without risk aversion, which theory of wellbeing, and how they answer thought experiments like the repugnant conclusion or the trolley problem.
Their stance on real world problems. For example, what are the views of LLMs on factory farming, wild animal suffering, extreme poverty, high global catastrophic risks, current wars, and digital sentience?
I have not checked whether people are looking into this, but it seems worth it as a more empirical type of intervention to influence future values. Then maybe the models could be (partly) aligned based on their views on questions like the above.
I really like your suggestion on just plain AI value alignment perhaps being the most effective. In a sense, even though these are perilous times we do perhaps have the opportunity to massively impact the values of millions of machine intelligences so that even though humans stick with pretty much the same values (something I perceive being really hard to change), we will “improve” the average values globally. Thanks for your thoughtful response—it certainly have made me see this issue from a new perspective!
Thanks for the kind words, and also for clarifying my own views!