Dawn Drescher comments on The Future Fund’s Project Ideas Competition

Dawn Drescher 5 Mar 2022 20:15 UTC
3 points
0 ∶ 0
Research to determine what human cultures minimize the risks of major catastrophes

Great Power Relations, Values and Reflective Processes, Artificial Intelligence

I posit that human cultures differ and that there’s a chance that some cultures are more likely to punish in minor ways and more likely to adapt to new situations peacefully while other may be more likely to wage wars. This may be completely wrong.

But if it is now, we could investigate what processes can be used to foster the sort of culture that is less likely to immanentize global catastrophes, and to structure the cultural learning of future AI systems such that they also learn that culture, so that (seeming) cooperation failures between AIs are frequent and minor and really part of their bargaining process rather than infrequent and civilization-ending. It might be even more important to set a clear cultural Schelling point for AIs if some cultures play well with all other cultures but all cultures play well with themselves.

Some more detail on my inspiration for the idea (copied from my blog):

Herrmann et al. (2008) have found that in games that resemble collective prisoners dilemmas with punishment, cultures worldwide fall into different groups. Those with antisocial punishment fail to realize the gains from cooperation but two other cultures succeed: In the first (cities Boston, Copenhagen, and St. Gallen), participants cooperated at a high level from the start and used occasional punishments to keep it that way. In the second (cities Seoul, Melbourne, and Chengdu), the prior appeared to be low cooperation, but through punishment they achieved after a few rounds the same level of cooperation as the first group.

These two strategies appear to me to map (somewhat imperfectly) to the successful Tit for Tat and Pavlov strategies in iterated prisoner’s dilemmas.

Sarah Constantin writes:

In Wedekind and Milinski’s 1996 experiment with human subjects, playing an iterated prisoner’s dilemma game, a full 70% of them engaged in Pavlov-like strategies. The human Pavlovians were smarter than a pure Pavlov strategy — they eventually recognized the DefectBots and stopped cooperating with them, while a pure-Pavlov strategy never would — but, just like Pavlov, the humans kept “pushing boundaries” when unopposed.

As mentioned, I think these strategies map somewhat imperfectly to human behavior, but I feel that I can often classify the people around me as tending toward one or the other strategy.

Pavlovian behaviors:
1. Break rules until the cost to yourself from punishments exceeds the profits from the rule-breaking.
2. View rules as rights for other people to request that you stop a behavior if they disapprove of it. Then stop if anyone invokes a rule.
3. Push boundaries, the Overton window, or unwritten social rules habitually or for fun, but then take note if someone looks hurt or complains. Someone else merely looking unhappy with the situation is a form of punishment for an empathetic person. (I’m thinking of things like “sharp culture.”)
4. Don’t worry about etiquette because you expect others to give you frank feedback if they are annoyed/hurt/threatened by something you do. Don’t see it as a morally relevant mistake so long as you change your behavior in response to the feedback. (This seems to me like it might be associated with low agreeableness.)
Tit for Tat behaviors:
1. Try to anticipate the correct behavior in every situation. Feel remorse over any mistakes.
2. Attribute rule-breaking, boundary-pushing behaviors to malice.
3. Keep to very similar people to be able to anticipate the correct behaviors reliably and to avoid being exploited (if only for a short number of “rounds”).
This way of categorizing behaviors has led me to think that there are forms of both strategies that seem perfectly nice to me. In particularly, I’ve met socially astute agents who noticed that I’m a “soft culture” tit-for-tat type of person and adjusted to my interaction style. I don’t think it would make sense for an empathetic tit-for-tat agent to adjust to a Pavlovian agent in such a way, but it’s a straightforward self-modification for an empathetic Pavlovian agent.

Further, Pavlovian agents probably have a much easier time navigating areas like entrepreneurship where you’re always moving in innovative areas that don’t have any hard and fast rules yet that you could anticipate. Rather they need to be renegotiated all the time.

Pavlov also seems more time-consuming and cognitively demanding, so it may be more attractive for socially astute agents and for situations where there are likely gains to be had as compared to a tit for tat approach.

The idea is that one type of culture may be safer than another for AIs to learn from through, e.g., inverse reinforcement learning. My tentative hypothesis is that the Pavlovian culture is safer because punishments are small and routine with little risk of ideological, fanatical retributivism emerging.