Effective altruism is similar to the AI alignment problem and suffers from the same difficulties [Criticism and Red Teaming Contest entry]

This post is a contest entry for Criticism and Red Teaming Contest

EA as an AI safety case

Effective altruism is similar to the AI alignment problem in some sense: we’re looking for what would be the greatest good for humanity, and we’re searching for a way to safely and effectively implement this good.

EA considers both of these tasks to be generally solved (otherwise we would not have been doing anything at all) for the social activities of a public organization, but at the same time, completely unsolved for the AI ​​alignment.

However, EA runs into the same problems as in AI safety: “goodharting”, unforeseen consequences, wireheading (and the problem of internal conflicts). It seems surprising to me that the problems that we are trying to solve for AI, we consider already solved for the human organization – which, however, intends to solve them for AI.

Utilitarianism’s failure modes

One of these AI-misalignment-style failure of EA is the general acceptance of utilitarianism as if we know really what is good and how to measure it. The absolutizing of utilitarianism as a moral principle is subject to Goodhart’s effect: “When a measure becomes a target, it ceases to be a good measure”.

For example, if a king wants to know the welfare of his country, then it is reasonable to conduct a survey about what percentage of people who are happy, and observe how this percentage changes depending on certain government decisions. However, it is a mistake to transfer this to an individual: humans may have a life goal that implies a large number of unpleasant moments: for example, sports, parenthood, climbing Everest or winning a war. If we absolutize percentage-of-happy-people principle, then many state decisions will begin to look absurd: an unnecessary increase in the population, pouring opium into the water, refusing to procreate and declining to have an army.

Now I will list a few cases where utilitarianism fails:

The Trolley Problem in the fog

The trolley in the fog problem is the following: I see the normal trolley, but all the situation is covered in the fog, and the 5 people which I am going to save are at a larger distance, so I am seeing them less clearly than the person I am going to kill via moving the lever. After I moved the lever, it turns out that the five people were not real, but were just a bush on tracks. The lesson here is that I should count not only the number of saved people, but also my uncertainty in this number, and such uncertainty is proportional to the distance to the people who are supposed to be saved. In many cases, such a fog-discount could completely compensate for the expected gain in the number of saved people, especially if we account for typical human biases like overconfidence.

The real solution for the trolley problem, as we know from memes, is to either derail the cart by pulling the lever when only the first pair of wheels have passed the lever, or to find and stop the person who designs such experiments.

Blurring the line between remote people and possible people

In the original trolley problem all people are real but on different tracks of a trolley. In its real-life analogues, we often can’t observe the five people who will be saved, so it is a variant of the trolley in the fog problem. They often are not even born: if I invest in anti-malaria drug development, I am saving the lives of yet unborn children.

Saving unborn is great, but eventually, EA slowly blurs the line between remote people and possible people: if we have less pollution now, there will be fewer deaths in future generations from cancer, and all future people are possible, right?

But here the risk of illusions is high. The real person is real, but the future possible people are currently only in my imagination, and my imagination is likely to be affected by all kinds of biases, including selfish ones. As a result, I can start to think that I am saving thousands of people by just writing a post, and feel – I really had that feeling – a moral orgasm of my perceived extreme goodness.

An example of the bad application of the trolley-problem-like thinking: a man is sentenced to death, but he says that he will ensure the birth of five new people through surrogacy if he is freed – should we release him? The problem is that exchanging real humans for a large number of possible people gives criminals a carte blanche to do whatever they want, as long as it can be compensated later. And of course, compensation can be postponed indefinitely and, in the end, it will never happen.

The consequences of utilitarianism

In some sense consequentialism contradicts utilitarianism. If everybody becomes utilitarian, they will constantly fail into the trap of trolley-in-the-fog-like problems as most people can’t calculate the consequences of their actions.

From the utilitarian point of view, it would be better if most people were deontologists. Most people cannot correctly calculate the sum of the consequences of their actions – and therefore they will calculate the expected utility in the wrong ways, inclining the result to their own benefit. Therefore, only leaders of the country or leaders of charitable foundations should be utilitarians, and everyone will be better off if most people will follow the rules.

The value of happiness is overrated

Another way to critique utilitarianism is to see that the value of happiness is overrated: happiness is only a measure of success. It is a signal for reinforcement learning. But learning also requires pain sometimes. There is nothing wrong with small and mild pain. Only prolonged and/​or unbearable suffering is terrible because they are destructive to the individual.

Covering the universe with happy observer-moments is the same failure mode that covering with pictures of smiley faces which was once suggested as an example of false AI friendliness.

Utilitarian EA views the pain’s value as a linear function of its intensity (see specks vs. torture debate), I view it as a step function.

Linearity

One of EA failure modes is the linearity of utility functions. In real life, the personal utility of something is asymptotic: the more I get something, the less it is valuable for me, and thus after a certain amount I get enough and turn to something else. This ensures the balance of different desires and needs.

Linearity of utility can produce undesired outcomes. For example, if ten elephants can be saved for a total of $10,000 but only one rhino for the same price, then the rhinoceros will become extinct, as all investment will go on saving elephants. As a result, this will lead to a decrease in bio-diversity. But in the real-life refuge, having too many elephants is bad, they could be even hunted.

Paperclip maximiser is an example of AI which falls into the trap of linearity: the more paperclips, the higher the utility.

Future evolution is good

By measuring good through the amount of pleasure, utilitarianism ignores the human complexity and the need for future evolution. In biology, pleasure helps survival, and survival ensures the appearance of decadents and eventually the capability of the species to adapt and evolve. In other words, pain and pleasure are tiny bits of evolution.

Longtermism, astronomical waste and surviving the end of the universe

The problem of consequences of consequences. Something good may have bad consequences, but those bad consequences will lead to an even greater good. It is necessary to calculate the whole future in order to take into account all the consequences, and this is impossible. Here appeared the Butterfly effect – any action has endless consequences. In the end, we are either myopic or if we start to calculate consequences, we become longtermists.

I am longtermist, but my view on the longtermism is different. There is an idea that we should cover the whole universe with simulations full of happy human minds (escaping astronomical waste), and after we do it, we will “fulfil human endowment” and “achieve full human potential” and can happily die in the heat death of the universe. My view is that finding the ways to survive the end of the universe is more important and will eventually give us a chance to have even more happy minds. I am also skeptical that just creating an astronomical number of human minds is good (except in the case when we run resurrectional simulations).

The more humans are there, the less is the relative value of each one, and as humans are social and status-driven beings, knowing that you are very insignificant will be a strong moral burden. Even now there is more art created then I can consume by many orders of magnitude, and it is embarrassing for the creators as well as for me. I think that if people will live longer and will have time to evolve into more complex and different beings, it will be less of a problem. Also, the thought that I can have unlimited potential for evolution, but have to die to give way to other “happy minds” is painful, not only for me but for these minds too.

Where are the effects?

My friend asked me: where are effects? While EA called itself “effective”, we rarely see its effects, because the biggest effects are supposed to happen in the remote future, remote countries and be statistical. This creates a problem of feedback: we never know if we have saved some future generation. Weak feedback signals could be easily hacked by malicious egoistic agents, who just want to be paid.

EA as arbitrage of the price of life in different countries

In some sense, we can view Effective Altruism as a cost-of-life arbitrage. EA-as-giving only works if someone earns more money than they need for survival. And EA-as-giving works as long as there is a difference between the rich and the poor, and especially between rich and poor countries. That is why the most impressive examples of EA efficiency are about Africa, as saving life there is cheaper. But in order to give, you need to earn, and you can earn a lot only under capitalism. Thus, effective altruism needs a system where there is inequality and exploitation.

EA as a world government

But at the same time, EA works as if it is a wannabe world government, as it takes care of all people and future generations. Due to this, it comes into conflict with the goals of local states. It is thus unsurprising that EA tries to go into politics, as governments distribute very large amounts of money.

The market is more efficient than a gift

EA promotes gratuitous aid, but it is less effective. Mutual aid will eventually win over gratuitous aid. If I constantly help certain animals, then I waste my resource, but I do not get any resources in return. Ultimately, I won’t be able to help anymore. However, you can help those who can then help you. Such an exchange can go on longer and ultimately generate more good. For example, a foundation gives a loan to a poor person, then creates its own business, repays the loan, and it can be given back to another person.

EA’s misalignment: we forget about death

In general, this has probably already been said elsewhere – but EA misses the main values of people. These are the needs to live longer and not to die – and the secret repressed dream – the resurrection of the dead. At the same time, traditional religions are not afraid to make such promises, although only the super-technologies of the future can realize them.

That is, the focus of EA is misaligned with human values – the main need of people is not “happiness”, but not dying. A mortal being cannot be happy: the thought of death is a worm within. At the same time, it is better to redefine happiness as a harmonious eternal development in a perfect world. EA significantly ignores the badness of death.

The resurrection of the dead is good and could be cheap

EA ignores the importance of the resurrection of the dead. But there are two ways to increase the chance of resurrection cheaply. First, is plastination, which is the preservation of the brain in a chemical solution, which is better in an organizational sense than cryonics: no need for constant care and thus it should be cheaper. Second, it is life-logging as an instrument to achieve digital immortality. Both could be achieved starting from a few thousands USD per person.

There are several people who now advocating for accepting fighting aging as EA cause area, so it is not new, but still should be mentioned that a relatively small life extension (a few years) could be achieved via some simple interventions, but each year of life extension increases personal chances to survive until radical life extension technologies will be developed so each year utility is more than just a year.

EA pumps resources from near to far

EA pumps resources from near to far: to distant countries, to a distant future, to other beings. At the same time, the volume of the “far” is always greater than the volume of the near, that is, the pumping will never stop and therefore the good of the “neighbours” will never come. And this causes a deaf protest from the general public, which already feels that it has been robbed by taxes and so on.

But sometimes helping a neighbour is cheaper than helping a distant person, because we have unique knowledge and opportunities in our inner circle. For example, a person is drowning, I am standing on the shore, I threw him a life buoy, and it cost me nothing. I think (not sure but) that I personally prevented several accidents by telling cab drivers that a pedestrian is ahead etc.

If all billionaires are so for the good, then why hasn’t the problem of the homeless in San Francisco been solved yet? If this problem is particularly complex and unsolvable, then maybe other people’s problems only seem simple?

The subject of help is more important than help

We can help effectively either by improving the quality of care or by changing the subject of help. EA tries to find cheaper new subjects: people in poor countries, animals, and future generations. It is opposite to the typical human type of commitment when I care not about pain, but about a person.

A rather separate idea: Insects are more likely to be copies of each other and thus have less moral value

The number of possible states of consciousness in insects is smaller than in humans, as they presumably have a smaller field of attention. Therefore, they have less moral value, since their mental states are more likely to be copies of each other. If there are one hundred copies of one virus, then we can count it as one virus.

Now let’s take an ant. The number of possible states of consciousness in it is much less than in humans (most likely). Due to combinatorial effects, it can be large, but still, it is astronomically smaller than human states of consciousness. That is, in total, a trillion trillion… trillion states of an ant are possible, and if we create a number of ants greater than this number, then some ants will only be copies of each other. While it is unlikely that we create so many happy ants, any ant represents a much large share of all possible ants. Thus, if we want to save an equal share of all possible ants and all possible humans, we have to save one ant and billions of billions of humans. This helps to reinforce our intuition that humans have more moral value, even if animals also have moral value and it will help us not fail into an “effectiveness trap” preserving smaller and smaller animals in grand numbers.