Results of the Effective Altruism Outreach Survey

This article reports the results of an online survey with 167 respondents on the influence different styles of effective altruism outreach have on them. While we could not find evidence for our hypotheses, the exploratory data analysis yielded a ranking of the levels of motivation and curiosity our prompts induced. (Cross-posted from my blog.)

Topic

The aim of our survey was to determine what form of effective altruism outreach was most effective for what type of audience.

As types of outreach, we distinguished:

the “obligation style,” which aims to reveal altruistic values in people by helping them overcome biases, a style that is epitomized by Peter Singer’s Child in the Pond analogy, and
the “opportunity style,” which assumes that people are altruistic and helps them overcome biases that keep them locked in lethargy anyway, a style that is epitomized by Toby Ord’s appeal that people can save hundreds of lives over their lifetime if they invest their money wisely.

Styles that we did not investigate are the usage of humor to better convey topics that would otherwise be met by defensiveness (suggested by Rob Mather) and a style that is similar to the opportunity style but puts a stronger emphasis on personal discovery, as in Melanie Joy’s TED talk.

Such an evaluation could help any group engaged in effective altruism outreach to communicate more effectively with their respective audiences.

Our hypotheses were:

The obligation style leads to defensiveness, which would effect negative reaction at least in the short term and at least from less rationally-minded people. (If it is also more emotionally salient, later reflection might still make it more effective, but we cannot measure that.)
The opportunity style has a positive effect but only on people who already show a strong altruistic inclination.
Pitches targeted at specific demographics have a stronger effect on these people than on others.

On the exploratory side, we were also interested in the correlation between rational inclination and respondents’ trust in their intuition, and their attitude toward our prompts, as well as any correlation between respondents’ reaction to the prompts and the degree to which the prompts informed them or withheld information, as teasers do.

Since we could not find evidence of these correlations, it would be interesting to see whether others can. Additionally, there are a number of prompts that seem very powerful that we did not include (e.g., a comparison of prioritization with triage). A different sample might be more representative of the taxonomy. A qualitative study might also shed more light on the way people react to our prompts.

Design and Implementation

One of our worries was that if obligation-style prompts really make people defensive, then there is the risk of this defensiveness coloring the responses to later prompts. Hence we introduced a page break and sorted the critical prompts to the second page of the two.

The length of the prompts, especially the one’s borrowed from Peter Singer, was another problem. We slightly shortened them where possible and otherwise reduced the number of prompts from originally eight per category to five. In the interest of reducing the number of fields people have to tick, we removed a scale for how much people like a prompt, which we found dispensable.

To measure rational and experiential (intuition-related) proclivities, we relied on the Rational-Experiential Inventory (REI) with 10 prompts by Norris, Pacini, and Epstein (1998). To measure altruistic inclination, we selected 10 prompts from the Adapted Self-Report Altruism Scale by Rushton (original, 1981), Witt and Boleman (adapted version, 2009).

Just as these two scales, our prompts also relied on five-item Likert scales.

The full survey and recruitment letter can be found here.

We at first used the original Rushton scale, but after receiving 15 responses switched to the modified one, which meant turning sentences from present perfect into conditional (“I have donated blood” became “I would donate blood.”). The change is fairly localized, the responses obtained after the change greatly outnumber those obtained before, and we did not see any noticeable differences in the spread of the answers, so we decided to include the first 15 in our final analysis.

We advertised the survey on Reddit, Twitter, and Facebook, also using paid advertisement on Facebook to reach more people. Most, however, were recruited through an email a friend send to a mailing list of the Humboldt-Universität zu Berlin. We tried to counterbalance and get more people without academic background into our sample by targeting younger people on Facebook, but we only recruited only about 26 people that way (at a rate of almost €1 per person), as opposed to 85 via the mailing list.

Please contact us if you would like to play around with the raw data.

Analysis

Our R script for cleaning and analysis can be found in this Bitbucket snippet.

After a first section of type conversions and reversal of questions that were asked in the negative for validation purposes, we engaged in the controversial practice of interpreting the ordinal Likert items as interval scale to compute means. This would imply that the differences between the five options we gave are identical. We have no basis for this assumption, and the results should be taken with the appropriate absolute-scale number of grains of salt.

Apart from more cleaning, we also combined answers into categories that seem intuitive enough to us to not be motivated by the data. However, we have seen the data before deciding on the categories in all cases except for political views. The intervals used for the respondents age are not ours but intervals often used in the literature. These coarser categories allowed us to compensate for the low sample sizes per cohort.

Finally the script produces some eighty graphs.

When the analyses showed that we could find evidence for none of our hypotheses, we engaged in exploratory data analysis, the results of which are detailed in the following.

Evaluation

Full ranking

Exploratory data analysis has the inevitable drawback that in all likelihood we’ll find significant-looking correlations in our data simply by chance.

Nonetheless the overall ranking of the prompts, prompts that we asked our respondents to rate along scales of curiosity and motivation they either induced or failed to induce, has the power of our full sample size of 167 behind it, so that we’re somewhat confident that conclusions drawn about prompts close to its extreme points are valuable.

The graph above shows the distribution of respondents’ votes with the prompts described by a key where the first is a keyword that makes clear which prompt is meant, the second part is our taxonomy of whether the prompt focuses on the donor’s opportunity or moral obligation, the third part is either “info” or “teaser” depending on whether the prompt explains something or withholds information, and the fourth part indicates whether the respondent gauged their motivation or their curiosity. The first and last part are restrictive while the second and third are descriptive.

There are also some post-hoc rationalizations that make the rankings of the top prompts plausible.

The absolute top prompt in terms of motivation and curiosity is Peter Singer’s famous Child in the Pond analogy, which would probably not have made it into our survey had it not proved its persuasive power by turning Singer’s essay “Famine, Affluence, and Morality” into a seminal paper of moral philosophy well-known to philosophers worldwide.

The third place is a slightly adapted version of the sentence that Giving What We Can uses as one of their slogans, “Studies have found that top charities are up to a thousand times more effective than others,” except that the organization omits the weasel words “studies have found.” It is also a time-tested prompt.

The fourth place is an almost verbatim quote from Toby Ord’s TED talk and surely a statement that the Giving What We Can founder has honed in hundreds of conversations with potential pledge-takers: “You can save someone’s life without even changing your career.”

The final spots in the ranking can be explained as an aversive reaction to an insulting prompt. Interestingly, the rather popular prompt comparing the training of a guide dog to sight-restoring surgery ranks very low in terms of the motivation it induces.

Threats to our external validity are that we have in our sample:

3.7 times as many academics than people who only graduated school if you count as academics anyone who has visited a university or college irrespective of whether they’ve attained a degree yet,
3 times as many nonreligious than religious respondents, and a mean age of 25 (σ=7) with only two respondents over 45.

There are likely more biases that we can’t recognize.

Main Hypotheses

Scatter plots

In our data exploration, we have generated over eighty graphs that can be found in this gallery.

Based on experiences in the Less Wrong community and REG’s experiences with poker players as well as our inside view of the effective altruism movement itself, we expected to see a clear correlation between rationality and effective altruism inclination (the “all” vs. “rational” plots above).

We did not expect to see such a clear correlation with our data on the respondents’ altruism inclination (the “all” vs. “altruistic” plots above), because it tested very elementary, naive empathetic skill, which may be necessary to a degree but is otherwise unhelpful for understanding effective altruism.

Neither correlation showed. Not even the square root of the product of the two features was correlated with responses to our prompts. If these results can be taken at face value, then it seems to us that rationality and altruism may be little more than necessary conditions for becoming effective altruism, and that something else is just as necessary—maybe the principle of “taking ideas seriously,” which is common on Less Wrong, or any number of other such traits. But the results are probably more likely to be meaningless.

The strong correlations between “altruistic” and the two REI dimensions may be just artifacts of people’s different inclinations to answer Likert scales with extreme or moderate values. Surprisingly, however, the same tendency is not evident between the two REI dimensions. Perhaps they are sufficiently contradictory to offset this tendency. Please let us know when you have other explanations.

Qualitative Results

The only nonquantitative question in our survey was the one asking for comments and suggestions. A few interesting comments:

One respondent made the good point that the questions that focus on opportunities in effective altruism put the donor at the center rather than the beneficiary, something that to change is a crucial part of effective altruism.
Five respondents made suggestions that seemed to go in the opposite direction (though that is my interpretation), largely for pragmatic purposes. Two of them seemed take this position despite seeming fairly aware of the privilege of their birth.
One respondent said fairly directly that the distance of suffering was morally relevant to them.

Conclusion

While we could not find evidence for our hypotheses, we were able to generate a ranking of prompts commonly used by effective altruists of to how much motivation and curiosity they induce according to self-report. Due to biases in our sample, the external validity of these results is probably higher for populations of academics than the general population.