Marketing Messages Trial for GWWC Giving Guide Campaign

The trial was run in conjunction with Josh Lewis (NYU). Thanks to David Moss and others for feedback on this post, and to Jamie Elsey for support with the Bayesian analysis.

TL;DR

Giving What We Can together with the EA Market Testing Team (EAMT) tested marketing and messaging themes on Facebook in their Effective Giving Guide Facebook Lead campaigns which ran from late November 2021 - January 2022. GWWC’s Giving Guide answers key questions about effective giving and includes the latest effective giving recommendations to teach donors how to do the most good with their donations. These were exploratory trials to identify promising strategies to recruit people for GWWC and engage people with EA more broadly.[1] We report the most interesting patterns from these trials to provide insight into which hypotheses might be worth exploring more rigorously in future (‘confirmatory analysis’) work.

Across four trials we compared the effectiveness of different types of (1) messages, (2) videos, and (3) targeted audiences. The key outcomes were (i) email addresses per dollar (when a Facebook user provides an email lead) and (ii) link clicks per dollar. Based on our analysis of 682,577 unique Facebook ‘impressions’, we found:[2]

  • The cost of an email address was as low as $8.00 across campaigns, but it seemed to vary substantially across audiences, videos, and messages.

  • The message “Only 3% of donors give based on charity effectiveness, yet the best charities can be 100x more impactful” generated more link clicks and email addresses per dollar than other messages.[3] In contrast, the message “Giving What We Can has helped 6,000+ people make a bigger impact on the causes they care about most” was less cost-effective than the other messages.

  • A ‘short video with facts about effective giving’ generated more email addresses per dollar than either (1) a long video with facts about effective giving or (2) a long video that explained how GWWC can help maximize charitable impact, the GWWC ‘brand video.’

  • On a per-dollar basis ‘Animal’ audiences that were given animal-related cause videos performed among the best, both overall and in the most comparable trials. ‘Lookalike’ audiences (those with a similar profile as current people engaging with GWWC) performed best overall, for both cause and non-cause videos.[4] However, ‘Climate’ and ‘Global Poverty’ audiences basically underperformed the ‘Philanthropy’ audience when presented videos ‘for their own causes.’ The Animal-related cause video performed particularly poorly on the ‘Philanthropy’ audience.

  • Demographics were mostly not predictive of email addresses per dollar nor link clicks per dollar

  • See our Quarto dynamic document[5] linked here for more details, and ongoing analyses.

Purpose and Interpretation of this Report

One of the primary goals of the EAMT is to identify the most effective, scalable strategies for marketing EA. Our main approach is to test marketing and messaging themes in naturally-occurring settings (such as advertising campaigns on Facebook, YouTube, etc.), targeting large audiences, to determine which specific strategies work best in the most relevant contexts. In this report, we share key patterns and insights about the effectiveness of different marketing and messaging approaches used in GWWC’s Effective Giving Guide Facebook Lead campaigns. The patterns we share here serve as a starting point to consider themes and hypotheses to test more rigorously in our ongoing research project.

We are hoping for feedback and suggestions from the EA community on these trials and their implementation and analysis. We continue to conduct detailed analyses of this data.[6] We’d like to get ideas from the community about how to improve trials like these and what other analyses would be informative. Additionally, we hope this report will give other groups ideas for messaging themes to try and tools and processes for starting and analyzing marketing campaigns.[7] Finally, we seek to keep the community abreast of what we (at EA Market Testing and at GWWC) are up to (see the EA Market Testing Gitbook page for more details and resources).

Research questions

In these trials, we aimed to test two different approaches to messaging– (1) presenting facts about effective giving and (2) presenting cause-focused messages–in order to get people to provide their email to download GWWC’s Giving Guide, which also subscribed them to GWWC’s email list. We tested (i) six short animated videos with facts about effective giving or a focus on specific cause areas and (ii) seven messages displayed with the videos on (iii) different segments: a general Facebook audience, audiences based on specific interests (animal rights, climate change, poverty, philanthropy), and a lookalike audience (people with a similar profile as current people engaging with GWWC).

Content

There were two dimensions of treatment content: (1) the text displayed above the videos and (2) the video ad’s theme and content.

Text

  1. Bigger difference next year: Want to make a bigger difference next year? Start with our Effective Giving Guide and learn how to make a remarkable impact just by carefully choosing the charities you give to.

  2. 100x impact: Did you know that the best charities can have a 100x greater impact? Download our free Effective Giving Guide for the best tips on doing the most good this holiday season.

  3. 6000 people: Giving What We Can has helped 6,000+ people make a bigger impact on the causes they care about most. Download our free guide and learn how you can do the same.

  4. Cause list: Whether we’re moved by animal welfare, the climate crisis, or worldwide humanitarian efforts, our community is united by one thing: making the biggest impact we can. Make a bigger difference in the world through charitable giving. Start by downloading our Effective Giving Guide. You’ll learn how to approach charity research and smart giving. And be sure to share it with others who care about making a greater impact on the causes closest to their hearts.

  5. Learn: Use our free guide to learn how to make a bigger impact on the causes you care about most.

  6. Only 3% research: Only 3% of donors give based on charity effectiveness yet the best charities can be 100x more impactful. That’s incredible! Check out the Effective Giving Guide 2021. It’ll help you find the most impactful charities across a range of causes.

  7. Overwhelming: It can be overwhelming with so many problems in the world. Fortunately, we can do *a lot* to help, if we give effectively. Check out the Effective Giving Guide 2021. It’ll help you find the most impactful charities across a range of causes.

Video ads

Facts about effective giving

  1. Charity research facts short video (8 seconds): Only 3% of donors research charity effectiveness, yet the best charities can 100x your impact, learn how to give effectively

  2. Charity research facts long video (22 seconds): Trivial things we search (shows someone searching how to do Gangnam style), things we should research (shows someone searching how to donate effectively), only 3% of donors research charity effectiveness, yet the best charities can 100x your impact, learn how to give effectively.

Cause-focus

  1. Climate change (cause focus video) (15 seconds): Care about climate change? You don’t have to renounce all your possessions, But you could give to effective environmental charities, Learn how to maximize your charitable impact, Download the Effective Giving Guide

  2. Animal welfare (cause focus video) (16 seconds): Care about animals? You don’t have to adopt 100 cats, But you could give to effective animal charities, Learn how to maximize your charitable impact, Download the Effective Giving Guide

  3. Poverty (cause focus video) (16 seconds): Want to help reduce global poverty? You don’t have to build a village, But you could give to effective global development charities, Learn how to maximize your charitable impact, Download the Effective Giving Guide

Arguments, rich content from brand video

  1. Brand video (1 min 22 seconds): Animated and voiceover video that explains how GWWC can help maximize charitable impact (support, community, and information) and the problems GWWC addresses (good intentions don’t always produce the desired outcomes, there are millions of charities that have varying degrees of impact and some can even cause harm). Call to action: Check out givingwhatwecan.org to learn how you can become an effective giver.

Outcome measures

Outcome measures used in our analysis of the messages and videos were (1) email addresses per dollar and (2) link clicks per dollar. When we say ‘results’ below, we refer to these outcome measures. Other measures collected were amount spent, cost per impression, cost per link click, link click-through rate, and 3-second video plays.

The cost was determined in the Facebook ad auction for each impression on a per-result basis. However, the ‘cost-per-result’ is determined in part by factoring in the likelihood of getting the result. Results cost less if you are expected to get more results, so ultimately, marketers are paying for the value of impressions.[8] Generally, when we talk about a segment being ‘expensive’ we mean that it’s expensive to serve them an impression—not that it’s necessarily expensive on a cost-per-result basis.

Analysis: data collection and caveats

Facebook allows you to generate pivot tables showing the number of results for different segments. We used these pivot tables to generate datasets with a row for each impression and downloaded these into our repo for further analysis.[9] We chose to treat each unique user impression (aka ‘reach’) as an independent observation and each result or link click as if it came from a different ‘reach.’ This enabled us to do more sophisticated statistical analyses than those available on the Facebook ads platform. Still, our approach has its limitations. Facebook only gives the total number of results of all people in a segment who saw an ad, so we do not know if the same person contributes more than one result to the total. (However, in this context it seems unlikely that many people would give their email to sign up for giving guides more than once.)

Divergent delivery: Facebook does not facilitate randomized (or ‘balanced’) assignment. Unlike in standard experiments and RCTs, Facebook ads do not ensure that each participant has the same chance of ending up in each condition, nor does it allow an easy way to re-weight for imbalance. Specifically, Facebook’s algorithm will always try to assign a particular content variation to people it thinks will be most receptive to that specific variant. All our results may come from…

  • ‘how easy it is for Facebook’s algorithm to find the most receptive people within a given audience to target a particular message’ rather than

  • how effective the message or receptive the audience was generally.

Thus, given these idiosyncrasies, our results may not generalize much beyond Facebook, nor even to an evolving Facebook environment. (We discuss this more fully in our Gitbook here, where we maintain and update a knowledge base on these design and implementation issues.)

All analyses were conducted on an exploratory, non-preregistered basis. We had no prior hypotheses before conducting the research. The results we are highlighting were not findings predicted in advance. We made many comparisons between different segments, most of which did not show any substantial interesting difference. Although we did conduct statistical tests, we don’t report them in detail here we do this for brevity and because of the limited potential for generalizable causal inference, given the caveats above. We will report a richer statistical analysis and summary, along with a complete pipeline of code, data, and results, in the transparent ‘Quarto’ dynamic document HERE, where we are working on a Bayesian ‘decision relevant’ approach. (See, e.g., plots here and here.) More information about treatment assignment by campaign is explained in the Appendix to this post.

Full results

Note: We include charts that depict ‘results’ (link clicks or emails) as well as ‘results per dollar.’ [10]

Demographics

  • Women and older people were more responsive to the ads (with more results per impression) but also more expensive to target – which means the results approximately balanced out on a ‘cost per email’ or ‘cost per link click’ basis. There is some evidence of interaction effects; E.g., the 45-54 age group appears to contain particularly uninterested men, even on a cost-per-result basis.

  • People aged 65+ click more, both per impression and per ad dollar spent

  • Caveats: The graphs below results pool across earlier and later campaigns[11]; the latter only included ages 18-44. However, the above results basically continue to hold (see Quarto here) when we separate these groups of campaigns; with the additional result that the 18-24 age group was particularly unpromising in the earlier trials.

Audiences

We defined the following audiences. Within each of these groups, Facebook chose which ad content to allocate according to its maximizing algorithm.[12]

  • Lookalike: An audience that we set by telling Facebook to identify people whose characteristics resembled GWWC’s ‘core audience’, i.e., who resembled pledgers, ‘try givers’, or people who liked GWWC’s page.

  • Animal: People interested in animal rights (according to Facebook interests[13])

  • Climate Change: People interested in climate change (according to Facebook interests)

  • Poverty: People interested in global poverty (according to Facebook interests)

  • Philanthropy: People interested in philanthropy (according to Facebook interests)

  • General: All Facebook users

Results

  • Predictably, the ‘Lookalike’ audience was the most responsive.[14]

  • On a per-dollar basis, ‘Animal’ audiences given animal-related cause videos performed among the best, both overall and in the most comparable trials.

  • The ‘Climate’ audience was highly engaged (in terms of link clicks), but this didn’t translate into more results per impression (or per cost) than for the ‘Poverty’ or ‘Philanthropy’ audiences.

When ‘Climate’ and ‘Global Poverty’ audiences were presented videos ‘for their own causes’, they performed worse than the ‘Philanthropy’ audience when presented these same videos.[15]

Aggregated across all trials, when presented with the Climate videos, ‘Climate’ audiences underperformed the ‘Philanthropy’ audience. In the most comparable trial (trial 4), they similarly underperformed both the ‘Philanthropy’ and ‘General’ audiences for this video. Aggregated across trials, the ‘Global Poverty’ audience appears to have done OK on with the Global Poverty related video. However, in the most comparable (4th) trial they did so poorly on this video that Facebook stopped administering it to them!

Texts (messages)

  • “Only 3% of donors give based on charity effectiveness, yet the best charities can be 100x more impactful” performed better than the other messages. However, there was some heterogeneity by audience, with this message doing poorly on ‘Poverty’, ‘Lookalikes’, and ‘Climate’ audiences in a comparable trial; see tables in Quarto here.

  • “Giving What We Can has helped 6,000+ people make a bigger impact on the causes they care about most” was the least effective message.

As reported in the Quarto, some messages show strong heterogeneity across audiences, while others were fairly consistent. The second best overall message, ‘100x impact’ did adequately on all audiences. ‘Learn’ showed some variation, doing pretty well on ‘Poverty’ and ‘Lookalike’ audiences, OK on ‘Animal’ audiences, but poorly on the ‘General’, ‘Philanthropy’, and ‘Climate’ audiences. The ’6000+ people’ and ‘Bigger Difference’ messages performed poorly on nearly all audiences, doing at best OK on a few audiences.

In the Quarto, we also graph HDI intervals (somewhat like confidence intervals) for the ‘results per unique impression’ by message. These intervals appear extremely narrow; the HDI’s do not overlap.[16] As each message is delivered at very nearly the same cost per impression (1.5-1.6 cents), the differences in ‘results per cost’ closely reflect the differences in ‘results per 1k impressions’.

Videos

  1. The ‘factual long’ video and the ‘brand video’ video were the least effective. All other videos were about the same.

  2. The ‘factual long’ video was served to expensive people. It is possible that only more expensive people had the attention for it or it was better received by older people (who are more expensive).

  3. The animal video was also sent to relatively expensive people – it was more targeted at women.

  4. On a per-dollar basis ‘Animal’ audiences that were given animal-related cause videos performed among the best, both overall and in the most comparable trials. However, ‘Climate’ and ‘Global Poverty’ audiences underperformed the ‘Philanthropy’ audience when presented videos ‘for their own causes.’ [17]

Conclusions from video and age breakdown

  • The ‘factual short’ seems to be the best video for older people

  • The poverty video seems to be less interesting to older people

  • The climate video produces more engagement from older people (in terms of clicks) but doesn’t translate to more emails (a common theme with the climate video)

Conclusions about Video Results by Audience

  • The brand video seemed to be served to the cheaper people in the ‘Philanthropy’ audience but otherwise seemed to do as poorly as the ‘factual long’ video.

  • For the ‘Philanthropy’ audience (in the most comparable trials: see Quarto here) the Animal video performed extremely poorly, and the Climate video performed rather well, especially on a per-cost basis.

  • Otherwise, for each audience, the brand and factual long videos both tended to do poorly.

  • ‘Climate’ and ‘Global Poverty’ audiences basically underperformed the ‘Philanthropy’ audience when presented videos ‘for their own causes.’

Next steps

We are looking for collaborators with particular interests and expertise to help us design and analyze future campaigns as well as contribute to the ongoing analyses of trials we have completed (including the one discussed in this report). We are particularly interested in: adaptive trial designs to maximize ‘value of information’, Bayesian and robust simulation-based statistical analysis, meta-analysis and mixed models, social media advertising (e.g., Facebook ‘pixels’) and web analytics, and open data pipelines and visualizations (especially in R and Quarto). If you are interested in working with the EA Market Testing Team on this project or on similar research projects, please reach out to David Reinstein at dreinstein@rethinkpriorities.org.

In the future, we intend to test questions generated from these trials. For example:

  • Is the success of the “only 3% of donors” message being driven by the desire to be part of a small group of like-minded people?

  • Does the effectiveness of short factual messaging generalize to other campaigns?

  • Are audiences interested in animal welfare particularly promising?

  • Do our results (for clicks and email signups) carry over to more impactful outcomes?

Appendix: Treatment assignment/​campaigns

Video content was manipulated across three ‘split tests’. (For a breakdown of treatment assignments by campaign, date, etc., see the tables in the dynamic document here.)

  • Test 1 (Nov 30, 2021 – Dec 8, 2021) displayed either the long factual video or a cause focus video. In the cause focus condition, cause-specific audiences for animal rights, climate change, and poverty (based on their behavior on Facebook) were shown the relevant cause video.

  • Test 2 (Dec 8 − 20, 2021) was the same as Test 1 but used the short factual video instead of the cause-focus videos.

  • Test 3 (Dec 23, 2021 - Jan 4, 2022) was the same as Test 2 but it had a new version of the videos, and it was restricted to ages 18-44.[18]

  • Test 4 (Jan 7 - Jan 18, 2022: The brand video was displayed in a separate ‘brand video’ campaign which was tested against another campaign that allowed the algorithm to optimize between the short factual and cause-focus videos (although not allowing each cause-specific audience to see the ads for other cause areas).

In all tests, the treatment assignment (which text and video were displayed to which of the users within the chosen audience) was determined by Facebook’s algorithm. In the split test, the amount that each experimental condition was displayed[19] was set to equalize the total cost across conditions. The other features (e.g., ‘what cause video to show’ or ‘how to introduce each video’) were determined by Facebook’s algorithm to optimize the rate of people leaving their emails (thus these were not balanced). None of the treatments were fully randomly assigned.[20]

The videos were adapted across the trials as we learned. First, we updated the factual video to be shorter for Trial 2, and then we tried videos of Luke holding up signs spelling out the voiceover in Trial 3 for all videos. In many of our analyses, we pooled data across trials, yielding greater statistical power.

  1. ^

    Learn more about GWWC’s efforts to identify strategies to engage people with EA by watching Grace’s talk at EAGxOxford 2022, “What we’re learning about spreading EA ideas.”

  2. ^

    These results are largely based on simple comparisons: we continue the careful statistical analysis of this trial in the Quarto dynamic document linked here, along with the analysis of other EAMT-linked trials.

  3. ^

    Although there was some heterogeneity by audience, with this message doing poorly on ‘Poverty’, ‘Lookalikes’, and ‘Climate’ audiences in a comparable trial.

  4. ^

    However, our confidence/​credible intervals (reported in Quatro) are wider for this smaller group.

  5. ^

    ‘Dynamic documents’ combine code, text, and results, making it clear exactly how the results are produced.

  6. ^

    We continue and extend the presentation and statistical analysis of this trial in the Quarto dynamic document linked here, along with other chapters analyzing our other EAMT-linked trials. (This is publicly hosted on the Github repo here, which contains all code and data for this project). We are eager to have you engage with that resource. You can leave comments on the Quarto with the embedded ‘Hypothesis’ tool. You can also engage with the Github, and reach out to us if you want to get further involved in this and other analysis and reporting.

  7. ^

    Also see the resources on ‘implementing ads, messages, and designs’ that we are building in the public Gitbook here.

  8. ^

    You can read more about how Facebook’s ad auction works here.

  9. ^

    See our walk-through on how to extract such data from Facebook here. (You can also import Facebook ad data into R directly via an API and helper tools; we intend to do this in our later work.)

  10. ^

    We recognize (as noted above) these charts do not contain confidence/​credible intervals, statistical tests, or metrics like ‘probability of superiority’. This presents particular challenges in this context; we have some preliminary analysis of this (as well as forest and ridge plots plots) in the Quarto, e.g., here and here.)

  11. ^

    More information about treatment assignment by campaign is explained in the Appendix to this post.

  12. ^

    We also set a ‘Retargeting’ audience, consisting of people who had already been on GWWC’s website. This was very small and obviously more promising than other audiences, so we do not include this audience in the analysis.

  13. ^

    Facebook interests are determined by pages the user ‘likes’ and the content they engage with on Facebook as well as off the platform. Facebook’s definition of ‘being interested’ is very broad and it might be different from what one would perceive as ‘being interested in something’ in the real world.

  14. ^

    ‘Lookalike’ audiences performed best overall, for both cause and non-cause videos. However, our confidence/​credible intervals are wider for this smaller group. See Quarto for more details, e.g., the table ‘Results by audience; cause vs non-cause (and overall)’.

  15. ^

    See the last table in the Quarto (go here and scroll to bottom); note that the ‘Climate’ audience did worse than the ‘Philanthropy’ audience when both saw climate videos, while the global poverty and philanthropy audience performed similarly. The table here focuses on the most comparable trials. If you filter by “video_theme = poverty” or “=climate” you see the results discussed below, suggesting each of these cause audiences underperformed the ‘Philanthropy’ audiences, even when presented their ‘own cause’s videos.’

  16. ^

    Are these ‘real differences in the population?’ This is subject to other caveats about divergent delivery etc.

  17. ^

    As discussed in our ‘Audiences’ section … aggregated across all trials, when presented the Climate videos, ‘Climate’ audiences underperformed the ‘Philanthropy’ audience. In the most comparable trial (trial 4), they similarly underperformed both the ‘Philanthropy’ and ‘General’ audiences for this video. Aggregated across trials, the ‘Global Poverty’ audience appears to have done OK on this video. However, in the most comparable (4th) trial they did so poorly on this video that Facebook stopped administering it to them!

  18. ^

    The original videos in Tests 1 and 2 and the new videos in Test 3 used the same script; The videos in Tests 1 and 2 were animations. In Test 3, the videos had a voiceover instead and Luke held up signs with the words of the script.

  19. ^

    I.e., factual vs. cause focus in test 1-3 and brand video vs. factual and cause based in test 4

  20. ^

    See our comments about random assignment in the data collection and caveats section.