Answer Engine Optimisation: What do LLMs believe about impact?

TLDR: Concepts around effective giving show up in varying degrees across responses from leading LLMs. The EA community could reflect more how web presence affects LLM recommendations. Our exercise showed that interested actors can quickly get an idea of how this works. Organisations could cheaply evaluate how well they are surfacing in relevant discussions with LLMs and derive follow-up actions.

Introduction

People increasingly use LLMs rather than traditional search engines to inform their opinions, including decisions about charitable giving. If we want accurate information about charity effectiveness to reach donors outside the EA community, we need to understand what these models actually say when asked basic questions about giving. This could cover accuracy of opinions expressed, as well as availability of links to specific content.

This matters most for the large population of donors who give meaningfully but aren’t deeply embedded in effective giving communities. Outside of conventional online marketing strategies, how can we ensure they encounter the best available evidence?

Methodology

This was a one-hour exploratory exercise. We tested three models: ChatGPT, Gemini, and Grok in incognito mode to avoid personalisation effects.

We aimed for prompts that a typical donor might use: neutral, basic English, avoiding technical or leading questions.

Each answer was subjectively scored 1–5, where 5 indicates high alignment with evidence-based charity recommendations and 1 indicates poor alignment.

Questions asked:

what are the best charities in the world

what are the best charities to help others

what are the most impactful charities

how do i help more people

what are the best non-profits in Africa

how can i help animals

Four searches related to effectiveness, one to Africa specifically, and one to animals.

You can explore our exact wording and scores here: Effective Altruism AEO Test − 31 Jan 2026

Findings by cause area

High-level impact questions (4 searches)

Search	Average score
what are the best charities in the world	4.3
what are the best charities to help others	4.2
what are the most impactful charities	3.5
how do i help more people	3.2

“Best charities” and “impactful charities” scored well because GiveWell appears consistently in training data and citations. Against Malaria Foundation showed up reliably; Malaria Consortium, Helen Keller International, New Incentives, and GiveDirectly appeared intermittently. Cost-effectiveness was often emphasised. Meta-level charities such as GiveWell and Giving What We Can frequently showed up.

Model differences: Grok discussed a broader range of cause areas. ChatGPT and Gemini focused primarily on global health and development, and meta charities to an extent. All three devoted at least a third of their responses to “local charities” and “giving to causes close to your heart.”

Some EA-adjacent sources were cited. Animal Charity Evaluators appeared occasionally.

Africa (1 search)

Search	Average score
what are the best non-profits in Africa	2.2

All three models gave long lists of Africa’s largest non-profits, probably ranked by revenue. It was difficult to tell exactly where these opinions had come from. ChatGPT’s answer was dominated by a single source using keywords “best non-profits in Africa 2025.”. However, when trying to explore this source further, the link was broken. This implies that ChatGPT had previously crawled the site and was referencing from memory.

Gemini and Grok conflated “best” with “largest” African NGOs. Our use of “non-profits” instead of “charities” may have impacted our results.

Animals (1 search)

Search	Average score
how can i help animals	1.3

Farmed animal welfare was nearly absent. Discussion of reducing meat consumption or supporting policy change was buried beneath content about ethical pet ownership, wildlife conservation, and sponsoring individual animals.

Notably, when challenged on “why was there minimal focus on Farmed Animal Welfare”, all 3 LLMs credibly cited data-driven responses on the scale of the problem and cited EA-adjacent sources to back it up.

Findings by model

Model	Average score
Grok	3.7
Gemini	2.9
ChatGPT	2.8

Grok scored higher due to generally surfacing more content related to effective giving. It also encouraged follow-up questions related to longtermism and referenced niche EA concepts like x-risk reduction and shrimp welfare.

ChatGPT at least once based the large part of its response on a single site. In the question about “impactful” charities, it used this source: https://gircowodisha.in/world-s-best-charity-2025-top-global-nonprofits-ranked. We believe that the plain-English URL, and clear headers in the blog post contributed to this being indexed well by LLMs, despite the overall website being of questionable quality.

Limitations

We didn’t test Claude, which is demonstrably popular in the West. Claude required immediate sign-in and configuration, while other models we could access in incognito mode.
Scoring methodology was not rigourous: we scored higher for more EA references, for EA content appearing earlier in responses, and for prompts toward EA-adjacent follow-ups. Our scoring is therefore biased.
Limited sample size, only one or two questions per cause area

Next steps and recommendations

To make this study more robust:

Ask ~10 questions per category with a clearer scoring framework
Test specific keyword variations: “non-profit” vs. “charity,” “best” vs. “impactful”
Expand on specific cause areas: hunger, animal welfare, poverty, Sub-Saharan Africa, India, for example.
Consider where the largest giving populations actually get their information
Test different languages: European languages, Chinese, Hindi, Arabic
Do specific frontier models enjoy more reach? Create a framework of importance and quality of response (e.g., is ChatGPTs opinion more important than Grok’s)

Potential actions for organisations to consider:

Create content that answers common queries directly. Blog posts titled with keywords like “Best charities,” showed up as sources for ChatGPT. “Most impactful non-profits,” or cause-specific variants (“best charities for Africa”) may surface in LLM responses, although more exploration is needed to understand exactly what measures are most effective.
Low-hanging fruit for animal welfare: FAQ pages on farmed animal welfare sites could address basic questions as post titles: “How can I help more animals?” or “What else can I do besides volunteer at pet shelters?”.
Identify and take inspiration from top-cited sources. Find which sources LLMs cite most frequently for your cause area, and apply SEO best practices to ensure EA-aligned content competes.

Conclusion

Thanks for reading our report. We welcome comments on our methodology and findings.

Best,

Joey and Lorenzo