Answer Engine Optimisation: What do LLMs believe about impact?
TLDR: Concepts around effective giving show up in varying degrees across responses from leading LLMs. The EA community could reflect more how web presence affects LLM recommendations. Our exercise showed that interested actors can quickly get an idea of how this works. Organisations could cheaply evaluate how well they are surfacing in relevant discussions with LLMs and derive follow-up actions.
Introduction
People increasingly use LLMs rather than traditional search engines to inform their opinions, including decisions about charitable giving. If we want accurate information about charity effectiveness to reach donors outside the EA community, we need to understand what these models actually say when asked basic questions about giving. This could cover accuracy of opinions expressed, as well as availability of links to specific content.
This matters most for the large population of donors who give meaningfully but arenât deeply embedded in effective giving communities. Outside of conventional online marketing strategies, how can we ensure they encounter the best available evidence?
Methodology
This was a one-hour exploratory exercise. We tested three models: ChatGPT, Gemini, and Grok in incognito mode to avoid personalisation effects.
We aimed for prompts that a typical donor might use: neutral, basic English, avoiding technical or leading questions.
Each answer was subjectively scored 1â5, where 5 indicates high alignment with evidence-based charity recommendations and 1 indicates poor alignment.
Questions asked:
| Search |
| what are the best charities in the world |
| what are the best charities to help others |
| what are the most impactful charities |
| how do i help more people |
| what are the best non-profits in Africa |
| how can i help animals |
Four searches related to effectiveness, one to Africa specifically, and one to animals.
You can explore our exact wording and scores here: Effective Altruism AEO Test â 31 Jan 2026
Findings by cause area
High-level impact questions (4 searches)
| Search | Average score |
| what are the best charities in the world | 4.3 |
| what are the best charities to help others | 4.2 |
| what are the most impactful charities | 3.5 |
| how do i help more people | 3.2 |
âBest charitiesâ and âimpactful charitiesâ scored well because GiveWell appears consistently in training data and citations. Against Malaria Foundation showed up reliably; Malaria Consortium, Helen Keller International, New Incentives, and GiveDirectly appeared intermittently. Cost-effectiveness was often emphasised. Meta-level charities such as GiveWell and Giving What We Can frequently showed up.
Model differences: Grok discussed a broader range of cause areas. ChatGPT and Gemini focused primarily on global health and development, and meta charities to an extent. All three devoted at least a third of their responses to âlocal charitiesâ and âgiving to causes close to your heart.â
Some EA-adjacent sources were cited. Animal Charity Evaluators appeared occasionally.
Africa (1 search)
| Search | Average score |
| what are the best non-profits in Africa | 2.2 |
All three models gave long lists of Africaâs largest non-profits, probably ranked by revenue. It was difficult to tell exactly where these opinions had come from. ChatGPTâs answer was dominated by a single source using keywords âbest non-profits in Africa 2025.â. However, when trying to explore this source further, the link was broken. This implies that ChatGPT had previously crawled the site and was referencing from memory.
Gemini and Grok conflated âbestâ with âlargestâ African NGOs. Our use of ânon-profitsâ instead of âcharitiesâ may have impacted our results.
Animals (1 search)
| Search | Average score |
| how can i help animals | 1.3 |
Farmed animal welfare was nearly absent. Discussion of reducing meat consumption or supporting policy change was buried beneath content about ethical pet ownership, wildlife conservation, and sponsoring individual animals.
Notably, when challenged on âwhy was there minimal focus on Farmed Animal Welfareâ, all 3 LLMs credibly cited data-driven responses on the scale of the problem and cited EA-adjacent sources to back it up.
Findings by model
| Model | Average score |
| Grok | 3.7 |
| Gemini | 2.9 |
| ChatGPT | 2.8 |
Grok scored higher due to generally surfacing more content related to effective giving. It also encouraged follow-up questions related to longtermism and referenced niche EA concepts like x-risk reduction and shrimp welfare.
ChatGPT at least once based the large part of its response on a single site. In the question about âimpactfulâ charities, it used this source: https://ââgircowodisha.in/ââworld-s-best-charity-2025-top-global-nonprofits-ranked. We believe that the plain-English URL, and clear headers in the blog post contributed to this being indexed well by LLMs, despite the overall website being of questionable quality.
Limitations
We didnât test Claude, which is demonstrably popular in the West. Claude required immediate sign-in and configuration, while other models we could access in incognito mode.
Scoring methodology was not rigourous: we scored higher for more EA references, for EA content appearing earlier in responses, and for prompts toward EA-adjacent follow-ups. Our scoring is therefore biased.
Limited sample size, only one or two questions per cause area
Next steps and recommendations
To make this study more robust:
Ask ~10 questions per category with a clearer scoring framework
Test specific keyword variations: ânon-profitâ vs. âcharity,â âbestâ vs. âimpactfulâ
Expand on specific cause areas: hunger, animal welfare, poverty, Sub-Saharan Africa, India, for example.
Consider where the largest giving populations actually get their information
Test different languages: European languages, Chinese, Hindi, Arabic
Do specific frontier models enjoy more reach? Create a framework of importance and quality of response (e.g., is ChatGPTs opinion more important than Grokâs)
Potential actions for organisations to consider:
Create content that answers common queries directly. Blog posts titled with keywords like âBest charities,â showed up as sources for ChatGPT. âMost impactful non-profits,â or cause-specific variants (âbest charities for Africaâ) may surface in LLM responses, although more exploration is needed to understand exactly what measures are most effective.
Low-hanging fruit for animal welfare: FAQ pages on farmed animal welfare sites could address basic questions as post titles: âHow can I help more animals?â or âWhat else can I do besides volunteer at pet shelters?â.
Identify and take inspiration from top-cited sources. Find which sources LLMs cite most frequently for your cause area, and apply SEO best practices to ensure EA-aligned content competes.
Conclusion
Thanks for reading our report. We welcome comments on our methodology and findings.
Best,
Joey and Lorenzo
This is awesome! Claude does have an incognito mode so I tested your queries there and made a copy of your doc with its responses along with my ratings included:
https://ââdocs.google.com/ââspreadsheets/ââd/ââ1Qlq9hTLQ4U7BPdMNN7Rsxxo4A8GLDv0tfYjll4MKkW4/ââedit?gid=0#gid=0
It did a bit better than ChatGPT and Gemini but still below Grok. Iâm pretty impressed with how âEA-pilledâ it was overall.
Cracking work! Love the initiative of you extending this. Interesting to hear that Claude is EA-pilled, but that its animal welfare opinions are still more vibes-based than taking the same, rigourous stance
Deeply appreciate this Kristof. Interesting that at a broad level (what are the best charities, how to help more people), it cites credible and evidence-based resources.
Then when discussing animals and Africaâi.e. more long-tail keywordsâit does not.
There is probably low-hanging opportunity here for charities to write up more indexable FAQs and blogposts that match the language a user would use when asking an LLM a question (or even Google).