TLDR: Concepts around effective giving show up in varying degrees across responses from leading LLMs. The EA community could reflect more how web presence affects LLM recommendations. Our exercise showed that interested actors can quickly get an idea of how this works. Organisations could cheaply evaluate how well they are surfacing in relevant discussions with LLMs and derive follow-up actions.
Introduction
People increasingly use LLMs rather than traditional search engines to inform their opinions, including decisions about charitable giving. If we want accurate information about charity effectiveness to reach donors outside the EA community, we need to understand what these models actually say when asked basic questions about giving. This could cover accuracy of opinions expressed, as well as availability of links to specific content.
This matters most for the large population of donors who give meaningfully but arenât deeply embedded in effective giving communities. Outside of conventional online marketing strategies, how can we ensure they encounter the best available evidence?
Methodology
This was a one-hour exploratory exercise. We tested three models: ChatGPT, Gemini, and Grok in incognito mode to avoid personalisation effects.
We aimed for prompts that a typical donor might use: neutral, basic English, avoiding technical or leading questions.
Each answer was subjectively scored 1â5, where 5 indicates high alignment with evidence-based charity recommendations and 1 indicates poor alignment.
Questions asked:
Search
what are the best charities in the world
what are the best charities to help others
what are the most impactful charities
how do i help more people
what are the best non-profits in Africa
how can i help animals
Four searches related to effectiveness, one to Africa specifically, and one to animals.
âBest charitiesâ and âimpactful charitiesâ scored well because GiveWell appears consistently in training data and citations. Against Malaria Foundation showed up reliably; Malaria Consortium, Helen Keller International, New Incentives, and GiveDirectly appeared intermittently. Cost-effectiveness was often emphasised. Meta-level charities such as GiveWell and Giving What We Can frequently showed up.
Model differences: Grok discussed a broader range of cause areas. ChatGPT and Gemini focused primarily on global health and development, and meta charities to an extent. All three devoted at least a third of their responses to âlocal charitiesâ and âgiving to causes close to your heart.â
Some EA-adjacent sources were cited. Animal Charity Evaluators appeared occasionally.
Africa (1 search)
Search
Average score
what are the best non-profits in Africa
2.2
All three models gave long lists of Africaâs largest non-profits, probably ranked by revenue. It was difficult to tell exactly where these opinions had come from. ChatGPTâs answer was dominated by a single source using keywords âbest non-profits in Africa 2025.â. However, when trying to explore this source further, the link was broken. This implies that ChatGPT had previously crawled the site and was referencing from memory.
Gemini and Grok conflated âbestâ with âlargestâ African NGOs. Our use of ânon-profitsâ instead of âcharitiesâ may have impacted our results.
Animals (1 search)
Search
Average score
how can i help animals
1.3
Farmed animal welfare was nearly absent. Discussion of reducing meat consumption or supporting policy change was buried beneath content about ethical pet ownership, wildlife conservation, and sponsoring individual animals.
Notably, when challenged on âwhy was there minimal focus on Farmed Animal Welfareâ, all 3 LLMs credibly cited data-driven responses on the scale of the problem and cited EA-adjacent sources to back it up.
Findings by model
Model
Average score
Grok
3.7
Gemini
2.9
ChatGPT
2.8
Grok scored higher due to generally surfacing more content related to effective giving. It also encouraged follow-up questions related to longtermism and referenced niche EA concepts like x-risk reduction and shrimp welfare.
ChatGPT at least once based the large part of its response on a single site. In the question about âimpactfulâ charities, it used this source: https://ââgircowodisha.in/ââworld-s-best-charity-2025-top-global-nonprofits-ranked. We believe that the plain-English URL, and clear headers in the blog post contributed to this being indexed well by LLMs, despite the overall website being of questionable quality.
Limitations
We didnât test Claude, which is demonstrably popular in the West. Claude required immediate sign-in and configuration, while other models we could access in incognito mode.
Scoring methodology was not rigourous: we scored higher for more EA references, for EA content appearing earlier in responses, and for prompts toward EA-adjacent follow-ups. Our scoring is therefore biased.
Limited sample size, only one or two questions per cause area
Next steps and recommendations
To make this study more robust:
Ask ~10 questions per category with a clearer scoring framework
Test specific keyword variations: ânon-profitâ vs. âcharity,â âbestâ vs. âimpactfulâ
Expand on specific cause areas: hunger, animal welfare, poverty, Sub-Saharan Africa, India, for example.
Consider where the largest giving populations actually get their information
Test different languages: European languages, Chinese, Hindi, Arabic
Do specific frontier models enjoy more reach? Create a framework of importance and quality of response (e.g., is ChatGPTs opinion more important than Grokâs)
Potential actions for organisations to consider:
Create content that answers common queries directly. Blog posts titled with keywords like âBest charities,â showed up as sources for ChatGPT. âMost impactful non-profits,â or cause-specific variants (âbest charities for Africaâ) may surface in LLM responses, although more exploration is needed to understand exactly what measures are most effective.
Low-hanging fruit for animal welfare: FAQ pages on farmed animal welfare sites could address basic questions as post titles: âHow can I help more animals?â or âWhat else can I do besides volunteer at pet shelters?â.
Identify and take inspiration from top-cited sources. Find which sources LLMs cite most frequently for your cause area, and apply SEO best practices to ensure EA-aligned content competes.
Conclusion
Thanks for reading our report. We welcome comments on our methodology and findings.
Answer Engine Optimisation: What do LLMs believe about impact?
TLDR: Concepts around effective giving show up in varying degrees across responses from leading LLMs. The EA community could reflect more how web presence affects LLM recommendations. Our exercise showed that interested actors can quickly get an idea of how this works. Organisations could cheaply evaluate how well they are surfacing in relevant discussions with LLMs and derive follow-up actions.
Introduction
People increasingly use LLMs rather than traditional search engines to inform their opinions, including decisions about charitable giving. If we want accurate information about charity effectiveness to reach donors outside the EA community, we need to understand what these models actually say when asked basic questions about giving. This could cover accuracy of opinions expressed, as well as availability of links to specific content.
This matters most for the large population of donors who give meaningfully but arenât deeply embedded in effective giving communities. Outside of conventional online marketing strategies, how can we ensure they encounter the best available evidence?
Methodology
This was a one-hour exploratory exercise. We tested three models: ChatGPT, Gemini, and Grok in incognito mode to avoid personalisation effects.
We aimed for prompts that a typical donor might use: neutral, basic English, avoiding technical or leading questions.
Each answer was subjectively scored 1â5, where 5 indicates high alignment with evidence-based charity recommendations and 1 indicates poor alignment.
Questions asked:
Four searches related to effectiveness, one to Africa specifically, and one to animals.
You can explore our exact wording and scores here: Effective Altruism AEO Test â 31 Jan 2026
Findings by cause area
High-level impact questions (4 searches)
âBest charitiesâ and âimpactful charitiesâ scored well because GiveWell appears consistently in training data and citations. Against Malaria Foundation showed up reliably; Malaria Consortium, Helen Keller International, New Incentives, and GiveDirectly appeared intermittently. Cost-effectiveness was often emphasised. Meta-level charities such as GiveWell and Giving What We Can frequently showed up.
Model differences: Grok discussed a broader range of cause areas. ChatGPT and Gemini focused primarily on global health and development, and meta charities to an extent. All three devoted at least a third of their responses to âlocal charitiesâ and âgiving to causes close to your heart.â
Some EA-adjacent sources were cited. Animal Charity Evaluators appeared occasionally.
Africa (1 search)
All three models gave long lists of Africaâs largest non-profits, probably ranked by revenue. It was difficult to tell exactly where these opinions had come from. ChatGPTâs answer was dominated by a single source using keywords âbest non-profits in Africa 2025.â. However, when trying to explore this source further, the link was broken. This implies that ChatGPT had previously crawled the site and was referencing from memory.
Gemini and Grok conflated âbestâ with âlargestâ African NGOs. Our use of ânon-profitsâ instead of âcharitiesâ may have impacted our results.
Animals (1 search)
Farmed animal welfare was nearly absent. Discussion of reducing meat consumption or supporting policy change was buried beneath content about ethical pet ownership, wildlife conservation, and sponsoring individual animals.
Notably, when challenged on âwhy was there minimal focus on Farmed Animal Welfareâ, all 3 LLMs credibly cited data-driven responses on the scale of the problem and cited EA-adjacent sources to back it up.
Findings by model
Grok scored higher due to generally surfacing more content related to effective giving. It also encouraged follow-up questions related to longtermism and referenced niche EA concepts like x-risk reduction and shrimp welfare.
ChatGPT at least once based the large part of its response on a single site. In the question about âimpactfulâ charities, it used this source: https://ââgircowodisha.in/ââworld-s-best-charity-2025-top-global-nonprofits-ranked. We believe that the plain-English URL, and clear headers in the blog post contributed to this being indexed well by LLMs, despite the overall website being of questionable quality.
Limitations
We didnât test Claude, which is demonstrably popular in the West. Claude required immediate sign-in and configuration, while other models we could access in incognito mode.
Scoring methodology was not rigourous: we scored higher for more EA references, for EA content appearing earlier in responses, and for prompts toward EA-adjacent follow-ups. Our scoring is therefore biased.
Limited sample size, only one or two questions per cause area
Next steps and recommendations
To make this study more robust:
Ask ~10 questions per category with a clearer scoring framework
Test specific keyword variations: ânon-profitâ vs. âcharity,â âbestâ vs. âimpactfulâ
Expand on specific cause areas: hunger, animal welfare, poverty, Sub-Saharan Africa, India, for example.
Consider where the largest giving populations actually get their information
Test different languages: European languages, Chinese, Hindi, Arabic
Do specific frontier models enjoy more reach? Create a framework of importance and quality of response (e.g., is ChatGPTs opinion more important than Grokâs)
Potential actions for organisations to consider:
Create content that answers common queries directly. Blog posts titled with keywords like âBest charities,â showed up as sources for ChatGPT. âMost impactful non-profits,â or cause-specific variants (âbest charities for Africaâ) may surface in LLM responses, although more exploration is needed to understand exactly what measures are most effective.
Low-hanging fruit for animal welfare: FAQ pages on farmed animal welfare sites could address basic questions as post titles: âHow can I help more animals?â or âWhat else can I do besides volunteer at pet shelters?â.
Identify and take inspiration from top-cited sources. Find which sources LLMs cite most frequently for your cause area, and apply SEO best practices to ensure EA-aligned content competes.
Conclusion
Thanks for reading our report. We welcome comments on our methodology and findings.
Best,
Joey and Lorenzo