There have been at least 2 studies/impact evaluations of AI tutoring in African countries finding extraordinarily large effects:
Summer 2024 — 15–16-year olds in Nigeria They had 800 students total. The treatment group studied with GPT-based Microsoft Copilot twice weekly for six weeks, studying English. They were just provided an initial prompt to start chatting—teachers had a minimal “orchestra conductor” role—but they achieved “the equivalent of two years of typical learning in just six weeks.”
February–August 2023 — 8–14-year-olds in Ghana An educational network called Rising Academies tested their WhatsApp-based AI math tutor called Rori with 637 students in Ghana. Students in the treatment group received AI tutors during study hall. After eight months, 25% of the subjects attrited from inconsistent school attendance. Of the remainder, the treatment group increased their scores on a 35-question assessment by 5.13 points versus 2.12 points for the control group. This difference was “approximately equivalent to an extra year of learning” for the treatment group.
Should this significantly change how excited EAs are about educational interventions? I don’t know, but I’ve also not seen a discussion of this on the forum (this post about MOOC & AI tutors that received ~zero engagement).
The part that seems relevant is the charity Imagine Worldwide’s use of the “adaptive software” OneBillion app to teach numeracy and literacy. Despite Vadim’s several discounts and general conservatism throughout his CEA he still gets ~11x GD cost-effectiveness. (I’d honestly thought, given the upvotes and engagement on the post, that Vadim had changed some EAs’ minds on the promisingness of non-deworming education interventions.) The OneBillion app doesn’t seem to use AI, but they already (paraphrasing) use “software to provide a complete, research-based curriculum that adapts to each child’s pace, progress, and cultural and linguistic context”, so I’m not sure how much better Copilot / Rori would be?
Quoting some parts that stood out to me (emphasis mine):
This post argues that if we look at a broad enough evidence base for the long term outcomes of education interventions we can conclude that the best ones are as cost effective as top GiveWell grants. …
… I will argue that the combined evidence for the income impacts of interventions that boost test scores is much stronger than the evidence GiveWell has used to value the income effects of fighting malaria, deworming, or making vaccines, vitamin A, and iodine more available. Even after applying very conservative discounts to expected effect sizes to account for the applicability of the evidence to potential funding opportunities, we find the best education interventions to be in the same range of cost-effectiveness as GiveWell’s top charities. …
When we apply the above recommendations to our median recommended education charity, Imagine Worldwide, we estimate that it is 11x as cost effective as GiveDirectly at boosting well-being through higher income. …
Imagine Worldwide (IW) provides adaptive software to teach numeracy and literacy in Malawi, along with the training, tablets and solar panels required to run it. They plan to fund a six-year scale-up of their currently existing program to cover all 3.5 million children in grades 1-4 by 2028. The Malawi government will provide government employees to help with implementation for the first six years, and will take over the program after 2028. Children from over 250 schools have received instruction through the OneBillion app in Malawi over the past 8 years. Five randomized controlled trials of the program have found learning gains of an average of 0.33 standard deviations. The OneBillion app has also undergone over five additional RCTs in a broad range of contexts with comparable or better results.
I suspect this might be two distinct uses of “AI” as a term. While GPT-type chatbots can be helpful (such as in the educational examples you refer to), they are very different from artificial general intelligence of the type that most AI alignment/safety work is expecting to happen.
To paraphrase AI Snake Oil,[1] it is like one person talking about vehicles while discussing about how improved spacecraft will open up new possibilities for humanity, and a second person mentions how vehicles are also helping his area because cars are becoming more energy efficient. While they do both fall under the category of “vehicles,” they are quite different concepts. So I’m wondering if this might be verging near to talking past each other territory.
The full quote is this: “Imagine an alternate universe in which people don’t have words for different forms of transportation—only the collective noun “vehicle.” They use that word to refer to cars, buses, bikes, spacecraft, and all other ways of getting from place A to place B. Conversations in this world are confusing. There are furious debates about whether or not vehicles are environmentally friendly, even though no one realizes that one side of the debate is talking about bikes and the other side is talking about trucks. There is a breakthrough in rocketry, but the media focuses on how vehicles have gotten faster—so people call their car dealer (oops, vehicle dealer) to ask when faster models will be available. Meanwhile, fraudsters have capitalized on the fact that consumers don’t know what to believe when it comes to vehicle technology, so scams are rampant in the vehicle sector. Now replace the word “vehicle” with “artificial intelligence,” and we have a pretty good description of the world we live in.”
Thanks for the comment! I might be missing something, but GPT-type chatbots are based on large language models, which play a key role in scaling toward AGI. I do think that extrapolating progress from them is valuable but also agree that tying discussions about future AI systems too closely to current models’ capabilities can be misleading.
That said, my post intentionally assumes a more limited claim: that AI will transform the world in significant ways relatively soon. This assumption seems both more likely and increasingly foreseeable. In contrast, assumptions about a world ‘incredibly radically’ transformed by superintelligence are less likely and less foreseeable. There have been lots of arguments around why you should work on AI Safety, and I agree with many of them. I’m mainly trying to reach the EAs who buy into the limited claim but currently act as if they don’t.
Regarding the example: It would likely be a mistake to focus only on current AI capabilities for education. However, it could be important to seriously evaluate scenarios like, ‘AI teachers better than every human teacher soon’.
That strikes me as very reasonable, especially considering the likelihood and foreseeability. Especially since the education examples you mentioned really are currently capable of transforming parts of the world.
I guess the issue for arguing for AI tutoring interventions to increase earnings is that it would have to compete against AI tutoring interventions to assist folk working directly on high-priority issues and that comparison is unlikely to come out favourably (though the former has the advantage of being more sellable to traditional funders).
HealthLearn builds a mobile app for health workers (nurses, midwives, doctors, community health workers) in Nigeria und Uganda. Health workers use it to learn clinical best practices. This leads to better outcomes for patients.
I’m personally very excited by this. Health workers in developing countries often have few training resources available. There are several clinical practices that can improve patient outcomes while being easy to implement (such as initiating breastfeeding immediately after birth). These are not as widely used as we would like.
HealthLearn uses technology as a way to faithfully scale the intervention to thousands of health workers. At this point, AI does not play a significant role in the learning process yet. Courses are manually designed. This was important to get started quickly, but also to get approval from government health agencies and professional organizations such as nursing councils.
The impact model that I’ve linked to above estimates that the approach has been cost-effective so far, and could become better with scale.
(disclaimer: I’m one of the software engineers building the app)
One GHW example: The impact of AI tutoring on educational interventions (via Arjun Panickssery on LessWrong).
There have been at least 2 studies/impact evaluations of AI tutoring in African countries finding extraordinarily large effects:
Should this significantly change how excited EAs are about educational interventions? I don’t know, but I’ve also not seen a discussion of this on the forum (this post about MOOC & AI tutors that received ~zero engagement).
This writeup by Vadim Albinsky at Founders Pledge seems related: Are education interventions as cost effective as the top health interventions? Five separate lines of evidence for the income effects of better education [Founders Pledge]
The part that seems relevant is the charity Imagine Worldwide’s use of the “adaptive software” OneBillion app to teach numeracy and literacy. Despite Vadim’s several discounts and general conservatism throughout his CEA he still gets ~11x GD cost-effectiveness. (I’d honestly thought, given the upvotes and engagement on the post, that Vadim had changed some EAs’ minds on the promisingness of non-deworming education interventions.) The OneBillion app doesn’t seem to use AI, but they already (paraphrasing) use “software to provide a complete, research-based curriculum that adapts to each child’s pace, progress, and cultural and linguistic context”, so I’m not sure how much better Copilot / Rori would be?
Quoting some parts that stood out to me (emphasis mine):
I suspect this might be two distinct uses of “AI” as a term. While GPT-type chatbots can be helpful (such as in the educational examples you refer to), they are very different from artificial general intelligence of the type that most AI alignment/safety work is expecting to happen.
To paraphrase AI Snake Oil,[1] it is like one person talking about vehicles while discussing about how improved spacecraft will open up new possibilities for humanity, and a second person mentions how vehicles are also helping his area because cars are becoming more energy efficient. While they do both fall under the category of “vehicles,” they are quite different concepts. So I’m wondering if this might be verging near to talking past each other territory.
The full quote is this: “Imagine an alternate universe in which people don’t have words for different forms of transportation—only the collective noun “vehicle.” They use that word to refer to cars, buses, bikes, spacecraft, and all other ways of getting from place A to place B. Conversations in this world are confusing. There are furious debates about whether or not vehicles are environmentally friendly, even though no one realizes that one side of the debate is talking about bikes and the other side is talking about trucks. There is a breakthrough in rocketry, but the media focuses on how vehicles have gotten faster—so people call their car dealer (oops, vehicle dealer) to ask when faster models will be available. Meanwhile, fraudsters have capitalized on the fact that consumers don’t know what to believe when it comes to vehicle technology, so scams are rampant in the vehicle sector. Now replace the word “vehicle” with “artificial intelligence,” and we have a pretty good description of the world we live in.”
Thanks for the comment! I might be missing something, but GPT-type chatbots are based on large language models, which play a key role in scaling toward AGI. I do think that extrapolating progress from them is valuable but also agree that tying discussions about future AI systems too closely to current models’ capabilities can be misleading.
That said, my post intentionally assumes a more limited claim: that AI will transform the world in significant ways relatively soon. This assumption seems both more likely and increasingly foreseeable. In contrast, assumptions about a world ‘incredibly radically’ transformed by superintelligence are less likely and less foreseeable. There have been lots of arguments around why you should work on AI Safety, and I agree with many of them. I’m mainly trying to reach the EAs who buy into the limited claim but currently act as if they don’t.
Regarding the example: It would likely be a mistake to focus only on current AI capabilities for education. However, it could be important to seriously evaluate scenarios like, ‘AI teachers better than every human teacher soon’.
That strikes me as very reasonable, especially considering the likelihood and foreseeability. Especially since the education examples you mentioned really are currently capable of transforming parts of the world.
I guess the issue for arguing for AI tutoring interventions to increase earnings is that it would have to compete against AI tutoring interventions to assist folk working directly on high-priority issues and that comparison is unlikely to come out favourably (though the former has the advantage of being more sellable to traditional funders).
EA charities can also combine education and global health, like https://healthlearn.org/blog/updated-impact-model
HealthLearn builds a mobile app for health workers (nurses, midwives, doctors, community health workers) in Nigeria und Uganda. Health workers use it to learn clinical best practices. This leads to better outcomes for patients.
I’m personally very excited by this. Health workers in developing countries often have few training resources available. There are several clinical practices that can improve patient outcomes while being easy to implement (such as initiating breastfeeding immediately after birth). These are not as widely used as we would like.
HealthLearn uses technology as a way to faithfully scale the intervention to thousands of health workers. At this point, AI does not play a significant role in the learning process yet. Courses are manually designed. This was important to get started quickly, but also to get approval from government health agencies and professional organizations such as nursing councils.
The impact model that I’ve linked to above estimates that the approach has been cost-effective so far, and could become better with scale.
(disclaimer: I’m one of the software engineers building the app)