It would be interested to see a more detailed and systematic report on the activity and findings so far.
In some respects, it seems like a strange thing for GiveDirectly to be piloting. On the one hand, GiveDirectly has expertise in systematic studies of behavioural change in LDCs , and the chatbot possibly also performed programmatic functions in a cost effective manner. On the other hand it involves a charity known for its “let local people decide how to use money spent on their behalf, Western aid agencies doing it can be disempowering and often wrong” ethos asking “which parameters should we use to fine tune this [adaptation of a commercial] product we’ve designed to give them the most suitable answers before scaling up its deployment”… which seems like a very different ethos and approach.[1]
The conclusions highlighted from the research so far—both that if you give poor Rwandans access to ChatGPT they have a similar range of interaction to other humans[2] and that responses generated by an LLM with no meaningful local training dataset were often inadequate—seem unsurprising. I am sympathetic to arguments that people make better decisions with access to information, but I am also sympathetic to arguments a ChatGPT derivative is not the most valuable information Rwandans could receive (and may have minimal or even negative value)
I’m not actually sure what the costs of acquiring relevant local data and training a chatbot to achieve greater fluency in spoken Kinyarwada dialects and safeguarding against advice that is very bad in a local context are,[3] but they seem like a pretty relevant benchmark, since they might actually be considerable on a per user basis and the alternative for critical information like “what is the nearest health centre” might be something like signing people up to email lists, or a small number of human agents in Kigali costing surprisingly little.[4] I guess there’s also the “who’s paying?” question, especially when the current implementation appears to involve providing training data for one of the world’s most valuable companies (and obscure languages may or may not add value to their model).
I feel one relevant benchmark for GiveDirectly specifically might be “what is the estimated cost per per person reached to improve it: would locals rather have a better chatbot or the cash?”. It’s possible the insights they’re getting are extremely valuable particularly in the context of limited/no of web access, but it’s possible they’re not…
- ^
the relevant comparator might be the One Laptop Per Child project. Well intentioned, theory of change centred on the idea that people in LEDCs can be empowered by interacting with modern technology and better information too, but perhaps actual educational benefits didn’t really stack up with the costs and the participants would have chosen to have something other than a computer
- ^
I must admit, I am curious about the extent to which Rwandans engaged in “witty banter” or attempts to manipulate the chatbot into saying something silly...
- ^
I don’t know how bad the speaking and dataset is, and whether an adequate “solution” looks like a finetuning prompt with some info or developing a corpus of services data and synthetic idiosyncratic Kinyarwada to fix the model, but the latter option could be very expensive compared with the people it would actually reach...
- ^
I suspect you get many person years of Rwandan human call centre time for a month or two of a mid-level AI engineer’s time...
I’m not sure naive total utility maximization [in a static framework] is the best framework to be thinking about dealing with existential risk over time.[1]
Assuming the number of risks and error bars are not trivially small, the universal outcome of concentrating all your risk mitigations on one is that most risks continue to be a high as they could possibly be. The modal outcome is that the risks ignored includes at least one risk greater than the one all efforts are concentrated on mitigating. Some reasonable assumptions in the article above show this can hold even where the actual biggest risk is orders of magnitude greater than the one targeted. In the diversified approach, less money are devoted to reducing the perceived biggest risk, but the rest is apportioned to reducing other risks. This seems more robust to conventional assumptions like uncertainty and some risks being easier to mitigate than others.
And tbh I’m not even seeing an average utility boost from concentrating on the single largest risk as opposed to mitigating lots of risks without ancillary assumptions like increasing returns to risk reduction expenditure or the actual value of many risks under consideration being 0.