[Disclaimer: I’m the Chief Economist of IDinsight, an M+E provider who has worked with GiveWell and many others. I have a LOT of experience with evaluators being pressured to sugarcoat results, or lack thereof. ]
Strong disagree on this conclusion that M+E providers are inherently biased.
Yes, there are situations where M+E have incentives that can lead to bias. For instance, if an NGO hires an M+E provider to do an external evaluation of themselves, the NGO is therefore the ‘client’ of the researchers. This can be problematic, since the NGO will need to approve deliverables before payments are made. I’ve been involved in these situations and it is tricky.
But in general, arrangements can be made to align incentives with the truth. For instance if a funder (like GiveWell) hires an M+E provider to do an evaluation of one of its grantees, the incentives of the M+E provider are aligned with the funder, who hopefully would like to know the unvarnished truth. We’ve done numerous evaluations for GiveWell (most notably the New Incentives RCT) and have never felt any incentive to skew results one way or another.
From an organizational perspective, a well-run evaluation organization has much stronger long-term incentives to have a reputation for being honest, transparent, and truth-seeking, rather than getting repeat business from any particular client.
Thanks so much Dan—honoured to get a reply from someone with so much experience on the topic and doing such important work. There’s also a decent chance that ID insight has higher standards than many other orgs.
I agree with a decent amount of this—I agree that an NGO hiring its own M&E provider directly is “problematic”, and “tricky” only that I would use stronger language ;). Personally I think its a waste of resources for an NGO to hire an external M&E provider as the incentives are so skewed, I don’t think there’s a lot of added value compared to just internal M&E. Yes incentives are of course all wrong there too, but at least the knowledge and understanding of the operation will be better than the external provider, and the uptake for reform by the org might be better if it is driven from the inside as well.
I also agree if a funder commissions the M&E provider that is far better . At the management level incentives of the funder and the M&E organisation are likely to be aligned. I’m sure you don’t feel any incentive at your level to skew a result, but despite that both from evidence what I have seen, the positive skew is very hard to remove due to unfortunately skewed incentives at the local level.
”We’ve done numerous evaluations for GiveWell (most notably the New Incentives RCT) and have never felt any incentive to skew results one way or another. ” - I’m sure you don’t at that top level, but at the local “on the ground”, assessor level it is very difficult to avoid positive skew.
Both from my personal experience and theory (see incentives below) I think it is likely there will be some degree of positive skew even among the best M&E orgs—this might not mean the orgs shouldn’t exist, but we should be doing much more to mitigate the “on-the-ground” positive bias.
Unfortunately there are strong almost unidirectional incentives towards M&E providers at the local level to skew an M&E assessment positively. I can’t think of clear incentives towards a negative assessment—maybe someone else can?
Incentives for external M&E Positive bias (A lot of overlap between these)
Belief a positive assessment will bring more work. Although this belief may not be true, the most obvious reasoning which I have seen for skewing M&E positive often goes something like...
Positive assessment = more funding for NGO locally = more M&E work in future for me Negative assessment = less funding for NGO locally = less M&E work in future for me
The equation which you hope your employees will abide by might be(correct me if I’m wrong)
Honest assessment = Correct funding for NGO = increased trust in assessment org = more work for assessment org = more work for all our staff including me
But I think that chain of reasoning is VERY difficult to compute for local people working in doing the assessing. There’s also an aspect of prisoners dilemma here, where you rely on everyone assessing in your org to be on board with this long term view in order for it to work for you personally
Long term relationship maintanaince—Often the pool of educated people in NGO jobs and M&E jobs isn’t big, with a lot of overlap especially in a low-income low-education situation. Here in Northern Uganda educated people are likely to know each other. People are incentivised against negative assessments, because it may make things relationally harder with their friends community in future. This is understandable!
Short and long term Job security. I know one absolute legend here who worked as M&E within an org, and straight up whistle-blew against corruption to a funder. Not only was he thrown out of that organisation, but no other NGO here would hire him for years because they feared he may do the same to them—he was even told that directly in 2 interviews! Eventually he left this part of Uganda to get a job somewhere else where he wasn’t known. The fear of long term job security and employability is a strong incentive against reporting negative assessment, especially reporting extremely negative aspects like corruption and an org not actually having done work they claimed they did.
Incentives for external M&E negative bias-I genuinely fine it hard to think of good ones
Perhaps if an organisation really is amazing, the assessor could overemphasise some negative aspects to provide a sense of balance.
You don’t like the organisation or the people in the organisation so you give a disingenuous negative response.
I won’t get into ways of mitigating these bias now (my comment is too long already haha), but I think the natural lean (quite heavily) towards positive skew in M&E is quite high.
Would love to hear specific rebuttals to this if you have time, but all good if you don’t!
Interesting discussion. I agree incentives can be tricky and I have seen my fair share of bad evaluations and evaluation organisations with questionable practices. Some thoughts from me as an evaluator who has worked in a few different country contexts:
I think M&E is not just internal or completely external. A lot of times M&E orgs are hired to provide expert support and work alongside an organisation to develop an evaluation. M&E can be complex and it can really help orgs to have experts guide them through this process and clarify their thinking. And as you say when we have internal buy in we are more likely to see the findings taken up and actioned. When we only see M&E as an outside judgement commissioned by a funder with no input from the org being evaluated we make M&E out as antagonistic or adversarial which can be an unhelpful dynamic. I have seen orgs who have been unhappy with an external evaluation because they feel the evaluators made judgements when they didn’t fully understand the operating context (and how can they with often only a fly by visit to project locations) or did not properly take into account the values of the organisation or the community but rather only listened to the funder. This can be very disempowering and may not lead to positive changes.
I think many organisations do want to learn and improve but fear harsh judgement which is quite a natural, human response. I think bringing partners/orgs on board early, and establishing a pre-evaluation plan (see here) highlighting what your standards for evidence are and what actions you will take in response to certain findings before the evaluation is helpful. This also gives the organisation ownership over the evaluation results. I think it is important to frame your evaluation so that feedback can be taken on board in a culturally appropriate way. The last thing you want is for an organisation to feel harshly judged with absolutely no input or right of reply.
We speak like M&E is clear cut but M&E assessments often don’t come out fully positive or negative. A lot of evaluations occupy a messy middle. There is often some good things, some not so good things, some thing which we think are good or bad but we don’t have conclusive evidence. Sensemaking can be subjective as it often comes down to how you weigh different values or the standards you set for what good or bad look like. This can be different between the funder, the org, the community and the evaluators. For example if you find an education project is cost-effectively increasing test scores, but only for female students and not struggling male students what do you say? Is it good? Is it bad? Is this difference practically significant? Should the program be changed? What if changing this makes it less cost-effective? This comes down to how you weigh different values and standards of performance.
I agree with Dan and think integrity is a very important internal driver. While I agree with Nick that acting this way can be more difficult for local staff given the connections and relationships both professional and personal that they have to navigate, I don’t think integrity as an incentive is hard for them to compute, it is just harder for them to action. I don’t think the response should be that all evaluations should be done by non-local/international firms. This is highly disempowering, would drain local capacity, and again puts decision-making back in the hands of people often from high-income and supposedly ‘more objective’ contexts rather than building a strong local ecosystem of accountability, re-hashing problematic colonial power dynamics.
These kinds of dilemmas exist everywhere. Evaluation is always a tricky tightrope walk where you are trying to balance the rigour of evidence, the weight of different values, and the broader political and relational ecosystem so that what you say is actually used and put into practice to improve programs and impact.
Wow thanks so much again for the great insights. So good to have experienced development practitioners here!
To give some background I came to EA partly because I saw how useless most NGOs are here where I live, and the EA framework answers many of the questions as to why, and some of the questions as to how to fix the problem. If I was the one doing M&E and had a magic wand, I would probably decide to shut down over 80% of NGOs and programs that I assessed.
Also we have had a bunch of USAID and other funded M&E pass through many of our health centers, and they have almost never either found our biggest problems nor suggested good solutions—with one exception of a specifically focused financial management assessment which was actually really helpful.
I won’t respond to everything but just make a few comments :)..
Your M&E might just be better First, the level of M&E you do might be so much better than I have seen, that some of the issues I talk about might not apply so much.
”For example if you find an education project is cost-effectively increasing test scores, but only for female students and not struggling male students what do you say?”
That you have even done the kind of analysis that allows you to ask this kind of great question would put you above nearly any M&E that I have ever seen here in Northern Uganda. Even the concept of calculated “cost effectiveness” as we know it rarely (if ever) considered here. I can’t think of anyone who has assessed either the bigger health centers we operate or OneDay Health who has included this in an assessment.
I’m not sure how you would answer that question, but the fact that you have even reached that point means that in my eyes you are already winning to some degree. Also this analysis is so fantastic thanks for sharing I haven’t seen that before! My only comment is that I don’t think the analysis generated “mixed’ results -they seem very clear to me :D!
External assessors for data collection, local assessors for analysis and change? For an assessment like this one of Miraclefeet, I favour external assessors to gather the basic data then perhaps local assessors could take over on the analysis? Data collection needs to be squeaky clean otherwise everything else falls down. This particular assessment should be fairly straightforward to assess by first gathering these data
1. Have the clubfoot procedures actually been done as stated? This needs a random selection of a sample of all patients allegedly worked on (say 100 randomly selected from a list of 5000 patients provided by Miricalefeet) then each one of those should be physically followed up in their home and checked. This isn’t difficult, and anything else is open to bias.
2. What has the “average intervention” achieved? Then those same 100 patients should be assessed for impact—what is their objective level of functionality and subjective improvement in wellbeing/quality of life after the procedure.
Once these 2 pieces of data are gathered, the organisational analysis and discussion you speak of can start and that might be more productive on a local-to-local level, providing the local expertise is available.
Integrity there but comes second? I know integrity is an important driver like you say, and I love your comment that it is easy to compute and hard to action. In my experience integrity is usually there, but often falls behind the other “positive skew” motivating factors. Also I agree that M&E shouldn’t always be done by external firms partly because of the reasons you state. An added reason is that external firms often hire lots of local people to do much of the work anyway, so the same issues I outlined remain.
A small disagreement? ”I have seen orgs who have been unhappy with an external evaluation because they feel the evaluators made judgements when they didn’t fully understand the operating context (and how can they with often only a fly by visit to project locations) or did not properly take into account the values of the organisation or the community but rather only listened to the funder.”
In my experience this response might be a red flag. A sign that the org might dodging and weaving after failing to perform. I believe almost all organisation should do do pre specified actions A,B and C which provides impact X, Y, and Z. If these actions aren’t happening and impact isn’t produced then that needs to be fixed or maybe the work needs to stop. External evaluators’ job isn’t to understand the context (how could they possibly do that? Its not realistic. I’ve been in Uganda for 10 years and in many ways I still don’t understand the local context) -that is our job, the practitioners. Their job is to see what the org is doing and whether the impact is happening.
As a side note I’m a little disappointed that we don’t have more engagement on this discussion. - the “M&E question” is so important, but perhaps its not sexy and probably isn’t accessible to many.
Hi Nick, thanks for the thoughtful response. I think you make a lot of good points and I agree that there are numerous incentives can can lead an M+E provider to bias results positively. That’s why there is a ton of bad M+E out there.
One main reaction: for an employee who works in an M+E org, there is arguably no worse situation than being pressured to skew your results positively, or even worse, taking on projects where you know a certain results is expected by your clients. It makes you feel you work is meaningless, and really sucks. And when you are put in this situations, you sure as hell don’t want to work for the same client again.
Yes, i hear you that for bean-counters in an organization (or those who get dividends in a for-profit org), there are strong incentives to make clients happy and get more contracts. But I think that the job-satisfaction incentive for rank-and-file employees skews the other way. And in the course of my experience, I think it is this latter incentive toward truth-telling that has dominated in most cases.
Perhaps, like the rules for auditors established after accounting scandals, funders should adopt a policy requiring changes in the M&E provider at certain intervals, maybe with some random selection of interval? Knowing that next year’s assessment may be done by a different firm may create a disincentive for gaming the system (and a pathway for easier detection of any gaming). That may only work for projects with longer-term M&E efforts though.
[Disclaimer: I’m the Chief Economist of IDinsight, an M+E provider who has worked with GiveWell and many others. I have a LOT of experience with evaluators being pressured to sugarcoat results, or lack thereof. ]
Strong disagree on this conclusion that M+E providers are inherently biased.
Yes, there are situations where M+E have incentives that can lead to bias. For instance, if an NGO hires an M+E provider to do an external evaluation of themselves, the NGO is therefore the ‘client’ of the researchers. This can be problematic, since the NGO will need to approve deliverables before payments are made. I’ve been involved in these situations and it is tricky.
But in general, arrangements can be made to align incentives with the truth. For instance if a funder (like GiveWell) hires an M+E provider to do an evaluation of one of its grantees, the incentives of the M+E provider are aligned with the funder, who hopefully would like to know the unvarnished truth. We’ve done numerous evaluations for GiveWell (most notably the New Incentives RCT) and have never felt any incentive to skew results one way or another.
From an organizational perspective, a well-run evaluation organization has much stronger long-term incentives to have a reputation for being honest, transparent, and truth-seeking, rather than getting repeat business from any particular client.
Thanks so much Dan—honoured to get a reply from someone with so much experience on the topic and doing such important work. There’s also a decent chance that ID insight has higher standards than many other orgs.
I agree with a decent amount of this—I agree that an NGO hiring its own M&E provider directly is “problematic”, and “tricky” only that I would use stronger language ;). Personally I think its a waste of resources for an NGO to hire an external M&E provider as the incentives are so skewed, I don’t think there’s a lot of added value compared to just internal M&E. Yes incentives are of course all wrong there too, but at least the knowledge and understanding of the operation will be better than the external provider, and the uptake for reform by the org might be better if it is driven from the inside as well.
I also agree if a funder commissions the M&E provider that is far better . At the management level incentives of the funder and the M&E organisation are likely to be aligned. I’m sure you don’t feel any incentive at your level to skew a result, but despite that both from evidence what I have seen, the positive skew is very hard to remove due to unfortunately skewed incentives at the local level.
”We’ve done numerous evaluations for GiveWell (most notably the New Incentives RCT) and have never felt any incentive to skew results one way or another. ” - I’m sure you don’t at that top level, but at the local “on the ground”, assessor level it is very difficult to avoid positive skew.
Both from my personal experience and theory (see incentives below) I think it is likely there will be some degree of positive skew even among the best M&E orgs—this might not mean the orgs shouldn’t exist, but we should be doing much more to mitigate the “on-the-ground” positive bias.
Unfortunately there are strong almost unidirectional incentives towards M&E providers at the local level to skew an M&E assessment positively. I can’t think of clear incentives towards a negative assessment—maybe someone else can?
Incentives for external M&E Positive bias (A lot of overlap between these)
Belief a positive assessment will bring more work. Although this belief may not be true, the most obvious reasoning which I have seen for skewing M&E positive often goes something like...
Positive assessment = more funding for NGO locally = more M&E work in future for me
Negative assessment = less funding for NGO locally = less M&E work in future for me
The equation which you hope your employees will abide by might be(correct me if I’m wrong)
Honest assessment = Correct funding for NGO = increased trust in assessment org = more work for assessment org = more work for all our staff including me
But I think that chain of reasoning is VERY difficult to compute for local people working in doing the assessing. There’s also an aspect of prisoners dilemma here, where you rely on everyone assessing in your org to be on board with this long term view in order for it to work for you personally
Long term relationship maintanaince—Often the pool of educated people in NGO jobs and M&E jobs isn’t big, with a lot of overlap especially in a low-income low-education situation. Here in Northern Uganda educated people are likely to know each other. People are incentivised against negative assessments, because it may make things relationally harder with their friends community in future. This is understandable!
Short and long term Job security. I know one absolute legend here who worked as M&E within an org, and straight up whistle-blew against corruption to a funder. Not only was he thrown out of that organisation, but no other NGO here would hire him for years because they feared he may do the same to them—he was even told that directly in 2 interviews! Eventually he left this part of Uganda to get a job somewhere else where he wasn’t known. The fear of long term job security and employability is a strong incentive against reporting negative assessment, especially reporting extremely negative aspects like corruption and an org not actually having done work they claimed they did.
Incentives for external M&E negative bias- I genuinely fine it hard to think of good ones
Perhaps if an organisation really is amazing, the assessor could overemphasise some negative aspects to provide a sense of balance.
You don’t like the organisation or the people in the organisation so you give a disingenuous negative response.
I won’t get into ways of mitigating these bias now (my comment is too long already haha), but I think the natural lean (quite heavily) towards positive skew in M&E is quite high.
Would love to hear specific rebuttals to this if you have time, but all good if you don’t!
Thanks again
NIck.
Interesting discussion. I agree incentives can be tricky and I have seen my fair share of bad evaluations and evaluation organisations with questionable practices. Some thoughts from me as an evaluator who has worked in a few different country contexts:
I think M&E is not just internal or completely external. A lot of times M&E orgs are hired to provide expert support and work alongside an organisation to develop an evaluation. M&E can be complex and it can really help orgs to have experts guide them through this process and clarify their thinking. And as you say when we have internal buy in we are more likely to see the findings taken up and actioned. When we only see M&E as an outside judgement commissioned by a funder with no input from the org being evaluated we make M&E out as antagonistic or adversarial which can be an unhelpful dynamic. I have seen orgs who have been unhappy with an external evaluation because they feel the evaluators made judgements when they didn’t fully understand the operating context (and how can they with often only a fly by visit to project locations) or did not properly take into account the values of the organisation or the community but rather only listened to the funder. This can be very disempowering and may not lead to positive changes.
I think many organisations do want to learn and improve but fear harsh judgement which is quite a natural, human response. I think bringing partners/orgs on board early, and establishing a pre-evaluation plan (see here) highlighting what your standards for evidence are and what actions you will take in response to certain findings before the evaluation is helpful. This also gives the organisation ownership over the evaluation results. I think it is important to frame your evaluation so that feedback can be taken on board in a culturally appropriate way. The last thing you want is for an organisation to feel harshly judged with absolutely no input or right of reply.
We speak like M&E is clear cut but M&E assessments often don’t come out fully positive or negative. A lot of evaluations occupy a messy middle. There is often some good things, some not so good things, some thing which we think are good or bad but we don’t have conclusive evidence. Sensemaking can be subjective as it often comes down to how you weigh different values or the standards you set for what good or bad look like. This can be different between the funder, the org, the community and the evaluators. For example if you find an education project is cost-effectively increasing test scores, but only for female students and not struggling male students what do you say? Is it good? Is it bad? Is this difference practically significant? Should the program be changed? What if changing this makes it less cost-effective? This comes down to how you weigh different values and standards of performance.
I agree with Dan and think integrity is a very important internal driver. While I agree with Nick that acting this way can be more difficult for local staff given the connections and relationships both professional and personal that they have to navigate, I don’t think integrity as an incentive is hard for them to compute, it is just harder for them to action. I don’t think the response should be that all evaluations should be done by non-local/international firms. This is highly disempowering, would drain local capacity, and again puts decision-making back in the hands of people often from high-income and supposedly ‘more objective’ contexts rather than building a strong local ecosystem of accountability, re-hashing problematic colonial power dynamics.
These kinds of dilemmas exist everywhere. Evaluation is always a tricky tightrope walk where you are trying to balance the rigour of evidence, the weight of different values, and the broader political and relational ecosystem so that what you say is actually used and put into practice to improve programs and impact.
Wow thanks so much again for the great insights. So good to have experienced development practitioners here!
To give some background I came to EA partly because I saw how useless most NGOs are here where I live, and the EA framework answers many of the questions as to why, and some of the questions as to how to fix the problem. If I was the one doing M&E and had a magic wand, I would probably decide to shut down over 80% of NGOs and programs that I assessed.
Also we have had a bunch of USAID and other funded M&E pass through many of our health centers, and they have almost never either found our biggest problems nor suggested good solutions—with one exception of a specifically focused financial management assessment which was actually really helpful.
I won’t respond to everything but just make a few comments :)..
Your M&E might just be better
First, the level of M&E you do might be so much better than I have seen, that some of the issues I talk about might not apply so much.
”For example if you find an education project is cost-effectively increasing test scores, but only for female students and not struggling male students what do you say?”
That you have even done the kind of analysis that allows you to ask this kind of great question would put you above nearly any M&E that I have ever seen here in Northern Uganda. Even the concept of calculated “cost effectiveness” as we know it rarely (if ever) considered here. I can’t think of anyone who has assessed either the bigger health centers we operate or OneDay Health who has included this in an assessment.
I’m not sure how you would answer that question, but the fact that you have even reached that point means that in my eyes you are already winning to some degree. Also this analysis is so fantastic thanks for sharing I haven’t seen that before! My only comment is that I don’t think the analysis generated “mixed’ results -they seem very clear to me :D!
External assessors for data collection, local assessors for analysis and change?
For an assessment like this one of Miraclefeet, I favour external assessors to gather the basic data then perhaps local assessors could take over on the analysis? Data collection needs to be squeaky clean otherwise everything else falls down. This particular assessment should be fairly straightforward to assess by first gathering these data
1. Have the clubfoot procedures actually been done as stated? This needs a random selection of a sample of all patients allegedly worked on (say 100 randomly selected from a list of 5000 patients provided by Miricalefeet) then each one of those should be physically followed up in their home and checked. This isn’t difficult, and anything else is open to bias.
2. What has the “average intervention” achieved? Then those same 100 patients should be assessed for impact—what is their objective level of functionality and subjective improvement in wellbeing/quality of life after the procedure.
Once these 2 pieces of data are gathered, the organisational analysis and discussion you speak of can start and that might be more productive on a local-to-local level, providing the local expertise is available.
Integrity there but comes second?
I know integrity is an important driver like you say, and I love your comment that it is easy to compute and hard to action. In my experience integrity is usually there, but often falls behind the other “positive skew” motivating factors. Also I agree that M&E shouldn’t always be done by external firms partly because of the reasons you state. An added reason is that external firms often hire lots of local people to do much of the work anyway, so the same issues I outlined remain.
A small disagreement?
”I have seen orgs who have been unhappy with an external evaluation because they feel the evaluators made judgements when they didn’t fully understand the operating context (and how can they with often only a fly by visit to project locations) or did not properly take into account the values of the organisation or the community but rather only listened to the funder.”
In my experience this response might be a red flag. A sign that the org might dodging and weaving after failing to perform. I believe almost all organisation should do do pre specified actions A,B and C which provides impact X, Y, and Z. If these actions aren’t happening and impact isn’t produced then that needs to be fixed or maybe the work needs to stop. External evaluators’ job isn’t to understand the context (how could they possibly do that? Its not realistic. I’ve been in Uganda for 10 years and in many ways I still don’t understand the local context) - that is our job, the practitioners. Their job is to see what the org is doing and whether the impact is happening.
As a side note I’m a little disappointed that we don’t have more engagement on this discussion. - the “M&E question” is so important, but perhaps its not sexy and probably isn’t accessible to many.
Hi Nick, thanks for the thoughtful response. I think you make a lot of good points and I agree that there are numerous incentives can can lead an M+E provider to bias results positively. That’s why there is a ton of bad M+E out there.
One main reaction: for an employee who works in an M+E org, there is arguably no worse situation than being pressured to skew your results positively, or even worse, taking on projects where you know a certain results is expected by your clients. It makes you feel you work is meaningless, and really sucks. And when you are put in this situations, you sure as hell don’t want to work for the same client again.
Yes, i hear you that for bean-counters in an organization (or those who get dividends in a for-profit org), there are strong incentives to make clients happy and get more contracts. But I think that the job-satisfaction incentive for rank-and-file employees skews the other way. And in the course of my experience, I think it is this latter incentive toward truth-telling that has dominated in most cases.
Perhaps, like the rules for auditors established after accounting scandals, funders should adopt a policy requiring changes in the M&E provider at certain intervals, maybe with some random selection of interval? Knowing that next year’s assessment may be done by a different firm may create a disincentive for gaming the system (and a pathway for easier detection of any gaming). That may only work for projects with longer-term M&E efforts though.