I think Aidan and the GWWC team did a very thorough job on their evaluation, and in some respects I think the report serves a valuable function in pushing us towards various kinds of process improvements.
I also understand why GWWC came to the decision they did: to not recommend GHDF as competitive with GiveWell. But I’m also skeptical that any organization other than GiveWell could pass this bar in GHD, since it seems that in the context of the evaluation GiveWell constitutes not just a benchmark for point-estimate CEAs but also a benchmark for various kinds of evaluation practices and levels of certainty.
I think this comes through in three key differences in perspective:
Can a grant only be identified as cost-effective in expectation if lots of time is spent making an unbiased, precise estimate of its cost-effectiveness?
Should CEAs be the singular determinant of whether or not a grant gets made?
Is maximizing calculated EV in the case of each individual grant the best way to ensure cost-effectiveness over the span of an entire grantmaking programme?
My claim is that, although I’m fairly sure sure GWWC would not explicitly say “yes” to each of these questions, the implication of their approach suggests otherwise. FP, meanwhile, thinks the answer to each is clearly “no.” I should say that GWWC has been quite open in saying that they think GHDF could pass the bar or might even pass it today — but I share other commenters’ skepticism that this could be true by GWWC’s lights in the context of the report! Obviously, though, we at FP think the GHDF is >10x.
The GHDF is risk-neutral. Consequently, we think that spending time reducing uncertainty about small grants is not worthwhile: it trades off against time that could be spent evaluating and making more plausibly high-EV grants. As Rosie notes in her comment, a principal function of the GHDF has been to provide urgent stopgap funding to organizations that quite often end up actually receiving funding from GW. Spending GW-tier effort getting more certain about $50k-$200k grants literally means that we don’t spend that time evaluating new high-EV opportunities. If these organizations die or fail to grow quickly, we miss out on potentially huge upside of the kind that we see in other orgs of which FP has been an early supporter. Rosie lists several such organizations in her comment.
The time and effort that we don’t spend matching GiveWell’s time expenditure results in higher variance around our EV estimates, and one component of that variance is indeed human error. We should reduce that error rate — but the existence of mistakes isn’t prima facie evidence of lack of rigor. In our view, the rigor lies in optimizing our processes to maximize EV over the long-term. This is why we have, for instance, guidelines for time expenditure based on the counterfactual value of researcher time. This programme entails some tolerance for error. I don’t think this is special pleading: you can look at GHDF’s list of grantees and find a good number that we identified as cost-effective before having that analysis corroborated by later analysis from GiveWell or other donors. This historical giving record, in combination with GWWC’s analysis, is what I think prospective GHDF donors should use to decide whether or not to give to the Fund.
Finally—a common (and IMO reasonable) criticism of EA-aligned or EA-adjacent organizations is an undue focus on quantification: “looking under the lamppost.” We want to avoid this without becoming detached from the base numeric truth, so one particular way we want to avoid it is by allowing difficult-to-quantify considerations to tilt us toward or away from a prospective grant. We do CEAs in nearly every case, but for the GHDF they serve an indicative purpose (as they often do at e.g. Open Phil) rather than a determinative one (as they often do at e.g. GiveWell). Non-quantitative considerations are elaborated and assessed in our internal recommendation template, which GWWC had access to but which I feel they somewhat underweighted in their analysis. These kinds of considerations find their way into our CEAs as well, particularly in the form of subjective inputs that GWWC, for their part, found unjustified.
It seems plausible to me that the existence of higher degrees of random error could inflate a more error-tolerant evaluator’s CEAs for funded grants as a class. Someone could probably quantify that intuition a whole lot better, but here’s one thought experiment:
Suppose ResourceHeavy and QuickMover [which are not intended to be GiveWell and FP!] are evaluating a pool of 100 grant opportunities and have room to fund 16 of them. Each has a policy of selecting the grants that score highest on cost-effectiveness. ResourceHeavy spends a ton of resources and determines the precise cost-effectiveness of each grant opportunity. To keep the hypo simple, let’s suppose that all 100 have a true cost effectiveness of 10.00-10.09 Units, and ResourceHeavy nails it on each candidate. QuickMover’s results, in contrast, include a normally-distributed error with a mean of 0 and a standard deviation of 3.
In this hypothetical, QuickMover is the more efficient operator because the underlying opportunities were ~indistinguishable anyway. However, QuickMover will erroneously claim that its selected projects have a cost-effectiveness of ~13+ Units because it unknowingly selected the 16 projects with the highest positive error terms (i.e., those with an error of +1 SD or above). Moreover, the random distribution of error determined which grants got funded and which did not—which is OK here since all candidates were ~indistinguishable but will be problematic in real-world situations.
While the hypo is unrealistic in some ways, it seems that given a significant error term, which grants clear a 10-Unit bar may be strongly influenced by random error, and that might undermine confidence in QuickMover’s selections. Moreover, significant error could result in inflated CEAs on funded grants as a class (as opposed to all evaluated grants as a class) because the error is a in some ways a one-way rachet—grants with significant negative error terms generally don’t get funded.
I’m sure someone with better quant skills than I could emulate a grant pool with variable cost-effectiveness in addition to a variable error term. And maybe these kinds of issues, even if they exist outside of thought experiments, could be too small in practice to matter much?
It’s definitely true that all else equal, uncertainty inflates CEAs of funded grants, for the reasons you identify. (This is an example of the optimizer’s curse.) However:
This risk is lower when the variance in true CE is large, especially if its larger than the variance due to measurement error. To the extent we think this is true in the opportunities we evaluate, this reduces the quantitative contribution of measurement error to CE inflation. More elaboration in this comment.
Good CEAs are conservative in their choices of inputs for exactly this reason. The goal should be to establish the minimal conditions for a grant to be worth making, as opposed to providing precise point estimates of CE.
Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.
I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:
1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:
[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.
Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.
2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.
3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.
I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.
I want to acknowledge my potential biases for any new comment thread readers (I used to be the senior researcher running the fund at FP, most or all of the errors highlighted in the report are mine, and I now work at GiveWell.) these are personal views.
I think getting further scrutiny and engagement into key grantmaking cruxes is really valuable. I also think this discussion this has prompted is cool. A few points from my perspective-
As Matt’s comment points out- there is a historical track record for many of these grants. Some have gone on the be GiveWell supported, or (imo) have otherwise demonstrated success in a way that suggests they were a ‘hit’. In fact, with the caveat that there are a good number of recent ones where it’s too early to tell, there hasn’t yet been one that I consider a ‘miss’. Is it correct to update primarily from 3 spot checks of early stage BOTECs (my read of this report) versus updating from what actually happened after the grant was made? Is this risking goodharting?
Is this really comparing like for like? In my view, small grants shouldn’t require as strong an evidence base as like, a multimillion grant, mainly due to the time expenditure reasons that Matt points out. I am concerned about whether this report is getting us further to a point where (due to the level of rigour and therefore time expenditure required) the incentives for grantmaking orgs are to only make really large grants. I think this systematically disadvantages smaller orgs, and I think this is a negative thing (I guess your view here partially depends on your view on point ‘3’ below.)
In my view, a really crucial crux here is really about the value of supporting early stage stuff, alongside other potentially riskier items, such as advocacy and giving multipliers. I am genuinely uncertain, and think that smart and reasonable people can disagree here. But I agree with Matt’s point- that there’s significant upside through potentially generating large future room for funding at high cost effectiveness. This kind of long term optionality benefit isn’t typically included in an early stage BOTEC (because doing a full VOI is time consuming) and I think it’s somewhat underweighted in this report.
I no longer have access to the BOTECs to check (since I’m no longer at FP) and again I think the focus on BOTECs is a bit misplaced. I do want to briefly acknowledge though that I’m not sure that all of these are actually errors (but I still think it’s true that there are likely some BOTEC errors, and I think this would be true for many/ most orgs making small grants).
Hi Rosie, thanks for sharing your thoughts on this! It’s great to get the chance to clarify our decision-making process so it’s more transparent, in particular so readers can make their own judgement as to whether or not they agree with our reasoning about FP GHDF. Some one my thoughts on each of the points you raise:
We agree there is a positive track record for some of FP GHDF’s grants and this is one of the key countervailing considerations against our decision not to rely on FP GHDF in the report. Ultimately, we concluded that the instances of ‘hits’ we were aware of were not sufficient to conclude that we should rely on FP GHDF into the future. Some of our key reasons for this included:
These ‘hits’ seemed to fall into clusters for which we expect there is a limited supply of opportunities, e.g., several that went on to be supported by GiveWell were AIM-incubated charities. This means, we expect these opportunities to be less likely to be the kinds of opportunities that FP GHDF would fund on the margin with additional funding
We were not convinced that these successes would be replicated in the future under the new senior researcher (see our crux relating to consistency of the fund).
Ultimately, what we are trying to do is establish where the next dollar can be best spent by a donor. We agree it might not be worth it for a researcher to spend as much time on small grants, but this by itself should not be a justification for us to recommend small grants over large ones (agree point 3 can be a relevant consideration here though).
We agree that the relative value donors place on supporting early stage and riskier opportunities compared to more established orgs could be a crux here. However, we still needed a bar against which we could assess FP GHDF (i.e., we couldn’t have justifiably relied on FP GHDF on the basis of this difference in worldview, independent of the quality of FP GHDF’s grantmaking). As such, we tried to assess whether FP GHDF grant evaluations convincingly demonstrated that opportunities met their self-stated bar. As we have acknowledged in the report, just because we don’t think the grant evaluations convincingly show opportunities meet the bar, doesn’t mean they really don’t (e.g., the researcher may have considered information not included in the grant evaluation report). However, we can only assess on the basis of the information we reviewed.
Regarding our focus on the BOTECs potentially being misplaced, I want to be clear that we did review all of these grant evaluations in full, not just the BOTECs. If we thought the issues we identified in the BOTECs were sufficiently compensated for by reasoning included in the grant evaluations more generally this would have played a part in our decision-making. I think assessing how well the BOTECs demonstrate opportunities surpass Founders Pledge’s stated bar was a reasonable evaluation strategy because: a) As mentioned above, these BOTECs were highly decision relevant — grants were only made if BOTECs showed opportunities to surpass 10x GiveDirectly and we know of no instances where an opportunity scored above 10x GiveDirectly and would not have been eligible for FP GHDF funding. b) The BOTECs are where many of the researcher’s judgements are made explicit and so can be assessed. At least for the three evaluations we reviewed in detail, a significant fraction of the work in the grant evaluation was justifying inputs to the BOTECs. On the other point raised here, it is true that not all of the concerns we had with the BOTECs were errors. Some of our concerns related to inputs that seemed (to us) optimistic and were, in our view, insufficiently justified considering the decision-relevant effect they had on the overall BOTEC. While not errors, these made it more difficult for us to justifiably conclude that the FP GHDF grants were in expectation competitive with GiveWell.
FP Research Director here.
I think Aidan and the GWWC team did a very thorough job on their evaluation, and in some respects I think the report serves a valuable function in pushing us towards various kinds of process improvements.
I also understand why GWWC came to the decision they did: to not recommend GHDF as competitive with GiveWell. But I’m also skeptical that any organization other than GiveWell could pass this bar in GHD, since it seems that in the context of the evaluation GiveWell constitutes not just a benchmark for point-estimate CEAs but also a benchmark for various kinds of evaluation practices and levels of certainty.
I think this comes through in three key differences in perspective:
Can a grant only be identified as cost-effective in expectation if lots of time is spent making an unbiased, precise estimate of its cost-effectiveness?
Should CEAs be the singular determinant of whether or not a grant gets made?
Is maximizing calculated EV in the case of each individual grant the best way to ensure cost-effectiveness over the span of an entire grantmaking programme?
My claim is that, although I’m fairly sure sure GWWC would not explicitly say “yes” to each of these questions, the implication of their approach suggests otherwise. FP, meanwhile, thinks the answer to each is clearly “no.” I should say that GWWC has been quite open in saying that they think GHDF could pass the bar or might even pass it today — but I share other commenters’ skepticism that this could be true by GWWC’s lights in the context of the report! Obviously, though, we at FP think the GHDF is >10x.
The GHDF is risk-neutral. Consequently, we think that spending time reducing uncertainty about small grants is not worthwhile: it trades off against time that could be spent evaluating and making more plausibly high-EV grants. As Rosie notes in her comment, a principal function of the GHDF has been to provide urgent stopgap funding to organizations that quite often end up actually receiving funding from GW. Spending GW-tier effort getting more certain about $50k-$200k grants literally means that we don’t spend that time evaluating new high-EV opportunities. If these organizations die or fail to grow quickly, we miss out on potentially huge upside of the kind that we see in other orgs of which FP has been an early supporter. Rosie lists several such organizations in her comment.
The time and effort that we don’t spend matching GiveWell’s time expenditure results in higher variance around our EV estimates, and one component of that variance is indeed human error. We should reduce that error rate — but the existence of mistakes isn’t prima facie evidence of lack of rigor. In our view, the rigor lies in optimizing our processes to maximize EV over the long-term. This is why we have, for instance, guidelines for time expenditure based on the counterfactual value of researcher time. This programme entails some tolerance for error. I don’t think this is special pleading: you can look at GHDF’s list of grantees and find a good number that we identified as cost-effective before having that analysis corroborated by later analysis from GiveWell or other donors. This historical giving record, in combination with GWWC’s analysis, is what I think prospective GHDF donors should use to decide whether or not to give to the Fund.
Finally—a common (and IMO reasonable) criticism of EA-aligned or EA-adjacent organizations is an undue focus on quantification: “looking under the lamppost.” We want to avoid this without becoming detached from the base numeric truth, so one particular way we want to avoid it is by allowing difficult-to-quantify considerations to tilt us toward or away from a prospective grant. We do CEAs in nearly every case, but for the GHDF they serve an indicative purpose (as they often do at e.g. Open Phil) rather than a determinative one (as they often do at e.g. GiveWell). Non-quantitative considerations are elaborated and assessed in our internal recommendation template, which GWWC had access to but which I feel they somewhat underweighted in their analysis. These kinds of considerations find their way into our CEAs as well, particularly in the form of subjective inputs that GWWC, for their part, found unjustified.
[highly speculative]
It seems plausible to me that the existence of higher degrees of random error could inflate a more error-tolerant evaluator’s CEAs for funded grants as a class. Someone could probably quantify that intuition a whole lot better, but here’s one thought experiment:
Suppose ResourceHeavy and QuickMover [which are not intended to be GiveWell and FP!] are evaluating a pool of 100 grant opportunities and have room to fund 16 of them. Each has a policy of selecting the grants that score highest on cost-effectiveness. ResourceHeavy spends a ton of resources and determines the precise cost-effectiveness of each grant opportunity. To keep the hypo simple, let’s suppose that all 100 have a true cost effectiveness of 10.00-10.09 Units, and ResourceHeavy nails it on each candidate. QuickMover’s results, in contrast, include a normally-distributed error with a mean of 0 and a standard deviation of 3.
In this hypothetical, QuickMover is the more efficient operator because the underlying opportunities were ~indistinguishable anyway. However, QuickMover will erroneously claim that its selected projects have a cost-effectiveness of ~13+ Units because it unknowingly selected the 16 projects with the highest positive error terms (i.e., those with an error of +1 SD or above). Moreover, the random distribution of error determined which grants got funded and which did not—which is OK here since all candidates were ~indistinguishable but will be problematic in real-world situations.
While the hypo is unrealistic in some ways, it seems that given a significant error term, which grants clear a 10-Unit bar may be strongly influenced by random error, and that might undermine confidence in QuickMover’s selections. Moreover, significant error could result in inflated CEAs on funded grants as a class (as opposed to all evaluated grants as a class) because the error is a in some ways a one-way rachet—grants with significant negative error terms generally don’t get funded.
I’m sure someone with better quant skills than I could emulate a grant pool with variable cost-effectiveness in addition to a variable error term. And maybe these kinds of issues, even if they exist outside of thought experiments, could be too small in practice to matter much?
It’s definitely true that all else equal, uncertainty inflates CEAs of funded grants, for the reasons you identify. (This is an example of the optimizer’s curse.) However:
This risk is lower when the variance in true CE is large, especially if its larger than the variance due to measurement error. To the extent we think this is true in the opportunities we evaluate, this reduces the quantitative contribution of measurement error to CE inflation. More elaboration in this comment.
Good CEAs are conservative in their choices of inputs for exactly this reason. The goal should be to establish the minimal conditions for a grant to be worth making, as opposed to providing precise point estimates of CE.
Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.
I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:
1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:
[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.
Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.
2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.
3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.
I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.
Hey Aidan,
I want to acknowledge my potential biases for any new comment thread readers (I used to be the senior researcher running the fund at FP, most or all of the errors highlighted in the report are mine, and I now work at GiveWell.) these are personal views.
I think getting further scrutiny and engagement into key grantmaking cruxes is really valuable. I also think this discussion this has prompted is cool. A few points from my perspective-
As Matt’s comment points out- there is a historical track record for many of these grants. Some have gone on the be GiveWell supported, or (imo) have otherwise demonstrated success in a way that suggests they were a ‘hit’. In fact, with the caveat that there are a good number of recent ones where it’s too early to tell, there hasn’t yet been one that I consider a ‘miss’. Is it correct to update primarily from 3 spot checks of early stage BOTECs (my read of this report) versus updating from what actually happened after the grant was made? Is this risking goodharting?
Is this really comparing like for like? In my view, small grants shouldn’t require as strong an evidence base as like, a multimillion grant, mainly due to the time expenditure reasons that Matt points out. I am concerned about whether this report is getting us further to a point where (due to the level of rigour and therefore time expenditure required) the incentives for grantmaking orgs are to only make really large grants. I think this systematically disadvantages smaller orgs, and I think this is a negative thing (I guess your view here partially depends on your view on point ‘3’ below.)
In my view, a really crucial crux here is really about the value of supporting early stage stuff, alongside other potentially riskier items, such as advocacy and giving multipliers. I am genuinely uncertain, and think that smart and reasonable people can disagree here. But I agree with Matt’s point- that there’s significant upside through potentially generating large future room for funding at high cost effectiveness. This kind of long term optionality benefit isn’t typically included in an early stage BOTEC (because doing a full VOI is time consuming) and I think it’s somewhat underweighted in this report.
I no longer have access to the BOTECs to check (since I’m no longer at FP) and again I think the focus on BOTECs is a bit misplaced. I do want to briefly acknowledge though that I’m not sure that all of these are actually errors (but I still think it’s true that there are likely some BOTEC errors, and I think this would be true for many/ most orgs making small grants).
Hi Rosie, thanks for sharing your thoughts on this! It’s great to get the chance to clarify our decision-making process so it’s more transparent, in particular so readers can make their own judgement as to whether or not they agree with our reasoning about FP GHDF. Some one my thoughts on each of the points you raise:
We agree there is a positive track record for some of FP GHDF’s grants and this is one of the key countervailing considerations against our decision not to rely on FP GHDF in the report. Ultimately, we concluded that the instances of ‘hits’ we were aware of were not sufficient to conclude that we should rely on FP GHDF into the future. Some of our key reasons for this included:
These ‘hits’ seemed to fall into clusters for which we expect there is a limited supply of opportunities, e.g., several that went on to be supported by GiveWell were AIM-incubated charities. This means, we expect these opportunities to be less likely to be the kinds of opportunities that FP GHDF would fund on the margin with additional funding
We were not convinced that these successes would be replicated in the future under the new senior researcher (see our crux relating to consistency of the fund).
Ultimately, what we are trying to do is establish where the next dollar can be best spent by a donor. We agree it might not be worth it for a researcher to spend as much time on small grants, but this by itself should not be a justification for us to recommend small grants over large ones (agree point 3 can be a relevant consideration here though).
We agree that the relative value donors place on supporting early stage and riskier opportunities compared to more established orgs could be a crux here. However, we still needed a bar against which we could assess FP GHDF (i.e., we couldn’t have justifiably relied on FP GHDF on the basis of this difference in worldview, independent of the quality of FP GHDF’s grantmaking). As such, we tried to assess whether FP GHDF grant evaluations convincingly demonstrated that opportunities met their self-stated bar. As we have acknowledged in the report, just because we don’t think the grant evaluations convincingly show opportunities meet the bar, doesn’t mean they really don’t (e.g., the researcher may have considered information not included in the grant evaluation report). However, we can only assess on the basis of the information we reviewed.
Regarding our focus on the BOTECs potentially being misplaced, I want to be clear that we did review all of these grant evaluations in full, not just the BOTECs. If we thought the issues we identified in the BOTECs were sufficiently compensated for by reasoning included in the grant evaluations more generally this would have played a part in our decision-making. I think assessing how well the BOTECs demonstrate opportunities surpass Founders Pledge’s stated bar was a reasonable evaluation strategy because: a) As mentioned above, these BOTECs were highly decision relevant — grants were only made if BOTECs showed opportunities to surpass 10x GiveDirectly and we know of no instances where an opportunity scored above 10x GiveDirectly and would not have been eligible for FP GHDF funding. b) The BOTECs are where many of the researcher’s judgements are made explicit and so can be assessed. At least for the three evaluations we reviewed in detail, a significant fraction of the work in the grant evaluation was justifying inputs to the BOTECs. On the other point raised here, it is true that not all of the concerns we had with the BOTECs were errors. Some of our concerns related to inputs that seemed (to us) optimistic and were, in our view, insufficiently justified considering the decision-relevant effect they had on the overall BOTEC. While not errors, these made it more difficult for us to justifiably conclude that the FP GHDF grants were in expectation competitive with GiveWell.