Jason comments on Discussion about AI Safety funding (FB transcript)

Jason 1 May 2023 1:32 UTC
9 points
3 ∶ 3
Asking people in the funding network to discuss applications they think are net negative seems like a big ask. At present, I assume that each grantmaker has a pass/fail evaluation function in connection to their own funding bar.
I’m sure this is simplified/imprecise/overquantified, but their thought process might look something like this: As the grantmaker evaluates each application, they implicitly develop a point estimate with a confidence interval (let’s say 90% CI for this example). As the grantmaker spends more time on evaluation, the CI shrinks but may remain quite broad. The grantmaker will prematurely abort their evaluation process if the entire CI is below their funding bar at any time. At that point, the risk of erroneous rejecting the application is low enough that devoting further resources to the evaluation is not a good use of their time from the perspective of their grantmaking program.
However, the grantmaker may often reach the conclusion that the entire 90% CI is below the funding bar before they investigate deeply enough to reach a sufficiently firm conclusion that the project is net-negative that they would feel comfortable voicing that. Understandably, many grantmakers will want to be fairly confident that an application is net-negative before making a semi-public statement that could unilaterally torpedo the applicant and/or application.
As can be seen in some of the comments above, a grantmaker who calls out a grant to others is also exposing themselves to reputational risk, which (being human) is another reason they might feel the need to mitigate by conducting additional evaluation on a grant they already decided to reject. ^[1] Plus there’s the work of writing up a rationale in enough detail to mitigate reputational risk and actually persuade others. Therefore, it seems that asking grantmakers to identifiably communicate their net-negativity findings would be asking them to do a meaningful amount of extra work. That would detract to some extent from the time they spend evaluating grants they might actually make.
I would be more inclined to rely on a more robust process than mere communication. If we assume all funders are “informed, smart, and value aligned EAs,”^[2] then a random screening panel could be an option. Conditional on all three members selecting best estimate is net harmful, ^[3] what are the odds that asking 100 “informed, smart, and value aligned EAs” would yield a majority finding the proposal net positive? ^[4] Probably low enough to justify refusing to send the application on to the funding network. That will lead to erroneous rejections, but that risk has to be weighed against the probability that the proposal truly is net harmful in expectation and that forwarding it onward risks a unilateralist funding it.
Of course, one could use a randomish-sample-of-evaluators method in other ways, such as asking each funder who evaluates an application to anonymously rate it as likely net beneficial, likely net negative, or declines to opine. The current balance could be displayed with the application, and if the balance ever reveals a high enough probability that a majority of the whole group would vote net harmful, the proposal is removed. That has a lower chance than a screening panel of removing a majority-net-harmful application; maybe the unilateralist will be one of the first few evaluators. But it at least mitigates some of the disadvantages of expecting grantmakers who think the application net negative to take extra time away from their own grants and incur personal reputational risk for the collective good of mitigating unilateralist risk.
1. ^
  This is not inconsistent with the idea that a substantial fraction of applications are overall harmful in expectation. It is much easier to be confident on a shallow review that (say) about 25% of all applications one sees would be net harmful than to be confident on a shallow review that a specific application would be net harmful.
2. ^
  If not all funders would qualify, that raises other concerns.
3. ^
  A 2-1 split in either direction could go to a second panel (although experience might ultimately reveal that a 2-1 split ultimately led to a certain outcome in a high percentage of cases).
4. ^
  Moreover, erroneously rejecting a proposal that 51 or even 60 of these EAs would find net positive would be at most a fairly minor error. (I express no opinion here as to whether a percentage of evaluators somewhat under 50% concluding that an application was net harmful should also justify its rejection.)