To onlookers: I’m not exactly answering the OP’s question and there’s high mansplaining potential, but my guess is that this comment has a chance of being useful.
Below is a narrative, not something I fully believe, because I think it’s incomplete (and I didn’t actually read the sources I quoted). But I think there is a chance this narrative is useful for the perspective it tries to share.
Narrative:
Basically, the root underlying value of evaluators comes from deep competence and powerful “models of reality”, this value does not come from a method.
Evaluators always share something like a “public method”, like spreadsheets and long essays, but these are not at all a complete presentation of how they reached their views, so reliance on them is problematic.
To be clear, all evaluators use numbers and tools like spreadsheets, and develop systems and processes. But in the end, these are surface methods, sort of a window dressing, explanation or heuristic to explain decisions whose information are stretched over a much deeper latent pool of skill and understanding. This skill and understanding isn’t legible to most people.
This is directly relevant to some motivations of the OP’s question, if they are trying to find and compare each org’s “Way to Measure Charities”. This is because, in the narrative you are reading, it’s hard to learn where the actual content comes from, and copying the surface presentation is an anti-pattern.
This statement is incredible if you stop to think about it:
[Rob Wilbin says that GiveWell said:]
“There’s a high probability that the impact that this has is much smaller than what’s suggested in that study. And it might even be nothing.”
“There’s actually a decent chance that this will just not replicate, and in fact, that won’t have any impact at all.”
(I’m not sure what GiveWell actually said, but let’s go with this paraphrase.)
Notice how much skill these statements convey. You would have to understand the process of scientific replication, and the related quality and robustness of the specific paper and subdomain. In scientific literatures, there’s many one-off papers like this—how did the they choose this one?
Also, even though GiveWell had this understanding, it’s very hard for anyone to measure this meta knowledge of science that lets them make this statement. How would you even go about asking about this?
Note that this is the right thing to do, getting a replication or doing a further investigation might take an impractically long time, during which many lives might be lost. It’s a deeply impressive statement. It’s hard to publicly be able to say that the impact could collapse with high probability and still respectfully ask for donations.
The author says that EEV has flaws, we can’t just take expected values to make decisions.
They then use “Bayesian priors” in a labored, lengthy way to explain how decisions are made. (Because this Bayesian explanation is poorly seated, this stirs up a wobbly consequent debate.)
No one actually thinks anyone is updating some statistical model here. Maybe a shorter explanation the author could have made was, “Look, we are really good, and have deep models of the world, the data is only one layer.”
(I didn’t actually read most of the above links).
Note that I’ve used GiveWell for all these examples. I think this is because GiveWell is the institution that, at the very least, stands well above the rest in EA in evaluators.
In less generous views, that EAs should be aware of, other evaluators might be much lower simulacra, sort of hiring a webmaster to copying the patterns into a new website.
Givewell’s quality is because of the people who moved through it over the years, not because of a spreadsheet or formula that can be handed down.
So this comment claims to touch on the OP’s question.
In addition to touching on the OP’s question, I think this is extremely important to point out because, of all the meta institutions that can be miscarried, evaluators or a “GiveWell for X” are structurally the most prone to misuse, and this is dangerous.
To explain:
Recall the narrative above, that evaluator skill is hard to see. As a result, the public face of an evaluator is inherently superficial (and often deliberately dressed for public consumption and socialization). Few actually see the underlying knowledge or ability.
Evaluators must also be independent—they can’t just be voted down by the entities they review. They must resist public pressure, sort of like a judge.
The danger is that these features can be cruelly captured by parties who talk the talk, and occupy the spaces for these evaluator institutions.
There is the further danger that, once they start polishing their public copy, it’s hard to draw the line. It’s a slippery place to be. It’s not difficult to glide into long or wacky worldviews, or introduce technical methods that are shallow, and defeat examination through rhetoric or obfuscation.
(While it can occur) the issues often aren’t driven by malice.
To see this, click on my name. You’ll see writing in many comments. Some of it touches on purported deep topics. It sounds impressive, but is it true?
What if most of my comments and ideas are sophomoric and hardly informative at all? If I lack deep understanding, I’ll never be able to see that I’m a hack.
I can’t tell I’m merely a person with just enough knowledge to write a coherent story. For many observers, they will know even less—but regardless, me, I’ll continue on whether I’m right or wrong.
In the same way that someone can write endlessly about meta or talk about politics, it can be tempting to feel that evaluation is easy.
The underlying patterns are similar, unfortunately the inertia and power of the institutions created can be large. Once started, it’s unclear what, if any, mechanism is available to stop it.
To onlookers: I’m not exactly answering the OP’s question and there’s high mansplaining potential, but my guess is that this comment has a chance of being useful.
Below is a narrative, not something I fully believe, because I think it’s incomplete (and I didn’t actually read the sources I quoted). But I think there is a chance this narrative is useful for the perspective it tries to share.
Narrative:
Basically, the root underlying value of evaluators comes from deep competence and powerful “models of reality”, this value does not come from a method.
Evaluators always share something like a “public method”, like spreadsheets and long essays, but these are not at all a complete presentation of how they reached their views, so reliance on them is problematic.
To be clear, all evaluators use numbers and tools like spreadsheets, and develop systems and processes. But in the end, these are surface methods, sort of a window dressing, explanation or heuristic to explain decisions whose information are stretched over a much deeper latent pool of skill and understanding. This skill and understanding isn’t legible to most people.
This is directly relevant to some motivations of the OP’s question, if they are trying to find and compare each org’s “Way to Measure Charities”. This is because, in the narrative you are reading, it’s hard to learn where the actual content comes from, and copying the surface presentation is an anti-pattern.
To see this, here’s some stories:
A. GiveWell has deep understanding of science
This statement is incredible if you stop to think about it:
(I’m not sure what GiveWell actually said, but let’s go with this paraphrase.)
Notice how much skill these statements convey. You would have to understand the process of scientific replication, and the related quality and robustness of the specific paper and subdomain. In scientific literatures, there’s many one-off papers like this—how did the they choose this one?
Also, even though GiveWell had this understanding, it’s very hard for anyone to measure this meta knowledge of science that lets them make this statement. How would you even go about asking about this?
Note that this is the right thing to do, getting a replication or doing a further investigation might take an impractically long time, during which many lives might be lost. It’s a deeply impressive statement. It’s hard to publicly be able to say that the impact could collapse with high probability and still respectfully ask for donations.
B. Givewell is backing off EEVs
https://blog.givewell.org/2011/08/18/why-we-cant-take-expected-value-estimates-literally-even-when-theyre-unbiased/
The author says that EEV has flaws, we can’t just take expected values to make decisions.
They then use “Bayesian priors” in a labored, lengthy way to explain how decisions are made. (Because this Bayesian explanation is poorly seated, this stirs up a wobbly consequent debate.)
No one actually thinks anyone is updating some statistical model here. Maybe a shorter explanation the author could have made was, “Look, we are really good, and have deep models of the world, the data is only one layer.”
(I didn’t actually read most of the above links).
Note that I’ve used GiveWell for all these examples. I think this is because GiveWell is the institution that, at the very least, stands well above the rest in EA in evaluators.
In less generous views, that EAs should be aware of, other evaluators might be much lower simulacra, sort of hiring a webmaster to copying the patterns into a new website.
Givewell’s quality is because of the people who moved through it over the years, not because of a spreadsheet or formula that can be handed down.
So this comment claims to touch on the OP’s question.
In addition to touching on the OP’s question, I think this is extremely important to point out because, of all the meta institutions that can be miscarried, evaluators or a “GiveWell for X” are structurally the most prone to misuse, and this is dangerous.
To explain:
Recall the narrative above, that evaluator skill is hard to see. As a result, the public face of an evaluator is inherently superficial (and often deliberately dressed for public consumption and socialization). Few actually see the underlying knowledge or ability.
Evaluators must also be independent—they can’t just be voted down by the entities they review. They must resist public pressure, sort of like a judge.
The danger is that these features can be cruelly captured by parties who talk the talk, and occupy the spaces for these evaluator institutions.
There is the further danger that, once they start polishing their public copy, it’s hard to draw the line. It’s a slippery place to be. It’s not difficult to glide into long or wacky worldviews, or introduce technical methods that are shallow, and defeat examination through rhetoric or obfuscation.
(While it can occur) the issues often aren’t driven by malice.
To see this, click on my name. You’ll see writing in many comments. Some of it touches on purported deep topics. It sounds impressive, but is it true?
What if most of my comments and ideas are sophomoric and hardly informative at all? If I lack deep understanding, I’ll never be able to see that I’m a hack.
I can’t tell I’m merely a person with just enough knowledge to write a coherent story. For many observers, they will know even less—but regardless, me, I’ll continue on whether I’m right or wrong.
In the same way that someone can write endlessly about meta or talk about politics, it can be tempting to feel that evaluation is easy.
The underlying patterns are similar, unfortunately the inertia and power of the institutions created can be large. Once started, it’s unclear what, if any, mechanism is available to stop it.