Re: sequestration, OpenPhil has written about the difficulty of getting honest, critical feedback as a grantmaker. This seems like something all grantmakers should keep in mind. The danger seems especially high for an organization like OpenPhil or CEA, which is grantmaking all over the EA movement with EA Grants and EA Funds. Unfortunately, some reports from ex-employees of CEA on Glassdoor give me the impression CEA is not as proactive in its self-skepticism as OpenPhil:
Not terribly open to honest self-assessment, but no more so than the average charity.
...
As another reviewer mentioned, ironically hostile to honest self-assessment, let alone internal concerns about effectiveness—I saw and heard of some people who’d got significant grief for this. Groupthink and back-patting was more rewarded.
I’ve also heard an additional anecdote about CEA, independent of Glassdoor, which is compatible with this impression.
The question of whether and how much to prioritize those who appear most talented is tricky. I get the impression there has been a gradual but substantial update away from mass outreach over the past few years (though some answers in Will’s AMA make me wonder if he and maybe others are trying to push back against what they see as excessive “hero worship” etc.) Anyway, some thoughts on this:
I think it’s not always obvious how much of the work attributed to one famous person should really be credited to a much larger team. For example, one friend of mine cited the massive amount of money Bill Gates made as evidence that impact is highly disproportionate. However, I would guess in many cases, successful entrepreneurs at the $100M+ scale are distinguished by their ability to identify & attract great people to work for their company. I think maybe there is some quirk of our society where we want to credit just a few individuals with an impressive accomplishment even when the “correct” assignment of credit doesn’t actually follow a power law distribution. [For a concrete example where we have data available, I think claims about Wikipedia editor contributions following a power law distribution have been refuted.]
Even in cases where individual impact will be power law distributed, that doesn’t mean we can reliably identify the people at the top of the distribution in advance. For example, this paper apparently found that work sample tests only correlated with job performance at around 0.26-0.33! (Not sure what “attenuation” means in this context.) Anyway, maybe we could do some analysis: If you have applicant pool with N applicants, and you’re going to hire the top K applicants based on a work sample test which correlates with job performance at 0.3, what does K need to be for you to have a 90% chance of hiring the best applicant? (I’d actually argue that the premise of this question is flawed, because the hypothetical 10x applicant is probably going to achieve 10x performance through some creative insights which the work sample test predicts even less well, but I’d still be interested in seeing the results of the analysis. Actually, speaking of creativity, have any EA organizations experimented with using tests of creative ability in their hiring?)
Finally, I think it could be useful to differentiate between “elitism” and “exclusivity”. For example, I once did some napkin math suggesting that less than 0.01% of the people who watch Peter Singer’s TED talk later become EAs. So arguably, this is actually a pretty strong signal of dedication & willingness to take ideas seriously compared to, say, someone who was persuaded to become an EA through an element of peer pressure after several friends became interested. But the second person is probably going to better connected within EA. So if the movement becomes more “exclusive”, in the sense of using someone’s position in the social scene as a proxy for their importance, I suspect we’d be getting it wrong. When I think of the EAs who seem very dedicated to making an impact, people I’m excited about, they’re often people who came to EA on their own and in some cases still aren’t very well-connected.
I’m glad people want to look for evidence that CEA (and other orgs) is being adequately self-reflective. However, I’d like to give some additional context on Glassdoor. Of the five CEA reviews posted there:
Two are from people who have confused CEA with other organizations (neither of those were cited in John’s comment)
One is fairly recent and positive (also not cited)
One is from March 2018 -- more recent, but still representing a substantial departure from CEA’s current staff list, including a different executive director. A lot can change over the course of 18 months.
I’ll refrain from going into too much detail, but my experience is that CEA circa late 2019 is intensely self-reflective; I’m prompted multiple times in the average week to put serious thought into ways we can improve our processes and public communication.
my experience is that CEA circa late 2019 is intensely self-reflective; I’m prompted multiple times in the average week to put serious thought into ways we can improve our processes and public communication.
I imagine that the ex-staff who complain about them being “hostile to honest self-assessment, let alone internal concerns about effectiveness” are likely referring more to something like self-criticism, rather than simply self-reflection. Even an org or individual that was entirely cynically dedicated to maximising their prestige rather than doing good, would self-reflect about how to communicate more effectively.
It does seem a bit weird to me for an organization to claim to be self-critical but put relatively little effort into soliciting external critical feedback. Like, CEA has a budget of $5M. To my knowledge, not even 0.01% of that budget is going into cash prizes for the best arguments that CEA is on the wrong track with any of its activities. This suggests either (a) an absurd level of confidence, on the order of 99.99%, that all the knowledge + ideas CEA needs are in the heads of current employees or (b) a preference for preserving the organization’s image over actual effectiveness. Not to rag on CEA specifically—just saying if an organization claims to be self-critical, maybe we should check to see if they’re putting their money where their mouth is.
(One possible counterpoint is that EAs are already willing to provide external critical feedback. However, Will recently said he thought EA was suffering too much from deference/information cascades. Prizes for criticism seem like they could be an effective way to counteract that.)
Is putting some non-trivial budget into cash prizes for arguments against what you do the only way to show you’re self-critical? Your statement suggests you believe something like that. But that doesn’t seem the only way to show you’re self-critical. I can’t think of any other organisation that have ever done that, so if it is the only way to show you’re self-critical, that suggests no organisation (I’ve heard of) is self-critical, which seems false. I wonder if you’re holding CEA to a peculiarly high standard; would you expect MIRI, 80k, the Gates Foundation, Google, etc. to do the same?
I’m suggesting that the revealed preferences of most organizations, including CEA, indicate they aren’t actually very self-critical. Hence the “Not to rag on CEA specifically” bit.
I think we’re mostly in agreement that CEA isn’t less self-critical than the average organization. Even one of the Glassdoor reviewers wrote: “Not terribly open to honest self-assessment, but no more so than the average charity.” (emphasis mine) However, aarongertler’s reply made it sound like he thought CEA was very self-critical… so I think it’s reasonable to ask why less than 0.01% of CEA’s cash budget goes to self-criticism, if someone makes that claim.
How meaningful is an organization’s commitment to self-criticism, exactly? I think the fraction of their cash budget devoted to self-criticism gives us a rough upper bound.
I agree that the norm I’m implicitly promoting, that organizations should offer cash prizes for the best criticisms of what they’re doing, is an unusual one. So to put my money where my mouth is, I’ll offer $20 (more than 0.01% of my annual budget!) for the best arguments for why this norm should not be promoted or at least experimented with. Enter by replying to this comment. (Even if you previously appeared to express support for this idea, you’re definitely still allowed to enter!) I’ll judge the contest at some point between Sept 20 and the end of the month, splitting $20 among some number of entries which I will determine while judging. Please promote this contest wherever you feel is appropriate. I’ll set up a reminder for myself to do judging, but I appreciate reminders from others also.
GiveWell used to solicit external feedback a fair bit years ago, but (as I understand it) stopped doing so because it found that it generally wasn’t useful. Their blog post External evaluation of our research goes some way to explaining why. I could imagine a lot of their points apply to CEA too.
I think you’re coming at this from a point of view of “more feedback is always better”, forgetting that making feedback useful can be laborious: figuring out which parts of a piece of feedback are accurate and actionable can be at least as hard as coming up with the feedback in the first place, and while soliciting comments can give you raw material, if your bottleneck is not on raw material but instead on which of multiple competing narratives to trust, you’re not necessarily gaining anything by hearing more copies of each.
Certainly you won’t gain anything for free, and you may not be able to afford the non-monetary cost.
However, I don’t think you’re representing that blog post accurately. You write that Givewell “stopped [soliciting external feedback] because it found that it generally wasn’t useful”, but at the top of the blog post, it says Givewell stopped because “The challenges of external evaluation are significant” and “The level of in-depth scrutiny of our work has increased greatly”. Later it says “We continue to believe that it is important to ensure that our work is subjected to in-depth scrutiny.”
I also don’t think we can generalize from Givewell to CEA easily. Compare the number of EAs who carefully read Givewell’s reports (not that many?) with the number of EAs who are familiar with various aspects of CEA’s work (lots). Since CEA’s work is the EA community, which should expect a lot of relevant local knowledge to reside in the EA community—knowledge which CEA could try & gather in a proactive way.
Check out the “Improvements in informal evaluation” section for some of the things Givewell is experimenting with in terms of critical feedback. When I read this section, I get the impression of an organization which is eager to gather critical feedback and experiment with different means for doing so. It doesn’t seem like CEA is trying as many things here as Givewell is—despite the fact that I expect external feedback would be more useful for it.
if your bottleneck is not on raw material but instead on which of multiple competing narratives to trust, you’re not necessarily gaining anything by hearing more copies of each.
I would say just the opposite. If you’re hearing multiple copies of a particular narrative, especially from a range of different individuals, that’s evidence you should trust it.
If you’re worried about feedback not being actionable, you could tell people that if they offer concrete suggestions, that will increase their chance of winning the prize.
The main barrier to self improvement isn’t knowing your weaknesses, it’s fixing them.
I believe CEA is aware of several of its weaknesses. Publicly pointing out weaknesses they’re already aware of is a waste of donors’ money and critics’ time. It’s also a needles reputational risk.
If I’m right, and CEA is already aware of its main flaws, then they should focus on finding and implementing solutions. Focusing instead on crowdsourcing more flaws won’t help; it will only distract staff from implementing solutions.
These are good points, upvoted. However, I don’t think they undermine the fundamental point: even if this is all true, CEA could publish a list of their known weaknesses and what they plan to do to fix them, and offer prizes for either improved understanding of their weaknesses (e.g. issues they weren’t aware of), or feedback on their plans to fix them. I would guess they would get their money’s worth.
Placing a bounty for writing criticisms casts doubt on whether those criticisms are actually sincere, or whether they’re just bs-ing and overstating certain things and omitting other considerations to write those most compelling criticism they can. It’s like reading a study written by someone with a conflict of interest – it’s very easy to dismiss it out of hand. If CEA were to offer a financial incentive for critiques, then all critiques of CEA become less trustworthy. I think it would be more productive to encourage people to offer the most thoughtful suggestions on how to improve, even if that means scaling up certain things because they were successful, and not criticism per se.
Thanks for the feedback, these are points worth considering.
bs-ing and overstating certain things and omitting other considerations to write those most compelling criticism they can
Hm, my thought was that CEA would be the ones choosing the winners, and presumably CEA’s definition of a “compelling” criticism could be based on how insightful or accurate CEA perceives the criticism to be rather than how negative it is.
It’s like reading a study written by someone with a conflict of interest – it’s very easy to dismiss it out of hand.
An alternative analogy is making sure that someone accused of a crime gets a defense lawyer. We want people who are paid to tell both sides of the story.
In any case, the point is not whether we should overall be pro/con CEA. The point is what CEA should do to improve. People could have conflicts of interest regarding specific changes they’d like to see CEA make, but the contest prize seems a bit orthogonal to those conflicts, and indeed could surface suggestions that are valuable precisely because no one currently has an incentive to make them.
If CEA were to offer a financial incentive for critiques, then all critiques of CEA become less trustworthy.
I don’t see how critiques which aren’t offered in the context of the contest would be affected.
I think it would be more productive to encourage people to offer the most thoughtful suggestions on how to improve, even if that means scaling up certain things because they were successful, and not criticism per se.
Maybe you’re right and this is a better scheme. I guess part of my thinking was that there are social incentives which discourage criticism, and cash could counteract those, and additionally people who are pessimistic about your organization could have some of the most valuable feedback to offer, but because they’re pessimistic they will by default focus on other things and might only be motivated by a cash incentive. But I don’t know.
There is an established industry committed to providing criticism from outside (well, kind of): external auditors, commonly known as the Big 4. These companies are paid by usually big firms to evaluate their financial statements with regards to accuracy and unlawful activity. While these accountants are supposed to serve the shareholders of the company and the public, they are remunerated and chosen by the companies themselves, which creates an obvious incentive problem. Empirically, this has led to serious doubt about the quality of their work, even after governments had to step in because of poor audits and provide stringent legal requirements for auditors. See: https://www.economist.com/leaders/2018/05/24/reforming-the-big-four
Essentially, a similar problem would arise if CEA would pay external people to provide feedback, which is something GiveWell also ran into (from memory: the page somebody below already linked outlines that finding people who are qualified AND willing to provide free criticism is really hard). If you pay a reviewer beforehand, how do you choose a reviewer? Having such a reviewer might actually be a net negative, if it provides a false sense of security (in probabilistic terms: it would seem from the outside that estimates A and B are independent of each other, but in fact since the first evaluator chooses the second they are not). If you use a format like the current one, where everybody is free to submit criticism, but the organization itself chooses the best arguments, there is no incentive for the organization to pick the most scathing criticisms, when it could just as well pick only moderate ones. (although it is probably better to incorporate moderate criticism rather than none at all)
Even if you solve the incentive problem somehow, there is a danger to public criticism campaigns like that: that they will provide a negative impression of the organization to outside people that do not read about the positive aspects of the organization/movement. There are several reasons to consider this as a realistic danger: 1) On the internet people seem to really love reading negative pieces, they capture our interest and they are shared more often. 2)The more negative the opinion expressed, the more salient to the memory it is. 3) With EA, it’s likely that this might end up being on of the first impressions people have of it.
4)All of this is what happened above with the link to the glassdoor review of CEA: we now have a discussion in this thread about the negative reviews on there, but not really of the positive ones. Previously I had no special information about whether CEA was internally open to self-criticism, but now I only have these negative reviews to go on with and I expect that in a year I will still remember them.
I realize that these points do not necessarily apply to asking for external criticism in itself, just for certain ways to go about it, but I do believe that avoiding the aforementioned problems requires clever and nontrivial design.
there is no incentive for the organization to pick the most scathing criticisms, when it could just as well pick only moderate ones.
If a particular criticism gets a lot of upvotes on the forum, but CEA ignores it and doesn’t give it a prize, that looks a little suspicious.
Even if you solve the incentive problem somehow, there is a danger to public criticism campaigns like that: that they will provide a negative impression of the organization to outside people that do not read about the positive aspects of the organization/movement.
You could be right. However, I haven’t seen anyone get in this kind of trouble for having a “mistakes” page. It seems possible to me that these kind of measures can proactively defuse the discontent that can lead to real drama if suppressed long enough. Note that the thing that stuck in your head was not any particular criticism of CEA, but rather just the notion that criticism might be being suppressed—I wonder if that is what leads to real drama! But you could have a good point, maybe CEA is too important of an organization to be the first ones to experiment with doing this kind of thing.
Thanks to everyone who entered this contest! I decided to split the prize money evenly between the four entries. Winners, please check your private messages for payment details!
Thanks for raising these points, John! I hadn’t considered the “cash prize for criticism” idea before, but it does seem like it’s worth more consideration.
I agree that CEA could do better on the front of generating criticisms from outside the organization, as well as making it easier for staff to criticize leadership. This is one of the key things that we have been working to improve since I took up the Interim Executive Director role in early 2019. Back in January/February, we did a big push on this, logging around 100 hours of user interviews in a few weeks, and sending out surveys to dozens of community members for feedback. Since then, we’ve continued to invest in getting feedback, e.g. staff regularly talk to community members to get feedback on our projects (though I think we could do more); similarly, we reach out to donors and advisors to get feedback on how we could improve our projects; we also have various (including anonymous) mechanisms for staff to raise concerns about management decisions. Together, I think these represent more than 0.1% of CEA’s staff time. None of this is to say that this is going as well as we’d like—maybe I’d say one of CEA’s “known weaknesses” is that I think we could stand to do more of this.
I agree that more of this could be public and transparent also—e.g. I’m aware that our mistakes page (https://centreforeffectivealtruism.org/our-mistakes) is incomplete. We’re currently nearing the end of our search for a new CEO, and one of the things that I think they’re likely to want to do is to communicate more with the community, and solicit the community’s thoughts on future plans.
my experience is that CEA circa late 2019 is intensely self-reflective; I’m prompted multiple times in the average week to put serious thought into ways we can improve our processes and public communication.
I guess a practical way to measure creativity could be to give candidates a take-home problem which is a description of one of the organization’s current challenges :P I suspect take-home problems are in general a better way to measure creativity, because if it’s administered in a conversational interview context, I imagine it’d be more of a test of whether someone can be relaxed & creative under pressure.
BTW, another point related to creativity and exclusivity is that outsiders often have a fresh perspective which brings important new ideas.
Not sure what “attenuation” means in this context.
It’s probably correction for attenuation: ‘Correction for attenuation is a statistical procedure … to “rid a correlation coefficient from the weakening effect of measurement error”.’
Ah, thanks! So as a practical matter it seems like we probably shouldn’t correct for attenuation in this context and lean towards the correlation coefficient being more like 0.26? Honestly that seems a bit implausibly low. Not sure how much stock to put in this paper even if it is a meta-analysis. Maybe better to read it before taking it too seriously.
I’d correct for attenuation, as we care more about getting the people who in fact will perform the best, rather than those who will seem like they are performing the best by our imperfect measurement.
Also selection procedures can gather other information (e.g. academic history, etc.) which should give incremental validity over work samples. I’d guess this should boost correlation, but there are countervailing factors (e.g., range restriction).
Oh interesting, I was thinking it would be bad to correct for measurement error in the work sample (since measurement error is a practical concern when it comes to how predictive it is.) But I guess you’re right that it would be reasonable to correct for measurement error in the measure of employee performance.
Nice post!
Re: sequestration, OpenPhil has written about the difficulty of getting honest, critical feedback as a grantmaker. This seems like something all grantmakers should keep in mind. The danger seems especially high for an organization like OpenPhil or CEA, which is grantmaking all over the EA movement with EA Grants and EA Funds. Unfortunately, some reports from ex-employees of CEA on Glassdoor give me the impression CEA is not as proactive in its self-skepticism as OpenPhil:
I’ve also heard an additional anecdote about CEA, independent of Glassdoor, which is compatible with this impression.
The question of whether and how much to prioritize those who appear most talented is tricky. I get the impression there has been a gradual but substantial update away from mass outreach over the past few years (though some answers in Will’s AMA make me wonder if he and maybe others are trying to push back against what they see as excessive “hero worship” etc.) Anyway, some thoughts on this:
I think it’s not always obvious how much of the work attributed to one famous person should really be credited to a much larger team. For example, one friend of mine cited the massive amount of money Bill Gates made as evidence that impact is highly disproportionate. However, I would guess in many cases, successful entrepreneurs at the $100M+ scale are distinguished by their ability to identify & attract great people to work for their company. I think maybe there is some quirk of our society where we want to credit just a few individuals with an impressive accomplishment even when the “correct” assignment of credit doesn’t actually follow a power law distribution. [For a concrete example where we have data available, I think claims about Wikipedia editor contributions following a power law distribution have been refuted.]
Even in cases where individual impact will be power law distributed, that doesn’t mean we can reliably identify the people at the top of the distribution in advance. For example, this paper apparently found that work sample tests only correlated with job performance at around 0.26-0.33! (Not sure what “attenuation” means in this context.) Anyway, maybe we could do some analysis: If you have applicant pool with N applicants, and you’re going to hire the top K applicants based on a work sample test which correlates with job performance at 0.3, what does K need to be for you to have a 90% chance of hiring the best applicant? (I’d actually argue that the premise of this question is flawed, because the hypothetical 10x applicant is probably going to achieve 10x performance through some creative insights which the work sample test predicts even less well, but I’d still be interested in seeing the results of the analysis. Actually, speaking of creativity, have any EA organizations experimented with using tests of creative ability in their hiring?)
Finally, I think it could be useful to differentiate between “elitism” and “exclusivity”. For example, I once did some napkin math suggesting that less than 0.01% of the people who watch Peter Singer’s TED talk later become EAs. So arguably, this is actually a pretty strong signal of dedication & willingness to take ideas seriously compared to, say, someone who was persuaded to become an EA through an element of peer pressure after several friends became interested. But the second person is probably going to better connected within EA. So if the movement becomes more “exclusive”, in the sense of using someone’s position in the social scene as a proxy for their importance, I suspect we’d be getting it wrong. When I think of the EAs who seem very dedicated to making an impact, people I’m excited about, they’re often people who came to EA on their own and in some cases still aren’t very well-connected.
I’m glad people want to look for evidence that CEA (and other orgs) is being adequately self-reflective. However, I’d like to give some additional context on Glassdoor. Of the five CEA reviews posted there:
Two are from people who have confused CEA with other organizations (neither of those were cited in John’s comment)
One is fairly recent and positive (also not cited)
One is from September 2016, at which point only three of CEA’s current staff were employed by the organization (three-and-a-half if you count Owen Cotton-Barratt, who is currently a part-time advisor to CEA).
One is from March 2018 -- more recent, but still representing a substantial departure from CEA’s current staff list, including a different executive director. A lot can change over the course of 18 months.
I’ll refrain from going into too much detail, but my experience is that CEA circa late 2019 is intensely self-reflective; I’m prompted multiple times in the average week to put serious thought into ways we can improve our processes and public communication.
I imagine that the ex-staff who complain about them being “hostile to honest self-assessment, let alone internal concerns about effectiveness” are likely referring more to something like self-criticism, rather than simply self-reflection. Even an org or individual that was entirely cynically dedicated to maximising their prestige rather than doing good, would self-reflect about how to communicate more effectively.
It does seem a bit weird to me for an organization to claim to be self-critical but put relatively little effort into soliciting external critical feedback. Like, CEA has a budget of $5M. To my knowledge, not even 0.01% of that budget is going into cash prizes for the best arguments that CEA is on the wrong track with any of its activities. This suggests either (a) an absurd level of confidence, on the order of 99.99%, that all the knowledge + ideas CEA needs are in the heads of current employees or (b) a preference for preserving the organization’s image over actual effectiveness. Not to rag on CEA specifically—just saying if an organization claims to be self-critical, maybe we should check to see if they’re putting their money where their mouth is.
(One possible counterpoint is that EAs are already willing to provide external critical feedback. However, Will recently said he thought EA was suffering too much from deference/information cascades. Prizes for criticism seem like they could be an effective way to counteract that.)
Is putting some non-trivial budget into cash prizes for arguments against what you do the only way to show you’re self-critical? Your statement suggests you believe something like that. But that doesn’t seem the only way to show you’re self-critical. I can’t think of any other organisation that have ever done that, so if it is the only way to show you’re self-critical, that suggests no organisation (I’ve heard of) is self-critical, which seems false. I wonder if you’re holding CEA to a peculiarly high standard; would you expect MIRI, 80k, the Gates Foundation, Google, etc. to do the same?
I’m suggesting that the revealed preferences of most organizations, including CEA, indicate they aren’t actually very self-critical. Hence the “Not to rag on CEA specifically” bit.
I think we’re mostly in agreement that CEA isn’t less self-critical than the average organization. Even one of the Glassdoor reviewers wrote: “Not terribly open to honest self-assessment, but no more so than the average charity.” (emphasis mine) However, aarongertler’s reply made it sound like he thought CEA was very self-critical… so I think it’s reasonable to ask why less than 0.01% of CEA’s cash budget goes to self-criticism, if someone makes that claim.
How meaningful is an organization’s commitment to self-criticism, exactly? I think the fraction of their cash budget devoted to self-criticism gives us a rough upper bound.
I agree that the norm I’m implicitly promoting, that organizations should offer cash prizes for the best criticisms of what they’re doing, is an unusual one. So to put my money where my mouth is, I’ll offer $20 (more than 0.01% of my annual budget!) for the best arguments for why this norm should not be promoted or at least experimented with. Enter by replying to this comment. (Even if you previously appeared to express support for this idea, you’re definitely still allowed to enter!) I’ll judge the contest at some point between Sept 20 and the end of the month, splitting $20 among some number of entries which I will determine while judging. Please promote this contest wherever you feel is appropriate. I’ll set up a reminder for myself to do judging, but I appreciate reminders from others also.
GiveWell used to solicit external feedback a fair bit years ago, but (as I understand it) stopped doing so because it found that it generally wasn’t useful. Their blog post External evaluation of our research goes some way to explaining why. I could imagine a lot of their points apply to CEA too.
I think you’re coming at this from a point of view of “more feedback is always better”, forgetting that making feedback useful can be laborious: figuring out which parts of a piece of feedback are accurate and actionable can be at least as hard as coming up with the feedback in the first place, and while soliciting comments can give you raw material, if your bottleneck is not on raw material but instead on which of multiple competing narratives to trust, you’re not necessarily gaining anything by hearing more copies of each.
Certainly you won’t gain anything for free, and you may not be able to afford the non-monetary cost.
Upvoted for relevant evidence.
However, I don’t think you’re representing that blog post accurately. You write that Givewell “stopped [soliciting external feedback] because it found that it generally wasn’t useful”, but at the top of the blog post, it says Givewell stopped because “The challenges of external evaluation are significant” and “The level of in-depth scrutiny of our work has increased greatly”. Later it says “We continue to believe that it is important to ensure that our work is subjected to in-depth scrutiny.”
I also don’t think we can generalize from Givewell to CEA easily. Compare the number of EAs who carefully read Givewell’s reports (not that many?) with the number of EAs who are familiar with various aspects of CEA’s work (lots). Since CEA’s work is the EA community, which should expect a lot of relevant local knowledge to reside in the EA community—knowledge which CEA could try & gather in a proactive way.
Check out the “Improvements in informal evaluation” section for some of the things Givewell is experimenting with in terms of critical feedback. When I read this section, I get the impression of an organization which is eager to gather critical feedback and experiment with different means for doing so. It doesn’t seem like CEA is trying as many things here as Givewell is—despite the fact that I expect external feedback would be more useful for it.
I would say just the opposite. If you’re hearing multiple copies of a particular narrative, especially from a range of different individuals, that’s evidence you should trust it.
If you’re worried about feedback not being actionable, you could tell people that if they offer concrete suggestions, that will increase their chance of winning the prize.
The main barrier to self improvement isn’t knowing your weaknesses, it’s fixing them.
I believe CEA is aware of several of its weaknesses. Publicly pointing out weaknesses they’re already aware of is a waste of donors’ money and critics’ time. It’s also a needles reputational risk.
If I’m right, and CEA is already aware of its main flaws, then they should focus on finding and implementing solutions. Focusing instead on crowdsourcing more flaws won’t help; it will only distract staff from implementing solutions.
These are good points, upvoted. However, I don’t think they undermine the fundamental point: even if this is all true, CEA could publish a list of their known weaknesses and what they plan to do to fix them, and offer prizes for either improved understanding of their weaknesses (e.g. issues they weren’t aware of), or feedback on their plans to fix them. I would guess they would get their money’s worth.
Placing a bounty for writing criticisms casts doubt on whether those criticisms are actually sincere, or whether they’re just bs-ing and overstating certain things and omitting other considerations to write those most compelling criticism they can. It’s like reading a study written by someone with a conflict of interest – it’s very easy to dismiss it out of hand. If CEA were to offer a financial incentive for critiques, then all critiques of CEA become less trustworthy. I think it would be more productive to encourage people to offer the most thoughtful suggestions on how to improve, even if that means scaling up certain things because they were successful, and not criticism per se.
Thanks for the feedback, these are points worth considering.
Hm, my thought was that CEA would be the ones choosing the winners, and presumably CEA’s definition of a “compelling” criticism could be based on how insightful or accurate CEA perceives the criticism to be rather than how negative it is.
An alternative analogy is making sure that someone accused of a crime gets a defense lawyer. We want people who are paid to tell both sides of the story.
In any case, the point is not whether we should overall be pro/con CEA. The point is what CEA should do to improve. People could have conflicts of interest regarding specific changes they’d like to see CEA make, but the contest prize seems a bit orthogonal to those conflicts, and indeed could surface suggestions that are valuable precisely because no one currently has an incentive to make them.
I don’t see how critiques which aren’t offered in the context of the contest would be affected.
Maybe you’re right and this is a better scheme. I guess part of my thinking was that there are social incentives which discourage criticism, and cash could counteract those, and additionally people who are pessimistic about your organization could have some of the most valuable feedback to offer, but because they’re pessimistic they will by default focus on other things and might only be motivated by a cash incentive. But I don’t know.
There is an established industry committed to providing criticism from outside (well, kind of): external auditors, commonly known as the Big 4. These companies are paid by usually big firms to evaluate their financial statements with regards to accuracy and unlawful activity. While these accountants are supposed to serve the shareholders of the company and the public, they are remunerated and chosen by the companies themselves, which creates an obvious incentive problem. Empirically, this has led to serious doubt about the quality of their work, even after governments had to step in because of poor audits and provide stringent legal requirements for auditors. See: https://www.economist.com/leaders/2018/05/24/reforming-the-big-four
Essentially, a similar problem would arise if CEA would pay external people to provide feedback, which is something GiveWell also ran into (from memory: the page somebody below already linked outlines that finding people who are qualified AND willing to provide free criticism is really hard). If you pay a reviewer beforehand, how do you choose a reviewer? Having such a reviewer might actually be a net negative, if it provides a false sense of security (in probabilistic terms: it would seem from the outside that estimates A and B are independent of each other, but in fact since the first evaluator chooses the second they are not). If you use a format like the current one, where everybody is free to submit criticism, but the organization itself chooses the best arguments, there is no incentive for the organization to pick the most scathing criticisms, when it could just as well pick only moderate ones. (although it is probably better to incorporate moderate criticism rather than none at all)
Even if you solve the incentive problem somehow, there is a danger to public criticism campaigns like that: that they will provide a negative impression of the organization to outside people that do not read about the positive aspects of the organization/movement. There are several reasons to consider this as a realistic danger: 1) On the internet people seem to really love reading negative pieces, they capture our interest and they are shared more often. 2)The more negative the opinion expressed, the more salient to the memory it is. 3) With EA, it’s likely that this might end up being on of the first impressions people have of it.
4)All of this is what happened above with the link to the glassdoor review of CEA: we now have a discussion in this thread about the negative reviews on there, but not really of the positive ones. Previously I had no special information about whether CEA was internally open to self-criticism, but now I only have these negative reviews to go on with and I expect that in a year I will still remember them.
I realize that these points do not necessarily apply to asking for external criticism in itself, just for certain ways to go about it, but I do believe that avoiding the aforementioned problems requires clever and nontrivial design.
Thanks, interesting points!
If a particular criticism gets a lot of upvotes on the forum, but CEA ignores it and doesn’t give it a prize, that looks a little suspicious.
You could be right. However, I haven’t seen anyone get in this kind of trouble for having a “mistakes” page. It seems possible to me that these kind of measures can proactively defuse the discontent that can lead to real drama if suppressed long enough. Note that the thing that stuck in your head was not any particular criticism of CEA, but rather just the notion that criticism might be being suppressed—I wonder if that is what leads to real drama! But you could have a good point, maybe CEA is too important of an organization to be the first ones to experiment with doing this kind of thing.
Thanks to everyone who entered this contest! I decided to split the prize money evenly between the four entries. Winners, please check your private messages for payment details!
Thanks for raising these points, John! I hadn’t considered the “cash prize for criticism” idea before, but it does seem like it’s worth more consideration.
I agree that CEA could do better on the front of generating criticisms from outside the organization, as well as making it easier for staff to criticize leadership. This is one of the key things that we have been working to improve since I took up the Interim Executive Director role in early 2019. Back in January/February, we did a big push on this, logging around 100 hours of user interviews in a few weeks, and sending out surveys to dozens of community members for feedback. Since then, we’ve continued to invest in getting feedback, e.g. staff regularly talk to community members to get feedback on our projects (though I think we could do more); similarly, we reach out to donors and advisors to get feedback on how we could improve our projects; we also have various (including anonymous) mechanisms for staff to raise concerns about management decisions. Together, I think these represent more than 0.1% of CEA’s staff time. None of this is to say that this is going as well as we’d like—maybe I’d say one of CEA’s “known weaknesses” is that I think we could stand to do more of this.
I agree that more of this could be public and transparent also—e.g. I’m aware that our mistakes page (https://centreforeffectivealtruism.org/our-mistakes) is incomplete. We’re currently nearing the end of our search for a new CEO, and one of the things that I think they’re likely to want to do is to communicate more with the community, and solicit the community’s thoughts on future plans.
Nice!
Glad to hear it!
I guess a practical way to measure creativity could be to give candidates a take-home problem which is a description of one of the organization’s current challenges :P I suspect take-home problems are in general a better way to measure creativity, because if it’s administered in a conversational interview context, I imagine it’d be more of a test of whether someone can be relaxed & creative under pressure.
BTW, another point related to creativity and exclusivity is that outsiders often have a fresh perspective which brings important new ideas.
It’s probably correction for attenuation: ‘Correction for attenuation is a statistical procedure … to “rid a correlation coefficient from the weakening effect of measurement error”.’
Ah, thanks! So as a practical matter it seems like we probably shouldn’t correct for attenuation in this context and lean towards the correlation coefficient being more like 0.26? Honestly that seems a bit implausibly low. Not sure how much stock to put in this paper even if it is a meta-analysis. Maybe better to read it before taking it too seriously.
I’d correct for attenuation, as we care more about getting the people who in fact will perform the best, rather than those who will seem like they are performing the best by our imperfect measurement.
Also selection procedures can gather other information (e.g. academic history, etc.) which should give incremental validity over work samples. I’d guess this should boost correlation, but there are countervailing factors (e.g., range restriction).
Oh interesting, I was thinking it would be bad to correct for measurement error in the work sample (since measurement error is a practical concern when it comes to how predictive it is.) But I guess you’re right that it would be reasonable to correct for measurement error in the measure of employee performance.