Red teaming papers as an EA training exercise?
I think a plausibly good training exercise for EAs wanting to be better at empirical/conceptual research is to deep dive into seminal papers/blog posts and attempt to identify all the empirical and conceptual errors in past work, especially writings by either a) other respected EAs or b) other stuff that we otherwise think of as especially important.
I’m not sure how knowledgeable you have to be to do this well, but I suspect it’s approachable for smart people who finish high school, and certainly by the time they finish undergrad^ with a decent science or social science degree.
I think this is good career building for various reasons:
you can develop a healthy skepticism of the existing EA orthodoxy
I mean skepticism that’s grounded in specific beliefs about why things ought to be different, rather than just vague “weirdness heuristics” or feeling like the goals of EA conflict with other tribal goals.
(I personally have not found high-level critiques of EA, and I have read many, to be particularly interesting or insightful, but this is just a personal take).
you actually deeply understand at least one topic well enough to point out errors
For many people and personalities, critiquing a specific paper/blog post may be a less hairy “entry point” into doing EA-research adjacent work than plausible alternatives like trying to form your own deep inside views on questions that are really open-ended and ambiguous like “come up with a novel solution in AI alignment” or “identify a new cause X”
creates legible career capital (at least within EA)
requires relatively little training/guidance from external mentors, meaning
our movement devotes less scarce mentorship resources into this
people with worse social skills/network/geographical situation don’t feel (as much) at a disadvantage for getting the relevant training
you can start forming your own opinions/intuitions of both object-level and meta-level heuristics for what things are likely to be correct vs wrong.
In some cases, the errors are actually quite big, and worth correcting (relevant parts of ) the entire EA movement on.
Main “cons” I can think of:
I’m not aware of anybody successfully doing a really good critique for the sake of doing a really good critique. The most exciting things I’m aware of (publicly, zdgroff’s critique of Ng’s original paper on wild animal suffering, alexrjl’s critique of Giving Green. I also have private examples) mostly comes from people trying to deeply understand a thing for themselves, and then along the way spotting errors with existing work.
It’s possible that doing deliberate “red-teaming” would make one predisposed to spot trivial issues rather than serious ones, or falsely identify issues where there aren’t any.
Maybe critiques are a less important skill to develop than forming your own vision/research direction and executing on it, and telling people to train for this skill might actively hinder their ability to be bold & imaginative?
^ Of course, this depends on field. I think even relatively technical papers within EA are readable to a recent undergrad who cares enough, but this will not be true for eg (most) papers in physics or math.
One additional risk: if done poorly, harsh criticism of someone else’s blog post from several years ago could be pretty unpleasant and make the EA community seem less friendly.
I’m actually super excited about this idea though—let’s set some courtesy norms around contacting the author privately before red-teaming their paper and then get going!
I think I agree this is a concern. But just so we’re on the same page here, what’s your threat model? Are you more worried about
The EA community feeling less pleasant and friendly to existing established EAs, so we’ll have more retention issues with people disengaging?
The EA community feeling less pleasant and friendly to newcomers, so we have more issues with recruitment and people getting excited to join novel projects?
Criticism makes being open about your work less pleasant, and open Red Teaming about EA projects makes EA move even further in the direction of being less open than we used to be. See also Responsible Transparency Consumption.
It’s actually a bit of numbers 1-3; I’m imagining decreased engagement generally, especially sharing ideas transparently.
Thanks for the excitement! I agree that contacting someone ahead of time might be good (so at least someone doesn’t learn about their project being red teamed until social media blows up), but I feel like it might not mitigate most of the potential unpleasantness/harshness. Like I don’t see a good cultural way to both incentivize Red Teaming and allow a face-saving way to refuse to let your papers be Red Teamed. Like if Red Teaming is opt-in by default, I’d worry a lot about this not taking off the ground, while if Red Teaming is opt-out by default I’d just find it very suss for anybody to refuse (speaking for myself, I can’t imagine ever refusing Red Teaming even if I would rather it not happen).
Easy steps could be to add a “red team” tag on the forum and point to this post to encourage people to do this.
I have at times given advice to early career EA’s mostly in AI Safety similar to this. When people have trouble coming up with something they might want to write about on the forum, I encourage them to look for the things they don’t think are true. Most people are passively reading the forum anyway but actively looking for something the reader doesn’t think is true or is unconvinced by can be a good starting point for a post. It may be that they end up convinced of the point but can still write a post making is clearer and adding the arguments they found.
Having said this, most peoples first reaction is a terrified look. Encouraging someone’s first post to be a criticism is understandably scary.
It may be hard to get both the benefit to the participants and to the orgs. Anyone not intimidated by this might already have enough experience and career capital. To give juniors the experience you might have to make it more comfortable school work where the paper is written but only read by one other person. This makes it harder to capture the career capital.
I’d expect this to be unlikely for someone to do individually and of their own accord. At the very least best to do this in small groups to create social accountability and commitment pressures. While also defusing the intimidation. Alternately part of an existing program like an EA Fellowship. Even better as it’s own program, with all the overhead that comes with that.
I would be very excited about someone experimenting with this and writing up the results. (And would be happy to provide EAIF funding for this if I thought the details of the experiment were good and the person a good fit for doing this.)
If I had had more time, I would have done this for the EA In-Depth Fellowship seminars I designed and piloted recently.
I would be particularly interested in doing this for cases where there is some amount of easily transmissible ‘ground truth’ people can use as feedback signal. E.g.
You first let people red-team deworming papers and then give them some more nuanced ‘Worm Wars’ stuff. (Where ideally you want people to figure out “okay, despite paper X making that claim we shouldn’t believe that deworming helps with short/mid-term education outcomes, but despite all the skepticism by epidemiologists here is why it’s still a great philanthropic bet overall”—or whatever we think the appropriate conclusion is.)
You first let people red-team particular claims about the effects on hen welfare from battery cages vs. cage-free environments and then you show them Ajeya’s report.
You first let people red-team particular claims about the impacts of the Justinian plague and then you show them this paper.
You first let people red-team particular claims about “X is power-law distributed” and then you show them Clauset et al., Power-law distributions in empirical data.
(Collecting a list of such examples would be another thing I’d be potentially interested to fund.)
Hmm I feel more uneasy about the truthiness grounds of considering some of these examples as “ground truth” (except maybe the Clauset et al example, not sure). I’d rather either a) train people to Red Team existing EA orthodoxy stuff and let their own internal senses + mentor guidance decide whether the red teaming is credible or b) for basic scientific literacy stuff where you do want clear ground truths, let them challenge stuff that’s closer to obvious junk (Why We Sleep, some climate science stuff, maybe some covid papers, maybe pull up examples from Calling Bullshit, which I have not read).
That seems fair. To be clear, I think “ground truth” isn’t the exact framing I’d want to use, and overall I think the best version of such an exercise would encourage some degree of skepticism about the alleged ‘better’ answer as well.
Assuming it’s framed well, I think there are both upsides and downsides to using examples that are closer to EA vs. clearer-cut. I’m uncertain on what seemed better overall if I could only do one of them.
Another advantage of my suggestion in my view is that it relies less on mentors. I’m concerned that having mentors that are less epistemically savvy than the best participants can detract a lot from the optimal value that exercise might provide, and that it would be super hard to ensure adequate mentor quality for some audiences I’d want to use this exercise for. Even if you’re less concerned about this, relying on any kind of plausible mentor seems like less scaleable than a version that only relies on access to published material.
Upon (brief) reflection I agree that relying on the epistemic savviness of the mentors might be too much and the best version of the training program will train a sort of keen internal sense of scientific skepticism that’s not particularly reliant on social approval. If we have enough time I would float a version of a course that slowly goes from very obvious crap (marketing tripe, bad graphs) into things that are subtler crap (Why We Sleep, Bem ESP stuff) into weasely/motivated stuff (Hickel? Pinker? Sunstein? popular nonfiction in general?) into things that are genuinely hard judgment calls (papers/blog posts/claims accepted by current elite EA consensus). But maybe I’m just remaking the Calling Bullshit course but with a higher endpoint.___(I also think it’s plausible/likely that my original program of just giving somebody an EA-approved paper + say 2 weeks to try their best to Red Team it will produce interesting results, even without all these training wheels).
This also reminds me of a recent shortform by Buck:
I want to have a program to fund people to write book reviews and post them to the EA Forum or LessWrong. (This idea came out of a conversation with a bunch of people at a retreat; I can’t remember exactly whose idea it was.) Basic structure:Someone picks a book they want to review.Optionally, they email me asking how on-topic I think the book is (to reduce the probability of not getting the prize later).They write a review, and send it to me.If it’s the kind of review I want, I give them $500 in return for them posting the review to EA Forum or LW with a “This post sponsored by the EAIF” banner at the top. (I’d also love to set up an impact purchase thing but that’s probably too complicated).If I don’t want to give them the money, they can do whatever with the review.[...]Suggested elements of a book review:One paragraph summary of the bookHow compelling you found the book’s thesis, and whyThe main takeaways that relate to vastly improving the world, with emphasis on the surprising onesOptionally, epistemic spot checksOptionally, “book adversarial collaborations”, where you actually review two different books on the same topic.
I want to have a program to fund people to write book reviews and post them to the EA Forum or LessWrong. (This idea came out of a conversation with a bunch of people at a retreat; I can’t remember exactly whose idea it was.)
Someone picks a book they want to review.
Optionally, they email me asking how on-topic I think the book is (to reduce the probability of not getting the prize later).
They write a review, and send it to me.
If it’s the kind of review I want, I give them $500 in return for them posting the review to EA Forum or LW with a “This post sponsored by the EAIF” banner at the top. (I’d also love to set up an impact purchase thing but that’s probably too complicated).
If I don’t want to give them the money, they can do whatever with the review.
Suggested elements of a book review:
One paragraph summary of the book
How compelling you found the book’s thesis, and why
The main takeaways that relate to vastly improving the world, with emphasis on the surprising ones
Optionally, epistemic spot checks
Optionally, “book adversarial collaborations”, where you actually review two different books on the same topic.
(I think the full shortform and the comments below it are also worth reading.)
This is another example of a Shortform that could be an excellent top-level post (especially as it’s on-theme with the motivated reasoning post that was just published). I’d love to see see this spend a week on the front page and perhaps convince some readers to try doing some red-teaming for themselves. Would you consider creating a post?
I think your cons are good things to have noted, but here are reasons why two of them might matter less than one might think:
I think the very fact that “It’s possible that doing deliberate “red-teaming” would make one predisposed to spot trivial issues rather than serious ones, or falsely identify issues where there aren’t any” could actually also make this useful for skill-building and testing fit; people will be forced to learn to avoid those failure modes, and “we” (the community, potential future hirers, etc.) can see how well they do so.
E.g., to do this red teaming well, they may have to learn to identify how central an error is to a paper/post’s argument, to think about whether a slightly different argument could reach the same conclusion without needing the questionable premise, etc.
I have personally found that the line between “noticing errors in existing work” and “generating novel research” is pretty blurry.
A decent amount of the research I’ve done (especially some that is unfortunately nonpublic so far) has basically followed the following steps:
“This paper/post/argument seems interesting and important”
“Oh wait, it actually requires a premise that they haven’t noted and that seems questionable” / “It ignores some other pathway by which a bad thing can happen” / “Its concepts/definitions are fuzzy or conflate things in way that may obscure something important”
[I write a post/doc that discusses that issue, provides some analysis in light of this additional premise being required or this other pathway being possible or whatever, and discussing what implications this has—e.g., whether some risk is actually more or less important than we thought, or what new intervention ideas this alternative risk pathway suggests might be useful]
Off the top of my head, some useful pieces of public work by other people that I feel could be roughly described as “red teaming that turned into novel research” include A Proposed Adjustment to the Astronomical Waste Argument and The long-term significance of reducing global catastrophic risks
I’d guess that the same could also sometimes happen with this red teaming, especially if that was explicitly encouraged, people were given guidance on how to lean into this more “novel research” element when they notice something potentially major during the red teaming, people were given examples of how that has happened in the past, etc.
Strong upvote for a idea that seems directly actionable and useful for addressing important problem.
I’m gonna quote your shortform in full (with a link and attribution, obviously) in a comment on my post about Intervention options for improving the EA-aligned research pipeline.
I think by default good ideas like this never really end up happening, which is sad. Do you or other people have thoughts on how to make your idea actually happen? Some quick thoughts from me:
Just highlight this idea on the Forum more often/prominently
People giving career advice or mentorship to people interested in EA-aligned research careers mention this as one way of testing fit, having an impact, etc.
I add the idea to Notes on EA-related research, writing, testing fit, learning, and the Forum [done!]
Heap an appropriate amount of status and attention on good instances of this having been done
That requires it to be done at least once first, of course, but can then increase the rate
E.g., Aaron Gertler could feature it in the EA Forum Digest newsletter, people could make positive comments on the post, someone can share it in a Facebook group and/or other newsletter
I know I found this sort of thing a useful and motivating signal when I started posting stuff (though not precisely this kind of stuff)
Publicly offer to provide financial prizes for good instances of this having been done
One way to do this could mirror Buck’s idea for getting more good book reviews to happen (see my other comment): “If it’s the kind of review I want, I give them $500 in return for them posting the review to EA Forum or LW with a “This post sponsored by the EAIF” banner at the top. (I’d also love to set up an impact purchase thing but that’s probably too complicated).”
Find case studies where someone found such a post useful or having written it helped someone get a good job or something, and then publicise those
See also my thoughts on discovering, writing, and/or promoting case studies of people successfully using interventions for improving the EA-aligned research pipeline
Thanks for linking my idea in your sequence! (onlookers note: MichaelA and I are coworkers)
Heap an appropriate amount of status and attention on good instances of this having been doneThat requires it to be done at least once first, of course, but can then increase the rate
This arguably happened to alexrjl’s critique of Giving Green, though it was a conjunction of a critique of an organization and a critique of research done. As an aside, I decided to focus my shortform on critiques of public research rather than critiques of organizations/people, even though I think the latter is quite valuable too, since a) my intuition is that the former is less acrimonious, b) relatedly, critiques of organizations may be worse at training dispassionate analysis skills (vs eg tribalistic feelings or rhetoric), c) critiques of orgs or people might be easier for newbies to fuck up and d) I think empirically, critiques of organizations have a worse hit rate than critiques of research posts.
I think by default good ideas like this never really end up happening, which is sad. Do you or other people have thoughts on how to make your idea actually happen?
As you know, one of my interns is doing something adjacent to this idea (though framed in a different way), and I may convince another intern to do something similar (depending on their interests and specific project ideas in mind).
Yeah, good point—I guess a more directed version of “People giving career advice or mentorship to people interested in EA-aligned research careers mention this as one way of testing fit, having impact, etc.” is just people encouraging people they manage to do this, or maybe even hiring people with this partly in mind.
Though I think that that wouldn’t capture most of the potential value of this idea, since part of what’s good about is that, as you say, this idea:
requires relatively little training/guidance from external mentors, meaningour movement devotes less scarce mentorship resources into thispeople with worse social skills/network/geographical situation don’t feel (as much) at a disadvantage for getting the relevant training
(People who’ve already gone through a hiring process and have an at least somewhat more experienced researcher managing them will have an easier time than other people in testing fit, having impact, building skills, etc. in other ways as well.)
Yeah I agree that a major upside to this idea (and a key differentiator between it and other proposed interventions for fixing early stages of the research pipeline) is that it ought to be doable without as much guidance from external mentors. I guess my own willingness to suggest this as an intern project suggests that I believe it must comparatively be even more exciting for people without external guidance.
Another possible (but less realistic?) way to make this happen:
Organisations/researchers do something like encouraging red teaming of their own output, setting up a bounty/prize for high-quality instances of that, or similar
An example of something roughly like this is a post on the GiveWell blog that says at the start: “This is a guest post by David Barry, a GiveWell supporter. He emailed us at the end of December to point out some mistakes and issues in our cost-effectiveness calculations for deworming, and we asked him to write up his thoughts to share here. We made minor wording and organizational suggestions but have otherwise published as is; we have not vetted his sources or his modifications to our spreadsheet for comparing deworming and cash. Note that since receiving his initial email, we have discussed the possibility of paying him to do more work like this in the future.”
But I think GiveWell haven’t done that since then?
It seems like this might make sense and be mutually beneficial
Orgs/researchers presumably want more ways to increase the accuracy of their claims and conclusions
A good red teaming of their work might also highlight additional directions for further research and surface someone who’d be a good employee for that org or collaborator for that researcher
Red teaming of that work might provide a way for people to build skills and test fit for work on precisely the topics that the org/researcher presumably considers important and wants more people working on
But I’d guess that this is unlikely to happen in this form
I think this is mainly due to inertia plus people feeling averse to the idea
But there may also be good arguments against
This post is probably relevant: https://forum.effectivealtruism.org/posts/gTaDDJFDzqe7jnTWG/some-thoughts-on-public-discourse
Another argument against is that, for actually directly improving the accuracy of some piece of work, it’s probably more effective to pay people who are already know to be good at relevant work to do reviewing / red-teaming prior to publication
Yeah I think this is key. I’m much more optimistic about getting trainees to do this being a good training intervention than a “directly improve research quality” intervention. There are some related arguments why you want to pay people who are either a) already good at the relevant work or b) specialized reviewers/red-teamers
paying people to criticize your work would risk creating a weird power dynamic, and more experienced reviewers would be better at navigating this
For example, trainees may be afraid of criticizing you too harshly.
Also, if the critique is in fact bad, you may be placed in a somewhat awkward position when deciding whether to publish/publicize it.
This idea sounds really cool. Brainstorming: a variant could be several people red teaming the same paper and not conferring until the end.