I think it’s a nice op-ed; I also appreciate the communication strategy here—anticipating that SBF’s sentencing will reignite discourse around SBF’s ties to EA, and trying to elevate the discourse around that (in particular by highlighting the reforms EA has undertaken over the past 1.5 years).
lilly
First of all, kudos on writing an op-ed! I think it’s a good thing to do, and I think earning to give is a much better path than what most Ivy League grads wind up doing, so if you persuade a few people, that’s good.
My basic problem with the argument you make here (and with earning to give in general) is that some bad things tend to go along with “selling out” (as you put it), rendering it difficult to maintain one’s initial commitment to earning to give. Some worries I have about college students deciding to do this:
-
Erosion of values. When your social group becomes full of Meta employees (vs. idealistic college students), you find a partner (who may or may not be EA), you have kids, and so on, your values shift, and it becomes easier to justify not donating. I have seen a lot of people become gradually less motivated to do good between the ages of 20 and 30, but while having committed to a career path in, eg, global health makes it harder for this value shift to be accompanied by a shift in the social value of one’s work (since most global health jobs are somewhat socially valuable), having committed to a career path in earning to give presents no such barriers.
-
Relatedly, lifestyle creep occurs. As you get richer (and befriend your colleagues at Meta and so on), people start inviting you to expensive birthday dinners and on nice trips and stuff. And so your ability to maintain a relatively more frugal lifestyle can be compromised by desire/pressure to buy nice stuff.
In other words, I think it’s harder to maintain your EA values when you’re earning to give vs. working at, eg, an NGO. These challenges are then further compounded by:
(3) Selection bias. I suspect that the group of EA-interested people who are drawn to earning to give in the first place are more interested in having a bougie lifestyle (etc) than the average EA who isn’t drawn to earning to give. And, correspondingly, I think they’re more likely to be affected by (1) and (2).
-
Again, I think this post is missing nuance; for example:
Induction of fetal demise is done through a variety of means in multiple respects—different medications are given (i.e., digoxin, lidocaine, or KCl) via different routes (i.e., intra-fetal vs. intra-amniotic). (Given that lidocaine is a painkiller, I could see a different version of this post compellingly making the case that to the extent clinicians have discretion in choosing what agents to use to induce fetal demise, they should prioritize using ones that are likely to have off-target analgesic effects.)
So, the link you post refers to a small minority of abortions, as it’s only routine to inject the amniotic fluid (specifically) with potassium chloride (specifically) prior to the delivery of anesthesia in some second-trimester abortions.
Potassium chloride is a medication that’s routinely given via IV to replete potassium. The dose has a significant effect on how painful this is, as does the route of administration; people tolerate oral potassium fine. Importantly, the fetus is not even being given KCl intravenously (vs. intra-amniotically or intra-fetally), so it’s hard for me to infer from “it is sometimes painful to get KCl via IV” that it would be painful for a fetus to get potassium via a different route. Correspondingly, then, I don’t think the “inflames the potassium ions in the sensory nerve fibers, literally burning up the veins as it travels to the heart” applies.
I don’t have time to research this in depth, but am pretty sure this post is missing a lot of nuance about how anesthesia works in abortion. Importantly, because mother and fetus share a circulation, IV sedation that is given to the mother will—to some extent—sedate the fetus as well, depending on the specific regimen used. So it’s not quite right to say “The fetus is administered a lethal injection with no anesthesia.” Correspondingly, I think this post overstates the risk of fetal suffering associated with abortion.
Yeah, I think this is basically right. EA orgs probably favor Profile 1 people because they’ve demonstrated more EA alignment, meaning: (1) the Profile 1 people will tend to be more familiar with EA orgs than the Profile 2 people, so may be better positioned to assess their fit for any given org/role, (2) conversely, EA orgs will tend to be more familiar with Profile 1 people, since they’ve been in the community for a while, meaning orgs may be better able to assess a prospective Profile 1 employee’s fit, and (3) if the Profile 1 employee leaves/is fired, they’ll be less inclined to trash/sue the EA org.
Favoring Profile 1 people because of (3) would be bad (and I hope orgs aren’t explicitly or implicitly doing this!), but favoring them because of (1) + (2) seems pretty reasonable, even though there are downsides associated with this (e.g., bad norms are less likely to get challenged, insights/innovations from other spheres won’t make it into EA, etc).
That said, I think one thing your post misses is that there are a lot of people who are closer to Profile 2 people (professionally) who are pretty embedded in EA (socially, academically, extracurricularly, etc). And I think orgs also tend to favor these people, which may mitigate at least some of the aforementioned downsides of EA being an insular ecosystem (i.e., the insights/innovations from other spheres one, if not the challenging norms one).
A final piece of speculation: getting a job at an EA org is a lot more prestigious for EAs than it is for people outside of EA, and the career capital conferred by working at EA orgs has a much lower exchange rate outside of EA. As a result, it wouldn’t shock me if top Profile 2 candidates are applying to EA jobs at much lower rates and are much less likely to take EA jobs they’re offered. If this is the case, the discrepancy you’re observing may not reflect an unwillingness of EA orgs to hire impressive Profile 2 candidates, but rather a lack of interest from Profile 2 candidates whose backgrounds are on par with the Profile 1 candidates’.
I really appreciate your and @Katja_Grace’s thoughtful responses, and wish more of this discussion had made it into the manuscript. (This is a minor thing, but I also didn’t love that the response rate/related concerns were introduced on page 20 [right?], since it’s standard practice—at least in my area—to include a response rate up front, if not in the abstract.) I wish I had more time to respond to the many reasonable points you’ve raised, and will try to come back to this in the next few days if I do have time, but I’ve written up a few thoughts here.
Note that we didn’t tell them the topic that specifically.
I understand that, and think this was the right call. But there seems to be consensus that in general, a response rate below ~70% introduces concerns of non-response bias, and when you’re at 15%—with (imo) good reason to think there would be non-response bias—you really cannot rule this out. (Even basic stuff like: responders probably earn less money than non-responders, and are thus probably younger, work in academia rather than industry, etc.; responders are more likely to be familiar with the prior AI Impacts survey, and all that that entails; and so on.) In short, there is a reason many medical journals have a policy of not publishing surveys with response rates below 60%; e.g., JAMA asks for >60%, less prestigious JAMA journals also ask for >60%, and BMJ asks for >65%. (I cite medical journals because their policies are the ones I’m most familiar with, not because I think there’s something special about medical journals.)
Tried sending them $100 last year and if anything it lowered the response rate.
I find it a bit hard to believe that this lowered response rates (was this statistically significant?), although I would buy that it didn’t increase response rates much, since I think I remember reading that response rates fall off pretty quickly as compensation for survey respondents increases. I also appreciate that you’re studying a high-earning group of experts, making it difficult to incentivize participation. That said, my reaction to this is: determine what the higher-order goals of this kind of project are, and adopt a methodology that aligns with that. I have a hard time believing that at this price point, conducting a survey with a 15% response rate is the optimal methodology.
If you are inclined to dismiss this based on your premise “many AI researchers just don’t seem too concerned about the risks posed by AI”, I’m curious where you get that view from, and why you think it is a less biased source.
My impression stems from conversations I’ve had with two CS professor friends about how concerned the CS community is about the risks posed by AI. For instance, last week, I was discussing the last AI Impacts survey with a CS professor (who has conducted surveys, as have I); I was defending the survey, and they were criticizing it for reasons similar to those outlined above. They said something to the effect of: the AI Impacts survey results do not align with my impression of people’s level of concern based on discussions I’ve had with friends and colleagues in the field. And I took that seriously, because this friend is EA-adjacent; extremely competent, careful, and trustworthy; and themselves sympathetic to concerns about AI risk. (I recognize I’m not giving you enough information for this to be at all worth updating on for you, but I’m just trying to give some context for my own skepticism, since you asked.)
Lastly, as someone immersed in the EA community myself, I think my bias is—if anything—in the direction of wanting to believe these results, but I just don’t think I should update much based on a survey with such a low response rate.
I think this is going to be my last word on the issue, since I suspect we’d need to delve more deeply into the literature on non-response bias/response rates to progress this discussion, and I don’t really have time to do that, but if you/others want to, I would definitely be eager to learn more.
I earn about $15/hour and donate much more than 1%. I don’t think it’s that hard to do this, and it seems weird to set such a low bar.
No, because the response rate wouldn’t be 100%; even if it doubled to 30% (which I doubt it would), the cost would still be lower ($120k).
I appreciate that a ton of work went into this, and the results are interesting. That said, I am skeptical of the value of surveys with low response rates (in this case, 15%), especially when those surveys are likely subject to non-response bias, as I suspect this one is, given: (1) many AI researchers just don’t seem too concerned about the risks posed by AI, so may not have opened the survey and (2) those researchers would likely have answered the questions on the survey differently. (I do appreciate that the authors took steps to mitigate the risk of non-response bias at the survey level, and did not find evidence of this at the question level.)
I don’t find the “expert surveys tend to have low response rates” defense particularly compelling, given: (1) the loaded nature of the content of the survey (meaning bias is especially likely), (2) the fact that such a broad group of people were surveyed that it’s hard to imagine they’re all actually “experts” (let alone have relevant expertise), (3) the fact that expert surveys often do have higher response rates (26% is a lot higher than 15%), especially when you account for the fact that it’s extremely unlikely other large surveys are compensating participants anywhere close to this well, and (4) the possibility that many expert surveys just aren’t very useful.
Given the non-response bias issue, I am not inclined to update very much on what AI researchers in general think about AI risk on the basis of this survey. I recognize that the survey may have value independent of its knowledge value—for instance, I can see how other researchers citing these kinds of results (as I have!) may serve a useful rhetorical function, given readers of work that cites this work are unlikely to review the references closely. That said, I don’t think we should make a habit of citing work that has methodological issues simply because such results may be compelling to people who won’t dig into them.
Given my aforementioned concerns, I wonder whether the cost of this survey can be justified (am I calculating correctly that $138,000 was spent just compensating participants for taking this survey, and that doesn’t include other costs, like those associated with using the outside firm to compensate participants, researchers’ time, etc?). In light of my concerns about cost and non-response bias, I am wondering whether a better approach would instead be to randomly sample a subset of potential respondents (say, 4,000 people), and offer to compensate them at a much higher rate (e.g., $100), given this strategy could both reduce costs and improve response rates.
I like the general idea, but a bit of feedback:
I’m not super compelled by the arguments for only asking people to donate 1%, which strikes me as a trivial amount of money, especially in the context of (1) doctors’ and other healthcare workers’ salaries (in the US, physicians make $350k on average) and (2) the fact that 2⁄3 of the US population gives an average of 4% per year (I dunno how reliable this stat is, or how this rate compares to other countries, but I’m inclined to think that the 1⁄3 who don’t give will not respond to this initiative).
I understand not wanting to make the perfect the enemy of the good here, but I think the biggest risk of only asking people to donate 1% is inadvertently normalizing 1% as being a reasonable amount for healthcare workers to donate (I am still a trainee, and I comfortably donate way more than this! I also am a bit reticent to take the pledge myself, because I don’t want people to think that I only donate 1%/endorse donating 1%).
I think it makes most sense to target the healthcare workers who already donate some (and who on average likely donate >1% already), and I suspect the best thing to do would be to focus on (1) getting them to pledge a more significant amount than they’re already donating (5%?), (2) specifically encouraging them to give to charities that are supported by a lot of evidence (e.g., GiveWell’s Top Charities fund, rather than GiveWell’s All Grants Fund), and (3) focusing on health-oriented interventions (which most of GiveWell’s are already, but this would be good to highlight explicitly).
“double-voting would surge as people learned you get a freebie.”
I just don’t see this happening?
Separately, one objection I have to cracking down hard on self-voting is that I think this is not very harmful relative to other ways in which people don’t vote how they’re “supposed to.” E.g., we know the correlation between upvotes and agree votes is incredibly high, and downvoting something solely because you disagree with it strikes me as more harmful to discourse on the forum than self-voting. I think the reason self-voting gets highlighted isn’t because it’s especially harmful, it’s just because it’s especially catchable.
If the mods want to improve people’s voting behavior on the forum, I both wish they’d target different voting behavior (ie, the agree/upvoting correlation) and use different means to do it (ie, generating reports for people of their own voting correlations, whether they tend to upvote/downvote certain people, etc), rather than naming/shaming people for self-voting.
I feel like this is getting really complicated and ultimately my point is very simple: prevent harmful behavior via the least harmful means. If you can get people to not vote for themselves by telling them not to, then just… do that. I have a really hard time imagining that someone who was warned about this would continue to do it; if they did, it would be reasonable to escalate. But if they’re warned and then change their behavior, why do I need to know this happened? I just don’t buy that it reflects some fundamental lack of integrity that we all need to know about (or something like this).
This is just a weird way to think about evidence, imo. I think the original post would’ve been more useful and persuasive (and generated better discourse) if it had been 1/5th as long. Throwing evidence—even high-quality evidence—at people does not always make them reason better, and often makes them reason worse. (I also don’t think it works here to say “just have better epistemics!” because (a) one important sense in which we’re all boundedly rational is that our ability to process information well decreases as the volume of information increases and (b) a writer acting in good faith—who wants you to reach the right conclusions—should account for this in how they present information.)
Critically, as previously stated, I think the photos constitute particularly poor evidence—they have a very low “provides useful information:how likely are they to sway people in ways that are irrational” ratio. This is why my comment wasn’t just “shorten your post so people can understand it better,” but rather “I think these photos will lead to vibes-based reasoning.” (This is also why prosecutors etc etc use this kind of evidence; it’s meant to make the jury think “aw they look so happy together! He couldn’t have possibly done that,” when in reality, the photo of the smiling couple on vacation has ~0 bearing on whether he murdered her.)
Thanks, this is also helpful! One thing to think about (and no need to tell me), is whether making the checks public could effectively disincentivize the bad behavior (like how warnings about speed cameras may as effectively disincentivize speeding as the cameras do themselves). But if there are easy workarounds, I can see why this wouldn’t be viable.
Thanks; this is helpful and reassuring, especially re: the DMs. I had read this section of the norms page, and it struck me that the “if we have reason to believe that someone is violating norms around voting” clause was doing a lot of work. I would appreciate more clarification about what would lead mods to believe something like this (and maybe some examples of how you’ve come to have such beliefs). But this is not urgent, and thanks for the clarification you’ve already provided.
Thanks, community health team. I’m wondering if it’d be helpful for the CHT +/- forum mods to develop guidelines regarding standards of evidence for sensitive forum posts, e.g.: under what circumstances (if any) should mods censor a post/parts of a post for making insufficiently substantiated and potentially harmful allegations? Perhaps the answer is “under no circumstances,” but even this would be worth clarifying, I think, so readers know never to expect this and understand the rationale for never doing so.
The forum does have guidance on infohazards, and I assume a post that contained serious infohazards would be censored. Given there are presumably limitations on harmful true things people might say, it seems prima facie plausible that there should be limitations on harmful potentially false things people might say, but I’m not sure when/whether/how that’s right, and it seems worth devoting some serious thought to this. (Sorry if this guidance does exist somewhere, or if this would be outside the purview of what the CHT does, but thanks for considering it.)
I think that if we were all completely rational, you’d be right. But we’re not, and I think the photos were included in an attempt to exploit that fact.
If the post just argued “there were s’mores and iguanas; Chloe and Alice must be lying about how bad their experience was!” my brain would go “that argument sucks; obviously people can be unhappy in a land of iguanas.” But the photos hijack my reasoning by conjuring a vivid image of a tropical paradise (brain: “hm, this looks pretty nice! It’s cold here and I wish I was there right now! Maybe this was an awesome job.”)
The reason this is bad is because the photos don’t tell us anything relevant that we didn’t already know; we knew they were hanging out in tropical places and the presence of s’mores has zero bearing on the veracity of Chloe and Alice’s claims. No one ever disputed whether there were s’mores and there having been s’mores is entirely compatible with this job having been a nightmare. The pictures just undermine my ability to immediately recognize that fact.
Sorry, I don’t think I got this quite right in my initial comment; let me try again:
I think something really messed up is going on here, in that both Ben and Kat’s posts include some serious allegations that are supported by very limited evidence (like “anonymous person said X”). (Other allegations in these posts are supported by good evidence, like screenshots.) These accusations have the potential to seriously harm people’s professional lives, relationships, and mental health. And in both cases, the general message of both posts could be relayed without relying on the anecdotes that aren’t supported by good evidence.
The forum moderators have allowed this mudslinging to occur more or less unchecked. To the extent mods have been involved, their involvement has been limited to telling bystanders not to lose our heads. I think this is very bad! The evidentiary standards these posts are being held to wouldn’t come close to passing muster on Wikipedia (let alone in a newspaper or court). And there’s a reason for that: baselessly smearing people is bad. It is especially bad when the most plausible explanation for the behavior is vengeance. For the mods to then issue a warning for a take saying as much (packaged in combative language) while allowing the libel (packaged in Forum-y language) to go unchecked strikes me as exactly backwards, especially when Forum users can readily police the former (through voting), but cannot police the latter. Given the stakes of these kinds of posts for people’s lives, I really hope this situation prompts some kind of post-mortem about the evidentiary standards posts should be held to.
I really enjoyed this series; thanks for writing it!
One piece of stylistic feedback on Anti-Philanthropic Misdirection: I think the piece’s hostile tone—e.g., “Wenar is here promoting a general approach to practical reasoning that is very obviously biased, stupid, and harmful: a plain force for evil in the world”—will make your piece less persuasive to non-EA readers for two reasons. First, I suspect all the italics and adjectives will trigger readers’ bias radars, making people who aren’t already sympathetic to EA approach the piece more critically/less openmindedly than they would have otherwise (e.g., if you had written: “Wenar promotes a general approach to practical reasoning that is both incorrect and harmful”). Second, it reads as hypocritical, since in the piece you criticize “the hostile, dismissive tone of many critics.” (And unless readers have read Wenar’s piece pretty closely and are pretty familiar with EA, they’re not going to be well-positioned to assess whose hostility and dismissiveness are justified.) So, while I understand the frustration, and think the tone is in some sense warranted, I suspect the piece would be more effective at morally redirecting people if it read as more neutral/measured. The arguments speak for themselves.