I’m glad you found my comment useful. I think then, with respect, you should consider retracting some of your previous comments, or at least reframing them to be more circumspect and be clear you’re taking issue with a particular framing/subset of the AIXR community as opposed to EA as a whole.
As for the points in your comment, there’s a lot of good stuff here. I think a post about the NRRC, or even an insider’s view into how the US administration thinks about and handles Nuclear Risk, would be really useful content on the Forum, and also incredibly interesting! Similarly, I think how a community handles making ‘right-tail recommendations’ when those recommendations may erode its collective and institutional legitimacy would be really valuable. (Not saying that you should write these posts, their just examples off the top of my head. In general I think you have a professional perspective a lot of EAs could benefit from)
I think one thing where we agree is that there’s a need to ask and answer a lot more questions, some of which you mention here (beyond ‘is AIXR valid’):
What policy options do we have to counteract AIXR if true?
How do the effectiveness of these policy options change as we change our estimation of the risk?
What is the median view in the AIXR/broader EA/broader AI communities on risk?
And so on.
Some people in EA might write this off as ‘optics’, but I think that’s wrong
I’m sorry you encountered this, and I don’t want to minimise your personal experience
I think once any group becoms large enough there will be people who associate with it who harbour all sorts of sentiments including the ones you mention.
On the whole though, i’ve found the EA community (both online and those I’ve met in person) to be incredibly pro-LGBT and pro-trans. Both the underlying moral views (e.g. non-traditionalism, impartially and cosmpolitanism etc) point that way, as do the underlying demographics (e.g. young, high educated, socially liberal)
I think where there might be a split is in progressive (as in, leftist politically) framings of issues and the type of language used to talk about these topics. I think those often find it difficult to gain purchase in EA, especially on the rationalist/LW-adjacent side. But I don’t think those mean that the community as a whole, or even the sub-section, are ‘anti-LGBT’ and ‘anti-trans’, and I think there are historical and multifacted reasons why there’s emnity between ‘progressive’ and ‘EA’ camps/perspectives.
Nevertheless, I’m sorry that you experience this sentiment, and I hope you’re feeling ok.
Thanks for sharing your thinking, David!
For donations deferring to cause-neutral experts, I usually give via Givewell, and split my giving between their top charities
Have you considered donating to Rethink Priorities? I would say it is much more cause-neutral than GiveWell, which only focusses on global health and development.
This article is behind a paywall; do you have a summary that we can read?
The dormant period occurred between applying and getting referred for the position, and between getting referred and receiving an email for an interview. These periods were unexpectedly long and I wish there had been more communication or at least some statement regarding how long I should expect to wait. However, once I had the interview, I only had to wait a week (if I am remembering correctly) to learn if I was to be given a test task. After completing the test task, it was around another week before I learned I had performed competently enough to be hired.
Not Ofer but I think he laid it out pretty clearly:
The author mentioned they do not want the comments to be “a discussion of the war per se” and yet the post contains multiple contentious pro-Israel propaganda talking points, and includes arguments that a cease-fire is net-negative. Therefore it seems to me legitimate to mention here the following.
I feel similarly to Ofer—this post has many interesting personal reflections, which I’m glad the author shared. At the same time, it seemed like there were several pro-Israel comments that feel similar to the rhetoric used to justify the killing of large numbers of civilians in Gaza (as a reminder for readers, roughly 17,000 Palestinians have been killed, with 70% of them being women or children under 18, relative to approx. 1,150 in Israel)
Some examples of these comments:
But now I also think much more about good and evil, and if stopping evil can justify many lives lost (if yes, how many? How do you even start to answer that?).
There’s at least one potential scenario that comes to mind in which protests end up being net negative in the long run. If global protests cause an early long term ceasefire, in the short term, fighting will stop, and lives will be saved. However, terror groups all over the world will learn that if they embed themselves within a civilian population, take hostages and use human shields, Western public opinion will protect them from a military response for even the most barbaric of attacks. In the long run, the chance of more frequent and more vicous attacks, and the use of human shields, will go up significantly, leading to even higher death tolls.
Without getting into it too much, the second comment seems to totally overlook the fact that Israel has been illegally encroaching on Palestinian land, forcing people out of their homes and restricting access to basic rights like food and water for the past few decades. In my view, it’s the allowance of this by the international community which has been net negative, and led to the ongoing occupation of Palestine and the war we currently have.
I agree that GHW is an excellent introduction to effectiveness and we should watch out for the practical limitations of going too meta, but I want to flag that seeing GHW as a pipeline to animal welfare and longtermism is problematic, both from a common-sense / moral uncertainty view (it feels deceitful and that’s something to avoid for its own sake) and a long-run strategic consequentialist view (I think the EA community would last longer and look better if it focused on being transparent, honest, and upfront about what most members care about, and it’s really important for the long term future of society that the core EA principles don’t die).
Throwaway account named after an Islamic Sultan who took back Jerusalem from the Christians.
Hi again Jason,
When we said “Excluding outliers is thought sensible practice here; two related meta-analyses, Cuijpers et al., 2020c; Tong et al., 2023, used a similar approach”—I can see that what we meant by “similar approach” was unclear. We meant that, conditional on removing outliers, they identify a similar or greater range of effect sizes as outliers as we do.
This was primarily meant to address the question raised by Gregory about whether to include outliers: “The cut data by and large doesn’t look visually ‘outlying’ to me.”
To rephrase, I think that Cuijpers et al. and Tong et al. would agree that the data we cut looks outlying. Obviously, this is a milder claim than our comment could be interpreted as making.
Turning to wider implications of these meta-analyses, As you rightly point out, they don’t have a “preferred specification” and are mostly presenting the options for doing the analysis. They present analyses with and without outlier removal in their main analysis, and they adjust for publication bias without outliers removed (which is not what we do). The first analytic choice doesn’t clearly support including or excluding outliers, and the second – if it supports any option, favors Greg’s proposed approach of correcting for publication bias without outliers removed.
I think one takeaway is that we should consider surveying the literature and some experts in the field, in a non-leading way, about what choices they’d make if they didn’t have “the luxury of not having to reach a conclusion”.
I think it seems plausible to give some weight to analyses with and without excluding outliers – if we are able find a reasonable way to treat the 2 out of 7 publication bias correction methods that produce the results suggesting that the effect of psychotherapy is in fact sizably negative. We’ll look into this more before our next update.
Cutting the outliers here was part of our first pass attempt at minimising the influence of dubious effects, which we’ll follow up with a Risk of Bias analysis in the next version. Our working assumption was that effects greater than ~ 2 standard deviations are suspect on theoretical grounds (that is, if they behave anything like SDs in an normal distribution), and seemed more likely to be the result of some error-generating process (e.g. data-entry error, bias) than a genuine effect.
We’ll look into this more in our next pass, but for this version we felt outlier removal was the most sensible choice.
“Would it have been better to start with a stipulated prior based on evidence of short-course general-purpose psychotherapy’s effect size generally, update that prior based on the LMIC data, and then update that on charity-specific data?”
1. To your first point, I think adding another layer of priors is a plausible way to do things – but given the effects of psychotherapy in general appear to be similar to the estimates we come up with – it’s not clear how much this would change our estimates.
There are probably two issues with using HIC RCTs as a prior. First, incentives that could bias results probably differ across countries. I’m not sure how this would pan out. Second, in HICs, the control group (“treatment as usual”) is probably a lot better off. In a HIC RCT, there’s not much you can do to stop someone in the control group of a psychotherapy trial to go get prescribed antidepressants. However, the standard of care in LMICs is much lower (antidepressants typically aren’t an option), so we shouldn’t be terribly surprised if control groups appear to do worse (and the treatment effect is thus larger).
“To my not-very-well-trained eyes, one hint to me that there’s an issue with application of Bayesian analysis here is the failure of the LMIC effect-size model to come anywhere close to predicting the effect size suggested by the SM-specific evidence.”
2. To your second point, does our model predict charity specific effects?
In general, I think it’s a fair test of a model to say it should do a reasonable job at predicting new observations. We can’t yet discuss the forthcoming StrongMinds RCT – we will know how well our model works at predicting that RCT when it’s released, but for the Friendship Bench (FB) situation, it is true that we predict a considerably lower effect for FB than the FB-specific evidence would suggest. But this is in part because we’re using a combination of charity specific evidence to inform our prior and the data. Let me explain.
We have two sources of charity specific evidence. First, we have the RCTs, which are based on a charity programme but not as it’s deployed at scale. Second, we have monitoring and evaluation data, which can show how well the charity intervention is implemented in the real world. We don’t have a psychotherapy charity at present that has RCT evidence of the programme as it’s deployed in the real world. This matters because I think placing a very high weight on the charity-specific evidence would require that it has a high ecological validity. While the ecological validity of these RCTs is obviously higher than the average study, we still think it’s limited. I’ll explain our concern with FB.
For Friendship Bench, the most recent RCT (Haas et al. 2023, n = 516) reports an attendance rate of around 90% to psychotherapy sessions, but the Friendship Bench M&E data reports an attendance rate more like 30%. We discuss this in Section 8 of the report.
So for the Friendship Bench case we have a couple reasonable quality RCTs for Friendship Bench, but it seems like, based on the M&E data, that something is wrong with the implementation. This evidence of lower implementation quality should be adjusted for, which we do. But we include this adjustment in the prior. So we’re injecting charity specific evidence into both the prior and the data. Note that this is part of the reason why we don’t think it’s wild to place a decent amount of weight on the prior. This is something we should probably clean up in a future version.
We can’t discuss the details of the Baird et al. RCT until it’s published, but we think there may be an analogous situation to Friendship Bench where the RCT and M&E data tell conflicting stories about implementation quality.
This is all to say, judging how well our predictions fair when predicting the charity specific effects isn’t clearly straightforward, since we are trying to predict the effects of the charity as it is actually implemented (something we don’t directly observe), not simply the effects from an RCT.
If we try and predict the RCT effects for Friendship Bench (which have much higher attendance than the “real” programme), then the gap between the predicted RCT effects and actual RCT effects is much smaller, but still suggests that we can’t completely explain why the Friendship Bench RCTs find their large effects.
So, we think the error in our prediction isn’t quite as bad as it seems if we’re predicting the RCTs, and stems in large part from the fact that we are actually predicting the charity implementation.
Cuijpers et al. 2023 finds an effect of psychotherapy of 0.49 SDs for studies with low RoB in low, middle, and high income countries (comparisons = 218#), and Tong et al. 2023 find an effect of 0.69 SDs for studies with low RoB in non-western countries (primarily low and middle income; comparisons = 36). Our estimate of the initial effect is 0.70 SDs (before publication bias adjustments). The results tend to be lower (between 0.27 and 0.57, or 0.42 and 0.60) SDs when the authors of the meta-analyses correct for publication bias. In both meta-analyses (Tong et al. and Cuijpers et al.) the authors present the effects after using three publication bias corrected methods: trim-and-fill (0.6; 0.38 SDs), a limit meta-analysis (0.42; 0.28 SDs), and using a selection model (0.49; 0.57 SDs). If we averaged their publication bias corrected results (which they did without removing outliers beforehand) the estimated effect of psychotherapy would be 0.5 SDs and 0.41 for the two meta-analyses. Our estimate of the initial effect (which is most comparable to these meta-analyses), after removing outliers is 0.70 SDs, and our publication bias correction is 36%, implying that we estimate our initial effect to be 0.46 SDs. You can play around with the data they use on the metapsy website.
Thanks for the correction! I have fixed it and added a link (the link was in the main document, but it’s good to have it in the executive summary as well).
this link is dead, here’s an archived version i found!
I don’t think helping people who feel an obligation to give zakat do so in the most effective way possible would constitute “endorsing” the awarding of strong preference to members of one’s religion as recipients of charity. It merely recognizes that the donor has already made this precommitment, and we want their donation to be as effective as possible given that precommitment.
I personally think LLMs will plateau around human level, but that they will be made agentic and self-teaching, and therefore and self-aware (in sum, “sapient”) and truly dangerous by scaffolding them into language model agents or language model cognitive architectures. See Capabilities and alignment of LLM cognitive for my logic in expecting that.
That would be a good outcome. We’d have agents with their own goals, capable enough to do useful and dangerous things, but probably not quite capable enough to self-exfiltrate, and probably initially under the control of relatively sane people. That would scare the pants off of the world, and we’d see some real efforts to align the things. Which is uniquely do-able, since they’d take top-level goals in natural language, and be readily interpretable by default (with real concerns still there aplenty, including waluigi effects and their utterances not reliably reflecting their real underlying cognition).
Thanks! To be clear, this is a ‘plan’ instead of something we are 100% committed to delivering on in the way it’s presented below. I think there are some updates to be made here, but I would feel bad if you made large irreversible decisions based on this post. We will almost certainly have a more official announcement if we do decide to commit to this plan.
Ofer, I’m an Israeli and a leftist perhaps as much as you are. Perhaps not, since I think the war is a necessary evil (though at the same time think some of the acts taken by Israel in it are unnecessary and horrific). Point is, I wouldn’t be surprised to discover you’re right. But I don’t understand what this all has to do with anything in Ezra’s post.
This is very exciting. A key point in our draft strategy for 2024 was the apparent lack of principles-first EA funding (beyond CEA’s CBG programme). This is quite the update, I’m glad you posted it when you did!
Our (the HLI) comment was in reference to these quotes.
The literature on PT in LMICs is a complete mess.
Trying to correct the results of a compromised literature is known to be a nightmare.
I think it is valid to describe these as saying the literature is compromised and (probably) uninformative. I can understand your complaint about the word “bunk”. Apologies to Gregory if this is a mischaracterization.
Regarding our comment:
If one insisted only on using charity evaluations that had every choice pre-registered, there would be none to choose from.
And your comment:
I don’t think anyone has claimed lack of certain choices being pre-registered is somehow fatal, only a factor to consider.
Yeah, I think this is a valid point, and the post should have quoted Gregory directly. The point we were hoping to make here is that we’ve attempted to provide a wide range of sensitivity analyses throughout our report, to an extent that we think goes beyond most charity evaluations. It’s not surprising that we’ve missed some in this draft that others would like to see. Gregory’s comments mentioned “Even if you didn’t pre-specify, presenting your first cut as the primary analysis helps for nothing up my sleeve reasons” seemed to imply that we were deliberately hiding something, but in my view our interpretation was overly pessimistic.
Cheers for keeping the discourse civil.
“Resistance Raid” is a bizarre framing of deliberately targeting and slaughtering defenceless women and children in their homes with the deliberate goal of mass terror.
This is not what Hamas’ plan was. It was a hostage taking raid for a hostage exchange. Also to provoke a response from the Muslim world and put Palestine back on the map, since your ilk would just want to commit a slow genocide while ignoring it. This was all extremely clear, as Scott Ritter clearly points out. Also Hamas literally spelled out their plans in documents like Jericho Wall.https://www.scottritterextra.com/p/the-october-7-hamas-assault-on-israel
I don’t believe this is the true given the contentious posts I’ve seen here over the years. I presume you have evidence of someone who is Palestinian and identifies as an EA that was perma-banned for writing from the Palestinian side? (i.e. not a political bot, someone who is actually part of the community) Because I’d be just as interested in reading that as I was reading this piece. And I wouldn’t be putting the two against each other, but be extending empathy to both authors as fellow human beings.
I’ve been banned from multiple rationalist communities for pointing this out (from these alt accounts). Maybe not EA yet, but same type of people.> I had to do a double-take and am now only rereading this part after writing my response. You actually believe Israel deliberately perpetuated part of the Oct 7 raid? I’m at a complete loss for words...
There was friendly fire which caused many civilian deaths, and possibly the majority of them. Please do some basic research. There are multiple lines of evidence, the overall picture is extremely clear, the destruction could only have been caused by IDF. They didn’t do it on purpose they had a panicked response and acted similarly to the Hannibal Directive. Also there is now even reports from hostages about how they were being fired at. Does truth matter at all to this community?Incredible how the Palestinians crimes are so exaggerated, while all of the unending horrors from the Zionist side are either downplayed or ignored. There are thousands of hostages still held by Israel while they bomb innocents, steal land, strip Palestinians of their basic rights etc. etc. Even if Oct 7th was a pure “kill civilians” terrorist attack it wouldn’t come close to what the Zionists do constantly, and even though it wasn’t but it still gets portrayed that way for propaganda purposes. This community is extremely biased due to western propaganda and esp. Jewish overrepresentation, but Imagine how this looks from the Muslim side or for anyone with basic human decency willing to check both sides of the story.