Doctor from NZ, independent researcher (grand futures / macrostrategy) collaborating with FHI / Anders Sandberg. Previously: Global Health & Development research @ Rethink Priorities.
Feel free to reach out if you think there’s anything I can do to help you or your work, or if you have any Qs about Rethink Priorities! If you’re a medical student / junior doctor reconsidering your clinical future, or if you’re quite new to EA / feel uncertain about how you fit in the EA space, have an especially low bar for reaching out.
Outside of EA, I do a bit of end of life care research and climate change advocacy, and outside of work I enjoy some casual basketball, board games and good indie films. (Very) washed up classical violinist and Oly-lifter.
All comments in personal capacity unless otherwise stated.
bruce
Just also want to emphasise Lizka’s role in organising and spearheading this, as well as her conscientiousness and clear communication at every step of the process—I’ve enjoyed being part of this, and am personally super grateful for all the work she has put into this contest.
This sounds like a terribly traumatic experience. I’m so sorry you went through this, and I hope you are in a better place and feel safer now.
Your self-worth is so, so much more than how well you can navigate what sounds like a manipulative, controlling, and abusive work environment.
spent months trying to figure out how to empathize with Kat and Emerson, how they’re able to do what they’ve done, to Alice, to others they claimed to care a lot about. How they can give so much love and support with one hand and say things that even if I’d try to model “what’s the worst possible thing someone could say”, I’d be surprised how far off my predictions would be.
It sounds like despite all of this, you’ve tried to be charitable to people who have treated you unfairly and poorly—while this speaks to your compassion, I know this line of thought can often lead to things that feel like you are gaslighting yourself, and I hope this isn’t something that has caused you too much distress.
I also hope that Effective Altruism as a community becomes a safer space for people who join it aspiring to do good, and I’m grateful for your courage in sharing your experiences, despite it (very reasonably!) feeling painful and unsafe for you.[1] All the best for whatever is next, and I hope you have access to enough support around you to help with recovering what you’ve lost.
============
[Meta: I’m aware that there will likely be claims around the accuracy of these stories, but I think it’s important to acknowledge the potential difficulty of sharing experiences of this nature with a community that rates itself highly on truth-seeking, possibly acknowledging your own lived experience as “stories” accordingly; as well as the potential anguish it might be for these experiences to have been re-lived over the past year and possibly again in the near future, if/when these claims are dissected, questioned, and contested.]
- ^
That being said, your experience would be no less valid had you chosen not to share these. And even though I’m cautiously optimistic that the EA community will benefit from you sharing these experiences, your work here is supererogatory, and improving Nonlinear’s practices or the EA community’s safety is not your burden to bear alone. In a different world it would have been totally reasonable for you to not have shared this, if that was what you needed to do for your own wellbeing. I guess this comment is more for past Chloes or other people with similar experiences who may have struggled with these kinds of decisions than it is for Chloe today, but thought it was worth mentioning.
- ^
Thanks for writing this post!
I feel a little bad linking to a comment I wrote, but the thread is relevant to this post, so I’m sharing in case it’s useful for other readers, though there’s definitely a decent amount of overlap here.
TL; DR
I personally default to being highly skeptical of any mental health intervention that claims to have ~95% success rate + a PHQ-9 reduction of 12 points over 12 weeks, as this is is a clear outlier in treatments for depression. The effectiveness figures from StrongMinds are also based on studies that are non-randomised and poorly controlled. There are other questionable methodology issues, e.g. surrounding adjusting for social desirability bias. The topline figure of $170 per head for cost-effectiveness is also possibly an underestimate, because while ~48% of clients were treated through SM partners in 2021, and Q2 results (pg 2) suggest StrongMinds is on track for ~79% of clients treated through partners in 2022, the expenses and operating costs of partners responsible for these clients were not included in the methodology.
(This mainly came from a cursory review of StrongMinds documents, and not from examining HLI analyses, though I do think “we’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money” seems a little overconfident. This is also not a comment on the appropriateness of recommendations by GWWC / FP)
(commenting in personal capacity etc)
Edit:
Links to existing discussion on SM. Much of this ends up touching on discussions around HLI’s methodology / analyses as opposed to the strength of evidence in support of StrongMinds, but including as this is ultimately relevant for the topline conclusion about StrongMinds (inclusion =/= endorsement etc):StrongMinds should not be a top-rated charity (yet)
Comments (1, 2) about outsider perception of HLI as an advocacy org
Comment about ideal role of an org like HLI, as well as trying to decouple the effectiveness of StrongMinds with whether or not WELLBYs / subjective wellbeing scores are valuable or worth more research on the margin.
Twitter exchange between Berk Özler and Johannes Haushofer, particularly relevant given Özler’s role in an upcoming RCT of StrongMinds in Uganda (though only targeted towards adolescent girls)
Evaluating StrongMinds: how strong is the evidence? and the comment section. In particular:
James Snowden’s analysis of household spillovers
GiveWell’s Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds
Comments in the post: The Happier Lives Institute is funding constrained and needs you!
Greg claims “study registration reduces expected effect size by a factor of 3”
Topline finding weighted 13% from StrongMinds RCT, where d = 1.72
“this is a very surprising mistake for a diligent and impartial evaluator to make”
Greg commits to: “donat[ing] 5k USD if the Ozler RCT reports an effect size greater than d = 0.4 − 2x smaller than HLI’s estimate of ~ 0.8, and below the bottom 0.1% of their monte carlo runs.”
Comment thread on discussion being harsh and “epistemic probation”
James and Alex push back on some claims they consider to be misleading.
- 10 Jul 2023 22:22 UTC; 45 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
- 29 Dec 2022 17:54 UTC; 44 points) 's comment on StrongMinds should not be a top-rated charity (yet) by (
If this comment is more about “how could this have been foreseen”, then this comment thread may be relevant. I should note that hindsight bias means that it’s much easier to look back and assess problems as obvious and predictable ex post, when powerful investment firms and individuals who also had skin in the game also missed this.
TL;DR:
1) There were entries that were relevant (this one also touches on it briefly)
2) They were specifically mentioned
3) There were comments relevant to this. (notably one of these was apparently deleted because it received a lot of downvotes when initially posted)
4) There has been at least two other posts on the forum prior to the contest that engaged with this specifically
My tentative take is that these issues were in fact identified by various members of the community, but there isn’t a good way of turning identified issues into constructive actions—the status quo is we just have to trust that organisations have good systems in place for this, and that EA leaders are sufficiently careful and willing to make changes or consider them seriously, such that all the community needs to do is “raise the issue”. And I think looking at the systems within the relevant EA orgs or leadership is what investigations or accountability questions going forward should focus on—all individuals are fallible, and we should be looking at how we can build systems in place such that the community doesn’t have to just trust that people who have power and who are steering the EA movement will get it right, and that there are ways for the community to hold them accountable to their ideals or stated goals if it appears to, or risks not playing out in practice.
i.e. if there are good processes and systems in place and documentation of these processes and decisions, it’s more acceptable (because other organisations that probably have a very good due diligence process also missed it). But if there weren’t good processes, or if these decisions weren’t a careful + intentional decision, then that’s comparatively more concerning, especially in context of specific criticisms that have been raised,[1] or previous precedent. For example, I’d be especially curious about the events surrounding Ben Delo,[2] and processes that were implemented in response. I’d be curious about whether there are people in EA orgs involved in steering who keep track of potential risks and early warning signs to the EA movement, in the same way the EA community advocates for in the case of pandemics, AI, or even general ways of finding opportunities for impact. For example, SBF, who is listed as a EtG success story on 80k hours, has publicly stated he’s willing to go 5x over the Kelly bet, and described yield farming in a way that Matt Levine interpreted as a Ponzi. Again, I’m personally less interested in the object level decision (e.g. whether or not we agree with SBF’s Kelly bet comments as serious, or whether Levine’s interpretation as appropriate), but more about what the process was, how this was considered at the time with the information they had etc. I’d also be curious about the documentation of any SBF related concerns that were raised by the community, if any, and how these concerns were managed and considered (as opposed to critiquing the final outcome).
Outside of due diligence and ways to facilitate whistleblowers, decision-making processes around the steering of the EA movement is crucial as well. When decisions are made by orgs that bring clear benefits to one part of the EA community while bringing clear risks that are shared across wider parts of the EA community,[3] it would probably be of value to look at how these decisions were made and what tradeoffs were considered at the time of the decision. Going forward, thinking about how to either diversify those risks, or make decision-making more inclusive of a wider range stakeholders[4], keeping in mind the best interests of the EA movement as a whole.
(this is something I’m considering working on in a personal capacity along with the OP of this post, as well as some others—details to come, but feel free to DM me if you have any thoughts on this. It appears that CEA is also already considering this)If this comment is about “are these red-teaming contests in fact valuable for the money and time put into it, if it misses problems like this”
I think my view here (speaking only for the red-teaming contest) is that even if this specific contest was framed in a way that it missed these classes of issues, the value of the very top submissions[5] may still have made the efforts worthwhile. The potential value of a different framing was mentioned by another panelist. If it’s the case that red-teaming contests are systematically missing this class of issues regardless of framing, then I agree that would be pretty useful to know, but I don’t have a good sense of how we would try to investigate this.
- ^
This tweet seems to have aged particularly well. Despite supportive comments from high-profile EAs on the original forum post, the author seemed disappointed that nothing came of it in that direction. Again, without getting into the object level discussion of the claims of the original paper, it’s still worth asking questions around the processes. If there was were actions planned, what did these look like? If not, was that because of a disagreement over the suggested changes, or the extent that it was an issue at all? How were these decisions made, and what was considered?
- ^
Apparently a previous EA-aligned billionaire ?donor who got rich by starting a crypto trading firm, who pleaded guilty to violating the bank secrecy act
- ^
Even before this, I had heard from a primary source in a major mainstream global health organisation that there were staff who wanted to distance themselves from EA because of misunderstandings around longtermism.
- ^
This doesn’t have to be a lengthy deliberative consensus-building project, but it should at least include internal comms across different EA stakeholders to allow discussions of risks and potential mitigation strategies.
- ^
- ^
I don’t follow Timnit closely, but I’m fairly unconvinced by much of what I think you’re referring to RE: “Timnit Gebru-style yelling / sneering”, and I don’t want to give the impression that my uncertainties are strongly influenced by this, or by AI-safety community pushback to those kinds of sneering. I’d be hesitant to agree that I share these views that you are attributing to me, since I don’t really know what you are referring to RE: folks who share “the same skepticism” (but some more negative version).
When I talk about uncertainty, some of these are really just the things that Nuno is pointing out in this post. Concrete examples of what some of these uncertainties look like in practice for me personally include:
I don’t have a good inside view on timelines, but when EY says our probability of survival is ~0% this seems like an extraordinary claim that doesn’t seem to be very well supported or argued for, and something I intuitively want to reject outright, but don’t have the object level expertise to meaningfully do so. I don’t know the extent to which EY’s views are representative or highly influential in current AI safety efforts, and I can imagine a world where there’s too much deferring going on. It seems like some within the community have similar thoughts.
When Metaculus suggests a median prediction of 2041 for AGI and a lower quartile prediction of 2030, I don’t feel like I have a good way of working out how much I should defer to this.
I was surprised to find that there seems to be a pretty strong correlation between people who work on AI safety and people who have been involved in either the EA/LW community, and there are many people outside of the EA/LW space who have a much lower P(doom) than those inside these spaces, even those who prima facie have strong incentives to have an accurate gauge of what P(doom) actually is.
To use a highly imperfect analogy, if being an EA doctor was highly correlated with a belief that [disease X] is made up, and if a significant majority of mainstream medicine believe [disease X] is a real phenomena, this would make me more uncertain about whether EAs are right, as I have to take into account the likelihood that a significant majority of mainstream medicine is wrong, against the possibility that EA doctors somehow have a way of interpreting noise / information in a much less biased way to non-EA doctors.
Of course, it’s plausible that EAs are on to something legit, and everyone else is underweighting the risks. But all I’m saying is that this is an added uncertainty.
It doesn’t help that, unlike COVID (another place EAs say they were ahead of the curve in), it’s much easier to retreat into some unfalsifiable position when it comes to P(doom) and AI safety, and also (very reasonably) not helpful to point to base rates.
While I wouldn’t suggest that 80,000 hours has “gone off the guardrails” in the way Nuno suggests CFAR did, on particularly uncharitable days I do get the sense that 80,000 hours feels more like a recruitment platform for AI and longtermist careers for a narrow readership that fit a particular philosophical view, which was not what it felt like back in ~2015 when I first read it (at that time, perhaps a more open, worldview-diversified career guide for a wider audience). Perhaps this reflects a deliberate shift in strategy, but I do get the sense that if this is the case, additional transparency about this shift would be helpful, since the target audience are often fairly impressionable, idealistic high school students or undergrads looking for a highly impactful career.
Another reason I’m slightly worried about this, and related to the earlier point about deferral, is that Ben Todd says: “For instance, I agree it’s really important for EA to attract people who are very open minded and curious, to keep EA alive as a question. And one way to do that is to broadcast ideas that aren’t widely accepted.” [emphasis added]
But my view is that this approach does not meaningfully differentiate between “attracting open-minded and curious people” VS “attracting highly deferring + easily influenced people”, and 80,000 hours may inadvertently contribute to the latter if not careful.
Relatedly, there have been some recent discussions around deference around things like AI timelines, as well as in the EA community generally.
------------I don’t want this comment thread to get into a discussion about Timnit, because I think that detracts from engagement with Nuno’s post, but a quick comment:[1]
Regardless of one’s views on the quality of Timnit-style engagement, it’s useful to consider the extent the AI safety community may need to coordinate with people who might be influenced by that category of “skepticism”. This could either be at the object-level, or in terms of considering why some classes of “skepticism” seems to gain so much traction despite AI safety proponents’ disagreements with them. It might point to better ways for the AI safety community when it comes to communicating their ideas, or to help maximise buy-in from key stakeholders and focus on shared goals. (Thinking about some of these things have been directly influential in how some folks view ongoing UN engagement processes, for example).
- ^
Which should not be read as an endorsement of Timnit-style engagement.
To add sources to recent examples that come to mind that broadly support MHR’s point above RE: visible (ex post) failures that don’t seem to be harshly punished, (most seem somewhere between neutral to supportive, at least publicly).
Lightcone
Alvea
ALERT
AI Safety Support
EA hub
No Lean SeasonSome failures that came with a larger proportion of critical feedback probably include the Carrick Flynn campaign (1, 2, 3), but even here “harshly punish” seems like an overstatement. HLI also comes to mind (and despite highly critical commentary in earlier posts, I think the highly positive response to this specific post is telling).
============
On the extent to which Nonlinear’s failures relate to integrity / engineering, I think I’m sympathetic to both Rob’s view:
I think the failures that seem like the biggest deal to me (Nonlinear threatening people and trying to shut down criticism and frighten people) genuinely are matters of character and lack of integrity, not matters of bad engineering.
As well as Holly’s:
If you wouldn’t have looked at it before it imploded and thought the engineering was bad, I think that’s the biggest thing that needs to change. I’m concerned that people still think that if you have good enough character (or are smart enough, etc), you don’t need good boundaries and systems.
but do not think these are necessarily mutually exclusive.
Specifically, it sounds like Rob is mainly thinking about the source of the concerns, and Holly is thinking about what to do going forwards. And it might be the case that the most helpful actionable steps going forward are things that look more like improving boundaries and systems, regardless of whether you believe failures specific to Nonlinear are caused by deficiencies in integrity or engineering.That said, I agree with Rob’s point that the most significant allegations raised about Nonlinear quite clearly do not fit the category of ‘appropriate experimentation that the community would approve of’, under almost all reasonable perspectives.
Thanks for the apology Julia.
I’m mindful that there’s an external investigation that is ongoing at present, but I had a few questions that I think would be useful transparency for the EA community, even if it may be detrimental to the CEA / the community health team. I’m sorry if this comes across as piling on in what I’m sure is a very stressful time for you and the team, and I want to emphasise and echo Kirsten’s comment above about this ultimately being a “lack of adequate systems” issue, and not a responsibility that should be fully borne by you as an individual.
Shortly after the article came out, Julia Wise (CEA’s community liaison) informed the EV UK board that this concerned behaviour of Owen Cotton-Barratt; the incident occurred more than 5 years ago and was reported to her in 2021. (Owen became a board member in 2020.)
From the EV UK board’s statement, it sounds like the board did not know about this until Feb this year. Can you clarify the extent to which not informing the EV UK board was a result of the victim explicitly requesting something along these lines, and if so, whether you spoke to the victim before informing the EV UK board when the article came out?
What actions did you take to reduce the risks associated with these events (whether to individual / potential victims in the EA community, to CEA, or the EA movement more broadly)? It sounds like the actions consisted of the following, but I wanted to make sure I wasn’t missing anything important:[1]
Conversations with Owen Cotton-Barratt (OCB) and his colleagues
Some clarity here would be useful also—what’s the role of OCB’s colleagues here? Were they complicit, or was this for harm-mitigating reasons?
A written post about power dynamics
An update to Nicole when she became your manager in 2021
Are you also happy to comment on whether your CoI with OCB was disclosed with Nicole when you informed her of this situation, or with anyone else in the CH team at any stage? What details did you share with Nicole in 2021, when she became your manager?
Given OCB’s status and position in the community, the seemingly direct access to potential victims via mentoring / “picking out promising students and funneling them towards highly coveted jobs” / his role as Director for FHI’s Research Scholars Programme, and your COIs with him (both from a friendship and EV / CEA organisational perspective), this seems to clearly tick many important boxes of where I’d expect to err on the side of full disclosure. Were there extenuating circumstances at the time that meant you didn’t feel comfortable sharing more than you did?
Did the complaints from the woman in the Time article come before or after other feedback you heard about OCB? The timeline sounds something like:
TIME magazine case, reported to you in 2021
Learnt about other situations (in the cases not from OCB, were these as a result of your investigation, or spontaneous reports by other community members?)
OCB raised concerns to you that he had made another woman uncomfortable—reported a few months ago.
Accordingly, I also just want to flag this set of questions as important, and has been raised in the past as a potential cause for insufficient action. When the TIME article came out, you clarified that one cause for confusion was that this consideration didn’t apply to sexual assault but to things like “someone has made some inappropriate comments and gotten feedback about it”. To what extent do you think these considerations played a role in the decisions you made around managing risk?
You mentioned that you had been “taking a renewed look at possible steps to take here”. When did this start? I’m mainly interested in clarifying whether this was something ongoing, (e.g., prompted by finding out about other situations or hearing from OCB himself about making another woman uncomfortable a few months ago), or was this prompted by knowledge of the existence (or possible existence) of the TIME article.
(commenting in personal capacity etc)
- ^
For example:
-notifying the EV board
-a discussion with other CH colleagues around reducing his exposure to possible victims or level of risk, given his role as Director for FHI’s Research Scholars Programme, such as considering a temporary ban to EAGs (also seems like shared responsibility around the decision made would be appropriate, and not a burden that should fall solely on your shoulders)
- 20 Feb 2023 23:35 UTC; 9 points) 's comment on EV UK board statement on Owen’s resignation by (
Looks fun! Thanks for this. Curious about EA forum alignment methodology!
(also happy new year to the team, thanks for all your work on the forum!)
As requested, here are some submissions that I think are worth highlighting, or considered awarding but ultimately did not make the final cut. (This list is non-exhaustive, and should be taken more lightly than the Honorable mentions, because by definition these posts are less strongly endorsed by those who judged it. Also commenting in personal capacity, not on behalf of other panelists, etc):
Bad Omens in Current Community Building
I think this was a good-faith description of some potential / existing issues that are important for community builders and the EA community, written by someone who “did not become an EA” but chose to go to the effort of providing feedback with the intention of benefitting the EA community. While these problems are difficult to quantify, they seem important if true, and pretty plausible based on my personal priors/limited experience. At the very least, this starts important conversations about how to approach community building that I hope will lead to positive changes, and a community that continues to strongly value truth-seeking and epistemic humility, which is personally one of the benefits I’ve valued most from engaging in the EA community.
Seven Questions for Existential Risk Studies
It’s possible that the length and academic tone of this piece detracts from the reach it could have, and it (perhaps aptly) leaves me with more questions than answers, but I think the questions are important to reckon with, and this piece covers a lot of (important) ground. To quote a fellow (more eloquent) panelist, whose views I endorse: “Clearly written in good faith, and consistently even-handed and fair—almost to a fault. Very good analysis of epistemic dynamics in EA.” On the other hand, this is likely less useful to those who are already very familiar with the ERS space.
Most problems fall within a 100x tractability range (under certain assumptions)
I was skeptical when I read this headline, and while I’m not yet convinced that 100x tractability range should be used as a general heuristic when thinking about tractability, I certainly updated in this direction, and I think this is a valuable post that may help guide cause prioritisation efforts.
The Effective Altruism movement is not above conflicts of interest
I was unsure about including this post, but I think this post highlights an important risk of the EA community receiving a significant share of its funding from a few sources, both for internal community epistemics/culture considerations as well as for external-facing and movement-building considerations. I don’t agree with all of the object-level claims, but I think these issues are important to highlight and plausibly relevant outside of the specific case of SBF / crypto. That it wasn’t already on the forum (afaict) also contributed to its inclusion here.
I’ll also highlight one post that was awarded a prize, but I thought was particularly valuable:
Red Teaming CEA’s Community Building Work
I think this is particularly valuable because of the unique and difficult-to-replace position that CEA holds in the EA community, and as Max acknowledges, it benefits the EA community for important public organisations to be held accountable (and to a standard that is appropriate for their role and potential influence). Thus, even if listed problems aren’t all fully on the mark, or are less relevant today than when the mistakes happened, a thorough analysis of these mistakes and an attempt at providing reasonable suggestions at least provides a baseline to which CEA can be held accountable for similar future mistakes, or help with assessing trends and patterns over time. I would personally be happy to see something like this on at least a semi-regular basis (though am unsure about exactly what time-frame would be most appropriate). On the other hand, it’s important to acknowledge that this analysis is possible in large part because of CEA’s commitment to transparency.- The Effective Altruism movement is not above conflicts of interest by 16 Dec 2022 15:05 UTC; 80 points) (
- 13 Nov 2022 1:49 UTC; 75 points) 's comment on CEA/EV + OP + RP should engage an independent investigator to determine whether key figures in EA knew about the (likely) fraud at FTX by (
- 11 Nov 2022 13:18 UTC; 42 points) 's comment on The FTX Future Fund team has resigned by (
- 14 Dec 2022 3:27 UTC; 10 points) 's comment on Ramiro’s Quick takes by (
- 12 Feb 2024 19:02 UTC; 7 points) 's comment on ‘Why not effective altruism?’ — Richard Y. Chappell by (
- 11 Nov 2022 11:26 UTC; 4 points) 's comment on The FTX Future Fund team has resigned by (
Hey Ollie! Hope you’re well.
I think there’s a tricky trade-off between clarity and scope here....if we state guidelines that are very specific (e.g. a list of things you mustn’t do in specific contexts), we might fail to prevent harmful behaviour that isn’t on the list.
I want to gently push back on this a bit—I don’t think this is necessarily a tradeoff. It’s not clear to me that the guidelines have to be “all-inclusive or nothing”. As an example, just because the guidelines say you can’t use the swapcard app for dating purposes, it would be pretty unreasonable for people to interpret that as “oh, the guidelines don’t say I can’t use the swapcard app to scam people, that must mean this is endorsed by CEA”.
And even if it’s the case that the current guidelines don’t explicitly comment against using swapcard to scam other attendees, and this contributes to some degree of “failing to prevent harmful behaviour that isn’t on the list”, that seems like a bad reason to choose to not state “don’t use swapcard for sexual purposes”.
RE: guidelines that include helpful examples, here’s one that I found from 10secs of googling.
First it defines harrassment and sexual harrassment fairly broadly. Of course, what exactly counts as “reasonably be expected or be perceived to cause offence or humiliation” can differ between people, but this is a marginal improvement compared to current EAG guidelines that simply state “unwanted sexual attention or sexual harrassment”.
It then gives a non-exhaustive list of fairly uncontroversial actions for its context—CEA can adopt its own standard! But I think it’s fair to say that just because this list doesn’t cover every possibility it doesn’t necessarily mean the list is not worth including.
Notably, it also outlines a complaint process and details possible actions that may reasonably occur in response to a complaint.
As I responded to Julia’s comment that you linked, I think these lists can be helpful because most reported cases are likely not from people intentionally wishing to cause harm, but differences in norms or communication or expectations around what might be considered harmful. Having a explicit list of actions helps get around these differences by being more precise about actions that are likely to be considered net negative in expectation. If it’s the case that there are a lot of examples that are in a grey area, then this may be an argument to exclude those examples, but it isn’t really an argument against having a list that contains less ambiguous examples.
Ditto RE: different settings—this is an argument to have narrower scope for the guidelines, and to not write a single guideline that is intended to cover both the career fair and the afterparty, but not an argument against expressing what’s unacceptable under one specific setting (especially when that setting is something as crucial as “EAG conference time”)
Lastly, RE: “Responses should be shaped by the wishes of the person who experienced the problem”—of course it should be! But a list of possible actions that might be taken can be helpful without committing the team to a set response, but the inclusion of potential actions that can be taken is still reassuring and helpful for people to know what can be possible.
Again, this was just the first link I clicked, I don’t think it’s perfect, but I think there are multiple aspects of this that CEA could use to help with further iterations of its guidelines.
Another challenge is that CEA is the host of some events but not the host of some others associated with the conferences. We can’t force an afterparty host or a bar manager to agree to follow our guidelines though we sometimes collaborate on setting norms or encourage certain practices.
I think it’s fine to start from CEA’s circle of influence and have good guidelines + norms for CEA events—if things go well this may incentivise other organisers to adopt these practices (or perhaps they won’t adopt it, because the context is sufficiently different, which is fine too!) But even if other organisers don’t adopt better guidelines, this doesn’t seem like a particularly strong argument against adopting clearer guidelines for CEA events. The UNFCCC presumably aren’t using “oh, we can’t control what happens in UN Youth events globally, and we can’t force them to agree to follow our guidelines” as an excuse to not have guidelines. But because they have their own guidelines, and many UN Youth events try to emulate what the UN event proper looks like, they will (at least try to) adopt a similar level of formality.
One last reason to err on the side of more precise guidelines echoes point 3 in what lilly shared above—if guidelines are vague and more open to interpretation by the Community Health team, this requires a higher level of trust in the CH team’s track record and decision-making and management of CoIs, etc. To whatever extent recent events may reflect actual gaps in this process or even just a change in the perception here, erring on the side of clearer guidelines can help with accountability and trust building.
Many thanks for doing this AMA!
I’m personally excited about more work in the EA space on topics around mental health and subjective well-being, and was initially excited to see StrongMinds (SM) come so strongly recommended. I do have a few Qs about the incredible success the pilots have shown so far:[1]
I couldn’t find number needed to treat (NNT)[2] figures anywhere (please let me know if I’ve missed this!), so I’ve had a rough go based on the published results, and came to an NNT of around 1.35.[3] Limitations of the research aside, this suggests StrongMinds is among the most effective interventions in all of medicine in terms of achieving its stated goals.
If later RCTs and replications showed much higher NNT figures, what do you think would be the most likely reason for this? For comparison:
This meta-analysis suggests an NNT of 3 when comparing IPT to a control condition;
This systematic review suggests an NNT of 4 for interpersonal therapy (IPT) compared to treatment as usual[4];
This meta-analysis suggests a response rate of 41% and an NNT of 4 when comparing therapy to ‘waitlist’ conditions (and lower when only considering IPT in subgroup analyses); or
this meta-analysis which suggests an NNT of 7 when comparing psychotherapies to placebo pill.
Admittedly, there are many caveats here—the various linked studies aren’t a perfect comparison to SM’s work, NNT clearly shouldn’t be used as sole basis for comparison between interventions, and I haven’t done enough work here to feel super confident about the quality of SM’s research. But my initial reaction upon skimming and seeing response to treatment in the range of 94-99%, or 100+ people with PHQ-9 scores of over 15 basically all dropping down to 1-4[5] (edit: an average improvement of 12 points after conclusion of therapy) after 12-16 weeks of group IGT by lay counsellors was that this seemed far “too good to be true”, and fairly incongruent with ~everything I’ve learnt or anecdotally seen in clinical practice about the effectiveness of mental health treatments (though clearly I could be wrong!). This is especially surprising given SM dropped the group of participants with minimal or mild depression from the analysis.[6]
Were these concerns ever raised by the researchers when writing up the reports? Do you have any reason to believe that the Ugandan context or something about the SM methodology makes your intervention many times more effective than basically any other intervention for depression?
[Edit: I note that the 99% figure in the phase 2 trial was disregarded, but the 94% figure in phase 1 trial wasn’t, despite presumably the same methodology? Also curious about the separate analysis that came to 92%, which states: “Since this impact figure was collected at a regular IPT group meeting, as had been done bi-weekly throughout the 12- week intervention, it is unlikely that any bias influenced the figure.” I don’t quite understand how collection at a regular IPT group meeting makes bias unlikely—could you clarify this? Presumably participants knew in advance how many weeks the intervention would be?]
How did you come to the 10% figure when adjusting for social desirability bias?
Was there a reason an RCT couldn’t have been done as a pilot? Just noting that “informal control populations were established for both Phase 1 and Phase 2 patients, consisting of women who screened for depression but did not participate”, and the control group in both the pilots were only 36 people, compared to the 244 and 270 in the treatment arm for phase 1 and phase 2 respectively. As a result, 11 / 24 of the villages where the interventions took place did not have a control arm at all. (pg 9)
Are you happy to go into a bit more detail about the background of the lay counsellors? E.g. what they know prior to the SM pilots, how much training (in number of hours) they receive, and who runs it (what relevant qualifications / background? How did the trainers get their IPT-G certification—e.g. is this a postgrad psychology qualification, or a one-off training course?) I briefly skimmed the text (appendix A + E) but also got a bit confused over the difference between “lay counsellor”, “mental health facilitator”, “mental health supervisor” and “senior technical advisor” and how they’re relevant for the intervention.
Can you give us a cost breakdown of $170 / person figure for delivering the programme (Or $134 for 2021)? See Joel’s response and subsequent discussion for more details. Specifically, whether the methodology for working out the cost / client by dividing total clients reached over SM’s total expenses means that this includes the clients reached by the partners, but not their operating costs / expenses. For example, ~48% of clients were treated through partners in 2021, and Q2 results (pg 2) suggest StrongMinds is on track for ~79% of clients treated through partners in 2022.[7] Or are all expenses of SM partners covered by SM and included in the tax returns?
In the most recent publication (pg 5), published 2017, the report says: “Looking forward, StrongMinds will continue to strengthen our evaluation efforts and will continue to follow up with patients at 6 or 12 month intervals. We also remain committed to implementing a much more rigorous study, in the form of an externally-led, longitudinal randomized control trial, in the coming years.”
Have either the follow-up or the externally-led longitudinal RCT happened yet? If so, are the results shareable with the public? (I note that there has been a qualitative study done on a teletherapy version published in 2021, but no RCT.)
The pivot to teletherapy in light of COVID makes sense, though the evidence-base for its effectiveness is ?presumably weaker.
What’s the breakdown of % clients reached via teletherapy versus clients reached via group IGT as per the original pilots (i.e. in person)
In the 2021 report on a qualitative assessment of teletherapy (pg 2), it says: “Data from StrongMinds shows that phone-based IPT-G is as effective as in-person group therapy in reducing depression symptoms among participants”. Is this research + methodology available to the public? (I searched for phone and telehealth in the other 4 reports which returned no hits)
Does StrongMinds have any other unpublished research?
What’s the plan with telehealth going forward? Was this a temporary thing for COVID, or is this a pivot into a more / similarly effective approach?
I also saw in the HLI report that SM defines treated patients treated for purpose of cost analysis as “attending more than six sessions (out of 12) for face-to-face modes and more than four (out of 8) for teletherapy.”—is this also the definition for the treatment outcomes? i.e. how did SM assess the effectiveness of SM for people who attended 7 sessions and then dropped out? Do we have more details around about how many people didn’t do all sessions, how they responded, and how this was incorporated into SM’s analyses?
Thanks again!
(Commenting in personal capacity etc)
[Edited after Joel’s response to include Q7, Q8, and an update to Q1c and Q5, mainly to put all the unresolved Qs in one place for Sean and other readers’ convenience.]
[Edited to add this disclaimer.]
[Edited to include a link to a newer post StrongMinds should not be a top-rated charity (yet), which includes additional discussion.]
- ^
Apologies in advance if I’ve missed anything—I’ve only briefly skimmed your website’s publications, and I haven’t engaged with this literature for quite a while now!
- ^
Quick primer on NNT for other readers. Lower = better, where NNT = 1 means your treatment gets the desired effect 100% of the time.
- ^
SM’s results of 95% depression-free (85% after the 10% adjustment for social desirability bias) give an EER of 0.15 after adjustment. By a more conservative estimate, based on this quote (pg 3): “A separate control group, which consisted of depressed women who received no treatment, experienced a reduction of depressive symptoms in only 11% of members over the same 12-week intervention period” and assuming all of those are clinically significant reductions in depressive symptoms, the CER is 0.89, which gives an NNT of 1 / (0.89 − 0.15) = 1.35. The EER can be adjusted upwards because not all who started in the treatment group were depressed, but this is only 2% and 6% for phase 1 and 2 respectively—so in any case the NNT is unlikely to go much higher than 1.5 even by the most conservative estimate.
- ^
They also concluded: “We did not find convincing evidence supporting or refuting the effect of interpersonal psychotherapy or psychodynamic therapy compared with ‘treatment as usual’ for patients with major depressive disorder. The potential beneficial effect seems small and effects on major outcomes are unknown. Randomized trials with low risk of systematic errors and low risk of random errors are needed.”
- ^
See Appendix B, pg 30. for more context about what the PHQ-9 scoring is like.
- ^
As pointed out in the report (pg 9):
A total of 56 participants with Minimal or Mild Depression (anyone with total raw scores between 1-9) at baseline in both the treatment intervention (46 participants) and control (10 participants) groups were dropped from the GEE analysis of determining the depression reduction impact. In typical practice around the world, individuals with Minimal/Mild Depression are not considered for inclusion in group therapy because their depressive symptoms are relatively insignificant. StrongMinds consciously included these Minimal/Mild cases in Phase Two because these patients indicated suicidal thoughts in their PHQ-9 evaluation. However, their removal from the GEE analysis serves to ensure that the Impact Evaluation is not artificially inflated, since reducing the depressive symptoms of Minimal/Mild Depressive cases is generally easier to do.
- ^
% clients reached by partners:
20392 / 42482 in 2021
33148 / (33148+8823) in 2022
- Where are you donating this year, and why? (Open thread) by 23 Nov 2022 12:26 UTC; 151 points) (
- 28 Dec 2022 1:15 UTC; 76 points) 's comment on StrongMinds should not be a top-rated charity (yet) by (
- 10 Jul 2023 22:22 UTC; 45 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
- 29 Nov 2022 23:19 UTC; 3 points) 's comment on AMA: Sean Mayberry, Founder & CEO of StrongMinds by (
[Edit: wrote this before I saw lilly’s comment, would recommend that as a similar message but ~3x shorter].
============
I would consider Greg’s comment as “brought up with force”, but would not consider it an “edge case criticism”. I also don’t think James / Alex’s comments are brought up particularly forcefully.
I do think it is worth making a case that pushing back on making comments that are easily misinterpreted or misleading are also not edge case criticisms though, especially if these are comments that directly benefit your organisation.
Given the stated goal of the EA community is “to find the best ways to help others, and put them into practice”, it seems especially important that strong claims are sufficiently well-supported, and made carefully + cautiously. This is in part because the EA community should reward research outputs if they are helpful for finding the best ways to do good, not solely because they are strongly worded; in part because EA donors who don’t have capacity to engage at the object level may be happy to defer to EA organisations/recommendations; and in part because the counterfactual impact diverted from the EA donor is likely higher than the average donor.
For example:
“We’re now in a position to confidently recommend StrongMinds as the most effective way we know of to help other people with your money”.[1]
Michael has expressed regret about this statement, so I won’t go further into this than I already have. However, there is a framing in that comment that suggests this is an exception, because “HLI is quite well-caveated elsewhere”, and I want to push back on this a little.
HLI has previously been mistaken for an advocacy organisation (1, 2). This isn’t HLI’s stated intention (which is closer to a “Happiness/Wellbeing GiveWell”). I outline why I think this is a reasonable misunderstanding here (including important disclaimers that outline HLI’s positives).
Despite claims that HLI does not advocate for any particular philosophical view, I think this is easily (and reasonably) misinterpreted.
James’ comment thread below: “Our focus on subjective wellbeing (SWB) was initially treated with a (understandable!) dose of scepticism. Since then, all the major actors in effective altruism’s global health and wellbeing space seem to have come around to it”
See alex’s comment below, where TLYCS is quoted to say: “we will continue to rely heavily on the research done by other terrific organizations in this space, such as GiveWell, Founders Pledge, Giving Green, Happier Lives Institute [...]”
I think excluding “to identify candidates for our recommendations, even as we also assess them using our own evaluation framework” [emphasis added] gives a fairly different impression to the actual quote, in terms of whether or not TLYCS supports WELLBYs as an approach.
While I wouldn’t want to exclude careless communication / miscommunication, I can understand why others might feel less optimistic about this, especially if they have engaged more deeply at the object level and found additional reasons to be skeptical.[2] I do feel like I subjectively have a lower bar for investigating strong claims by HLI than I did 7 or 8 months ago.
(commenting in personal capacity etc)
============
Adding a note RE: Nathan’s comment below about bad blood:
Just for the record, I don’t consider there to be any bad blood between me and any members of HLI. I previously flagged a comment I wrote with two HLI staff, worrying that it might be misinterpreted as uncharitable or unfair. Based on positive responses there and from other private discussions, my impression is that this is mutual.[3]- ^
-This as the claim that originally prompted me to look more deeply into the StrongMinds studies. After <30 minutes on StrongMinds’ website, I stumbled across a few things that stood out as surprising, which prompted me to look deeper. I summarise some thoughts here (which has been edited to include a compilation of most of the critical relevant EA forum commentary I have come across on StrongMinds), and include more detail here.
-I remained fairly cautious about claims I made, because this entire process took three years / 10,000 hours, so I assumed by default I was missing information or that there was a reasonable explanation.
-However, after some discussions on the forum / in private DMs with HLI staff, I found it difficult to update meaningfully towards believing this statement was a sufficiently well-justified one. I think a fairly charitable interpretation would be something like “this claim was too strong, it is attributable to careless communication, but unintentional.”
- ^
Quotes above do not imply any particular views of commentors referenced.
- ^
I have not done this for this message, as I view it as largely a compilation of existing messages that may help provide more context.
- 11 Jul 2023 8:48 UTC; 1 point) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
TL;DR
I think an outsider may reasonably get the impression that HLI thinks its value is correlated with their ability to showcase the effectiveness of mental health charities, or of WELLBYs as an alternate metric to cause prioritisation. It might also be the case that HLI believes this, based on their published approach, which seems to assume that 1) happiness is what ultimately matters and 2) subjective wellbeing scores are the best way of measuring this. But I don’t personally think this is the case—I think the main value of an organisation like HLI is to help the GH research community work out the extent to which SWB scores are valuable in cause prioritisation, and how we best integrate these with existing measures (or indeed, replace them if appropriate). In a world where HLI works out that WELLBYs actually aren’t the best way of measuring SWB, or that actually we should weigh DALYs to SWB at a 1:5 ratio or a 4:1 ratio instead of replacing existing measures wholesale or disregarding them entirely, I’d still see these research conclusions as highly valuable (even if the money shifted metric might not be similarly high). And I think these should be possibilities that HLI remain open to in its research and considers in its theory of change going forward—though this is based mainly from a truth-seeking / epistemics perspective, and not because I have a deep knowledge of the SWB / happiness literature to have a well-formed view on this (though my sense is that it’s also not a settled question). I’m not suggesting that HLI is not already considering this or doing this, just that from reading the HLI website / published comments, it’s hard to clearly tell that this is the case (and I haven’t looked through the entire website, so I may have missed it).
======
Longer:I think some things that may support Elliot’s views here:
HLI was founded with the mission of finding something better than GiveWell top charities under a subjective wellbeing (SWB) method. That means it’s beneficial for HLI in terms of achieving its phase 1 goal and mission that StrongMinds is highly effective. GiveWell doesn’t have this pressure of finding something better than it’s current best charities (or not to the same degree).
HLI’s investigation of various mental health programmes lead to its strong endorsement for StrongMinds. This was in part based on StrongMinds being the only organisation on HLI’s shortlist (of 13 orgs) to respond and engage with HLI’s request for information. Two potential scenarios for this:
HLI’s hypothesis that mental health charities are systematically undervalued is right, and thus, it’s not necessarily that StrongMinds is uniquely good (acknowledged by HLI here), but the very best mental health charities are all better than non-mental health charities under WELLBYs measurements, which is HLI’s preferred approach RE: “how to do the most good”. However this might bump up against priors or base rates or views around how good mental health charities on HLI’s shortlist might be vs existing GiveWell charities are as comparisons, whether all of global health prioritisation, aid or EA aid has been getting things wrong and we are in need of a paradigm shift, as well as whether WELLBYs and SWB scores alone should be a sufficient metric for “doing the most good”.
Mental health charities are not systematically undervalued, and current aid / EA global health work isn’t in need of a huge paradigm shift, but StrongMinds is uniquely good, and HLI were fortunate that the one that responded happened to be the one that responded. However, if an outsider’s priors on the effectiveness of good mental health interventions generally are much lower than HLI’s, it might seem like this result is very fortuitous for HLI’s mission and goals. On the other hand, there are some reasons to think they might be at least somewhat correlated:
well-run organisations are more likely to have capacity to respond to outside requests for information
organisations with good numbers are more likely to share their numbers etc
HLI have never published any conclusions that are net harmful for WELLBYs or mental health interventions. Depending on how much an outsider thinks GiveWell is wrong here, they might expect GiveWell to be wrong in different directions, and not only in one direction. Some pushback: HLI is young, and would reasonably focus on organisations that is most likely to be successful and most likely to change GiveWell funding priorities. These results are also what you’d expect if GiveWell IS in fact wrong on how charities should be measured.
I think ultimately the combination could contribute to an outsider’s uncertainty around whether they can take HLI’s conclusions at face value, or whether they believe these are the result of an unbiased search optimising for truth-seeking, e.g. if they don’t know who HLI researchers are or don’t have any reason to trust them beyond what they see from HLI’s outputs.
Some important disclaimers:
-All of these discussions are made possible because of HLI (and SM)’s transparency, which should be acknowledged.
-It seems much harder to defend against claims that paints HLI as an “advocacy org” or suggests motivated reasoning etc than to make the claim. It’s also the case that these findings are consistent with what we would expect if the claims 1) “WELLBYs or subjective wellbeing score alone is the best metric for ‘doing the most good’” and 2) “Existing metrics systematically undervalue mental health charities” are true, and HLI is taking a dispassionate, unbiased view towards this. All I’m saying is that an outsider might prefer to not default to believing this.
-It’s hard to be in a position to be challenging the status quo, in a community where reputation is important, and the status quo is highly trusted. Ultimately, I think this kind of work is worth doing, and I’m happy to see this level of engagement and hope it continues in the future.
-Lastly, I don’t want this message (or any of my other messages) to be interpreted to be an attack on HLI itself. For example, I found HLI’s Deworming and decay: replicating GiveWell’s cost-effectiveness analysis to be very helpful and valuable. I personally am excited about more work on subjective wellbeing measures generally (though I’m less certain if I’d personally subscribe to HLI’s founding beliefs), and I think this is a valuable niche in the EA research ecosystem. I also think it’s easy for these conversations to accidentally become too adversarial, and it’s important to recognise that everyone here does share the same overarching goal of “how do we do good better”.
(commenting in personal capacity etc)
- 8 Jul 2023 7:33 UTC; 108 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
- 28 Dec 2022 1:15 UTC; 76 points) 's comment on StrongMinds should not be a top-rated charity (yet) by (
- 10 Jul 2023 22:22 UTC; 45 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
- 10 Jul 2023 14:05 UTC; 2 points) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
- 11 Jul 2023 8:48 UTC; 1 point) 's comment on The Happier Lives Institute is funding constrained and needs you! by (
Thanks for this! I echo Lizka’s comment about linkposting.
In light of the recent events I’m struggling a bit with taking my hindsight-bias shades off, and while I scored it reasonably highly, I don’t think I can fairly engage with whether it should have received a prize over other entries even if I had the capacity to (let alone speak for other panelists). I do remember including it in the comment mainly because I thought it was a risk that didn’t receive enough attention and was worth highlighting (though I have a pretty limited understanding of the crypto space and ~0 clue that things would happen in the way they did).
I think it’s worth noting that there has been at least one other post on the forum that engaged with this specifically, but unfortunately didn’t receive much attention. (Edit: another one here)
Ultimately though, I think it’s more important to think about what actionable and constructive steps the EA community can take going forward. I think there are a lot of unanswered questions wrt accountability from EA leaders in terms of due diligence, what was known or could have been suspected prior to Nov 9th this year, and what systems or checks/balances were in place etc that need to be answered, so the community can work out what the best next steps are in order to minimise the likelihood of something like this from happening again.I also think there are questions around how these kinds of decisions are made when benefits affect one part of the EA community but the risks are pertinent to all, and how to either diversify these risks, or make decision-making more inclusive of more stakeholders, keeping in mind the best interests of the EA movement as a whole.
This is something I’m considering working on at the moment and will try and push for—do feel free to DM me if you have thoughts, ideas, or information.
(Commenting in personal capacity etc)
Both Kat and Emerson are claiming that there have been edits to this post.[1]
I wonder whether an appendix or summary of changes to important claims would be fair and appropriate, given the length of post and severity of allegations? It’d help readers keep up with these changes, and it is likely most time-efficient for the author making the edits to document these as they go along.
[Edit: Kat has since retracted her statement.]
- ^
his original post (he’s quietly redacted a lot of points since publishing) had a lot of falsehoods that he knew were not true. He has since removed some of them after the fact, but those have still been causing us damage.Ben has also been quietly fixing errors in the post, which I appreciate, but people are going around right now attacking us for things that Ben got wrong, because how would they know he quietly changed the post?
This is why every time newspapers get caught making a mistake they issue a public retraction the next day to let everyone know. I believe Ben should make these retractions more visible
- ^
Thanks for this! Useful to get some insight into the FP thought process here.
The effect sizes observed are very large, but it’s important to place in the context of StrongMinds’ work with severely traumatized populations. Incoming PHQ-9 scores are very, very high, so I think … 2) I’m not sure that our general priors about the low effectiveness of therapeutic interventions are likely to be well-calibrated here.
(emphasis added)
Minor nitpick (I haven’t personally read FP’s analysis / work on this):
Appendix C (pg 31) details the recruitment process, where they teach locals about what depression is prior to recruitment. The group they sample from are groups engaging in some form of livelihood / microfinance programmes, such as hairdressers. Other groups include churches and people at public health clinic wait areas. It’s not clear to me based on that description that we should take at face value that the reason for very very high incoming PHQ-9 scores is that these groups are “severely traumatised” (though it’s clearly a possibility!)RE: priors about low effectiveness of therapeutic interventions—if the group is severely traumatised, then while I agree this might make us feel less skeptical about the astounding effect size, it should also make us more skeptical about the high success rates, unless we have reason to believe that severe depression in severely traumatised populations in this context is easier to treat than moderate / mild depression.
If I write a message like that because I find someone attractive (in some form), does that seem wrong to you? :) Genuinely curious about your reaction and am open to changing my mind, but this seems currently fine to me. I worry that if such a thing is entirely prohibited, so much value in new beautiful relationships is lost.
Yes, you’re still contributing to harm (at least probabalistically) because the norm and expectation is currently that EAG / swapcard shouldn’t be used as a speed-dating tool. So if you reaching out only because you find them attractive despite that, you are explicitly going against what other parties are expecting when engaging with swapcard, and they don’t have a way to opt-out of receiving your norm-breaking message.
I’ll also mention that you’re arguing for the scenario of asking people for 1-1s at EAGS “only because you find them attractive”. This means it would also allow for messages like, “Hey, I find you attractive and I’d love to meet.” Would you also defend this? If not, what separates the two messages, and why did you choose the example you gave?
Sure, a new beautiful relationship is valuable, but how many non-work swapcard messages lead to a new beautiful relationship? Put yourself in the shoe of an undergrad who is attending EAG for the first time, wishing to learn more about a potential career in biosecurity or animal welfare or AI safety. Now imagine they receive a message from you, and 50 other people who also find them attractive. This doesn’t seem like a good conference experience, nor a good introduction to the EA community. It also complicates the situation with people they want to reach out to as it increases uncertainty around whether people they want to meet with are responding in a purely professional sense, or whether they are just opportunistic. Then there’s an additional layer of complexity when you add in things around power dynamics etc. Having shared professional standards and norms goes some way to reducing this uncertainty, but people need to actually follow them.
If you are worried that you’ll lose the opportunity for beautiful relationships at EAGs, then there’s nothing stopping you from attending something after the conference wraps up for the day, or even organising some kind of speed-dating thing yourself. But note how your organised speed-dating event would be something people choose to opt in to, unlike sending solicitation DMs via an app intended to be used for professional / networking purposes (or some other purpose explicit on their profile—i.e. if you’re sending that DM to someone whose profile says “DM me if you’re interested in dating me”, then this doesn’t apply. The appropriateness of that is a separate convo though).
Some questions for you:
You say you’re “open to changing your mind”—what would this look like? What kind of harm would need to be possible for you to believe that the expected benefit of a new beautiful relationship isn’t worth it?
What’s the case that it’s the role of CEA and EAG to facilitate new beautiful relationships? Do you apply this standard to other communities and conferences you attend?
I’ll also note Kirsten’s comment above, which already talks about why it could be plausibly be bad “in general”:
”The EAG team have repeatedly asked people not to use EAG or the Swapcard app for flirting. 1-1s at EAG are for networking, and if you’re just asking to meet someone because you think they’re attractive, there’s a good chance you’re wasting their time. It’s also sexualizing someone who presumably doesn’t want to be because they’re at a work event.”
And Lorenzo’s comment above:
”Because EAG(x) conferences exist to enable people to do the most good, conference time is very scarce, misusing a 1-1 slot means someone is missing out on a potentially useful 1-1. Also, these kinds of interactions make it much harder for me to ask extremely talented and motivated people I know to participate in these events, and for me to participate personally. For people that really just want to do the most good, and are not looking for dates, this kind of interaction is very aversive.”
These are largely anecdotal, and are NOT endorsements of all listed critiques, just an acknowledgement that they exist, and may contribute to negative shifts in EA’s public image. This skews towards L) leaning views, and isn’t representative of all critiques out there, just a selection of from people I’ve talked to and commentary I’ve seen / what’s front of mind due to recent conversations.
FTX events are clearly a net negative to EA’s reputation from the outside. This probably was a more larger reputational hit to longtermism than animal welfare or GHW (though not necessarily a larger harm : benefit ratio). But even before this, a lot of left leaning folks view EA’s ties to crypto with skepticism (often this is usually around views of whether crypto is net positive for humanity, not on the extent to which crypto is a sound investment).
EA generally is subject to critiques around measuring impact through a utilitarian lens by those who deem the value of lives “above” measurement, as well as by those who think EA undervalues non-utilitarian moral views or person-affecting views. There are also general criticisms of it being an insufficiently diverse place (usually something like: too white / too western / too male / too elitist) for a movement that cares about doing the most good it can.
EA global health + wellbeing / development is subject to critiques around top-down Western aid (e.g., Easterly, Deaton), and general critiques around the merits of randomista development. Some conclusions are seen as unintuitive e.g. by those who think donating to local causes or your local community is preferable because of some moral obligation to those closer to them or those responsible for their success in some way.
Within the animal welfare space, there’s discussion around whether continued involvement with EA and its principles is a good thing or not for similar reasons (lacking diversity, too top-down, too utilitarian) - these voices largely come from the more left leaning / social justice inclined (e.g. placing value on intersectionality). Accordingly, some within the farmed animal advocacy space also think involvement with EA is contributing to a splintering within the FAAM movement. I’m not sure why this seems more relevant in FAAM than GHW, but some possibilities might be if EA funders are a more important player in the animal space than the GHD space, and if FAAM members are generally more left-leaning and see a larger divide between social justice approaches and EA’s utilitarian EV-maximising approaches. Some conclusions are seen as unintuitive e.g. shrimp welfare (“wait you guys actually care about shrimp?”), or wild animal suffering.
Longtermism is subject to critiques from those uncomfortable with valuing the future at the cost of people today, valuing artificial sentience more than humans, a perceived reliance on EV-maximising views, “tech bros” valuing science fiction ideas over real suffering and justifying such spending as “saving the world”, and the extent to which the future they are wanting to preserve actually involves all of humanity, or just a version of humanity that a limited subculture cares about. Unintuitive conclusions may involve anything ranging from thinking about the future more than 1000 years from now, outside of the solar system, or artificial sentience. The general critiques around diversity and a lean towards tech fixes instead of systematic approaches are perhaps most pronounced in longtermism, perhaps in part due to AI safety being a large focus of longtermism, and in part due to associations with the Bay Area. The general critiques around utilitarianism are perhaps also most pronounced in longtermism, and the media attention around WWOTF probably made more people engage with longtermism and its critiques. On recent EA involvement in politics as a pushback RE: favouring tech fixes > systemic approaches, the Flynn campaign was seen as a negative update for some left-leaning outsiders in terms of EA’s ability to engage in this space.
Outside of cause area considerations, some people get the impression that EA leans young, unprofessional, too skeptical to defer to existing expertise / too eager to defer to a smart generalist to first-principles their way through a complicated problem, that EA is a community that is too closely knit and subject to nepotism, that EA unfairly favours “EA insiders” or “EA alignment”. Other people think EA is too fervent with outreach, and consider university and high school messaging or even 80,000 hours akin to cult recruitment. On a similar vein, some think that the EA movement is too morally demanding, and this may lead to burnout, or insufficiently values individuals’ flourishing. Some others think that EA lacks direction, isn’t well steered, or has an inconsistent theory of change.
- 30 Dec 2022 9:38 UTC; 20 points) 's comment on What the heck happened to EA’s public image? by (
Thanks for writing this! RE: We would advise against working at Conjecture
We think there are many more impactful places to work, including non-profits such as Redwood, CAIS and FAR; alignment teams at Anthropic, OpenAI and DeepMind; or working with academics such as Stuart Russell, Sam Bowman, Jacob Steinhardt or David Krueger. Note we would not in general recommend working at capabilities-oriented teams at Anthropic, OpenAI, DeepMind or other AGI-focused companies.
Additionally, Conjecture seems relatively weak for skill building [...] We expect most ML engineering or research roles at prominent AI labs to offer better mentorship than Conjecture. Although we would hesitate to recommend taking a position at a capabilities-focused lab purely for skill building, we find it plausible that Conjecture could end up being net-negative, and so do not view Conjecture as a safer option in this regard than most competing firms.
I don’t work in AI safety and am not well-informed on the orgs here, but did want to comment on this as this recommendation might benefit from some clarity about who the target audience is.
As written, the claims sound something like:
CAIS et al., alignment teams at Anthropic et al., and working with Stuart Russel et al., are better places to work than Conjecture
Though not necessarily recommended, capabilities research at prominent AI labs is likely to be better than working at Conjecture for skill building, since Conjecture is not necessarily safer.
However:
The suggested alternatives don’t seem like they would be able to absorb a significant amount of additional talent, especially given the increase in interest in AI.
I have spoken to a few people working in AI / AI field building who perceive mentoring to be a bottleneck in AI safety at the moment.
If both of the above are true, what would your recommendation be to someone who had an offer from Conjecture, but not your recommended alternatives? E.g., choosing between independent research funded by LTFF VS working for Conjecture?
Just seeking a bit more clarity about whether this recommendation is mainly targeted at people who might have a choice between Conjecture and your alternatives, or whether this is a blanket recommendation that one should reject offers from Conjecture, regardless of seniority and what their alternatives are, or somewhere in between.
Thanks again!
I think it is entirely possible that people are being unkind because they updated too quickly on claims from Ben’s post that are now being disputed, and I’m grateful that you’ve written this (ditto chinscratch’s comment) as a reminder to be empathetic. That being said, there are also some reasons people might be less charitable than you are for reasons that are unrelated to them being unkind, or the facts that are in contention:
Without commenting on whether Ben’s original post should have been approached better or worded differently or was misleading etc, this comment from the Community Health/Special Projects team might add some useful additional context. There are also previous allegations that have been raised.[1]
Perhaps you are including both of these as part of the same set of allegations, but some may suggest that not being permitted to run sessions / recruit at EAGs and considering blocking attendance (especially given the reference class of actions that have prompted various responses that you can see here) is qualitatively important and may affect whether commentors are being charitable or not (as opposed to if they just considered the contents of Ben’s post VS Nonlinear (NL)’s response). Of course, this depends on how much you think the Community Health/Special Projects team are trustworthy with their judgement / investigation, or how likely this is all just an information cascade etc.
It is possible for altruistic people to be poor managers, poor leaders, make bad decisions about professional boundaries, have a poor understanding of power dynamics, or indeed, be abusive. The extent to which people at NL are altruistic is (afaict) not a major point of contention, and it is possible to not update about how altruistic someone is while also wanting to hold them accountable to some reasonable standard like “not being abusive or manipulative towards people you manage”.
The claims in question from Alice/Chloe/Ben are not anonymous, the identities of Alice and Chloe are known to the Nonlinear team.
Independent of my personal views on these issues, I do think the pushback around ‘stylistic mistakes’ are reasonable insofar as people interpret this to be indicative of something concerning about NL’s approach towards managing staff / criticism / conflict (1, 2, 3), rather than e.g. just being nitpicky about tone, though I appreciate both interpretations are plausible.
I think (much) less is more in this case.[2] I think there are parts of this current post that feel more subjective and not supported by facts, and may be reasonably interpreted by a cynical outsider to look like a distraction or a defensive smear campaign. I think these choices are counterproductive (both for a truth-seeking outsider, and for NL’s own interests), especially given the allegations of frame control and being retaliatory.
There are other parts that might similarly be reasonably interpreted to range from irrelevant (Alice’s personal drug use habits), unproductive (links to Kathy Forth), or misleading (inclusion of photos, inconsistent usage of quotation marks, unnecessary paraphrasing, usage of quotes that miss the full context). I disagreed with the approaches here, though I acknowledge there were competing opinions and I wasn’t privy to the internal discussions that lead to the decisions.
I think a cleaner version of this would have probably been something 5 to 10x shorter (not including the appendix), and looked something like:[3]
Apology for harms done
Acknowledgement of which allegations are seen as the most major (much closer to top 3-5 than all 85)
Responses to major allegations, focusing only on factual differences and claims that are backed up by ~irrefutable evidence
Charitable interpretations of Alice/Chloe/Ben’s position, despite above factual disagreement (what kinds of things need to be true for their allegations to be plausibly reasonable or fair from their perspective),
Lessons learnt, and things NL will do differently in future (some expression of self-awareness / reflection)
An appendix containing a list of unresolved but less critical allegations
Disclaimer: I offered to (and did) help review an early draft, in large part because I expected the NL team to (understandably!) be in panic mode after Ben’s post/getting dogpiled, and I wanted further community updates to be based on as much relevant information as was possible.
This footnote added in response to Jeff’s comment: I agree that it’s likely not double counting, because the story there appears to be one where Kat left the working relationship, which is inconsistent with the accounts of Alice / Chloe’s situations, but also makes it unlikely that the “current employee of NL / Kat” hypothesis is correct.
Perhaps hypocritical given the length of this comment
Acknowledging that I have no PR expertise