Having a savings target seems important. (Not financial advice.)
I sometimes hear people in/around EA rule out taking jobs due to low salaries (sometimes implicitly, sometimes a little embarrassedly). Of course, it’s perfectly understandable not to want to take a significant drop in your consumption. But in theory, people with high salaries could be saving up so they can take high-impact, low-paying jobs in the future; it just seems like, by default, this doesn’t happen. I think it’s worth thinking about how to set yourself up to be able to do it if you do find yourself in such a situation; you might find it harder than you expect.
(Personal digression: I also notice my own brain paying a lot more attention to my personal finances than I think is justified. Maybe some of this traces back to some kind of trauma response to being unemployed for a very stressful ~6 months after graduating: I just always could be a little more financially secure. A couple weeks ago, while meditating, it occurred to me that my brain is probably reacting to not knowing how I’m doing relative to my goal, because 1) I didn’t actually know what my goal is, and 2) I didn’t really have a sense of what I was spending each month. In IFS terms, I think the “social and physical security” part of my brain wasn’t trusting that the rest of my brain was competently handling the situation.)
So, I think people in general would benefit from having an explicit target: once I have X in savings, I can feel financially secure. This probably means explicitly tracking your expenses, both now and in a “making some reasonable, not-that-painful cuts” budget, and gaming out the most likely scenarios where you’d need to use a large amount of your savings, beyond the classic 3 or 6 months of expenses in an emergency fund. For people motivated by EA principles, the most likely scenarios might be for impact reasons: maybe you take a public-sector job that pays half your current salary for three years, or maybe you’d need to self-fund a new project for a year; how much would it cost to maintain your current level of spending, or a not-that-painful budget-cut version? Then you could target that amount (in addition to the emergency fund, so you’d still have that at the end of the period); once you have that, you could feel more secure/spend less brain space on money, donate more of your income, and be ready to jump on a high-impact, low-paying opportunity.
Of course, you can more easily hit that target if you can bring down your expenses—you both lower the required amount in savings and you save more each month. So, maybe some readers would also benefit from cutting back a bit, though I think most EAs are pretty thrifty already.
(This is hardly novel—Ben Todd was publishing related stuff on 80k in 2015. But I guess I had to rediscover it, so posting here in case anyone else could use the refresher.)
One dynamic worth considering here is that a person with near-typical longtermist views about the future also likely believes that there are a large number of salient risks in the future, including sub-extinction AI catastrophes, pandemics, war with China, authoritarian takeover, “white collar bloodbath” etc.
It can be very psychologically hard to spend all day thinking about these risks without also internalizing that these risks may very well affect oneself and one’s family, which in turn implies that typical financial advice and financial lifecycle planning are not well-tailored to the futures that longtermists think we might face. For example, the typical suggestion to save around 6 months in an emergency fund makes sense for the economy of the last hundred years, but if there is widespread white collar automation, what are the odds that there will be job disruption lasting longer than six months? If you think that your country may experience authoritarian takeover, might you want to save enough to buy residence elsewhere?
None of this excuses not making financial sacrifices. But I do think it’s hard to simultaneously think “the future is really risky” and “there is a very achievable (e.g., <<$1M) amount of savings that would make me very secure.”
That’s a fair point, but a lot of the scenarios you describe would mean rapid economic growth and equities going up like crazy. The expectation of my net worth in 40 years on my actual views is way, way higher than it would be if I thought AI was totally fake and the world would look basically the same in 2065. That doesn’t mean you shouldn’t save up though (higher yields are actually a reason to save, not a reason to refrain from saving).
For what it’s worth: a lot of people think emergency fund means cash in a normal savings account, but this is not a good approach. Instead, buy bonds or money market funds with your emergency savings, or put them in a specialized high yield savings account (which to repeat is likely NOT a savings account that you get by default from your bank).
Or just put the money in equities in a liquid brokerage account.
Relevant: I’ve been having some discussions with (non-EA) friends on why they don’t donate more.
Some argue that they want enough money to take care of themselves in the extreme cases of medical problems and political disasters, but still with decent bay area lifestyles. I think the implication is that they will wait until they have around $10 Million or so to begin thinking of donations. And if they have kids, maybe $30 Million.
I obviously find this very frustrating, but also interesting.
Of course, I’d expect that if they would make more money, their bar would increase. Perhaps if they made $1M/year they would get used to a different lifestyle, then assume that they needed $50M/$150M accordingly.
It feels like, “I don’t have an obligation or reason to be an altruistic person until I’m in the top 0.01% of wealthy individuals”
A friend recentlyshared this reason for not giving (fear of an expensive medical crisis). I think if a good resource existed with the base rates of events that can cause financial hardship and solutions for reducing their likelihood (e.g., long term care insurance), this might help some people feel more comfortable with giving.
I passed this along to someone at GWWC and they said this is on their list of ideas to write about.
The biggest risk is, I believe, disability resulting in long-term income loss. My US-centric understanding is that private disability insurance that is both portable (not bound to a specific employer) and broad (e.g., covers any condition that causes a significant loss in earnings capacity) can be difficult to find if you’re not in particularly excellent health.
Basefund was working on the broader issue of donors who subsequently experience financial hardship, although I haven’t heard much about them recently. My assumption was that limitations imposed by the project’s non-profit status would preclude the Basefund model from working for people considering larger donations but worried about needing them back down the road if a crisis happens.
Meeting those needs for those unable to access general-purpose private disability insurance would probably require some sort of model under which the donor paid an insurance premium and reduced their would-be donation accordingly. If there were enough interest, I could see one of the big disability insurance shops underwriting a product like that. Probably wouldn’t be cheap, though. Of course, if someone were willing to financially guarantee claims payment, thus removing any financial risk from the policy administrator, that would make the program more attractive for a would-be administrator.
Yeah this was what I found too when I looked into private US long-term disability insurance a while back. My recollection was:
there’s a surprising number of health exclusions, even for things that happened in your childhood or adolescence
it’s a lot more expensive in percentage terms if you’re at a lower income
many disability cases are ambiguous so the insurance company may have you jump through a lot of hoops and paperwork (a strange role-reversal in which the bureaucracy wants to affirm your agency)
I had the impression that it was a great product for some people, meaning those with high income, clean medical history, and a support network to wrestle with the insurance company. But at the time I looked into it, it didn’t seem like a great option for me even given my risk-adverse preferences.
Planning to look again soon so could change my mind.
I like the thought, but would flag that I’d probably recommend them doing some user interviews or such to really dig at what, if anything, might actually convince these people.
I’d expect that strong marketing people would be good here.
Typically the first few reasons people give for why they aren’t more charitable are all BS, and these sorts of people aren’t the type willing to read many counter-arguments. It can still be good to provide just a bit more evidence on the other side, but you have to go in with the right (low) expectations.
That said, I do think that solutions (like insurance) are a pretty good thing to consider, even to those not making these excuses.
I worry that the pro-AI/slow-AI/stop-AI has the salient characteristics of a tribal dividing line that could tear EA apart:
“I want to accelerate AI” vs “I want to decelerate AI” is a big, clear line in the sand that allows for a lot clearer signaling of one’s tribal identity than something more universally agreeable like “malaria is bad”
Up to the point where AI either kills us or doesn’t, there’s basically in principle no way to verify that one side or the other is “right”, which means everyone can keep arguing about it forever
The discourse around it is more hostile/less-trust-presuming than the typical EA discussion, which tends to be collegial (to a fault, some might argue)
You might think it’s worth having this civl war to clarify what EA is about. I don’t. I would like for us to get on a different track.
The EA Forum moderation team is going to experiment a bit with how we categorize posts. Currently there is a low bar for a Forum post being categorized as “Frontpage” after it’s approved. In comparison, LessWrong is much more opinionated about the content they allow, especially from new users. We’re considering moving in that direction, in order to maintain a higher percentage of valuable content on our Frontpage.
To start, we’re going to allow moderators to move posts from new users from “Frontpage” to “Personal blog”[1], at their discretion, but starting conservatively. We’ll keep an eye on this and, depending on how this goes, we may consider taking further steps such as using the “rejected content” feature (we don’t currently have that on the EA Forum).
Feel free to reply here if you have any questions or feedback.
I would be a bit hesitant to follow Less Wrong’s lead on this too closely. I find the EA Forum, for lack of a better term, feels much friendlier than Less Wrong, and I wouldn’t want that sense of friendliness to go away.
I was hesitant on this one, but I looked at last month’s posts and saw a lot of them with few votes and little engagement, which made me more sympathetic to the concern about the frontpage. Maybe it’s a viable idea with some safeguards:
I think a limitation to application against “new users” mitigates some of the downside risk as long as that definition is operationalized well. In particular, people use throwaways to post criticisms, and the newness of an account should not necessarily establish a “new user” for purpose of this policy. I think mods are capable of figuring out if a throwaway post shows enough EA knowledge, but they should err on the side of letting throwaway criticism posts through to the frontpage. For certain critical posts, the decision to demote should be affirmed by someone independent of CEA.
The risk of being demoted to Personal Blog could be a significant demotivator for people investing the time to write posts.
You could mitigate this by being very clear and objective about what will trigger classification and then applying the stated criteria in a conservative fashion. But based on your stated goals, I think you may have a hard time defining the boundaries with enough objective precision.
You could also invite people to submit 1-2 paragraph pitches if they were concerned about demotion, and establish a safe harbor for anyone who got a thumbs-up on their pitch. But that approach risks being a little too censorious for my tastes, as the likely outcome of a decision not to pre-clear is that the author never completes their idea into a post.
If something is getting any meaningful number of upvotes or comments after being consigned to Personal Blog as lower-quality content, you probably made a mistake that should be reverted ASAP. (When thinking what the thresholds for reversal should be, the much lower visibility of Personal Blogs should carry significant weight.)
I would be hesitant to reject more content—people selecting to show Personal Blog posts presumably know what they are getting themselves into and have implicitly decided to opt out of your filtering efforts.
Thanks Jason! Luckily, which posts get categorized as “Personal blog” is public information (I think it’s easiest to skim via the All posts page), so I would be happy for people to check our work and contact us if you think we’ve made a mistake. If you take a look now, you’ll see that very few posts have been moved there so far, and I don’t expect the rate to change very much going forward.
2. My guess is that the vast majority of new users don’t even know what “Personal blog” means, so I’m not sure how demotivating it will be to them. As I mentioned in another comment, my guess is that getting downvoted is more demotivating for new users.
3. I think that’s a good idea, and I’d be happy for users to flag these as mistakes to the moderators, or just DM me directly and I can return a post to the Frontpage if I agree (I have the final say as head moderator).
I would be nervous about discouraging new users. There’s a high bar for what gets upvoted here on the forum. Especially for VERY new users I’d be nervous about not giving the opportunity for their post to be on the frontpage—maybe it can depend on if you think the post is decent or not?
Ah yeah sorry I was unclear! I basically meant what you said when I said “at their discretion, but starting conservatively” — so we are starting to take “quality” into account when deciding what stays in the Frontpage, because our readers’ time is valuable. You can kind of think of it like: if the mod would have downvoted a post from a new user, the mod can instead decide to move it to “Personal blog”. I think it’s possible that this is actually less discouraging to new users than getting downvoted, since it’s like you’re being moved to a category with different standards. You can check our work by looking at what gets categorized as “Personal blog” via the All posts page. :)
I expect this will affect only a small proportion of new users.
Health Progress Hub is Looking for Contributors from Low- and Middle-Income Countries!
Health Progress Hub (HPH), an initiative by GPRG aims to accelerate global health progress by building infrastructure that helps high-impact NGOs identify and deploy local talent more efficiently. We are looking for contributors from Low- and Middle-Income Countries who are motivated to accelerate global health progress using their local insights and networks.
You’d support both HPH and our partner organizations through research, recruitment assistance, stakeholder mapping, and program support. We’ll match tasks to your strengths and interests, and what HPH and our partners need.
You’ll gain practical experience working on real global health challenges, develop skills in areas such as research, operations and strategy, and connect with others working to tackle critical health challenges. With your permission, we can include you in our talent database, enabling global health organizations to consider you for relevant volunteer or paid positions.
You can find more information and apply through our form (~10 minutes): Application Form
Know someone who should apply? Please send them this or nominate them (~5-10 minutes): Nomination form
Questions? Want to volunteer or provide guidance from a high-income country? Please email us at ren@globalprg.org
If you’re an organization interested in partnering with us to access local talent and expertise, you can reach out to berke@globalprg.org
Productive conference meetup format for 5-15 people in 30-60 minutes
I ran an impromptu meetup at a conference this weekend, where 2 of the ~8 attendees told me that they found this an unusually useful/productive format and encouraged me to share it as an EA Forum shortform. So here I am, obliging them:
Intros… but actually useful
Name
Brief background or interest in the topic
1 thing you could possibly help others in this group with
1 thing you hope others in this group could help you with
NOTE: I will ask you to act on these imminently so you need to pay attention, take notes etc
[Facilitator starts and demonstrates by example]
Round of any quick wins: anything you heard where someone asked for some help and you think you can help quickly, e.g. a resource, idea, offer? Say so now!
Round of quick requests: Anything where anyone would like to arrange a 1:1 later with someone else here, or request anything else?
If 15+ minutes remaining:
Brainstorm whole-group discussion topics for the remaining time. Quickly gather in 1-5 topic ideas in less than 5 minutes.
Show of hands voting for each of the proposed topics.
Discuss most popular topics for 8-15 minutes each. (It might just be one topic)
If less than 15 minutes remaining:
Quickly pick one topic for group discussion yourself.
Or just finish early? People can stay and chat if they like.
Note: the facilitator needs to actually facilitate, including cutting off lengthy intros or any discussions that get started during the ‘quick wins’ and ‘quick requests’ rounds. If you have a group over 10 you might need to divide into subgroups for the discussion part.
I think we had around 3 quick wins, 3 quick requests, and briefly discussed 2 topics in our 45 minute session.
An updated draft of a model of consciousness made based on information and complexity theory
This paper proposes a formal, information-theoretic model of consciousness in which awareness is defined as the alignment between an observer’s beliefs and the objective description of an object. Consciousness is quantified as the ratio between the complexity of true beliefs and the complexity of the full inherent description of the object. The model introduces three distinct epistemic states: Consciousness (true beliefs), Schizo-Consciousness (false beliefs), and Unconsciousness (absence of belief). Object descriptions are expressed as structured sets of object–quality (O–Q) statements, and belief dynamics are governed by internal belief-updating functions (brain codes) and attentional codes that determine which beliefs are foregrounded at any given time. Crucially, the model treats internal states—such as emotions, memories, and thoughts—as objects with describable properties, allowing it to account for self-awareness, misbelief about oneself, and psychological distortion. This framework enables a unified treatment of external and internal contents of consciousness, supports the simulation of evolving belief structures, and provides a tool for comparative cognition, mental health modeling, and epistemic alignment in artificial agents.
Any hints / info on what to look for in a mentor / how to find one? (Specifically for community building.)
I’m starting as a national group director in september, and among my focus topics for EAG London are group-focused things like “figuring out pointers / out of the box ideas / well-working ideas we haven’t tried yet for our future strategy”, but also trying to find a mentor.
These were some thoughts I came up with when thinking about this yesterday: - I’m not looking for accountability or day to day support. I get that from inside our local group. - I am looking for someone that can take a description of the higher level situation and see different things than I can. Either due to perspective differences or being more experienced and skilled. - Also someone who can give me useful input on what skills to focus on building in the medium term. - Someone whose skills and experience I trust, and when they say “plan looks good” it gives me confidence, when I’m trying to do something that feels to me like a long shot / weird / difficult plan and I specifically need validation that it makes sense.
On a concrete level I’m looking for someone to have ~monthly 1-1 calls with and some asynchronous communication. Not about common day to day stuff but larger calls.
There is going to be a Netflix series on SBF titled The Altruists, so EA will be back in the media. I don’t know how EA will be portrayed in the show, but regardless, now is a great time to improve EA communications. More specifically, being a lot more loud about historical and current EA wins — we just don’t talk about them enough!
Julia Garner (Ozark, The Fantastic Four: First Steps, Inventing Anna) and Anthony Boyle (House of Guinness, Say Nothing, Masters of the Air) are set to star in The Altruists, a new eight-episode limited series about Sam Bankman-Fried and Caroline Ellison.
Graham Moore (The Imitation Game, The Outfit) and Jacqueline Hoyt (The Underground Railroad, Dietland, Leftovers) will co-showrun and executive produce the series, which tells the story of Sam Bankman-Fried and Caroline Ellison, two hyper-smart, ambitious young idealists who tried to remake the global financial system in the blink of an eye — and then seduced, coaxed, and teased each other into stealing $8 billion.
The best one-stop summary I know of is still Scott Alexander’s In Continued Defense Of Effective Altruism from late 2023. I’m curious to see if anyone has an updated take, if not I’ll keep steering folks there:
Here’s a short, very incomplete list of things effective altruism has accomplished in its ~10 years of existence. I’m counting it as an EA accomplishment if EA either provided the funding or did the work, further explanations in the footnotes. I’m also slightly conflating EA, rationalism, and AI doomerism rather than doing the hard work of teasing them apart:
Global Health And Development
Saved about 200,000 lives total, mostly from malaria1
Treated 25 million cases of chronic parasite infection.2
Given 5 million people access to clean drinking water.3
Supported clinical trials for both the RTS.S malaria vaccine (currently approved!) and the R21/Matrix malaria vaccine (on track for approval)4
Supported additional research into vaccines for syphilis, malaria, helminths, and hepatitis C and E.5
Supported teams giving development economics advice in Ethiopia, India, Rwanda, and around the world.6
Animal Welfare:
Convinced farms to switch 400 million chickens from caged to cage-free.7
Freed 500,000 pigs from tiny crates where they weren’t able to move around8
Gotten 3,000 companies including Pepsi, Kelloggs, CVS, and Whole Foods to commit to selling low-cruelty meat.
AI:
Developed RLHF, a technique for controlling AI output widely considered the key breakthrough behind ChatGPT.9
…and other major AI safety advances, including RLAIF and the foundations of AI interpretability10.
Founded the field of AI safety, and incubated it from nothing up to the point where Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, Bill Gates, and hundreds of others have endorsed it and urged policymakers to take it seriously.11
Helped convince OpenAI to dedicate 20% of company resources to a team working on aligning future superintelligences.
Gotten major AI companies including OpenAI to work with ARC Evals and evaluate their models for dangerous behavior before releasing them.
Got two seats on the board of OpenAI, held majority control of OpenAI for one wild weekend, and still apparently might have some seats on the board of OpenAI, somehow?12
Helped found, and continue to have majority control of, competing AI startup Anthropic, a $30 billion company widely considered the only group with technology comparable to OpenAI’s.13
Helped (probably, I have no secret knowledge) the Biden administration pass what they called “the strongest set of actions any government in the world has ever taken on AI safety, security, and trust.”
Won the PR war: a recent poll shows that 70% of US voters believe that mitigating extinction risk from AI should be a “global priority”.
Other:
Helped organize the SecureDNA consortium, which helps DNA synthesis companies figure out what their customers are requesting and avoid accidentally selling bioweapons to terrorists14.
Provided a significant fraction of all funding for DC groups trying to lower the risk of nuclear war.15
Played a big part in creating the YIMBY movement—I’m as surprised by this one as you are, but see footnote for evidence17.
I think other people are probably thinking of this as par for the course—all of these seem like the sort of thing a big movement should be able to do. But I remember when EA was three philosophers and few weird Bay Area nerds with a blog. It clawed its way up into the kind of movement that could do these sorts of things by having all the virtues it claims to have: dedication, rationality, and (I think) genuine desire to make the world a better place.
According to the Guardian there is also one movie, another series, and several documentaries potentially in the works
The series is one of several projects in the works on the high-profile financial saga. It was announced in November that Girls creator Lena Dunham will write a movie based on Michael Lewis’s 2023 bestseller Going Infinite: The Rise and Fall of a New Tycoon for Apple and A24. Amazon Prime Video has a limited series in the works from Marvel directors Joe and Anthony Russo and writer David Weil.
There are also multiple competing nonfiction projects: one from Vice Media and the Information on effective altruism, and another from studio XTR and director David Darg that promises “unprecedented access to key players at FTX and the cryptocurrency community” in Bankman-Fried’s home base of the Bahamas.
A third documentary from Fortune and Mark Wahlberg’s company Unrealistic Ideas will focus on the relationship between Bankman-Fried and one of his most vocal critics, Binance founder and CEO Changpeng “CZ” Zhao. Bloomberg has already aired a nonfiction special on the debacle, titled Ruin: Money, Ego & Deception at FTX.
and another from studio XTR and director David Darg that promises “unprecedented access to key players at FTX and the cryptocurrency community” in Bankman-Fried’s home base of the Bahamas.
I don’t think this is necessarily related, but it should be noted that XTR is also currently making a documentary about the Zizians.
I think this is very hard to predict, and I just feel uncertain. Public perception seems to be really fickle, and I could imagine each show being either:
Negative towards SBF/Caroline and negative towards EA (it’s all tech bros feeling superior, e.g. here)
Negative towards SBF/Caroline and positive towards EA (they used ethics as “mostly a front” and only cared about winning)
Positive towards SBF/Caroline, and negative towards EA (they started as idealistic altruists and got corrupted by the toxic EA ideology)
Positive towards SBF/Caroline, and positive towards EA (e.g. making John J. Ray III and Sullivan and Cromwell the villains)
And for each of these 4, it’s not clear what the impact on EA would be, e.g. I think “The Wolf of Wall Street” probably got many people excited about working in finance.
I predict the documentaries will be negative towards EA, as was the vast majority of media on EA in 2023 and 2024, and I think documentaries tend to be mostly negative about their subject, but I’m much more unsure about the fiction series
If it’s anything like the book Going Infinite by Michael Lewis, it’ll probably be a relatively sympathetic portrayal. My initial impression from the announcement post is that it at least sounds like the angle they’re going for is misguided haphazard idealists (which Lewis also did), rather than mere criminal masterminds.
Graham Moore is best known for the Imitation Game, the movie about Alan Turing, and his portrayal was a classic “misunderstood genius angle”. If he brings that kind of energy to a movie about SBF, we can hope he shows EA in a positive light as well.
Another possible comparison you could make would be with the movie The Social Network, which was inspired by real life, but took a lot of liberties and interestingly made Dustin Moskovitz (who funds a lot of EA stuff through Open Philanthropy) a very sympathetic character. (Edit: Confused him and Eduardo Saverin).
I also think there’s lots of precedence for Hollywood to generally make dramas and movies that are sympathetic to apparent “villains” and “antiheroes”. Mindless caricatures are less interesting to watch than nuanced portrayals of complex characters with human motivations. The good fiction at least tries to have that kind of depth.
So, I’m cautiously optimistic. When you actually dive deeper into the story of SBF, you realize he’s more complex than yet another crypto grifter, and I think a nuanced portrayal could actually help EA recover a bit from the narrative that we’re just a TESCREAL techbro cult.
I do also agree in general that we should be louder about the good that EA has actually done in the world.
I want to clarify, for the record, that although I disagree with most members of the EA community on whether we should accelerate or slow down AI development, I still consider myself an effective altruist in the senses that matter. This is because I continue to value and support most EA principles, such as using evidence and reason to improve the world, prioritizing issues based on their scope, not discriminating against foreigners, and antispeciesism.
I think it’s unfortunate that disagreements about AI acceleration often trigger such strong backlash within the community. It appears that advocating for slowing AI development has become a “sacred” value that unites much of the community more strongly than other EA values do. Despite hinging on many uncertain and IMO questionable empirical assumptions, the idea that we should decelerate AI development is now sometimes treated as central to the EA identity in many (albeit not all) EA circles.
As a little bit of evidence for this, I have been publicly labeled a “sellout and traitor” on X by a prominent member of the EA community simply because I cofounded an AI startup. This is hardly an appropriate reaction to what I perceive as a measured, academic disagreement occurring within the context of mainstream cultural debates. Such reactions frankly resemble the behavior of a cult, rather than an evidence-based movement—something I personally did not observe nearly as much in the EA community ten years ago.
Thanks for writing on the forum here—I think its brave of you to comment where there will obviously be lots of pushback. I’ve got a question relating to the new company and EA assignment. You may well have answered this somewhere else, if that’s the case please point me in that direction. I’m a Global Health guy mostly, so am not super deep in AI understanding, so this question may be Naive.
Question: If we frame EA along the (great new website) lines of “Find the best ways to help others”, how are you, through your new startup doing this? Is the for the purpose of earning to Give money away? Or do you think the direct work the startup will do has a high EV for doing lots of good? Feel free to define EA along different lines if you like!
In the case at hand, Matthew would have had to at some point represent himself as supporting slowing down or stopping AI progress. For at least the past 2.5 years, he has been arguing against doing that in extreme depth on the public internet. So I don’t really see how you can interpret him starting a company that aims to speed up AI as inconsistent with his publicly stated views, which seems like a necessary condition for him to be a “traitor”. If Matthew had previously claimed to be a pause AI guy, then I think it would be more reasonable for other adherents of that view to call him a “traitor.” I don’t think that’s raising the definitional bar so high that no will ever meet it—it seems like a very basic standard.
I have no idea how to interpret “sellout” in this context, as I have mostly heard that term used for such situations as rappers making washing machine commercials. Insofar as I am familiar with that word, it seems obviously inapplicable.
I’m obviously not Matthew, but the OED defines them like so:
sell-out: “a betrayal of one’s principles for reasons of expedience”
traitor: “a person who betrays [be gravely disloyal to] someone or something, such as a friend, cause, or principle”
Unless he is lying about what he believes—which seems unlikely—Matthew is not a sell-out, because according to him Mechanize is good or at minimum not bad for the world on his worldview. Hence, he is not betraying his own principles.
As for being a traitor, I guess the first question is, traitor of what? To EA principles? To the AI safety cause? To the EA or AI safety community? In order:
I don’t think Matthew is gravely disloyal to EA principles, as he explicitly says he endorses them and has explained how his decisions make sense on his worldview
I don’t think Matthew is gravely disloyal to the AI safety cause, as he’s been openly critical of many common AI doom arguments for some time, and you can’t be disloyal to a cause you never really bought into in the first place
Whether Matthew is gravely disloyal to the EA or AI safety communities feels less obvious to me. I’m guessing a bunch of people saw Epoch as an an AI safety organisation, and by extension its employees as members of the AI safety community, even if the org and its employees did not necessarily see itself or themselves that way, and felt betrayed for that reason. But it still feels off to me to call Matthew a traitor to the EA or AI safety communities, especially given that he’s been critical of common AI doom arguments. This feels more like a difference over empirical beliefs than a difference over fundamental values, and it seems wrong to me to call someone gravely disloyal to a community for drawing unorthodox but reasonable empirical conclusions and acting on those, while broadly having similar values. Like, I think people should be allowed to draw conclusions (or even change their minds) based on evidence—and act on those conclusions—without it being betrayal, assuming they broadly share the core EA values, and assuming they’re being thoughtful about it.
(Of course, it’s still possible that Mechanize is a net-negative for the world, even if Matthew personally is not a sell-out or a traitor or any other such thing.)
Yes, I understand the arguments against it applying here. My question is whether the threshold is being set at a sufficiently high level that it basically never applies to anyone. Hence why I was looking for examples which would qualify.
Sellout (in the context of Epoch) would apply to someone e.g. concealing data or refraining from publishing a report in exchange for a proposed job in an existing AI company.
As for traitor, I think the only group here that can be betrayed is humanity as a whole, so as long as one believes they’re doing something good for humanity I don’t think it’d ever apply.
As for traitor, I think the only group here that can be betrayed is humanity as a whole, so as long as one believes they’re doing something good for humanity I don’t think it’d ever apply.
Hmm, that seems off to me? Unless you mean “severe disloyalty to some group isn’t Ultimately Bad, even though it can be instrumentally bad”. But to me it seems useful to have a concept of group betrayal, and to consider doing so to be generally bad, since I think group loyalty is often a useful norm that’s good for humanity as a whole.
Specifically, I think group-specific trust networks are instrumentally useful for cooperating to increase human welfare. For example, scientific research can’t be carried out effectively without some amount of trust among researchers, and between researchers and the public, etc. And you need some boundary for these groups that’s much smaller than all humanity to enable repeated interaction, mutual monitoring, and norm enforcement. When someone is severely disloyal to one of those groups they belong to, they undermine the mutual trust that enables future cooperation, which I’d guess is ultimately often bad for the world, since humanity as a whole depends for its welfare on countless such specialised (and overlapping) communities cooperating internally.
It’s not that I’m ignoring group loyalty, just that the word “traitor” seems so strong to me that I don’t think there’s any smaller group here that’s owed that much trust. I could imagine a close friend calling me that, but not a colleague. I could imagine a researcher saying I “betrayed” them if I steal and publish their results as my own after they consulted me, but that’s a much weaker word.
[Context: I come from a country where you’re labeled a traitor for having my anti-war political views, and I don’t feel such usage of this word has done much good for society here...]
I think Holly’s tweet was pretty unreasonable and judge her for that not you. But I also disagree with a lot of other things she says and do not at all consider her to speak for the movement
To the best of my ability to tell (both from your comments and private conversations with others), you and the other Mechanize founders are not getting undue benefit from Epoch funders apart from less tangible things like skills, reputation, etc. I totally agree with your comment below that this does not seem a betrayal of their trust. To me, it seems more a mutually beneficial trade between parties with different but somewhat overlapping values, and I am pro EA as a community being able to make such trades.
AI is a very complex uncertain and important space. This means reasonable people will disagree on the best actions AND that certain actions will look great under some worldviews and pretty harmful under others
As such, assuming you are sincere about the beliefs you’ve expressed re why to found Mechanize, I have no issue with calling yourself an Effective Altruist—it’s about evidence based ways to do the most good, not about doing good my way
Separately:
Under my model of the world, Mechanize seems pretty harmful in a variety of ways, in expectation
I think it’s reasonable for people who object to your work to push back against it and publicly criticise it (though agree that much of the actual criticism has been pretty unreasonable)
The EA community implicitly gives help and resources to other people in it. If most people in the community think that what you’re doing is net harmful even if you’re doing it with good intentions, I think it’s pretty reasonable to not want to give you any of that implicit support?
Can you be a bit more specific about what it means for the EA community to deny Matthew (and Mechanize) implicit support, and which ways of doing this you would find reasonable vs. unreasonable?
I was going to write a comment responding but Neel basically did it for me.
The only thing I would object to is Holly being a “prominent member of the EA community”. The PauseAI/StopAI people are often treated as fringe in the EA community and the she frequently violates norms of discourse. EAs due to their norms of discourse, usually just don’t respond to her in the way she responds to others..
Just off the top of my head: Holly was a community builder at Harvard EA, wrote what is arguably one of the most influential forum posts ever, and took sincere career and personal decisions based on EA principles (first, wild animal welfare, and now, “making AI go well”). Besides that, there are several EAGs and community events and conversations and activities that I don’t know about, but all in all, she has deeply engaged with EA and has been a thought leader of sorts for a while now. I think it is completely fair to call her a prominent member of the EA community.[1]
I am unsure if Holly would like the term “member” because she has stated that she is happy to burn bridges with EA / funders, so maybe “person who has historically been strongly influenced by and has been an active member of EA” would be the most accurate but verbose phrasing.
My impression is that Holly has intentionally sacrificed a significant amount of influence within EA because she feels that EA is too constraining in terms of what needs to be done to save humanity from AI.
So that term would have been much more accurate in the past.
Right but most of this is her “pre-AI” stuff and I am saying that I don’t think “Pause AI” is very mainstream by EA standards, particularly the very inflammatory nature of the activism and the policy prescriptions are definitely not in the majority. It is in that sense that I object to Matthew calling her prominent since by the standard you are suggesting, Matthew is also prominent. He’s been in the movement for a decade and written a lot of extremely influential posts and was a well known part of Epoch for a long time and also wrote one of the most prescient posts ever.
I don’t dispute that Holly has been an active and motivated member of the EA community for a while
I think there’s some speaking past each other due to differing word choices. Holly is prominent, evidenced by the fact that we are currently discussing her. She has been part of the EA community for a long time and appears to be trying to do the most good according to her own principles. So it’s reasonable to call her a member of the EA community. And therefore “prominent member” is accurate in some sense.
However, “prominent member” can also imply that she represents the movement, is endorsed by it, or that her actions should influence what EA as a whole is perceived to believe. I believe this is the sense that Marcus and Matthew are using it, and I disagree that she fits this definition. She does not speak for me in any way. While I believe she has good intentions, I’m uncertain about the impact of her work and strongly disagree with many of her online statements and the discourse norms she has chosen to adopt, and think these go against EA norms (and would guess they are also negative for her stated goals, but am less sure on this one).
Edit: I think that Neel’s comment is basically just a better version of the stuff I was trying to say. (On the object level I’m a little more sympathetic than him to ways in which Mechanize might be good, although I don’t really buy the story to that end that I’ve seen you present.)
Wanting to note that on my impressions, and setting aside who is correct on the object-level question of whether Mechanize’s work is good for the world:
My best read of the situation is that Matthew has acted very reasonably (according to his beliefs), and that Holly has let herself down a bit
I believe that Holly honestly feels that Matthew is a sellout and a traitor; however, I don’t think that this is substantiated by reasonable readings of the facts, and I think this is the kind of accusation which it is socially corrosive to make publicly based on feelings
On handling object-level disagreements about what’s crucial to do in the world …
I think that EA-writ-large should be endorsing methodology more than conclusions
Inevitably we will have cases where people have strong earnest beliefs about what’s good to do that point in conflicting directions
I think that we need to support people in assessing the state of evidence and then acting on their own beliefs (hegemony of majority opinion seems kinda terrible)
Of course people should be encouraged to beware unilateralism, but I don’t think that can extend to “never do things other people think are actively destructive”
It’s important to me that EA has space for earnest disagreements
I therefore think that we should have something like “civilized society” norms, which constrain actions
Especially (but not only!) those which would be harmful to the ability for the group to have high-quality discourse
cf. SBF’s actions, which I think were indefensible even if he earnestly believed them to be the best thing
Matthew’s comment was on −1 just now. I’d like to encourage people not to vote his post into the negative. Even though I don’t find his defense at all persuasive, I still think it deserves to be heard.
What I perceive as a measured, academic disagreement
This isn’t merely an “academic disagreement” anymore. You aren’t just writing posts, you’ve actually created a startup. You’re doing things in the space.
As an example, it’s neither incoherent nor hypocritical to let philosophers argue “Maybe existence is negative, all things considered” whilst still cracking down on serial killers. The former is necessary for academic freedom, the latter is not.
The point of academic freedom is to ensure that the actions we take in the world are as well-informed as possible. It is not to create a world without any norms at all.
It appears that advocating for slowing AI development has become a “sacred” value… Such reactions frankly resemble the behavior of a cult
Honestly, this is such a lazy critique. Whenever anyone disagrees with a group, they can always dismiss them as a “cult” or “cult-adjacent”, but this doesn’t make it true.
I think Ozzie’s framing of cooperativeness is much more accurate. The unilateralist’s curse very much applies to differential technology development, so if the community wants to have an impact here, it can’t ignore the issue of “cowboys” messing things up by rowing in the opposite direction, especially when their reasoning seems poor. Any viable community, especially one attempting to drive change, needs to have a solution to this problem.
Having norms isn’t equivalent to being a cult. When Fair Trade started taking off, I shared some of my doubts with some people who were very committed to it. This went poorly. They weren’t open-minded at all, but I wouldn’t run around calling Fair Trade a cult or even cult adjacent. They were just… a regular group.
And if I had run around accusing them of essentially being a “cult” that would have reflected poorly on me rather than on them.
I have been publicly labeled a “sellout and traitor”… simply because I cofounded an AI startup
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.
However, this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
I concede that there wasn’t a previous well-defined norm against this, but norms have to get started somehow. And this is how it happens, someone does something, people are like wtf and then, sometimes, a consensus forms that a norm is required.
Quick thoughts: 1. I think I want to see more dialogue here. I don’t personally like the thought of the Mechanize team and EA splitting apart (at least, more than is already the case). I’d naively expect that there might still be a fair bit of wiggle room for the Mechanize team to do better or worse things in the world, and I’d of course hope for the better size of that. (I think the situation is still very early for instance). 2. I find it really difficult to adjudicate on morality and specifics of the Mechanize spinnoff. I don’t know as much about the details as others do. It really isn’t clear to me what the previous funders of Epoch believed or what the conditions of the donations were. I think those details matter in trying to judge the situation. 3. The person you mentioned, Holly Elmore, is really the first and and one of the loudest to get upset about many things of this sort of shape. I think Holly disagrees with much of the EA scene, but in the opposite way than you/Matthew does. I personally think Holly goes a fair bit too far much of the time. That said, I know there were others who were upset about this who I think better represent the main EA crowd. 4. “the idea that we should decelerate AI development is now sometimes treated as central to the EA identity in many (albeit not all) EA circles.” The way I see it is more that it’s somewhat a matter of cooperativeness between EA organizations. There are a bunch of smart people and organizations working hard to slow down generic AI development. Out of all the things one could do, there are many useful things to work on other than [directly speeding up AI development]. This is akin to how it would be pretty awkward if there were a group that calls themselves EA that tries to fight global population growth by making advertisements attacking GiveWell—it might be the case that they feel like they have good reasons for this, but it makes sense to me why some EAs might not be very thrilled. Related, I’ve seen some arguments for longer timelines that makes sense to me, but I don’t feel like I’ve seen many arguments in favor of speeding up AI timelines that make sense to me.
I have been publicly labeled a “sellout and traitor” on X by a prominent member of the EA community simply because I cofounded an AI startup.
This accusation was not because you cofounded an AI startup. It was specifically because you took funding to work on AI safety from people who want to slow down AI development use capability trends to better understand how to make AI safer*, and you are now (allegedly) using results developed from that funding to start a company dedicated to accelerating AI capabilities.
I don’t know exactly what results Mechanize is using, but if this is true, then it does indeed constitute a betrayal. Not because you’re accelerating capabilities, but because you took AI safety funding and used the results to do the opposite of what funders wanted.
*Corrected to give a more accurate characterization, see Chris Leong’s comment
“From people who want to slow down AI development”
The framing here could be tighter. It’s more about wanting to be able to understand AI capability trends better without accidentally causing capability externalities.
Yes I think that is better than what I said, both because it’s more accurate, and because it’s more clear that Matthew did in fact use his knowledge of capability trends to decide that he could profit from starting an AI company.
Like, I don’t know what exactly went into his decision, but I would be surprised if that knowledge didn’t play a role.
Arguably that’s less on Matthew and more on the founders of Epoch for either misrepresenting themselves or having a bad hiring filter. Probably the former—if I’m not mistaken, Tamay Besiroglu co-founded Epoch and is now co-founding Mechanize, so I would say Tamay behaved badly here but I’m not sure whether Matthew did.
If this line of reasoning is truly the basis for calling me a “sellout” and a “traitor”, then I think the accusation becomes even more unfounded and misguided. The claim is not only unreasonable: it is also factually incorrect by any straightforward or good-faith interpretation of the facts.
To be absolutely clear: I have never taken funds that were earmarked for slowing down AI development and redirected them toward accelerating AI capabilities. There has been no repurposing or misuse of philanthropic funding that I am aware of. The startup in question is an entirely new and independent entity. It was created from scratch, and it is funded separately—it is not backed by any of the philanthropic donations I received in the past. There is no financial or operational overlap.
Furthermore, we do not plan on meaningfully making use of benchmarks, datasets, or tools that were developed during my previous roles in any substantial capacity at the new startup. We are not relying on that prior work to advance our current mission. And as far as I can tell, we have never claimed or implied otherwise publicly.
It’s also important to address the deeper assumption here: that I am somehow morally or legally obligated to permanently align my actions with the preferences or ideological views of past philanthropic funders who supported an organization that employed me. That notion seems absurd. It has no basis in ordinary social norms, legal standards, or moral expectations. People routinely change roles, perspectives evolve, and institutions have limited scopes and timelines. Holding someone to an indefinite obligation based solely on past philanthropic support would be unreasonable.
Even if, for the sake of argument, such an obligation did exist, it would still not apply in this case—because, unless I am mistaken, the philanthropic grant that supported me as an employee never included any stipulation about slowing down AI in the first place. As far as I know, that goal was never made explicit in the grant terms, which renders the current accusations irrelevant and unfounded.
Ultimately, these criticisms appear unsupported by evidence, logic, or any widely accepted ethical standards. They seem more consistent with a kind of ideological or tribal backlash to the idea of accelerating AI than with genuine, thoughtful, and evidence-based concerns.
It’s also important to address the deeper assumption here: that I am somehow morally or legally obligated to permanently align my actions with the preferences or ideological views of past philanthropic funders who supported an organization that employed me. That notion seems absurd. It has no basis in ordinary social norms, legal standards, or moral expectations. People routinely change roles, perspectives evolve, and institutions have limited scopes and timelines. Holding someone to an indefinite obligation based solely on past philanthropic support would be unreasonable.
I don’t think a lifetime obligation is the steelmanned version of your critics’ narrative, though. A time-limited version will work just as well for them.
In many circumstances, I do think society does recognize a time-limited moral obligation and social norm not to work for the other side from those providing you significant resources,[1] --although I am not convinced it would in the specific circumstances involving you and Epoch. So although I would probably acquit you of the alleged norm violation here, I would not want others drawing larger conclusions about the obligation / norm from that acquittal than warranted.[2]
There is something else here, though. At least in the government sector, time-limited post-employment restrictions are not uncommon. They are intended to avoid the appearance of impropriety as much as actual impropriety itself. In those cases, we don’t trust the departing employee not to use their prior public service for private gain in certain ways. Moreover, we recognize that even the appearance that they are doing so creates social costs. The AIS community generally can’t establish and enforce legally binding post-employment restrictions, but is of course free to criticize people whose post-employment conduct it finds inappropriate under community standards. (“Traitor” is rather poorly calibrated to those circumstances, but most of the on-Forum criticism has been somewhat more measured than that.)
Although I’d defer to people with subject-matter expertise on whether there is an appearance of impropriety here, [3] I would note that is a significant lower standard for your critics to satisfy than proving actual impropriety. If there’s a close enough fit between your prior employment and new enterprise, that could be enough to establish a rebuttable presumption of an appearance.
For instance, I would consider it shady for a new lawyer to accept a competitive job with Treehuggers (made up organization); gain skill, reputation, and career capital for several years through Treehuggers’ investment of money and mentorship resources; and then use said skill and reputation to jump directly to a position at Big Timber with a big financial upside. I would generally consider anyone who did that as something of . . . well, a traitor and a sellout to Treehuggers and the environmental movement.
This should also not be seen as endorsing your specific defense rationale. For instance, I don’t think an explicit “stipulation about slowing down AI” in grant language would be necessary to create an obligation.
My deference extends to deciding what impropriety means here, but “meaningfully making use of benchmarks, datasets, or tools that were developed during [your] previous roles” in a way that was substantially assisted by your previous roles sounds like a plausible first draft of at least one form of impropriety.
At least in the government sector, time-limited post-employment restrictions are not uncommon. They are intended to avoid the appearance of impropriety as much as actual impropriety itself. In those cases, we don’t trust the departing employee not to use their prior public service for private gain in certain ways.
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.
However, this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
I agree that Michael’s framing doesn’t quite work. It’s not even clear to me that OpenPhil, for example, is aiming to “slow down AI development” as opposed to “fund research into understanding AI capability trends better without accidentally causing capability externalities”.
I’ve previously written a critique here, but the TLDR is that Mechanise is a major burning of the commons that damages trust within the Effective Altruism community and creates a major challenge for funders who want to support ideological diversity in forecasting organisations without accidentally causing capability externalities.
Furthermore, we do not plan on meaningfully making use of benchmarks, datasets, or tools that were developed during my previous roles in any substantial capacity at the new startup. We are not relying on that prior work to advance our current mission. And as far as I can tell, we have never claimed or implied otherwise publicly.
This is a useful clarification. I had a weak impression that Mechanise might be.
They seem more consistent with a kind of ideological or tribal backlash to the idea of accelerating AI than with genuine, thoughtful, and evidence-based concerns.
I agree that some of your critics may not have quite been able to hit the nail on the head when they tried to articulate their critiques (it took me substantial effort to figure out what I precisely thought was wrong, as opposed to just ‘this feels bad’), but I believe that the general thrust of their arguments more or less holds up.
I agree that some of your critics may not have quite been able to hit the nail on the head when they tried to articulate their critiques (it took me substantial effort to figure out what I precisely thought was wrong, as opposed to just ‘this feels bad’), but I believe that the general thrust of their arguments generally holds up.
In context, this comes across to me as an overly charitable characterization of what actually occurred: someone publicly labeled me a literal traitor and then made a baseless, false accusation against me. What’s even more concerning is that this unfounded claim is now apparently being repeated and upvoted by others.
When communities choose to excuse or downplay this kind of behavior—by interpreting it in the most charitable possible way, or by glossing over it as being “essentially correct”—they end up legitimizing what is, in fact, a low-effort personal attack without a factual basis. Brushing aside or downplaying such attacks as if they are somehow valid or acceptable doesn’t just misrepresent the situation; it actively undermines the conditions necessary for good faith engagement and genuine truth-seeking.
I urge you to recognize that tolerating or rationalizing this type of behavior has real social consequences. It fosters a hostile environment, discourages honest dialogue, and ultimately corrodes the integrity of any community that claims to value fairness and reasoned discussion.
I think Holly just said what a lot of people were feeling and I find that hard to condemn.
”Traitor” is a bit of a strong term, but it’s pretty natural for burning the commons to result in significantly less trust. To be honest, the main reason why I wouldn’t use that term myself is that it reifies individual actions into a permanent personal characteristic and I don’t have the context to make any such judgments. I’d be quite comfortable with saying that founding Mechanise was a betrayal of sorts, where the “of sorts” clarifies that I’m construing the term broadly.
Glossing over it as being “essentially correct”
This characterisation doesn’t quite match what happened. My comment wasn’t along the lines, “Oh, it’s essentially correct, close enough is good enough, details are unimportant”, but I actually wrote down what I thought a more careful analysis would look like.
They end up legitimizing what is, in fact, a low-effort personal attack without a factual basis
Part of the reason why I’ve been commenting is to encourage folks to make more precise critiques. And indeed, Michael has updated his previous comment in response to what I wrote.
A baseless, false accusation
Is it baseless?
I noticed you wrote: “we do not plan on meaningfully making use”. That provides you with substantial wriggle room. So it’s unclear to me at this stage that your statements being true/defensible would necessitate her statements being false.
Yes, absolutely. With respect, unless you can provide some evidence indicating that I’ve acted improperly, I see no productive reason to continue engaging on this point.
What concerns me most here is that the accusation seems to be treated as credible despite no evidence being presented and a clear denial from me. That pattern—assuming accusations about individuals who criticize or act against core dogmas are true without evidence—is precisely the kind of cult-like behavior I referenced in my original comment.
Suggesting that I’ve left myself “substantial wiggle room” misinterprets what I intended, and given the lack of supporting evidence, it feels unfair and unnecessarily adversarial. Repeatedly implying that I’ve acted improperly without concrete substantiation does not reflect a good-faith approach to discussion.
If you don’t want to engage, that’s perfectly fine. I’ve written a lot of comments and responding to all of them would take substantial time. It wouldn’t be fair to expect that from you.
That said, labelling asking for clarification “cult-like behaviour” is absurd. On the contrary, not naively taking claims at face value is a crucial defence against this. Furthermore, implying that someone asking questions in bad faith is precisely the technique that cult leaders use[1].
I said that the statement left you substantial wiggle room. This was purely a comment about how the statement could have a broad range of interpretations. I did not state, nor mean to imply, that this vagueness was intentional or in bad faith.
That said, people asking questions in bad faith is actually pretty common and so you can’t assume that something is a cult just because they say that their critics are mostly acting in bad faith.
To be clear, I was not calling your request for clarification “cult-like”. My comment was directed at how the accusation against me was seemingly handled—as though it were credible until I could somehow prove otherwise. No evidence was offered to support the claim. Instead, assertions were made without substantiation. I directly and clearly denied the accusations, but despite that, the line of questioning continued in a way that strongly suggested the accusation might still be valid.
To illustrate the issue more clearly: imagine if I were to accuse you of something completely baseless, and even after your firm denials, I continued to press you with questions that implicitly treated the accusation as credible. You would likely find that approach deeply frustrating and unfair, and understandably so. You’d be entirely justified in pushing back against it.
That said, I acknowledge that describing the behavior as “cult-like” may have generated more heat than light. It likely escalated the tone unnecessarily, and I’ll be more careful to avoid that kind of rhetoric going forward.
I can see why you’d find this personally frustrating.
On the other hand, many people in the community, myself included, took certain claims from OpenAI and sbf at face value when it might have been more prudent to be less trusting. I understand that it must be unpleasant to face some degree of distrust due to the actions of others.
And I can see why you’d see your statements as a firm denial, whilst from my perspective, they were ambiguous. For example, I don’t know how to interpret your use of the word “meaningful”, so I don’t actually know what exactly you’ve denied. It may be clear to you because you know what you mean, but it isn’t clear to me.
(For what it’s worth, I neither upvoted nor downvoted the comment you made before this one, but I did disagree vote it.)
I’m a 36 year old iOS Engineer/Software Engineer who switched to working on Image classification systems via Tensorflow a year ago. Last month I was made redundant with a fairly generous severance package and good buffer of savings to get me by while unemployed.
The risky step I had long considered of quitting my non-impactful job was taken for me. I’m hoping to capitalize on my free time by determining what career path to take that best fits my goals. I’m pretty excited about it.
I created a weighted factor model to figure out what projects or learning to take on first. I welcome feedback on it. There’s also a schedule tab for how I’m planning to spend my time this year and a template if anyone wishes to use this spreadsheet their selves.
I got feedback from my 80K hour advisor to get involved in EA communities more often. I’m also want to learn more publicly be it via forums or by blogging. This somewhat unstructured dumping of my thoughts is a first step towards that.
Out of that list I’d guess that the fourth and fifth (depending on topics) bullets are most suitable for the Forum.
The basic way I’d differentiate content is that the Forum frontpage should all be content that is related to the project of effective altruism, the community section is about EA as a community (i.e. if you were into AI Safety but not EA, you wouldn’t be interested in the community section), and “personal blog” (i.e. not visible on frontpage) is the section for everything that isn’t in those categories. For example posts on “Miscellaneous topics such as productivity and ADD” would probably be moved to personal blog, unless they were strongly related to EA. This doesn’t mean the content isn’t good—lots of EAs read productivity content, but ideally, the Forum should be focused on EA priorities rather than what EAs find interesting.
Feel free to message me with specific ideas that I could help categorise for you! And if in doubt, quick-takes are much more loose and you can post stuff like the bi-weekly updates there to gauge interest.
In a galactic civilisation of thousands of independent and technologically advanced colonies, what is the probability that one of those colonies will create trillions of suffering digital sentient beings? (probably near 100% if digital sentience is possible… it only takes one)
Is it possible to create a governance structure that would prevent any person in a whole galactic civilisation from creating digital sentience capable of suffering? (sounds really hard especially given the huge distances and potential time delays in messaging… no idea)
What is the point of no-return where a domino is knocked over that inevitably leads to self-perpetuating human expansion and the creation of galactic civilisation? (somewhere around a self-sustaining civilisation on Mars I think).
If the answer to question 3 is “Mars colony”, then it’s possible that creating a colony on Mars is a huge s-risk if we don’t first answer question 2.
Looks like Mechanize is choosing to be even more irresponsible than we previously thought. They’re going straight for automating software engineering. Would love to hear their explanation for this.
Some useful context is that I think a software singularity is unlikely to occur; see this blog post for some arguments. Loosely speaking, under the view expressed in the linked blog post, there aren’t extremely large gains from automating software engineering tasks beyond the fact that these tasks represent a significant (and growing) fraction of white collar labor by wage bill.
Even if I thought a software singularity will likely happen in the future, I don’t think this type of work would be bad in expectation, as I continue to think that accelerating AI is likely good for the world. My main argument is that speeding up AI development will hasten large medical, technological, and economic benefits to people alive today, without predictably causing long-term harms large enough to outweigh these clear benefits. For anyone curious about my views, I’ve explained my perspective on this issue at length on this forum and elsewhere.
Note: Matthew’s comment was negative just now. Please don’t vote it into the negative and use the disagree button instead. Even though I don’t think Matthew’s defense is persuasive, it deserves to be heard.
I wrote a critique of that article here. TLDR: “It has some strong analysis at points, but unfortunately, it’s undermined by some poor choices of framing/focus that mean most readers will probably leave more confused than when they came”.
”A software singularity is unlikely to occur”—Unlikely enough that you’re willing to bet the house on it? Feels like you’re picking up pennies in front of a steamroller.
I continue to think that accelerating AI is likely good for the world
AI is already going incredibly fast. Why would you want to throw more fuel on the fire?
Is it that you honestly think AI is moving too slow at the moment (no offense, but seems crazy to me) or is your worry that current trends are misleading and AI might slow in the future?
Regarding the latter, I agree that once timelines start to get sufficiently long, there might actually be an argument for accelerating them (but in order to reach AGI before biotech causes a catastrophe, rather than the more myopic reasons you’ve provided). But if your worry is stagnation, why not actually wait until things appear to have stalled and then perhaps consider doing something like this?
Or why didn’t you just stay at Epoch, which was a much more robust and less fragile theory of action? (Okay, I don’t actually think articles like this are high enough quality to be net-positive, but you were 90% of the way towards having written a really good article. The framing/argument just needed to be a bit tighter, which could have been achieved with another round of revisions).
The main reason not to wait is… missing the opportunity to cash in on the current AI boom.
I bet the strategic analysis for Mechanize being a good choice (net-positive and positive relative to alternatives) is paper-thin, even given his rough world view.
Might be true, doesn’t make that not a strawman. I’m sympathetic to thinking it’s implausible that mechanize would be the best thing to do on altruistic grounds even if you share views like those of the founders. (Because there is probably something more leveraged to do and some weight on cooperativeness considerations.)
Sometimes the dollar signs can blind someone and cause them not to consider obvious alternatives. And they will feel that they made the decision for reasons other than the money, but the money nonetheless caused the cognitive distortion that ultimately led to the decision.
I’m not claiming that this happened here. I don’t have any way of really knowing. But it’s certainly suspicious. And I don’t think anything is gained by pretending that it’s not.
As part of MATS’ compensation reevaluation project, I scraped the publicly declared employee compensations from ProPublica’s Nonprofit Explorer for many AI safety and EA organizations (data here) in 2019-2023. US nonprofits are required to disclose compensation information for certain highly paid employees and contractors on their annual Form 990 tax return, which becomes publicly available. This includes compensation for officers, directors, trustees, key employees, and highest compensated employees earning over $100k annually. Therefore, my data does not include many individuals earning under $100k, but this doesn’t seem to affect the yearly medians much, as the data seems to follow a lognormal distribution, with mode ~$178k in 2023, for example.
I generally found that AI safety and EA organization employees are highly compensated, albeit inconsistently between similar-sized organizations within equivalent roles (e.g., Redwood and FAR AI). I speculate that this is primarily due to differences in organization funding, but inconsistent compensation policies may also play a role.
I’m sharing this data to promote healthy and fair compensation policies across the ecosystem. I believe that MATS salaries are quite fair and reasonably competitive after our recent salary reevaluation, where we also used Payfactors HR market data for comparison. If anyone wants to do a more detailed study of the data, I highly encourage this!
I decided to exclude OpenAI’s nonprofit salaries as I didn’t think they counted as an “AI safety nonprofit” and their highest paid current employees are definitely employed by the LLC. I decided to include Open Philanthropy’s nonprofit employees, despite the fact that their most highly compensated employees are likely those under the Open Philanthropy LLC.
I guess orgs need to be more careful about who they hire as forecasting/evals researchers in light of a recently announced startup.
Sometimes things happen, but three people at the same org...
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.
However, this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
Now let’s suppose you’re an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.
This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.
There’s been some discussions on the EA forum along the lines of “why do we care about value alignment shouldn’t we just hire who can best do the job”. My answer to that is that it’s myopic to only consider what happens whilst they’re working for you. Hiring someone or offering them an opportunity empowers them, you need to consider whether they’re someone who you want to empower[1].
Admittedly, this isn’t quite the same as value alignment. Suppose someone were diligent, honest, wise and responsible. You might want to empower them even if their views were extremely different from yours. Stronger: even if their views were the opposite in many ways. But in the absence of this, value alignment matters.
If you only hire people who you believe are intellectually committed to short AGI timelines (and who won’t change their minds given exposure to new evidence and analysis) to work in AGI forecasting, how can you do good AGI forecasting?
One of the co-founders of Mechanize, who formerly worked at Epoch AI, says he thinks AGI is 30 to 40 years away. That was in this video from a few weeks ago on Epoch AI’s YouTube channel.
He and one of his co-founders at Mechanize were recently on Dwarkesh Patel’s podcast (note: Dwarkesh Patel is an investor in Mechanize) and I didn’t watch all of it but it seemed like they were both arguing for longer AGI timelines than Dwarkesh believes in.
I also disagree with the shortest AGI timelines and found it refreshing that within the bubble of people who are fixated on near-term AGI, at least a few people expressed a different view.
I think if you restrict who you hire to do AGI forecasting based on strong agreement with a predetermined set of views, such as short AGI timelines and views on AGI alignment and safety, then you will just produce forecasts that re-state the views you already decided were the correct ones while you were hiring.
I wasn’t suggesting only hiring people who believe in short-timelines. I believe that my original post adequately lays out my position, but if any points are ambiguous, feel free to request clarification.
I don’t know how Epoch AI can both “hire people with a diversity of viewpoints in order to counter bias” and ensure that your former employees won’t try to “cash in on the AI boom in an acceleratory way”. These seem like incompatible goals.
I think Epoch has to either:
Accept that people have different views and will have different ideas about what actions are ethical, e.g., they may view creating an AI startup focused on automating labour as helpful to the world and benign
or
Only hire people who believe in short AGI timelines and high AGI risk and, as a result, bias its forecasts towards those conclusions
Presumably there are at least some people who have long timelines, but also believe in high risk and don’t want to speed things up. Or people who are unsure about timelines, but think risk is high whenever it happens. Or people (like me) who think X-risk is low* and timelines very unclear, but even a very low X-risk is very bad. (By very low, I mean like at least 1 in 1000, not 1 in 1x10^17 or something. I agree it is probably bad to use expected value reasoning with probabilities as low as that.)
I think you are pointing at a real tension though. But maybe try to see it a bit from the point of view of people who think X-risk is real enough and raised enough by acceleration that acceleration is bad. It’s hardly going to escape their notice that projects at least somewhat framed as reducing X-risk often end up pushing capabilities forward. They don’t have to be raging dogmatists to worry about this happening again, and it’s reasonable for them to balance this risk against risks of echo chambers when hiring people or funding projects.
*I’m less surely merely catastrophic biorisk from human misuse is low sadly.
Why don’t we ask ChatGPT? (In case you’re wondering, I’ve read every word of this answer and I fully endorse it, though I think there are better analogies that the journalism example ChatGPT used).
Hopefully, this clarifies a possible third option (one that my original answer was pointing at).
I think there is a third option, though it’s messy and imperfect. The third option is to:
Maintain epistemic pluralism at the level of research methods and internal debate, while being selective about value alignment on key downstream behaviors.
In other words:
You hire researchers with a range of views on timelines, takeoff speeds, and economic impacts, so long as they are capable of good-faith engagement and epistemic humility.
But you also have clear social norms, incentives, and possibly contractual commitments around what counts as harmful conflict of interest — e.g., spinning out an acceleratory startup that would directly undermine the mission of your forecasting work.
This requires drawing a distinction between research belief diversity and behavioral alignment on high-stakes actions. That’s tricky! But it’s not obviously incoherent.
The key mechanism that makes this possible (if it is possible) is something like:
“We don’t need everyone to agree on the odds of doom or the value of AGI automation in theory. But we do need shared clarity on what types of action would constitute a betrayal of the mission or a dangerous misuse of privileged information.”
So you can imagine hiring someone who thinks timelines are long and AGI risk is overblown but who is fully on board with the idea that, given the stakes, forecasting institutions should err on the side of caution in their affiliations and activities.
This is analogous to how, say, journalists might disagree about political philosophy but still share norms about not taking bribes from the subjects they cover.
Caveats and Challenges:
Enforceability is hard. Noncompetes are legally dubious in many jurisdictions, and “cash in on the AI boom” is vague enough that edge cases will be messy. But social signaling and community reputation mechanisms can still do a lot of work here.
Self-selection pressure remains. Even if you say you’re open to diverse views, the perception that Epoch is “aligned with x-risk EAs” might still screen out applicants who don’t buy the core premises. So you risk de facto ideological clustering unless you actively fight against that.
Forecasting bias could still creep in via mission alignment filtering. Even if you welcome researchers with divergent beliefs, if the only people willing to comply with your behavioral norms are those who already lean toward the doomier end of the spectrum, your epistemic diversity might still collapse in practice.
Summary:
The third option is:
Hire for epistemic virtue, not belief conformity, while maintaining strict behavioral norms around acceleratory conflict of interest.
It’s not a magic solution — it requires constant maintenance, good hiring processes, and clarity about the boundaries between “intellectual disagreement” and “mission betrayal.” But I think it’s at least plausible as a way to square the circle.”
So, you want to try to lock in AI forecasters to onerous and probably illegal contracts that forbid them from founding an AI startup after leaving the forecasting organization? Who would sign such a contract? This is even worse than only hiring people who are intellectually pre-committed to certain AI forecasts. Because it goes beyond a verbal affirmation of their beliefs to actually attempting to legally force them to comply with the (putative) ethical implications of certain AI forecasts.
If the suggestion is simply promoting “social norms” against starting AI startups, well, that social norm already exists to some extent in this community, as evidenced by the response on the EA Forum. But if the norm is too weak, it won’t prevent the undesired outcome (the creation of an AI startup), and if the norm is too strong, I don’t see how it doesn’t end up selecting forecasters for intellectual conformity. Because non-conformists would not want to go along with such a norm (just like they wouldn’t want to sign a contract telling them what they can and can’t do after they leave the forecasting company).
I agree that we need to be careful about who we are empowering.
“Value alignment” is one of those terms which has different meanings to different people. For example, the top hit I got on Google for “effective altruism value alignment” was a ConcernedEAs post which may not reflect what you mean by the term. Without knowing exactly what you mean, I’d hazard a guess that some facets of value alignment are pretty relevant to mitigating this kind of risk, and other facets are not so important. Moreover, I think some of the key factors are less cognitive or philosophical than emotional or motivational (e.g., a strong attraction toward money will increase the risk of defecting, a lack of self-awareness increases the risk of motivated reasoning toward goals one has in a sense repressed).
So, I think it would be helpful for orgs to consider what elements of “value alignment” are of particular importance here, as well as what other risk or protective factors might exist outside of value alignment, and focus on those specific things.
Also, it is worrying if the optimists easily find financial opportunities that depend on them not changing their minds. Even if they are honest and have the best of intentions, the disparity in returns to optimism is epistemically toxic.
I’d like to suggest a little bit more clarity here. The phrases you use refer to some knowledge that isn’t explicitly stated here. “in light of a recently announced startup” and “three people at the same org” make sense to someone who already knows the context of what you are writing about, but it is confusing to a reader who doesn’t have the same background knowledge that you do.
Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria.
So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million).
Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers.
Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.
100 percent agree. I dont understand the entire post because I don’t know the context. I don’t think alluding to something helps, better to say it explicitly.
So, I have two possible projects for AI alignment work that I’m debating between focusing on. Am curious for input into how worthwhile they’d be to pursue or follow up on.
The first is a mechanistic interpretability project. I have previously explored things like truth probes by reproducing the Marks and Tegmark paper and extending it to test whether a cosine similarity based linear classifier works as well. It does, but not any better or worse than the difference of means method from that paper. Unlike difference of means, however, it can be extended to multi-class situations (though logistic regression can be as well). I was thinking of extending the idea to try to create an activation vector based “mind reader” that calculates the cosine similarity with various words embedded in the model’s activation space. This would, if it works, allow you to get a bag of words that the model is “thinking” about at any given time.
The second project is a less common game theoretic approach. Earlier, I created a variant of the Iterated Prisoner’s Dilemma as a simulation that includes death, asymmetric power, and aggressor reputation. I found, interestingly, that cooperative “nice” strategies banding together against aggressive “nasty” strategies produced an equilibrium where the cooperative strategies win out in the long run, generally outnumbering the aggressive ones considerably by the end. Although this simulation probably requires more analysis and testing in more complex environments, it seems to point to the idea that being consistently nice to weaker nice agents acts as a signal to more powerful nice agents and allows coordination that increases the chance of survival of all the nice agents, whereas being nasty leads to a winner-takes-all highlander situation, which from an alignment perspective could be a kind of infoblessing that an AGI or ASI could be persuaded to spare humanity for these game theoretic reasons.
I’m organizing an EA Summit in Vancouver, BC, for the fall and am looking for opportunities for our attendees to come away from the event with opportunities to look forward to. Most of our attendees will have Canadian but not US work authorization. Anyone willing to meet potential hires, mentees, research associates, funding applicants, etc., please get in touch!
epistemic status: i timeboxed the below to 30 minutes. it’s been bubbling for a while, but i haven’t spent that much time explicitly thinking about this. i figured it’d be a lot better to share half-baked thoughts than to keep it all in my head — but accordingly, i don’t expect to reflectively endorse all of these points later down the line. i think it’s probably most useful & accurate to view the below as a slice of my emotions, rather than a developed point of view. i’m not very keen on arguing about any of the points below, but if you think you could be useful toward my reflecting processes (or if you think i could be useful toward yours!), i’d prefer that you book a call to chat more over replying in the comments. i do not give you consent to quote my writing in this short-form without also including the entirety of this epistemic status.
1-3 years ago, i was a decently involved with EA (helping organize my university EA program, attending EA events, contracting with EA orgs, reading EA content, thinking through EA frames, etc).
i am now a lot less involved in EA.
e.g. i currently attend uc berkeley, and am ~uninvolved in uc berkeley EA
e.g. i haven’t attended a casual EA social in a long time, and i notice myself ughing in response to invites to explicitly-EA socials
e.g. i think through impact-maximization frames with a lot more care & wariness, and have plenty of other frames in my toolbox that i use to a greater relative degree than the EA ones
e.g. the orgs i find myself interested in working for seem to do effectively altruistic things by my lights, but seem (at closest) to be EA-community-adjacent and (at furthest) actively antagonistic to the EA community
(to be clear, i still find myself wanting to be altruistic, and wanting to be effective in that process. but i think describing my shift as merely moving a bit away from the community would be underselling the extent to which i’ve also moved a bit away from EA’s frames of thinking.)
why?
a lot of EA seems fake
the stuff — the orientations — the orgs — i’m finding it hard to straightforwardly point at, but it feels kinda easy for me to notice ex-post
there’s been an odd mix of orientations toward [ aiming at a character of transparent/open/clear/etc ] alongside [ taking actions that are strategic/instrumentally useful/best at accomplishing narrow goals… that also happen to be mildly deceptive, or lying by omission, or otherwise somewhat slimy/untrustworthy/etc ]
the thing that really gets me is the combination of an implicit (and sometimes explicit!) request for deep trust alongside a level of trust that doesn’t live up to that expectation.
it’s fine to be in a low-trust environment, and also fine to be in a high-trust environment; it’s not fine to signal one and be the other. my experience of EA has been that people have generally behaved extremely well/with high integrity and with high trust… but not quite as well & as high as what was written on the tin.
for a concrete ex (& note that i totally might be screwing up some of the details here, please don’t index too hard on the specific people/orgs involved): when i was participating in — and then organizing for — brandeis EA, it seemed like our goal was (very roughly speaking) to increase awareness of EA ideas/principles, both via increasing depth & quantity of conversation and via increasing membership. i noticed a lack of action/doing-things-in-the-world, which felt kinda annoying to me… until i became aware that the action was “organizing the group,” and that some of the organizers (and higher up the chain, people at CEA/on the Groups team/at UGAP/etc) believed that most of the impact of university groups comes from recruiting/training organizers — that the “action” i felt was missing wasn’t missing at all, it was just happening to me, not from me. i doubt there was some point where anyone said “oh, and make sure not to tell the people in the club that their value is to be a training ground for the organizers!” — but that’s sorta how it felt, both on the object-level and on the deception-level.
this sort of orientation feels decently reprensentative of the 25th percentile end of what i’m talking about.
also some confusion around ethics/how i should behave given my confusion/etc
importantly, some confusions around how i value things. it feels like looking at the world through an EA frame blinds myself to things that i actually do care about, and blinds myself to the fact that i’m blinding myself. i think it’s taken me awhile to know what that feels like, and i’ve grown to find that blinding & meta-blinding extremely distasteful, and a signal that something’s wrong.
some of this might merely be confusion about orientation, and not ethics — e.g. it might be that in some sense the right doxastic attitude is “EA,” but that the right conative attitude is somewhere closer to (e.g.) “embody your character — be kind, warm, clear-thinking, goofy, loving, wise, [insert more virtues i want to be here]. oh and do some EA on the side, timeboxed & contained, like when you’re donating your yearly pledge money.”
where now?
i’m not sure! i could imagine the pendulum swinging more in either direction, and want to avoid doing any further prediction about where it will swing for fear of that prediction interacting harmfully with a sincere process of reflection.
Thanks for sharing your experiences and reflections here — I really appreciate the thoughtfulness. I want to offer some context on the group organizer situation you described, as someone who was running the university groups program at the time.
On the strategy itself: At the time, our scalable programs were pretty focused from evidence we had seen that much of the impact came from the organizers themselves. We of course did want groups to go well more generally, but in deciding where to put our marginal resource we were focusing on group organizers. It was a fairly unintuitive strategy — and I get how that could feel misaligned or even misleading if it wasn’t clearly communicated.
On communication: We did try to be explicit about this strategy — it was featured at organizer retreats and in parts of our support programming. But we didn’t consistently communicate it across all our materials. That inconsistency was an oversight on our part. Definitely not an attempt to be deceptive — just something that didn’t land as clearly as we hoped.
Where we’re at now: We’ve since updated our approach. The current strategy is less focused narrowly on organizers and more on helping groups be great overall. That said, we still think a lot of the value often comes from a small, highly engaged core — which often includes organizers, but not exclusively.
In retrospect, I wish we’d communicated this more clearly across the board. When a strategy is unintuitive, a few clear statements in a few places often isn’t enough to make it legible. Sorry again if this felt off — I really appreciate you surfacing it.
You go over more details later and answer other questions like what caused some reactions to some EA-related things, but an interesting thing here is that you are looking for a cause of something that is not.
> it feels like looking at the world through an EA frame blinds myself to things that i actually do care about, and blinds myself to the fact that i’m blinding myself.
I can strongly relate, had the same experience. i think it’s due to christian upbringing or some kind of need for external validation. I think many people don’t experience that, so I wouldn’t say that’s an inherently EA thing, it’s more about the attitude.
Riffing out loud … I feel that there are different dynamics going on here (not necessarily in your case; more in general):
The tensions where people don’t act with as much integrity as is signalled
This is not a new issue for EA (it arises structurally despite a lot of good intentions, because of the encouragement to be strategic), and I think it just needs active cultural resistance
In terms of writing, I like Holden’s and Toby’s pushes on this; my own attempts here and here
But for this to go well, I think it’s not enough to have some essays on reading lists; instead I hope that people try to practice good orientation here at lots of different scales, and socially encourage others to
The meta-blinding
I feel like I haven’t read much on this, but it rings true as a dynamic to be wary of! Where I take the heart of the issue to be that EA presents a strong frame about what “good” means, and then encourages people to engage in ways that make aspects of their thinking subservient to that frame
As someone put it to me, “EA has lost the mandate of heaven”
I think EA used to be (in some circles) the obvious default place for the thoughtful people who cared a lot to gather and collaborate
I think that some good fraction of its value came from performing this role?
Partially as a result of 1 and 2, people are disassociating with EA; and this further reduces the pull to associate
I can’t speak to how strong this effect is overall, but I think the directionality is clear
I don’t know if it’s accessible (and I don’t think I’m well positioned to try), but I still feel a lot of love for the core of EA, and would be excited if people could navigate it to a place where it regained the mandate of heaven.
Most of the problems you mention seem to be about the specific current EA community, as opposed to the main values of “doing a lot of good” and “being smart about doing so.”
Personally, I’m excited for certain altruistic and smart people to leave the EA community, as it suits them, and do good work elsewhere. I’m sure that being part of the community is limiting to certain people, especially if they can find other great communities.
That said, I of course hope you can find ways for the key values of “doing good in the world” and similar to work for you.
I think it might be cool if an AI Safety research organization ran a copy of an open model or something and I could pay them a subscription to use it. That way I know my LLM subscription money is going to good AI stuff and not towards the stuff that AI companies that I don’t think I like or want more of on net.
Idk, existing independent orgs might not be the best place to do this bc it might “damn them” or “corrupt them” over time. Like, this could lead them to “selling out” in a variety of ways you might conceive of that.
Still, I guess I am saying that to the extent anyone is going to actually “make money” off of my LLM usage subscriptions, it would be awesome if it were just a cool independent AIS lab I personally liked or similar.
(I don’t really know the margins and unit economics which seems like an important part of this pitch lol).
Like, if “GoodGuy AIS Lab” sets up a little website and inference server (running Qwen or Llama or whatever) then I could pay them the $15-25 a month I may have otherwise paid to an AI company. The selling point would be that less “moral hazard” is better vibes, but probably only some people would care about this at all and it would be a small thing. But also, it’s hardly like a felt sense of moral hazard around AI is a terribly niche issue.
This isn’t the “final form” of this I have in mind necessarily; I enjoy picking at ideas in the space of “what would a good guy AGI project do” or “how can you do neglected AIS / ‘AI go well’ research in a for-profit way”.
I also like the idea of an explicitly fast follower project for AI capabilities. Like, accelerate safety/security relevant stuff and stay comfortably middle of the pack on everything else. I think improving GUIs is probably fair game too, but not once it starts to shade into scaffolding I think? I wouldn’t know all of the right lines to draw here, but I really like this vibe.
This might not work well if you expect gaps to widen as RSI becomes a more important input. I would argue that seems too galaxy brained given that, as of writing, we do live in a world with a lot of mediocre AI companies that I believe can all provide products of ~comparable quality.
It is also just kind of a bet that in practice it is probably going to remain a lot less expensive to stay a little behind the frontier than to be at the frontier. And that, in practice, it may continue to not matter in a lot of cases.
fwiw I think you shouldn’t worry about paying $20/month to an evil company to improve your productivity, and if you want to offset it I think a $10/year donation to LTFF would more than suffice.
Can you say more on why you think a 1:24 ratio is the right one (as opposed to lower or higher ratios)? And how might this ratio differ for people who have different beliefs than you, for example about xrisk, LTFF, or the evilness of these companies?
I haven’t really thought about it and I’m not going to. If I wanted to be more precise, I’d assume that a $20 subscription is equivalent (to a company) to finding a $20 bill on the ground, assume that an ε% increase in spending on safety cancels out an ε% increase in spending on capabilities (or think about it and pick a different ratio), and look at money currently spent on safety vs capabilities. I don’t think P(doom) or company-evilness is a big crux.
Alternative idea: AI companies should have a little checkbox saying “Please use 100% of the revenue from my subscription to fund safety research only.” This avoids some of the problems with your idea and also introduces some new problems.
I think there is a non-infinitesimal chance that Anthropic would actually implement this.
Ya, maybe. This concern/way of thinking just seems kind of niche. Probably only a very small demographic who overlaps with me here. So I guess I wouldn’t expect it to be a consequential amount of money to eg. Anthropic or OpenAI.
That check box would be really cool though. It might ease friction / dissonance for people who buy into high p(doom) or relatively non-accelerationist perspectives. My views are not representative of anyone, but me, but a checkbox like that would be a killer feature for me and certainly win my $20/mo :) . And maybe, y’know, all 100 people or whatever who would care and see it that way.
Mini Forum update: Draft comments, and polls in comments
Draft comments
You can now save comments as permanent drafts:
After saving, the draft will appear for you to edit:
1. In-place if it’s a reply to another comment (as above)
2. In a “Draft comments” section under the comment box on the post
3. In the drafts section of your profile
The reasons we think this will be useful:
For writing long, substantive comments (and quick takes!). We think these are the some of the most valuable comments on the forum, and want to encourage more of them
For starting a comment on mobile and then later continuing on desktop
To lower the barrier to starting writing a comment, since you know you can always throw it in drafts and then never look at it again
Polls in comments
We recently added the ability to put polls in posts, and this was fairly well received, so we’re adding it to comments (… and quick takes!) as well.
You can add a poll from the toolbar, you just need to highlight a bit of text to make the toolbar appear:
A summary of my current views on moral theory and the value of AI
I am essentially a preference utilitarian and an illusionist regarding consciousness. This combination of views leads me to conclude that future AIs will very likely have moral value if they develop into complex agents capable of long-term planning, and are embedded within the real world. I think such AIs would have value even if their preferences look bizarre or meaningless to humans, as what matters to me is not the content of their preferences but rather the complexity and nature of their minds.
When deciding whether to attribute moral patienthood to something, my focus lies primarily on observable traits, cognitive sophistication, and most importantly, the presence of clear open-ended goal-directed behavior, rather than on speculative or less observable notions of AI welfare, about which I am more skeptical. As a rough approximation, my moral theory aligns fairly well with what is implicitly proposed by modern economists, who talk about revealed preferences and consumer welfare.
Like most preference utilitarians, I believe that value is ultimately subjective: loosely speaking, nothing has inherent value except insofar as it reflects a state of affairs that aligns with someone’s preferences. As a consequence, I am comfortable, at least in principle, with a wide variety of possible value systems and future outcomes. This means that I think a universe made of only paperclips could have value, but only if that’s what preference-having beings wanted the universe to be made out of.
To be clear, I also think existing people have value too, so this isn’t an argument for blind successionism. Also, it would be dishonest not to admit that I am also selfish to a significant degree (along with almost everyone else on Earth). What I have just described simply reflects my broad moral intuitions about what has value in our world from an impartial point of view, not a prescription that we should tile the universe with paperclips. Since humans and animals are currently the main preference-having beings in the world, at the moment I care most about fulfilling what they want the world to be like.
I’m relatively confident in these views, with the caveat that much of what I just expressed concerns morality, rather than epistemic beliefs about the world. I’m not a moral realist, so I am not quite sure how to parse my “confidence” in moral views.
From an antirealist perspective, at least on the ‘idealizing subjectivism’ form of antirealism, moral uncertainty can be understood as uncertainty about the result of an idealization process. Under this view, there exists some function that takes your current, naive values as input and produces idealized values as output—and your moral uncertainty is uncertainty about the output.
I agree that this sort of preference utilitarianism leads you to thinking that long run control by an AI which just wants paperclips could be some (substantial) amount good, but I think you’d still have strong preferences over different worlds.[1] The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities. (And if you agreed with this claim about large variation, then I don’t think you would focus on the fact that the paperclipper world is some small amount good as this wouldn’t be an important consideration—at least insofar as you don’t also expect that worlds where humans etc retain control are similarly a tiny amount good for similar reasons.)
The main reasons preference utilitarianism is more picky:
Preferences in the multiverse: Insofar as you put weight on the preferences of beings outside our lightcone (beings in the broader spatially infinte universe, Everett branches, the broader mathematical multiverse to the extent you put weight on this), then the preferences of these beings will sometimes care about what happens in our lightcone and this could easily dominate (as they are vastly more numerious and many might care about things independent of “distance”). In the world with the successful paperclipper, just as many preferences aren’t being fulfilled. You’d strongly prefer optimization to satisfy as many preferences as possible (weighted as you end up thinking is best).
Instrumentally constructed AIs with unsatisfied preferences: If future AIs don’t care at all about preference utilitarianism, they might instrumentally build other AIs who’s preferences aren’t fulfilled. As an extreme example, it might be that the best strategy for a paper clipper is to construct AIs which have very different preferences and are enslaved. Even if you don’t care about ensuring beings come into existence who’s preference are satisified, you might still be unhappy about creating huge numbers of beings who’s preferences aren’t satisfied. You could even end up in a world where (nearly) all currently existing AIs are instrumental and have preferences which are either unfulfilled or only partially fulfilled (a earlier AI initiated a system that perpetuates this, but this earlier AI no longer exists as it doesn’t care terminally about self-preservation and the system it built is more efficient than it).
AI inequality: It might be the case that the vast majority of AIs have there preferences unsatisfied despite some AIs succeeding at achieving their preference. E.g., suppose all AIs are replicators which want to spawn as many copies as possible. The vast majority of these replicator AI are operating at subsistence and so can’t replicate making their preferences totally unsatisfied. This could also happen as a result of any other preference that involves constructing minds that end up having preferences.
Weights over numbers of beings and how satisfied they are: It’s possible that in a paperclipper world, there are really a tiny number of intelligent beings because almost all self-replication and paperclip construction can be automated with very dumb/weak systems and you only occasionally need to consult something smarter than a honeybee. AIs could also vary in how much they are satisfied or how “big” their preferences are.
I think the only view which recovers indifference is something like “as long as stuff gets used and someone wanted this at some point, that’s just as good”. (This view also doesn’t actually care about stuff getting used, because there is someone existing who’d prefer the universe stays natural and/or you don’t mess with aliens.) I don’t think you buy this view?
To be clear, it’s not immediately obvious whether a preference utilitarian view like the one you’re talking about favors human control over AIs. It certainly favors control by that exact flavor of preference utilitarian view (so that you end up satisfying people across the (multi-/uni-)verse with the correct weighting). I’d guess it favors human control for broadly similar reasons to why I think more experience-focused utilitarian views also favor human control if that view is in a human.
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
(This comment is copied over from LWresponding to a copy of Matthew’s comment there.)
On my best guess moral views, I think there is goodness in the paper clipper universe but this goodness (which isn’t from (acausal) trade) is very small relative to how good the universe can plausibly get. So, this just isn’t an important consideration but I certainly agree there is some value here.
The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities.
I don’t think I agree with the strong version of the indifference view that you’re describing here. However, I probably do agree with a weaker version. In the weaker version that I largely agree with, our profound uncertainty about the long-term future means that, although different possible futures could indeed be extremely different in terms of their value, our limited ability to accurately predict or forecast outcomes so far ahead implies that, in practice, we shouldn’t overly emphasize these differences when making almost all ordinary decisions.
This doesn’t mean I think we should completely ignore the considerations you mentioned in your comment, but it does mean that I don’t tend to find those considerations particularly salient when deciding whether to work on certain types of AI research and development.
This reasoning is similar to why I try to be kind to people around me: while it’s theoretically possible that some galaxy-brained argument might exist showing that being extremely rude to people around me could ultimately lead to far better long-term outcomes that dramatically outweigh the short-term harm, in practice, it’s too difficult to reliably evaluate such abstract and distant possibilities. Therefore, I find it more practical to focus on immediate, clear, and direct considerations, like the straightforward fact that being kind is beneficial to the people I’m interacting with.
This puts me perhaps closest to the position you identified in the last paragraph:
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
Here’s an analogy that could help clarify my view: suppose we were talking about the risks of speeding up research into human genetic engineering or human cloning. In that case, I would still seriously consider speculative moral risks arising from the technology. For instance, I think it’s possible that genetically enhanced humans could coordinate to oppress or even eliminate natural unmodified humans, perhaps similar to the situation depicted in the movie GATTACA. Such scenarios could potentially have enormous long-term implications under my moral framework, even if it’s not immediately obvious what those implications might actually be.
However, even though these speculative risks are plausible and seem important to take into account, I’m hesitant to prioritize their (arguably very speculative) impacts above more practical and direct considerations when deciding whether to pursue such technologies. This is true even though it’s highly plausible that the long-run implications are, in some sense, more significant than the direct considerations that are easier to forecast.
Put more concretely, if someone argued that accelerating genetically engineering humans might negatively affect the long-term utilitarian moral value we derive from cosmic resources as a result of some indirect far-out consideration, I would likely find that argument far less compelling than if they informed me of more immediate, clear, and predictable effects of the research.
In general, I’m very cautious about relying heavily on indirect, abstract reasoning when deciding what actions we should take or what careers we should pursue. Instead, I prefer straightforward considerations that are harder to fool oneself about.
Gotcha, so if I understand correctly, you’re more so leaning on uncertainty for being mostly indifferent rather than on thinking you’d actually be indifferent if you understood exactly what would happen in the long run. This makes sense.
(I have a different perspective on decision making that has high stakes under uncertainty and I don’t personally feel sympathetic to this sort of cluelessness perspective as a heuristic in most cases or as a terminal moral view. See also the CLR work on cluelessness. Separately, my intuitions around cluelessness imply that, to the extent I put weight on this, when I’m clueless, I get more worried about unilateralists curse and downside which you don’t seem to put much weight on, though just rounding all kinda-uncertain long run effects to zero isn’t a crazy perspective.)
On the galaxy brained pont: I’m sympathetic to arguments against being too galaxy brained, so I see where you’re coming from there, but from my perspective, I was already responding to an argument which is one galaxy brain level deep.
I think the broader argument about AI takeover being bad from a longtermist perspective is not galaxy brained and the specialization of this argument to your flavor of preference utilitarianism also isn’t galaxy brained: you have some specific moral views (in this case about prefence utilitarianism) and all else equal you’d expect humans to share these moral views more than AIs that end up taking over despite their developers not wanting the AI to take over. So (all else equal) this makes AI takeover look bad, because if beings share your preferences, then more good stuff will happen.
Then you made a somewhat galaxy brained response to this about how you don’t actually care about shared preferences due to preference utilitarianism (because after all, you’re fine with any preferences right?). But, I don’t think this objection holds because there are a number of (somewhat galaxy brained) reasons why specifically optimizing for preference utilitarianism and related things may greatly outperform control by beings with arbitrary preferences.
From my perspective the argument looks sort of like:
Non galaxy brained argument for AI takeover being bad
Somewhat galaxy brained rebuttal by you about preference utilitarianism meaning you don’t actually care about this sort of preference similarity argument case for avoiding nonconsensual AI takeover
My somewhat galaxy brained response, but which is only galaxy brained substantially because it’s responding to a galaxy brained perspective abiut details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments. But, I think the preference utilitarian argument you’re giving is already quite galaxy brained and sensitive to details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments.
As am I. At least when it comes to the important action-relevant question of whether to work on AI development, in the final analysis, I’d probably simplify my reasoning to something like, “Accelerating general-purpose technology seems good because it improves people’s lives.” This perspective roughly guides my moral views on not just AI, but also human genetic engineering, human cloning, and most other potentially transformative technologies.
I mention my views on preference utilitarianism mainly to explain why I don’t particularly value preserving humanity as a species beyond preserving the individual humans who are alive now. I’m not mentioning it to commit to any form of galaxy-brained argument that I think makes acceleration look great for the long-term. In practice, the key reason I support accelerating most technology, including AI, is simply the belief that doing so would be directly beneficial to people who exist or who will exist in the near-term.
And to be clear, we could separately discuss what effect this reasoning has on the more abstract question of whether AI takeover is bad or good in expectation, but here I’m focusing just on the most action-relevant point that seems salient to me, which is whether I should choose to work on AI development based on these considerations.
Having a savings target seems important. (Not financial advice.)
I sometimes hear people in/around EA rule out taking jobs due to low salaries (sometimes implicitly, sometimes a little embarrassedly). Of course, it’s perfectly understandable not to want to take a significant drop in your consumption. But in theory, people with high salaries could be saving up so they can take high-impact, low-paying jobs in the future; it just seems like, by default, this doesn’t happen. I think it’s worth thinking about how to set yourself up to be able to do it if you do find yourself in such a situation; you might find it harder than you expect.
(Personal digression: I also notice my own brain paying a lot more attention to my personal finances than I think is justified. Maybe some of this traces back to some kind of trauma response to being unemployed for a very stressful ~6 months after graduating: I just always could be a little more financially secure. A couple weeks ago, while meditating, it occurred to me that my brain is probably reacting to not knowing how I’m doing relative to my goal, because 1) I didn’t actually know what my goal is, and 2) I didn’t really have a sense of what I was spending each month. In IFS terms, I think the “social and physical security” part of my brain wasn’t trusting that the rest of my brain was competently handling the situation.)
So, I think people in general would benefit from having an explicit target: once I have X in savings, I can feel financially secure. This probably means explicitly tracking your expenses, both now and in a “making some reasonable, not-that-painful cuts” budget, and gaming out the most likely scenarios where you’d need to use a large amount of your savings, beyond the classic 3 or 6 months of expenses in an emergency fund. For people motivated by EA principles, the most likely scenarios might be for impact reasons: maybe you take a public-sector job that pays half your current salary for three years, or maybe you’d need to self-fund a new project for a year; how much would it cost to maintain your current level of spending, or a not-that-painful budget-cut version? Then you could target that amount (in addition to the emergency fund, so you’d still have that at the end of the period); once you have that, you could feel more secure/spend less brain space on money, donate more of your income, and be ready to jump on a high-impact, low-paying opportunity.
Of course, you can more easily hit that target if you can bring down your expenses—you both lower the required amount in savings and you save more each month. So, maybe some readers would also benefit from cutting back a bit, though I think most EAs are pretty thrifty already.
(This is hardly novel—Ben Todd was publishing related stuff on 80k in 2015. But I guess I had to rediscover it, so posting here in case anyone else could use the refresher.)
One dynamic worth considering here is that a person with near-typical longtermist views about the future also likely believes that there are a large number of salient risks in the future, including sub-extinction AI catastrophes, pandemics, war with China, authoritarian takeover, “white collar bloodbath” etc.
It can be very psychologically hard to spend all day thinking about these risks without also internalizing that these risks may very well affect oneself and one’s family, which in turn implies that typical financial advice and financial lifecycle planning are not well-tailored to the futures that longtermists think we might face. For example, the typical suggestion to save around 6 months in an emergency fund makes sense for the economy of the last hundred years, but if there is widespread white collar automation, what are the odds that there will be job disruption lasting longer than six months? If you think that your country may experience authoritarian takeover, might you want to save enough to buy residence elsewhere?
None of this excuses not making financial sacrifices. But I do think it’s hard to simultaneously think “the future is really risky” and “there is a very achievable (e.g., <<$1M) amount of savings that would make me very secure.”
That’s a fair point, but a lot of the scenarios you describe would mean rapid economic growth and equities going up like crazy. The expectation of my net worth in 40 years on my actual views is way, way higher than it would be if I thought AI was totally fake and the world would look basically the same in 2065. That doesn’t mean you shouldn’t save up though (higher yields are actually a reason to save, not a reason to refrain from saving).
Thanks for this, Trevor.
For what it’s worth: a lot of people think emergency fund means cash in a normal savings account, but this is not a good approach. Instead, buy bonds or money market funds with your emergency savings, or put them in a specialized high yield savings account (which to repeat is likely NOT a savings account that you get by default from your bank).
Or just put the money in equities in a liquid brokerage account.
Relevant: I’ve been having some discussions with (non-EA) friends on why they don’t donate more.
Some argue that they want enough money to take care of themselves in the extreme cases of medical problems and political disasters, but still with decent bay area lifestyles. I think the implication is that they will wait until they have around $10 Million or so to begin thinking of donations. And if they have kids, maybe $30 Million.
I obviously find this very frustrating, but also interesting.
Of course, I’d expect that if they would make more money, their bar would increase. Perhaps if they made $1M/year they would get used to a different lifestyle, then assume that they needed $50M/$150M accordingly.
It feels like, “I don’t have an obligation or reason to be an altruistic person until I’m in the top 0.01% of wealthy individuals”
A friend recently shared this reason for not giving (fear of an expensive medical crisis). I think if a good resource existed with the base rates of events that can cause financial hardship and solutions for reducing their likelihood (e.g., long term care insurance), this might help some people feel more comfortable with giving.
I passed this along to someone at GWWC and they said this is on their list of ideas to write about.
The biggest risk is, I believe, disability resulting in long-term income loss. My US-centric understanding is that private disability insurance that is both portable (not bound to a specific employer) and broad (e.g., covers any condition that causes a significant loss in earnings capacity) can be difficult to find if you’re not in particularly excellent health.
Basefund was working on the broader issue of donors who subsequently experience financial hardship, although I haven’t heard much about them recently. My assumption was that limitations imposed by the project’s non-profit status would preclude the Basefund model from working for people considering larger donations but worried about needing them back down the road if a crisis happens.
Meeting those needs for those unable to access general-purpose private disability insurance would probably require some sort of model under which the donor paid an insurance premium and reduced their would-be donation accordingly. If there were enough interest, I could see one of the big disability insurance shops underwriting a product like that. Probably wouldn’t be cheap, though. Of course, if someone were willing to financially guarantee claims payment, thus removing any financial risk from the policy administrator, that would make the program more attractive for a would-be administrator.
Yeah this was what I found too when I looked into private US long-term disability insurance a while back. My recollection was:
there’s a surprising number of health exclusions, even for things that happened in your childhood or adolescence
it’s a lot more expensive in percentage terms if you’re at a lower income
many disability cases are ambiguous so the insurance company may have you jump through a lot of hoops and paperwork (a strange role-reversal in which the bureaucracy wants to affirm your agency)
I had the impression that it was a great product for some people, meaning those with high income, clean medical history, and a support network to wrestle with the insurance company. But at the time I looked into it, it didn’t seem like a great option for me even given my risk-adverse preferences.
Planning to look again soon so could change my mind.
I like the thought, but would flag that I’d probably recommend them doing some user interviews or such to really dig at what, if anything, might actually convince these people.
I’d expect that strong marketing people would be good here.
Typically the first few reasons people give for why they aren’t more charitable are all BS, and these sorts of people aren’t the type willing to read many counter-arguments. It can still be good to provide just a bit more evidence on the other side, but you have to go in with the right (low) expectations.
That said, I do think that solutions (like insurance) are a pretty good thing to consider, even to those not making these excuses.
I worry that the pro-AI/slow-AI/stop-AI has the salient characteristics of a tribal dividing line that could tear EA apart:
“I want to accelerate AI” vs “I want to decelerate AI” is a big, clear line in the sand that allows for a lot clearer signaling of one’s tribal identity than something more universally agreeable like “malaria is bad”
Up to the point where AI either kills us or doesn’t, there’s basically in principle no way to verify that one side or the other is “right”, which means everyone can keep arguing about it forever
The discourse around it is more hostile/less-trust-presuming than the typical EA discussion, which tends to be collegial (to a fault, some might argue)
You might think it’s worth having this civl war to clarify what EA is about. I don’t. I would like for us to get on a different track.
This thought prompted by discussion around one of Matthew_Barnett’s quick takes.
For what it’s worth, I really don’t think many EAs are in the AI accelerationist camp at least. Matthew Marnett seems fairly unusual to me here.
The EA Forum moderation team is going to experiment a bit with how we categorize posts. Currently there is a low bar for a Forum post being categorized as “Frontpage” after it’s approved. In comparison, LessWrong is much more opinionated about the content they allow, especially from new users. We’re considering moving in that direction, in order to maintain a higher percentage of valuable content on our Frontpage.
To start, we’re going to allow moderators to move posts from new users from “Frontpage” to “Personal blog”[1], at their discretion, but starting conservatively. We’ll keep an eye on this and, depending on how this goes, we may consider taking further steps such as using the “rejected content” feature (we don’t currently have that on the EA Forum).
Feel free to reply here if you have any questions or feedback.
If you’d like to make sure you see “Personal blog” posts in your Frontpage, you can customize your feed.
I would be a bit hesitant to follow Less Wrong’s lead on this too closely. I find the EA Forum, for lack of a better term, feels much friendlier than Less Wrong, and I wouldn’t want that sense of friendliness to go away.
I was hesitant on this one, but I looked at last month’s posts and saw a lot of them with few votes and little engagement, which made me more sympathetic to the concern about the frontpage. Maybe it’s a viable idea with some safeguards:
I think a limitation to application against “new users” mitigates some of the downside risk as long as that definition is operationalized well. In particular, people use throwaways to post criticisms, and the newness of an account should not necessarily establish a “new user” for purpose of this policy. I think mods are capable of figuring out if a throwaway post shows enough EA knowledge, but they should err on the side of letting throwaway criticism posts through to the frontpage. For certain critical posts, the decision to demote should be affirmed by someone independent of CEA.
The risk of being demoted to Personal Blog could be a significant demotivator for people investing the time to write posts.
You could mitigate this by being very clear and objective about what will trigger classification and then applying the stated criteria in a conservative fashion. But based on your stated goals, I think you may have a hard time defining the boundaries with enough objective precision.
You could also invite people to submit 1-2 paragraph pitches if they were concerned about demotion, and establish a safe harbor for anyone who got a thumbs-up on their pitch. But that approach risks being a little too censorious for my tastes, as the likely outcome of a decision not to pre-clear is that the author never completes their idea into a post.
If something is getting any meaningful number of upvotes or comments after being consigned to Personal Blog as lower-quality content, you probably made a mistake that should be reverted ASAP. (When thinking what the thresholds for reversal should be, the much lower visibility of Personal Blogs should carry significant weight.)
I would be hesitant to reject more content—people selecting to show Personal Blog posts presumably know what they are getting themselves into and have implicitly decided to opt out of your filtering efforts.
Thanks Jason! Luckily, which posts get categorized as “Personal blog” is public information (I think it’s easiest to skim via the All posts page), so I would be happy for people to check our work and contact us if you think we’ve made a mistake. If you take a look now, you’ll see that very few posts have been moved there so far, and I don’t expect the rate to change very much going forward.
2. My guess is that the vast majority of new users don’t even know what “Personal blog” means, so I’m not sure how demotivating it will be to them. As I mentioned in another comment, my guess is that getting downvoted is more demotivating for new users.
3. I think that’s a good idea, and I’d be happy for users to flag these as mistakes to the moderators, or just DM me directly and I can return a post to the Frontpage if I agree (I have the final say as head moderator).
I would be nervous about discouraging new users. There’s a high bar for what gets upvoted here on the forum. Especially for VERY new users I’d be nervous about not giving the opportunity for their post to be on the frontpage—maybe it can depend on if you think the post is decent or not?
Ah yeah sorry I was unclear! I basically meant what you said when I said “at their discretion, but starting conservatively” — so we are starting to take “quality” into account when deciding what stays in the Frontpage, because our readers’ time is valuable. You can kind of think of it like: if the mod would have downvoted a post from a new user, the mod can instead decide to move it to “Personal blog”. I think it’s possible that this is actually less discouraging to new users than getting downvoted, since it’s like you’re being moved to a category with different standards. You can check our work by looking at what gets categorized as “Personal blog” via the All posts page. :)
I expect this will affect only a small proportion of new users.
Health Progress Hub is Looking for Contributors from Low- and Middle-Income Countries!
Health Progress Hub (HPH), an initiative by GPRG aims to accelerate global health progress by building infrastructure that helps high-impact NGOs identify and deploy local talent more efficiently. We are looking for contributors from Low- and Middle-Income Countries who are motivated to accelerate global health progress using their local insights and networks.
You’d support both HPH and our partner organizations through research, recruitment assistance, stakeholder mapping, and program support. We’ll match tasks to your strengths and interests, and what HPH and our partners need.
You’ll gain practical experience working on real global health challenges, develop skills in areas such as research, operations and strategy, and connect with others working to tackle critical health challenges. With your permission, we can include you in our talent database, enabling global health organizations to consider you for relevant volunteer or paid positions.
You can find more information and apply through our form (~10 minutes): Application Form
Know someone who should apply? Please send them this or nominate them (~5-10 minutes): Nomination form
Questions? Want to volunteer or provide guidance from a high-income country? Please email us at ren@globalprg.org
If you’re an organization interested in partnering with us to access local talent and expertise, you can reach out to berke@globalprg.org
Productive conference meetup format for 5-15 people in 30-60 minutes
I ran an impromptu meetup at a conference this weekend, where 2 of the ~8 attendees told me that they found this an unusually useful/productive format and encouraged me to share it as an EA Forum shortform. So here I am, obliging them:
Intros… but actually useful
Name
Brief background or interest in the topic
1 thing you could possibly help others in this group with
1 thing you hope others in this group could help you with
NOTE: I will ask you to act on these imminently so you need to pay attention, take notes etc
[Facilitator starts and demonstrates by example]
Round of any quick wins: anything you heard where someone asked for some help and you think you can help quickly, e.g. a resource, idea, offer? Say so now!
Round of quick requests: Anything where anyone would like to arrange a 1:1 later with someone else here, or request anything else?
If 15+ minutes remaining:
Brainstorm whole-group discussion topics for the remaining time. Quickly gather in 1-5 topic ideas in less than 5 minutes.
Show of hands voting for each of the proposed topics.
Discuss most popular topics for 8-15 minutes each. (It might just be one topic)
If less than 15 minutes remaining:
Quickly pick one topic for group discussion yourself.
Or just finish early? People can stay and chat if they like.
Note: the facilitator needs to actually facilitate, including cutting off lengthy intros or any discussions that get started during the ‘quick wins’ and ‘quick requests’ rounds. If you have a group over 10 you might need to divide into subgroups for the discussion part.
I think we had around 3 quick wins, 3 quick requests, and briefly discussed 2 topics in our 45 minute session.
This is cool—thanks for sharing!
An updated draft of a model of consciousness made based on information and complexity theory
This paper proposes a formal, information-theoretic model of consciousness in which awareness is defined as the alignment between an observer’s beliefs and the objective description of an object. Consciousness is quantified as the ratio between the complexity of true beliefs and the complexity of the full inherent description of the object. The model introduces three distinct epistemic states: Consciousness (true beliefs), Schizo-Consciousness (false beliefs), and Unconsciousness (absence of belief). Object descriptions are expressed as structured sets of object–quality (O–Q) statements, and belief dynamics are governed by internal belief-updating functions (brain codes) and attentional codes that determine which beliefs are foregrounded at any given time. Crucially, the model treats internal states—such as emotions, memories, and thoughts—as objects with describable properties, allowing it to account for self-awareness, misbelief about oneself, and psychological distortion. This framework enables a unified treatment of external and internal contents of consciousness, supports the simulation of evolving belief structures, and provides a tool for comparative cognition, mental health modeling, and epistemic alignment in artificial agents.
The link to the paper:
https://drive.google.com/file/d/12eZWsXgPIJhd9c7KrsfW9EmUk0gER4mu/view?usp=drivesdk
Any hints / info on what to look for in a mentor / how to find one? (Specifically for community building.)
I’m starting as a national group director in september, and among my focus topics for EAG London are group-focused things like “figuring out pointers / out of the box ideas / well-working ideas we haven’t tried yet for our future strategy”, but also trying to find a mentor.
These were some thoughts I came up with when thinking about this yesterday:
- I’m not looking for accountability or day to day support. I get that from inside our local group.
- I am looking for someone that can take a description of the higher level situation and see different things than I can. Either due to perspective differences or being more experienced and skilled.
- Also someone who can give me useful input on what skills to focus on building in the medium term.
- Someone whose skills and experience I trust, and when they say “plan looks good” it gives me confidence, when I’m trying to do something that feels to me like a long shot / weird / difficult plan and I specifically need validation that it makes sense.
On a concrete level I’m looking for someone to have ~monthly 1-1 calls with and some asynchronous communication. Not about common day to day stuff but larger calls.
There is going to be a Netflix series on SBF titled The Altruists, so EA will be back in the media. I don’t know how EA will be portrayed in the show, but regardless, now is a great time to improve EA communications. More specifically, being a lot more loud about historical and current EA wins — we just don’t talk about them enough!
A snippet from Netflix’s official announcement post:
The best one-stop summary I know of is still Scott Alexander’s In Continued Defense Of Effective Altruism from late 2023. I’m curious to see if anyone has an updated take, if not I’ll keep steering folks there:
According to the Guardian there is also one movie, another series, and several documentaries potentially in the works
Oooh, I’d better get to work on my SBF musical 😂
Eh eh eh.
https://suno.com/song/be4cc4e2-15b2-42e7-b87f-86e367c0673d
I don’t think this is necessarily related, but it should be noted that XTR is also currently making a documentary about the Zizians.
Oh man… this really make it sound like It’s So Over
I think this is very hard to predict, and I just feel uncertain. Public perception seems to be really fickle, and I could imagine each show being either:
Negative towards SBF/Caroline and negative towards EA (it’s all tech bros feeling superior, e.g. here)
Negative towards SBF/Caroline and positive towards EA (they used ethics as “mostly a front” and only cared about winning)
Positive towards SBF/Caroline, and negative towards EA (they started as idealistic altruists and got corrupted by the toxic EA ideology)
Positive towards SBF/Caroline, and positive towards EA (e.g. making John J. Ray III and Sullivan and Cromwell the villains)
And for each of these 4, it’s not clear what the impact on EA would be, e.g. I think “The Wolf of Wall Street” probably got many people excited about working in finance.
I predict the documentaries will be negative towards EA, as was the vast majority of media on EA in 2023 and 2024, and I think documentaries tend to be mostly negative about their subject, but I’m much more unsure about the fiction series
If it’s anything like the book Going Infinite by Michael Lewis, it’ll probably be a relatively sympathetic portrayal. My initial impression from the announcement post is that it at least sounds like the angle they’re going for is misguided haphazard idealists (which Lewis also did), rather than mere criminal masterminds.
Graham Moore is best known for the Imitation Game, the movie about Alan Turing, and his portrayal was a classic “misunderstood genius angle”. If he brings that kind of energy to a movie about SBF, we can hope he shows EA in a positive light as well.
Another possible comparison you could make would be with the movie The Social Network, which was inspired by real life, but took a lot of liberties and
interestingly made Dustin Moskovitz (who funds a lot of EA stuff through Open Philanthropy) a very sympathetic character.(Edit: Confused him and Eduardo Saverin).I also think there’s lots of precedence for Hollywood to generally make dramas and movies that are sympathetic to apparent “villains” and “antiheroes”. Mindless caricatures are less interesting to watch than nuanced portrayals of complex characters with human motivations. The good fiction at least tries to have that kind of depth.
So, I’m cautiously optimistic. When you actually dive deeper into the story of SBF, you realize he’s more complex than yet another crypto grifter, and I think a nuanced portrayal could actually help EA recover a bit from the narrative that we’re just a TESCREAL techbro cult.
I do also agree in general that we should be louder about the good that EA has actually done in the world.
Minor thing, but I don’t remember Dustin being portrayed much in The Social Network? Do you mean Eduardo Saverin?
Yeah, he is. He was played by Joseph Mazzello.
Oh, woops, I totally confused the two. My bad.
I want to clarify, for the record, that although I disagree with most members of the EA community on whether we should accelerate or slow down AI development, I still consider myself an effective altruist in the senses that matter. This is because I continue to value and support most EA principles, such as using evidence and reason to improve the world, prioritizing issues based on their scope, not discriminating against foreigners, and antispeciesism.
I think it’s unfortunate that disagreements about AI acceleration often trigger such strong backlash within the community. It appears that advocating for slowing AI development has become a “sacred” value that unites much of the community more strongly than other EA values do. Despite hinging on many uncertain and IMO questionable empirical assumptions, the idea that we should decelerate AI development is now sometimes treated as central to the EA identity in many (albeit not all) EA circles.
As a little bit of evidence for this, I have been publicly labeled a “sellout and traitor” on X by a prominent member of the EA community simply because I cofounded an AI startup. This is hardly an appropriate reaction to what I perceive as a measured, academic disagreement occurring within the context of mainstream cultural debates. Such reactions frankly resemble the behavior of a cult, rather than an evidence-based movement—something I personally did not observe nearly as much in the EA community ten years ago.
Thanks for writing on the forum here—I think its brave of you to comment where there will obviously be lots of pushback. I’ve got a question relating to the new company and EA assignment. You may well have answered this somewhere else, if that’s the case please point me in that direction. I’m a Global Health guy mostly, so am not super deep in AI understanding, so this question may be Naive.
Question: If we frame EA along the (great new website) lines of “Find the best ways to help others”, how are you, through your new startup doing this? Is the for the purpose of earning to Give money away? Or do you think the direct work the startup will do has a high EV for doing lots of good? Feel free to define EA along different lines if you like!
What do you think would constitute being a “sellout and traitor”?
In the case at hand, Matthew would have had to at some point represent himself as supporting slowing down or stopping AI progress. For at least the past 2.5 years, he has been arguing against doing that in extreme depth on the public internet. So I don’t really see how you can interpret him starting a company that aims to speed up AI as inconsistent with his publicly stated views, which seems like a necessary condition for him to be a “traitor”. If Matthew had previously claimed to be a pause AI guy, then I think it would be more reasonable for other adherents of that view to call him a “traitor.” I don’t think that’s raising the definitional bar so high that no will ever meet it—it seems like a very basic standard.
I have no idea how to interpret “sellout” in this context, as I have mostly heard that term used for such situations as rappers making washing machine commercials. Insofar as I am familiar with that word, it seems obviously inapplicable.
I’m obviously not Matthew, but the OED defines them like so:
sell-out: “a betrayal of one’s principles for reasons of expedience”
traitor: “a person who betrays [be gravely disloyal to] someone or something, such as a friend, cause, or principle”
Unless he is lying about what he believes—which seems unlikely—Matthew is not a sell-out, because according to him Mechanize is good or at minimum not bad for the world on his worldview. Hence, he is not betraying his own principles.
As for being a traitor, I guess the first question is, traitor of what? To EA principles? To the AI safety cause? To the EA or AI safety community? In order:
I don’t think Matthew is gravely disloyal to EA principles, as he explicitly says he endorses them and has explained how his decisions make sense on his worldview
I don’t think Matthew is gravely disloyal to the AI safety cause, as he’s been openly critical of many common AI doom arguments for some time, and you can’t be disloyal to a cause you never really bought into in the first place
Whether Matthew is gravely disloyal to the EA or AI safety communities feels less obvious to me. I’m guessing a bunch of people saw Epoch as an an AI safety organisation, and by extension its employees as members of the AI safety community, even if the org and its employees did not necessarily see itself or themselves that way, and felt betrayed for that reason. But it still feels off to me to call Matthew a traitor to the EA or AI safety communities, especially given that he’s been critical of common AI doom arguments. This feels more like a difference over empirical beliefs than a difference over fundamental values, and it seems wrong to me to call someone gravely disloyal to a community for drawing unorthodox but reasonable empirical conclusions and acting on those, while broadly having similar values. Like, I think people should be allowed to draw conclusions (or even change their minds) based on evidence—and act on those conclusions—without it being betrayal, assuming they broadly share the core EA values, and assuming they’re being thoughtful about it.
(Of course, it’s still possible that Mechanize is a net-negative for the world, even if Matthew personally is not a sell-out or a traitor or any other such thing.)
Yes, I understand the arguments against it applying here. My question is whether the threshold is being set at a sufficiently high level that it basically never applies to anyone. Hence why I was looking for examples which would qualify.
Sellout (in the context of Epoch) would apply to someone e.g. concealing data or refraining from publishing a report in exchange for a proposed job in an existing AI company.
As for traitor, I think the only group here that can be betrayed is humanity as a whole, so as long as one believes they’re doing something good for humanity I don’t think it’d ever apply.
Hmm, that seems off to me? Unless you mean “severe disloyalty to some group isn’t Ultimately Bad, even though it can be instrumentally bad”. But to me it seems useful to have a concept of group betrayal, and to consider doing so to be generally bad, since I think group loyalty is often a useful norm that’s good for humanity as a whole.
Specifically, I think group-specific trust networks are instrumentally useful for cooperating to increase human welfare. For example, scientific research can’t be carried out effectively without some amount of trust among researchers, and between researchers and the public, etc. And you need some boundary for these groups that’s much smaller than all humanity to enable repeated interaction, mutual monitoring, and norm enforcement. When someone is severely disloyal to one of those groups they belong to, they undermine the mutual trust that enables future cooperation, which I’d guess is ultimately often bad for the world, since humanity as a whole depends for its welfare on countless such specialised (and overlapping) communities cooperating internally.
It’s not that I’m ignoring group loyalty, just that the word “traitor” seems so strong to me that I don’t think there’s any smaller group here that’s owed that much trust. I could imagine a close friend calling me that, but not a colleague. I could imagine a researcher saying I “betrayed” them if I steal and publish their results as my own after they consulted me, but that’s a much weaker word.
[Context: I come from a country where you’re labeled a traitor for having my anti-war political views, and I don’t feel such usage of this word has done much good for society here...]
Some takes:
I think Holly’s tweet was pretty unreasonable and judge her for that not you. But I also disagree with a lot of other things she says and do not at all consider her to speak for the movement
To the best of my ability to tell (both from your comments and private conversations with others), you and the other Mechanize founders are not getting undue benefit from Epoch funders apart from less tangible things like skills, reputation, etc. I totally agree with your comment below that this does not seem a betrayal of their trust. To me, it seems more a mutually beneficial trade between parties with different but somewhat overlapping values, and I am pro EA as a community being able to make such trades.
AI is a very complex uncertain and important space. This means reasonable people will disagree on the best actions AND that certain actions will look great under some worldviews and pretty harmful under others
As such, assuming you are sincere about the beliefs you’ve expressed re why to found Mechanize, I have no issue with calling yourself an Effective Altruist—it’s about evidence based ways to do the most good, not about doing good my way
Separately:
Under my model of the world, Mechanize seems pretty harmful in a variety of ways, in expectation
I think it’s reasonable for people who object to your work to push back against it and publicly criticise it (though agree that much of the actual criticism has been pretty unreasonable)
The EA community implicitly gives help and resources to other people in it. If most people in the community think that what you’re doing is net harmful even if you’re doing it with good intentions, I think it’s pretty reasonable to not want to give you any of that implicit support?
Can you be a bit more specific about what it means for the EA community to deny Matthew (and Mechanize) implicit support, and which ways of doing this you would find reasonable vs. unreasonable?
I was going to write a comment responding but Neel basically did it for me.
The only thing I would object to is Holly being a “prominent member of the EA community”. The PauseAI/StopAI people are often treated as fringe in the EA community and the she frequently violates norms of discourse. EAs due to their norms of discourse, usually just don’t respond to her in the way she responds to others..
Not sure how relevant this is given I think she disapproves of them. (I agree they are so fringe as to be basically outside it).
Just off the top of my head: Holly was a community builder at Harvard EA, wrote what is arguably one of the most influential forum posts ever, and took sincere career and personal decisions based on EA principles (first, wild animal welfare, and now, “making AI go well”). Besides that, there are several EAGs and community events and conversations and activities that I don’t know about, but all in all, she has deeply engaged with EA and has been a thought leader of sorts for a while now. I think it is completely fair to call her a prominent member of the EA community.[1]
I am unsure if Holly would like the term “member” because she has stated that she is happy to burn bridges with EA / funders, so maybe “person who has historically been strongly influenced by and has been an active member of EA” would be the most accurate but verbose phrasing.
“Prominence” isn’t static.
My impression is that Holly has intentionally sacrificed a significant amount of influence within EA because she feels that EA is too constraining in terms of what needs to be done to save humanity from AI.
So that term would have been much more accurate in the past.
Right but most of this is her “pre-AI” stuff and I am saying that I don’t think “Pause AI” is very mainstream by EA standards, particularly the very inflammatory nature of the activism and the policy prescriptions are definitely not in the majority. It is in that sense that I object to Matthew calling her prominent since by the standard you are suggesting, Matthew is also prominent. He’s been in the movement for a decade and written a lot of extremely influential posts and was a well known part of Epoch for a long time and also wrote one of the most prescient posts ever.
I don’t dispute that Holly has been an active and motivated member of the EA community for a while
I think there’s some speaking past each other due to differing word choices. Holly is prominent, evidenced by the fact that we are currently discussing her. She has been part of the EA community for a long time and appears to be trying to do the most good according to her own principles. So it’s reasonable to call her a member of the EA community. And therefore “prominent member” is accurate in some sense.
However, “prominent member” can also imply that she represents the movement, is endorsed by it, or that her actions should influence what EA as a whole is perceived to believe. I believe this is the sense that Marcus and Matthew are using it, and I disagree that she fits this definition. She does not speak for me in any way. While I believe she has good intentions, I’m uncertain about the impact of her work and strongly disagree with many of her online statements and the discourse norms she has chosen to adopt, and think these go against EA norms (and would guess they are also negative for her stated goals, but am less sure on this one).
Edit: I think that Neel’s comment is basically just a better version of the stuff I was trying to say. (On the object level I’m a little more sympathetic than him to ways in which Mechanize might be good, although I don’t really buy the story to that end that I’ve seen you present.)
Wanting to note that on my impressions, and setting aside who is correct on the object-level question of whether Mechanize’s work is good for the world:My best read of the situation is that Matthew has acted very reasonably (according to his beliefs), and that Holly has let herself down a bitI believe that Holly honestly feels that Matthew is a sellout and a traitor; however, I don’t think that this is substantiated by reasonable readings of the facts, and I think this is the kind of accusation which it is socially corrosive to make publicly based on feelingsOn handling object-level disagreements about what’s crucial to do in the world …I think that EA-writ-large should be endorsingmethodologymore thanconclusionsInevitably we will have cases where people have strong earnest beliefs about what’s good to do that point in conflicting directionsI think that we need to support people in assessing the state of evidence and then acting on their own beliefs (hegemony of majority opinion seems kinda terrible)Of course people should be encouraged to beware unilateralism, but I don’t think that can extend to “never do things other people think are actively destructive”It’s important to me that EA has space for earnest disagreementsI therefore think that we should have something like “civilized society” norms, which constrain actionsEspecially (but not only!) those which would be harmful to the ability for the group to have high-quality discoursecf. SBF’s actions, which I think were indefensible even if he earnestly believed them to be the best thing(Some discussion onhow norms help to contain naive utilitarianism)I feel that Holly’s tweet was (somewhat) norm-violating; and kind of angry that Matthew is the main person defending himself hereMatthew’s comment was on −1 just now. I’d like to encourage people not to vote his post into the negative. Even though I don’t find his defense at all persuasive, I still think it deserves to be heard.
This isn’t merely an “academic disagreement” anymore. You aren’t just writing posts, you’ve actually created a startup. You’re doing things in the space.
As an example, it’s neither incoherent nor hypocritical to let philosophers argue “Maybe existence is negative, all things considered” whilst still cracking down on serial killers. The former is necessary for academic freedom, the latter is not.
The point of academic freedom is to ensure that the actions we take in the world are as well-informed as possible. It is not to create a world without any norms at all.
Honestly, this is such a lazy critique. Whenever anyone disagrees with a group, they can always dismiss them as a “cult” or “cult-adjacent”, but this doesn’t make it true.
I think Ozzie’s framing of cooperativeness is much more accurate. The unilateralist’s curse very much applies to differential technology development, so if the community wants to have an impact here, it can’t ignore the issue of “cowboys” messing things up by rowing in the opposite direction, especially when their reasoning seems poor. Any viable community, especially one attempting to drive change, needs to have a solution to this problem.
Having norms isn’t equivalent to being a cult. When Fair Trade started taking off, I shared some of my doubts with some people who were very committed to it. This went poorly. They weren’t open-minded at all, but I wouldn’t run around calling Fair Trade a cult or even cult adjacent. They were just… a regular group.
And if I had run around accusing them of essentially being a “cult” that would have reflected poorly on me rather than on them.
As I described in my previous comment, the issue is more subtle than this. It’s about the specific context:
I concede that there wasn’t a previous well-defined norm against this, but norms have to get started somehow. And this is how it happens, someone does something, people are like wtf and then, sometimes, a consensus forms that a norm is required.
Quick thoughts:
1. I think I want to see more dialogue here. I don’t personally like the thought of the Mechanize team and EA splitting apart (at least, more than is already the case). I’d naively expect that there might still be a fair bit of wiggle room for the Mechanize team to do better or worse things in the world, and I’d of course hope for the better size of that. (I think the situation is still very early for instance).
2. I find it really difficult to adjudicate on morality and specifics of the Mechanize spinnoff. I don’t know as much about the details as others do. It really isn’t clear to me what the previous funders of Epoch believed or what the conditions of the donations were. I think those details matter in trying to judge the situation.
3. The person you mentioned, Holly Elmore, is really the first and and one of the loudest to get upset about many things of this sort of shape. I think Holly disagrees with much of the EA scene, but in the opposite way than you/Matthew does. I personally think Holly goes a fair bit too far much of the time. That said, I know there were others who were upset about this who I think better represent the main EA crowd.
4. “the idea that we should decelerate AI development is now sometimes treated as central to the EA identity in many (albeit not all) EA circles.” The way I see it is more that it’s somewhat a matter of cooperativeness between EA organizations. There are a bunch of smart people and organizations working hard to slow down generic AI development. Out of all the things one could do, there are many useful things to work on other than [directly speeding up AI development]. This is akin to how it would be pretty awkward if there were a group that calls themselves EA that tries to fight global population growth by making advertisements attacking GiveWell—it might be the case that they feel like they have good reasons for this, but it makes sense to me why some EAs might not be very thrilled. Related, I’ve seen some arguments for longer timelines that makes sense to me, but I don’t feel like I’ve seen many arguments in favor of speeding up AI timelines that make sense to me.
This accusation was not because you cofounded an AI startup. It was specifically because you took funding to work on AI safety from people who want to
slow down AI developmentuse capability trends to better understand how to make AI safer*, and you are now (allegedly) using results developed from that funding to start a company dedicated to accelerating AI capabilities.I don’t know exactly what results Mechanize is using, but if this is true, then it does indeed constitute a betrayal. Not because you’re accelerating capabilities, but because you took AI safety funding and used the results to do the opposite of what funders wanted.
*Corrected to give a more accurate characterization, see Chris Leong’s comment
“From people who want to slow down AI development”
The framing here could be tighter. It’s more about wanting to be able to understand AI capability trends better without accidentally causing capability externalities.
Yes I think that is better than what I said, both because it’s more accurate, and because it’s more clear that Matthew did in fact use his knowledge of capability trends to decide that he could profit from starting an AI company.
Like, I don’t know what exactly went into his decision, but I would be surprised if that knowledge didn’t play a role.
Arguably that’s less on Matthew and more on the founders of Epoch for either misrepresenting themselves or having a bad hiring filter. Probably the former—if I’m not mistaken, Tamay Besiroglu co-founded Epoch and is now co-founding Mechanize, so I would say Tamay behaved badly here but I’m not sure whether Matthew did.
If this line of reasoning is truly the basis for calling me a “sellout” and a “traitor”, then I think the accusation becomes even more unfounded and misguided. The claim is not only unreasonable: it is also factually incorrect by any straightforward or good-faith interpretation of the facts.
To be absolutely clear: I have never taken funds that were earmarked for slowing down AI development and redirected them toward accelerating AI capabilities. There has been no repurposing or misuse of philanthropic funding that I am aware of. The startup in question is an entirely new and independent entity. It was created from scratch, and it is funded separately—it is not backed by any of the philanthropic donations I received in the past. There is no financial or operational overlap.
Furthermore, we do not plan on meaningfully making use of benchmarks, datasets, or tools that were developed during my previous roles in any substantial capacity at the new startup. We are not relying on that prior work to advance our current mission. And as far as I can tell, we have never claimed or implied otherwise publicly.
It’s also important to address the deeper assumption here: that I am somehow morally or legally obligated to permanently align my actions with the preferences or ideological views of past philanthropic funders who supported an organization that employed me. That notion seems absurd. It has no basis in ordinary social norms, legal standards, or moral expectations. People routinely change roles, perspectives evolve, and institutions have limited scopes and timelines. Holding someone to an indefinite obligation based solely on past philanthropic support would be unreasonable.
Even if, for the sake of argument, such an obligation did exist, it would still not apply in this case—because, unless I am mistaken, the philanthropic grant that supported me as an employee never included any stipulation about slowing down AI in the first place. As far as I know, that goal was never made explicit in the grant terms, which renders the current accusations irrelevant and unfounded.
Ultimately, these criticisms appear unsupported by evidence, logic, or any widely accepted ethical standards. They seem more consistent with a kind of ideological or tribal backlash to the idea of accelerating AI than with genuine, thoughtful, and evidence-based concerns.
I don’t think a lifetime obligation is the steelmanned version of your critics’ narrative, though. A time-limited version will work just as well for them.
In many circumstances, I do think society does recognize a time-limited moral obligation and social norm not to work for the other side from those providing you significant resources,[1] --although I am not convinced it would in the specific circumstances involving you and Epoch. So although I would probably acquit you of the alleged norm violation here, I would not want others drawing larger conclusions about the obligation / norm from that acquittal than warranted.[2]
There is something else here, though. At least in the government sector, time-limited post-employment restrictions are not uncommon. They are intended to avoid the appearance of impropriety as much as actual impropriety itself. In those cases, we don’t trust the departing employee not to use their prior public service for private gain in certain ways. Moreover, we recognize that even the appearance that they are doing so creates social costs. The AIS community generally can’t establish and enforce legally binding post-employment restrictions, but is of course free to criticize people whose post-employment conduct it finds inappropriate under community standards. (“Traitor” is rather poorly calibrated to those circumstances, but most of the on-Forum criticism has been somewhat more measured than that.)
Although I’d defer to people with subject-matter expertise on whether there is an appearance of impropriety here, [3] I would note that is a significant lower standard for your critics to satisfy than proving actual impropriety. If there’s a close enough fit between your prior employment and new enterprise, that could be enough to establish a rebuttable presumption of an appearance.
For instance, I would consider it shady for a new lawyer to accept a competitive job with Treehuggers (made up organization); gain skill, reputation, and career capital for several years through Treehuggers’ investment of money and mentorship resources; and then use said skill and reputation to jump directly to a position at Big Timber with a big financial upside. I would generally consider anyone who did that as something of . . . well, a traitor and a sellout to Treehuggers and the environmental movement.
This should also not be seen as endorsing your specific defense rationale. For instance, I don’t think an explicit “stipulation about slowing down AI” in grant language would be necessary to create an obligation.
My deference extends to deciding what impropriety means here, but “meaningfully making use of benchmarks, datasets, or tools that were developed during [your] previous roles” in a way that was substantially assisted by your previous roles sounds like a plausible first draft of at least one form of impropriety.
My argument for this being bad is quite similar to what you’ve written.
I agree that Michael’s framing doesn’t quite work. It’s not even clear to me that OpenPhil, for example, is aiming to “slow down AI development” as opposed to “fund research into understanding AI capability trends better without accidentally causing capability externalities”.
I’ve previously written a critique here, but the TLDR is that Mechanise is a major burning of the commons that damages trust within the Effective Altruism community and creates a major challenge for funders who want to support ideological diversity in forecasting organisations without accidentally causing capability externalities.
This is a useful clarification. I had a weak impression that Mechanise might be.
I agree that some of your critics may not have quite been able to hit the nail on the head when they tried to articulate their critiques (it took me substantial effort to figure out what I precisely thought was wrong, as opposed to just ‘this feels bad’), but I believe that the general thrust of their arguments more or less holds up.
In context, this comes across to me as an overly charitable characterization of what actually occurred: someone publicly labeled me a literal traitor and then made a baseless, false accusation against me. What’s even more concerning is that this unfounded claim is now apparently being repeated and upvoted by others.
When communities choose to excuse or downplay this kind of behavior—by interpreting it in the most charitable possible way, or by glossing over it as being “essentially correct”—they end up legitimizing what is, in fact, a low-effort personal attack without a factual basis. Brushing aside or downplaying such attacks as if they are somehow valid or acceptable doesn’t just misrepresent the situation; it actively undermines the conditions necessary for good faith engagement and genuine truth-seeking.
I urge you to recognize that tolerating or rationalizing this type of behavior has real social consequences. It fosters a hostile environment, discourages honest dialogue, and ultimately corrodes the integrity of any community that claims to value fairness and reasoned discussion.
I think Holly just said what a lot of people were feeling and I find that hard to condemn.
”Traitor” is a bit of a strong term, but it’s pretty natural for burning the commons to result in significantly less trust. To be honest, the main reason why I wouldn’t use that term myself is that it reifies individual actions into a permanent personal characteristic and I don’t have the context to make any such judgments. I’d be quite comfortable with saying that founding Mechanise was a betrayal of sorts, where the “of sorts” clarifies that I’m construing the term broadly.
This characterisation doesn’t quite match what happened. My comment wasn’t along the lines, “Oh, it’s essentially correct, close enough is good enough, details are unimportant”, but I actually wrote down what I thought a more careful analysis would look like.
Part of the reason why I’ve been commenting is to encourage folks to make more precise critiques. And indeed, Michael has updated his previous comment in response to what I wrote.
Is it baseless?
I noticed you wrote: “we do not plan on meaningfully making use”. That provides you with substantial wriggle room. So it’s unclear to me at this stage that your statements being true/defensible would necessitate her statements being false.
Holly herself believes standards of criticism should be higher than what (judging by the comments here without being familiar with the overall situation) she seems to have employed here; see Criticism is sanctified in EA, but, like any intervention, criticism needs to pay rent.
Yes, absolutely. With respect, unless you can provide some evidence indicating that I’ve acted improperly, I see no productive reason to continue engaging on this point.
What concerns me most here is that the accusation seems to be treated as credible despite no evidence being presented and a clear denial from me. That pattern—assuming accusations about individuals who criticize or act against core dogmas are true without evidence—is precisely the kind of cult-like behavior I referenced in my original comment.
Suggesting that I’ve left myself “substantial wiggle room” misinterprets what I intended, and given the lack of supporting evidence, it feels unfair and unnecessarily adversarial. Repeatedly implying that I’ve acted improperly without concrete substantiation does not reflect a good-faith approach to discussion.
If you don’t want to engage, that’s perfectly fine. I’ve written a lot of comments and responding to all of them would take substantial time. It wouldn’t be fair to expect that from you.
That said, labelling asking for clarification “cult-like behaviour” is absurd. On the contrary, not naively taking claims at face value is a crucial defence against this. Furthermore, implying that someone asking questions in bad faith is precisely the technique that cult leaders use[1].
I said that the statement left you substantial wiggle room. This was purely a comment about how the statement could have a broad range of interpretations. I did not state, nor mean to imply, that this vagueness was intentional or in bad faith.
That said, people asking questions in bad faith is actually pretty common and so you can’t assume that something is a cult just because they say that their critics are mostly acting in bad faith.
To be clear, I was not calling your request for clarification “cult-like”. My comment was directed at how the accusation against me was seemingly handled—as though it were credible until I could somehow prove otherwise. No evidence was offered to support the claim. Instead, assertions were made without substantiation. I directly and clearly denied the accusations, but despite that, the line of questioning continued in a way that strongly suggested the accusation might still be valid.
To illustrate the issue more clearly: imagine if I were to accuse you of something completely baseless, and even after your firm denials, I continued to press you with questions that implicitly treated the accusation as credible. You would likely find that approach deeply frustrating and unfair, and understandably so. You’d be entirely justified in pushing back against it.
That said, I acknowledge that describing the behavior as “cult-like” may have generated more heat than light. It likely escalated the tone unnecessarily, and I’ll be more careful to avoid that kind of rhetoric going forward.
I can see why you’d find this personally frustrating.
On the other hand, many people in the community, myself included, took certain claims from OpenAI and sbf at face value when it might have been more prudent to be less trusting. I understand that it must be unpleasant to face some degree of distrust due to the actions of others.
And I can see why you’d see your statements as a firm denial, whilst from my perspective, they were ambiguous. For example, I don’t know how to interpret your use of the word “meaningful”, so I don’t actually know what exactly you’ve denied. It may be clear to you because you know what you mean, but it isn’t clear to me.
(For what it’s worth, I neither upvoted nor downvoted the comment you made before this one, but I did disagree vote it.)
I’m a 36 year old iOS Engineer/Software Engineer who switched to working on Image classification systems via Tensorflow a year ago. Last month I was made redundant with a fairly generous severance package and good buffer of savings to get me by while unemployed.
The risky step I had long considered of quitting my non-impactful job was taken for me. I’m hoping to capitalize on my free time by determining what career path to take that best fits my goals. I’m pretty excited about it.
I created a weighted factor model to figure out what projects or learning to take on first. I welcome feedback on it. There’s also a schedule tab for how I’m planning to spend my time this year and a template if anyone wishes to use this spreadsheet their selves.
I got feedback from my 80K hour advisor to get involved in EA communities more often. I’m also want to learn more publicly be it via forums or by blogging. This somewhat unstructured dumping of my thoughts is a first step towards that.
I love the model—and I’m happy to give feedback on ideas for EA Forum posts if that would ever be helpful! (I’m the Content Strategist for the Forum).
That would be really useful!
Some of my ideas for forum or blog posts are:
Bi-weekly updates on what I’ve been working on.
Posting stuff I’ve worked on (mostly ML related).
Miscellaneous topics such as productivity and ADD.
Reviews of EA programmes I’ve taken part in or books I’ve read
Dumping my thoughts on a topic
I’m also interested in how you differentiate between content better suited for a blog or better suited for a forum?
Out of that list I’d guess that the fourth and fifth (depending on topics) bullets are most suitable for the Forum.
The basic way I’d differentiate content is that the Forum frontpage should all be content that is related to the project of effective altruism, the community section is about EA as a community (i.e. if you were into AI Safety but not EA, you wouldn’t be interested in the community section), and “personal blog” (i.e. not visible on frontpage) is the section for everything that isn’t in those categories. For example posts on “Miscellaneous topics such as productivity and ADD” would probably be moved to personal blog, unless they were strongly related to EA. This doesn’t mean the content isn’t good—lots of EAs read productivity content, but ideally, the Forum should be focused on EA priorities rather than what EAs find interesting.
Feel free to message me with specific ideas that I could help categorise for you! And if in doubt, quick-takes are much more loose and you can post stuff like the bi-weekly updates there to gauge interest.
Elon Musk recently presented SpaceX’s roadmap for establishing a self-sustaining civilisation on Mars (by 2033 lol). Aside from the timeline, I think there may be some important questions to consider with regards to space colonisation and s-risks:
In a galactic civilisation of thousands of independent and technologically advanced colonies, what is the probability that one of those colonies will create trillions of suffering digital sentient beings? (probably near 100% if digital sentience is possible… it only takes one)
Is it possible to create a governance structure that would prevent any person in a whole galactic civilisation from creating digital sentience capable of suffering? (sounds really hard especially given the huge distances and potential time delays in messaging… no idea)
What is the point of no-return where a domino is knocked over that inevitably leads to self-perpetuating human expansion and the creation of galactic civilisation? (somewhere around a self-sustaining civilisation on Mars I think).
If the answer to question 3 is “Mars colony”, then it’s possible that creating a colony on Mars is a huge s-risk if we don’t first answer question 2.
Would appreciate some thoughts.
Stuart Armstrong and Anders Sandberg’s article on expanding throughout the galaxy rapidly, and Charlie Stross’ blog post about griefers influenced this quick take.
Looks like Mechanize is choosing to be even more irresponsible than we previously thought. They’re going straight for automating software engineering. Would love to hear their explanation for this.
“Software engineering automation isn’t going fast enough” [1] - oh really?
This seems even less defensible than their previous explanation of how their work would benefit the world.
Not an actual quote
Some useful context is that I think a software singularity is unlikely to occur; see this blog post for some arguments. Loosely speaking, under the view expressed in the linked blog post, there aren’t extremely large gains from automating software engineering tasks beyond the fact that these tasks represent a significant (and growing) fraction of white collar labor by wage bill.
Even if I thought a software singularity will likely happen in the future, I don’t think this type of work would be bad in expectation, as I continue to think that accelerating AI is likely good for the world. My main argument is that speeding up AI development will hasten large medical, technological, and economic benefits to people alive today, without predictably causing long-term harms large enough to outweigh these clear benefits. For anyone curious about my views, I’ve explained my perspective on this issue at length on this forum and elsewhere.
Note: Matthew’s comment was negative just now. Please don’t vote it into the negative and use the disagree button instead. Even though I don’t think Matthew’s defense is persuasive, it deserves to be heard.
I wrote a critique of that article here. TLDR: “It has some strong analysis at points, but unfortunately, it’s undermined by some poor choices of framing/focus that mean most readers will probably leave more confused than when they came”.
”A software singularity is unlikely to occur”—Unlikely enough that you’re willing to bet the house on it? Feels like you’re picking up pennies in front of a steamroller.
AI is already going incredibly fast. Why would you want to throw more fuel on the fire?
Is it that you honestly think AI is moving too slow at the moment (no offense, but seems crazy to me) or is your worry that current trends are misleading and AI might slow in the future?
Regarding the latter, I agree that once timelines start to get sufficiently long, there might actually be an argument for accelerating them (but in order to reach AGI before biotech causes a catastrophe, rather than the more myopic reasons you’ve provided). But if your worry is stagnation, why not actually wait until things appear to have stalled and then perhaps consider doing something like this?
Or why didn’t you just stay at Epoch, which was a much more robust and less fragile theory of action? (Okay, I don’t actually think articles like this are high enough quality to be net-positive, but you were 90% of the way towards having written a really good article. The framing/argument just needed to be a bit tighter, which could have been achieved with another round of revisions).
The main reason not to wait is… missing the opportunity to cash in on the current AI boom.
This is a clear strawman. Matthew has given reasons why he thinks acceleration is good which aren’t this.
I bet the strategic analysis for Mechanize being a good choice (net-positive and positive relative to alternatives) is paper-thin, even given his rough world view.
Might be true, doesn’t make that not a strawman. I’m sympathetic to thinking it’s implausible that mechanize would be the best thing to do on altruistic grounds even if you share views like those of the founders. (Because there is probably something more leveraged to do and some weight on cooperativeness considerations.)
Sometimes the dollar signs can blind someone and cause them not to consider obvious alternatives. And they will feel that they made the decision for reasons other than the money, but the money nonetheless caused the cognitive distortion that ultimately led to the decision.
I’m not claiming that this happened here. I don’t have any way of really knowing. But it’s certainly suspicious. And I don’t think anything is gained by pretending that it’s not.
As part of MATS’ compensation reevaluation project, I scraped the publicly declared employee compensations from ProPublica’s Nonprofit Explorer for many AI safety and EA organizations (data here) in 2019-2023. US nonprofits are required to disclose compensation information for certain highly paid employees and contractors on their annual Form 990 tax return, which becomes publicly available. This includes compensation for officers, directors, trustees, key employees, and highest compensated employees earning over $100k annually. Therefore, my data does not include many individuals earning under $100k, but this doesn’t seem to affect the yearly medians much, as the data seems to follow a lognormal distribution, with mode ~$178k in 2023, for example.
I generally found that AI safety and EA organization employees are highly compensated, albeit inconsistently between similar-sized organizations within equivalent roles (e.g., Redwood and FAR AI). I speculate that this is primarily due to differences in organization funding, but inconsistent compensation policies may also play a role.
I’m sharing this data to promote healthy and fair compensation policies across the ecosystem. I believe that MATS salaries are quite fair and reasonably competitive after our recent salary reevaluation, where we also used Payfactors HR market data for comparison. If anyone wants to do a more detailed study of the data, I highly encourage this!
I decided to exclude OpenAI’s nonprofit salaries as I didn’t think they counted as an “AI safety nonprofit” and their highest paid current employees are definitely employed by the LLC. I decided to include Open Philanthropy’s nonprofit employees, despite the fact that their most highly compensated employees are likely those under the Open Philanthropy LLC.
I guess orgs need to be more careful about who they hire as forecasting/evals researchers in light of a recently announced startup.
Sometimes things happen, but three people at the same org...
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.
However, this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
Now let’s suppose you’re an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.
This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.
There’s been some discussions on the EA forum along the lines of “why do we care about value alignment shouldn’t we just hire who can best do the job”. My answer to that is that it’s myopic to only consider what happens whilst they’re working for you. Hiring someone or offering them an opportunity empowers them, you need to consider whether they’re someone who you want to empower[1].
Admittedly, this isn’t quite the same as value alignment. Suppose someone were diligent, honest, wise and responsible. You might want to empower them even if their views were extremely different from yours. Stronger: even if their views were the opposite in many ways. But in the absence of this, value alignment matters.
Short update—TLDR—mechanise is going straight for automating software engineering.
If you only hire people who you believe are intellectually committed to short AGI timelines (and who won’t change their minds given exposure to new evidence and analysis) to work in AGI forecasting, how can you do good AGI forecasting?
One of the co-founders of Mechanize, who formerly worked at Epoch AI, says he thinks AGI is 30 to 40 years away. That was in this video from a few weeks ago on Epoch AI’s YouTube channel.
He and one of his co-founders at Mechanize were recently on Dwarkesh Patel’s podcast (note: Dwarkesh Patel is an investor in Mechanize) and I didn’t watch all of it but it seemed like they were both arguing for longer AGI timelines than Dwarkesh believes in.
I also disagree with the shortest AGI timelines and found it refreshing that within the bubble of people who are fixated on near-term AGI, at least a few people expressed a different view.
I think if you restrict who you hire to do AGI forecasting based on strong agreement with a predetermined set of views, such as short AGI timelines and views on AGI alignment and safety, then you will just produce forecasts that re-state the views you already decided were the correct ones while you were hiring.
I wasn’t suggesting only hiring people who believe in short-timelines. I believe that my original post adequately lays out my position, but if any points are ambiguous, feel free to request clarification.
I don’t know how Epoch AI can both “hire people with a diversity of viewpoints in order to counter bias” and ensure that your former employees won’t try to “cash in on the AI boom in an acceleratory way”. These seem like incompatible goals.
I think Epoch has to either:
Accept that people have different views and will have different ideas about what actions are ethical, e.g., they may view creating an AI startup focused on automating labour as helpful to the world and benign
or
Only hire people who believe in short AGI timelines and high AGI risk and, as a result, bias its forecasts towards those conclusions
Is there a third option?
Presumably there are at least some people who have long timelines, but also believe in high risk and don’t want to speed things up. Or people who are unsure about timelines, but think risk is high whenever it happens. Or people (like me) who think X-risk is low* and timelines very unclear, but even a very low X-risk is very bad. (By very low, I mean like at least 1 in 1000, not 1 in 1x10^17 or something. I agree it is probably bad to use expected value reasoning with probabilities as low as that.)
I think you are pointing at a real tension though. But maybe try to see it a bit from the point of view of people who think X-risk is real enough and raised enough by acceleration that acceleration is bad. It’s hardly going to escape their notice that projects at least somewhat framed as reducing X-risk often end up pushing capabilities forward. They don’t have to be raging dogmatists to worry about this happening again, and it’s reasonable for them to balance this risk against risks of echo chambers when hiring people or funding projects.
*I’m less surely merely catastrophic biorisk from human misuse is low sadly.
Why don’t we ask ChatGPT? (In case you’re wondering, I’ve read every word of this answer and I fully endorse it, though I think there are better analogies that the journalism example ChatGPT used).
Hopefully, this clarifies a possible third option (one that my original answer was pointing at).
So, you want to try to lock in AI forecasters to onerous and probably illegal contracts that forbid them from founding an AI startup after leaving the forecasting organization? Who would sign such a contract? This is even worse than only hiring people who are intellectually pre-committed to certain AI forecasts. Because it goes beyond a verbal affirmation of their beliefs to actually attempting to legally force them to comply with the (putative) ethical implications of certain AI forecasts.
If the suggestion is simply promoting “social norms” against starting AI startups, well, that social norm already exists to some extent in this community, as evidenced by the response on the EA Forum. But if the norm is too weak, it won’t prevent the undesired outcome (the creation of an AI startup), and if the norm is too strong, I don’t see how it doesn’t end up selecting forecasters for intellectual conformity. Because non-conformists would not want to go along with such a norm (just like they wouldn’t want to sign a contract telling them what they can and can’t do after they leave the forecasting company).
I agree that we need to be careful about who we are empowering.
“Value alignment” is one of those terms which has different meanings to different people. For example, the top hit I got on Google for “effective altruism value alignment” was a ConcernedEAs post which may not reflect what you mean by the term. Without knowing exactly what you mean, I’d hazard a guess that some facets of value alignment are pretty relevant to mitigating this kind of risk, and other facets are not so important. Moreover, I think some of the key factors are less cognitive or philosophical than emotional or motivational (e.g., a strong attraction toward money will increase the risk of defecting, a lack of self-awareness increases the risk of motivated reasoning toward goals one has in a sense repressed).
So, I think it would be helpful for orgs to consider what elements of “value alignment” are of particular importance here, as well as what other risk or protective factors might exist outside of value alignment, and focus on those specific things.
Agreed. “Value alignment” is a simplified framing.
Why not attack them? They defected. They did a really bad thing.
Also, it is worrying if the optimists easily find financial opportunities that depend on them not changing their minds. Even if they are honest and have the best of intentions, the disparity in returns to optimism is epistemically toxic.
I’d like to suggest a little bit more clarity here. The phrases you use refer to some knowledge that isn’t explicitly stated here. “in light of a recently announced startup” and “three people at the same org” make sense to someone who already knows the context of what you are writing about, but it is confusing to a reader who doesn’t have the same background knowledge that you do.
Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria. So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million). Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers. Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.
100 percent agree. I dont understand the entire post because I don’t know the context. I don’t think alluding to something helps, better to say it explicitly.
I tend to agree; better to be explicit especially as the information is public knowledge anyway.
It refers to this: https://forum.effectivealtruism.org/posts/HqKnreqC3EFF9YcEs/
So, I have two possible projects for AI alignment work that I’m debating between focusing on. Am curious for input into how worthwhile they’d be to pursue or follow up on.
The first is a mechanistic interpretability project. I have previously explored things like truth probes by reproducing the Marks and Tegmark paper and extending it to test whether a cosine similarity based linear classifier works as well. It does, but not any better or worse than the difference of means method from that paper. Unlike difference of means, however, it can be extended to multi-class situations (though logistic regression can be as well). I was thinking of extending the idea to try to create an activation vector based “mind reader” that calculates the cosine similarity with various words embedded in the model’s activation space. This would, if it works, allow you to get a bag of words that the model is “thinking” about at any given time.
The second project is a less common game theoretic approach. Earlier, I created a variant of the Iterated Prisoner’s Dilemma as a simulation that includes death, asymmetric power, and aggressor reputation. I found, interestingly, that cooperative “nice” strategies banding together against aggressive “nasty” strategies produced an equilibrium where the cooperative strategies win out in the long run, generally outnumbering the aggressive ones considerably by the end. Although this simulation probably requires more analysis and testing in more complex environments, it seems to point to the idea that being consistently nice to weaker nice agents acts as a signal to more powerful nice agents and allows coordination that increases the chance of survival of all the nice agents, whereas being nasty leads to a winner-takes-all highlander situation, which from an alignment perspective could be a kind of infoblessing that an AGI or ASI could be persuaded to spare humanity for these game theoretic reasons.
I’m organizing an EA Summit in Vancouver, BC, for the fall and am looking for opportunities for our attendees to come away from the event with opportunities to look forward to. Most of our attendees will have Canadian but not US work authorization. Anyone willing to meet potential hires, mentees, research associates, funding applicants, etc., please get in touch!
why do i find myself less involved in EA?
epistemic status: i timeboxed the below to 30 minutes. it’s been bubbling for a while, but i haven’t spent that much time explicitly thinking about this. i figured it’d be a lot better to share half-baked thoughts than to keep it all in my head — but accordingly, i don’t expect to reflectively endorse all of these points later down the line. i think it’s probably most useful & accurate to view the below as a slice of my emotions, rather than a developed point of view. i’m not very keen on arguing about any of the points below, but if you think you could be useful toward my reflecting processes (or if you think i could be useful toward yours!), i’d prefer that you book a call to chat more over replying in the comments. i do not give you consent to quote my writing in this short-form without also including the entirety of this epistemic status.
1-3 years ago, i was a decently involved with EA (helping organize my university EA program, attending EA events, contracting with EA orgs, reading EA content, thinking through EA frames, etc).
i am now a lot less involved in EA.
e.g. i currently attend uc berkeley, and am ~uninvolved in uc berkeley EA
e.g. i haven’t attended a casual EA social in a long time, and i notice myself ughing in response to invites to explicitly-EA socials
e.g. i think through impact-maximization frames with a lot more care & wariness, and have plenty of other frames in my toolbox that i use to a greater relative degree than the EA ones
e.g. the orgs i find myself interested in working for seem to do effectively altruistic things by my lights, but seem (at closest) to be EA-community-adjacent and (at furthest) actively antagonistic to the EA community
(to be clear, i still find myself wanting to be altruistic, and wanting to be effective in that process. but i think describing my shift as merely moving a bit away from the community would be underselling the extent to which i’ve also moved a bit away from EA’s frames of thinking.)
why?
a lot of EA seems fake
the stuff — the orientations — the orgs — i’m finding it hard to straightforwardly point at, but it feels kinda easy for me to notice ex-post
there’s been an odd mix of orientations toward [ aiming at a character of transparent/open/clear/etc ] alongside [ taking actions that are strategic/instrumentally useful/best at accomplishing narrow goals… that also happen to be mildly deceptive, or lying by omission, or otherwise somewhat slimy/untrustworthy/etc ]
the thing that really gets me is the combination of an implicit (and sometimes explicit!) request for deep trust alongside a level of trust that doesn’t live up to that expectation.
it’s fine to be in a low-trust environment, and also fine to be in a high-trust environment; it’s not fine to signal one and be the other. my experience of EA has been that people have generally behaved extremely well/with high integrity and with high trust… but not quite as well & as high as what was written on the tin.
for a concrete ex (& note that i totally might be screwing up some of the details here, please don’t index too hard on the specific people/orgs involved): when i was participating in — and then organizing for — brandeis EA, it seemed like our goal was (very roughly speaking) to increase awareness of EA ideas/principles, both via increasing depth & quantity of conversation and via increasing membership. i noticed a lack of action/doing-things-in-the-world, which felt kinda annoying to me… until i became aware that the action was “organizing the group,” and that some of the organizers (and higher up the chain, people at CEA/on the Groups team/at UGAP/etc) believed that most of the impact of university groups comes from recruiting/training organizers — that the “action” i felt was missing wasn’t missing at all, it was just happening to me, not from me. i doubt there was some point where anyone said “oh, and make sure not to tell the people in the club that their value is to be a training ground for the organizers!” — but that’s sorta how it felt, both on the object-level and on the deception-level.
this sort of orientation feels decently reprensentative of the 25th percentile end of what i’m talking about.
also some confusion around ethics/how i should behave given my confusion/etc
importantly, some confusions around how i value things. it feels like looking at the world through an EA frame blinds myself to things that i actually do care about, and blinds myself to the fact that i’m blinding myself. i think it’s taken me awhile to know what that feels like, and i’ve grown to find that blinding & meta-blinding extremely distasteful, and a signal that something’s wrong.
some of this might merely be confusion about orientation, and not ethics — e.g. it might be that in some sense the right doxastic attitude is “EA,” but that the right conative attitude is somewhere closer to (e.g.) “embody your character — be kind, warm, clear-thinking, goofy, loving, wise, [insert more virtues i want to be here]. oh and do some EA on the side, timeboxed & contained, like when you’re donating your yearly pledge money.”
where now?
i’m not sure! i could imagine the pendulum swinging more in either direction, and want to avoid doing any further prediction about where it will swing for fear of that prediction interacting harmfully with a sincere process of reflection.
i did find writing this out useful, though!
Thanks for sharing your experiences and reflections here — I really appreciate the thoughtfulness. I want to offer some context on the group organizer situation you described, as someone who was running the university groups program at the time.
On the strategy itself:
At the time, our scalable programs were pretty focused from evidence we had seen that much of the impact came from the organizers themselves. We of course did want groups to go well more generally, but in deciding where to put our marginal resource we were focusing on group organizers. It was a fairly unintuitive strategy — and I get how that could feel misaligned or even misleading if it wasn’t clearly communicated.
On communication:
We did try to be explicit about this strategy — it was featured at organizer retreats and in parts of our support programming. But we didn’t consistently communicate it across all our materials. That inconsistency was an oversight on our part. Definitely not an attempt to be deceptive — just something that didn’t land as clearly as we hoped.
Where we’re at now:
We’ve since updated our approach. The current strategy is less focused narrowly on organizers and more on helping groups be great overall. That said, we still think a lot of the value often comes from a small, highly engaged core — which often includes organizers, but not exclusively.
In retrospect, I wish we’d communicated this more clearly across the board. When a strategy is unintuitive, a few clear statements in a few places often isn’t enough to make it legible. Sorry again if this felt off — I really appreciate you surfacing it.
“why do i find myself less involved in EA?”
You go over more details later and answer other questions like what caused some reactions to some EA-related things, but an interesting thing here is that you are looking for a cause of something that is not.
> it feels like looking at the world through an EA frame blinds myself to things that i actually do care about, and blinds myself to the fact that i’m blinding myself.
I can strongly relate, had the same experience. i think it’s due to christian upbringing or some kind of need for external validation. I think many people don’t experience that, so I wouldn’t say that’s an inherently EA thing, it’s more about the attitude.
I appreciated you expressing this.
Riffing out loud … I feel that there are different dynamics going on here (not necessarily in your case; more in general):
The tensions where people don’t act with as much integrity as is signalled
This is not a new issue for EA (it arises structurally despite a lot of good intentions, because of the encouragement to be strategic), and I think it just needs active cultural resistance
In terms of writing, I like Holden’s and Toby’s pushes on this; my own attempts here and here
But for this to go well, I think it’s not enough to have some essays on reading lists; instead I hope that people try to practice good orientation here at lots of different scales, and socially encourage others to
The meta-blinding
I feel like I haven’t read much on this, but it rings true as a dynamic to be wary of! Where I take the heart of the issue to be that EA presents a strong frame about what “good” means, and then encourages people to engage in ways that make aspects of their thinking subservient to that frame
As someone put it to me, “EA has lost the mandate of heaven”
I think EA used to be (in some circles) the obvious default place for the thoughtful people who cared a lot to gather and collaborate
I think that some good fraction of its value came from performing this role?
Partially as a result of 1 and 2, people are disassociating with EA; and this further reduces the pull to associate
I can’t speak to how strong this effect is overall, but I think the directionality is clear
I don’t know if it’s accessible (and I don’t think I’m well positioned to try), but I still feel a lot of love for the core of EA, and would be excited if people could navigate it to a place where it regained the mandate of heaven.
Thanks for clarifying your take!
I’m sorry to hear about those experiences.
Most of the problems you mention seem to be about the specific current EA community, as opposed to the main values of “doing a lot of good” and “being smart about doing so.”
Personally, I’m excited for certain altruistic and smart people to leave the EA community, as it suits them, and do good work elsewhere. I’m sure that being part of the community is limiting to certain people, especially if they can find other great communities.
That said, I of course hope you can find ways for the key values of “doing good in the world” and similar to work for you.
I think it might be cool if an AI Safety research organization ran a copy of an open model or something and I could pay them a subscription to use it. That way I know my LLM subscription money is going to good AI stuff and not towards the stuff that AI companies that I don’t think I like or want more of on net.
Idk, existing independent orgs might not be the best place to do this bc it might “damn them” or “corrupt them” over time. Like, this could lead them to “selling out” in a variety of ways you might conceive of that.
Still, I guess I am saying that to the extent anyone is going to actually “make money” off of my LLM usage subscriptions, it would be awesome if it were just a cool independent AIS lab I personally liked or similar. (I don’t really know the margins and unit economics which seems like an important part of this pitch lol).
Like, if “GoodGuy AIS Lab” sets up a little website and inference server (running Qwen or Llama or whatever) then I could pay them the $15-25 a month I may have otherwise paid to an AI company. The selling point would be that less “moral hazard” is better vibes, but probably only some people would care about this at all and it would be a small thing. But also, it’s hardly like a felt sense of moral hazard around AI is a terribly niche issue.
This isn’t the “final form” of this I have in mind necessarily; I enjoy picking at ideas in the space of “what would a good guy AGI project do” or “how can you do neglected AIS / ‘AI go well’ research in a for-profit way”.
I also like the idea of an explicitly fast follower project for AI capabilities. Like, accelerate safety/security relevant stuff and stay comfortably middle of the pack on everything else. I think improving GUIs is probably fair game too, but not once it starts to shade into scaffolding I think? I wouldn’t know all of the right lines to draw here, but I really like this vibe.
This might not work well if you expect gaps to widen as RSI becomes a more important input. I would argue that seems too galaxy brained given that, as of writing, we do live in a world with a lot of mediocre AI companies that I believe can all provide products of ~comparable quality.
It is also just kind of a bet that in practice it is probably going to remain a lot less expensive to stay a little behind the frontier than to be at the frontier. And that, in practice, it may continue to not matter in a lot of cases.
fwiw I think you shouldn’t worry about paying $20/month to an evil company to improve your productivity, and if you want to offset it I think a $10/year donation to LTFF would more than suffice.
Can you say more on why you think a 1:24 ratio is the right one (as opposed to lower or higher ratios)? And how might this ratio differ for people who have different beliefs than you, for example about xrisk, LTFF, or the evilness of these companies?
I haven’t really thought about it and I’m not going to. If I wanted to be more precise, I’d assume that a $20 subscription is equivalent (to a company) to finding a $20 bill on the ground, assume that an ε% increase in spending on safety cancels out an ε% increase in spending on capabilities (or think about it and pick a different ratio), and look at money currently spent on safety vs capabilities. I don’t think P(doom) or company-evilness is a big crux.
Alternative idea: AI companies should have a little checkbox saying “Please use 100% of the revenue from my subscription to fund safety research only.” This avoids some of the problems with your idea and also introduces some new problems.
I think there is a non-infinitesimal chance that Anthropic would actually implement this.
Ya, maybe. This concern/way of thinking just seems kind of niche. Probably only a very small demographic who overlaps with me here. So I guess I wouldn’t expect it to be a consequential amount of money to eg. Anthropic or OpenAI.
That check box would be really cool though. It might ease friction / dissonance for people who buy into high p(doom) or relatively non-accelerationist perspectives. My views are not representative of anyone, but me, but a checkbox like that would be a killer feature for me and certainly win my $20/mo :) . And maybe, y’know, all 100 people or whatever who would care and see it that way.
Mini Forum update: Draft comments, and polls in comments
Draft comments
You can now save comments as permanent drafts:
After saving, the draft will appear for you to edit:
1. In-place if it’s a reply to another comment (as above)
2. In a “Draft comments” section under the comment box on the post
3. In the drafts section of your profile
The reasons we think this will be useful:
For writing long, substantive comments (and quick takes!). We think these are the some of the most valuable comments on the forum, and want to encourage more of them
For starting a comment on mobile and then later continuing on desktop
To lower the barrier to starting writing a comment, since you know you can always throw it in drafts and then never look at it again
Polls in comments
We recently added the ability to put polls in posts, and this was fairly well received, so we’re adding it to comments (… and quick takes!) as well.
You can add a poll from the toolbar, you just need to highlight a bit of text to make the toolbar appear:
And the poll will look like this...
A summary of my current views on moral theory and the value of AI
I am essentially a preference utilitarian and an illusionist regarding consciousness. This combination of views leads me to conclude that future AIs will very likely have moral value if they develop into complex agents capable of long-term planning, and are embedded within the real world. I think such AIs would have value even if their preferences look bizarre or meaningless to humans, as what matters to me is not the content of their preferences but rather the complexity and nature of their minds.
When deciding whether to attribute moral patienthood to something, my focus lies primarily on observable traits, cognitive sophistication, and most importantly, the presence of clear open-ended goal-directed behavior, rather than on speculative or less observable notions of AI welfare, about which I am more skeptical. As a rough approximation, my moral theory aligns fairly well with what is implicitly proposed by modern economists, who talk about revealed preferences and consumer welfare.
Like most preference utilitarians, I believe that value is ultimately subjective: loosely speaking, nothing has inherent value except insofar as it reflects a state of affairs that aligns with someone’s preferences. As a consequence, I am comfortable, at least in principle, with a wide variety of possible value systems and future outcomes. This means that I think a universe made of only paperclips could have value, but only if that’s what preference-having beings wanted the universe to be made out of.
To be clear, I also think existing people have value too, so this isn’t an argument for blind successionism. Also, it would be dishonest not to admit that I am also selfish to a significant degree (along with almost everyone else on Earth). What I have just described simply reflects my broad moral intuitions about what has value in our world from an impartial point of view, not a prescription that we should tile the universe with paperclips. Since humans and animals are currently the main preference-having beings in the world, at the moment I care most about fulfilling what they want the world to be like.
How confident are you about these views?
I’m relatively confident in these views, with the caveat that much of what I just expressed concerns morality, rather than epistemic beliefs about the world. I’m not a moral realist, so I am not quite sure how to parse my “confidence” in moral views.
From an antirealist perspective, at least on the ‘idealizing subjectivism’ form of antirealism, moral uncertainty can be understood as uncertainty about the result of an idealization process. Under this view, there exists some function that takes your current, naive values as input and produces idealized values as output—and your moral uncertainty is uncertainty about the output.
I agree that this sort of preference utilitarianism leads you to thinking that long run control by an AI which just wants paperclips could be some (substantial) amount good, but I think you’d still have strong preferences over different worlds.[1] The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities. (And if you agreed with this claim about large variation, then I don’t think you would focus on the fact that the paperclipper world is some small amount good as this wouldn’t be an important consideration—at least insofar as you don’t also expect that worlds where humans etc retain control are similarly a tiny amount good for similar reasons.)
The main reasons preference utilitarianism is more picky:
Preferences in the multiverse: Insofar as you put weight on the preferences of beings outside our lightcone (beings in the broader spatially infinte universe, Everett branches, the broader mathematical multiverse to the extent you put weight on this), then the preferences of these beings will sometimes care about what happens in our lightcone and this could easily dominate (as they are vastly more numerious and many might care about things independent of “distance”). In the world with the successful paperclipper, just as many preferences aren’t being fulfilled. You’d strongly prefer optimization to satisfy as many preferences as possible (weighted as you end up thinking is best).
Instrumentally constructed AIs with unsatisfied preferences: If future AIs don’t care at all about preference utilitarianism, they might instrumentally build other AIs who’s preferences aren’t fulfilled. As an extreme example, it might be that the best strategy for a paper clipper is to construct AIs which have very different preferences and are enslaved. Even if you don’t care about ensuring beings come into existence who’s preference are satisified, you might still be unhappy about creating huge numbers of beings who’s preferences aren’t satisfied. You could even end up in a world where (nearly) all currently existing AIs are instrumental and have preferences which are either unfulfilled or only partially fulfilled (a earlier AI initiated a system that perpetuates this, but this earlier AI no longer exists as it doesn’t care terminally about self-preservation and the system it built is more efficient than it).
AI inequality: It might be the case that the vast majority of AIs have there preferences unsatisfied despite some AIs succeeding at achieving their preference. E.g., suppose all AIs are replicators which want to spawn as many copies as possible. The vast majority of these replicator AI are operating at subsistence and so can’t replicate making their preferences totally unsatisfied. This could also happen as a result of any other preference that involves constructing minds that end up having preferences.
Weights over numbers of beings and how satisfied they are: It’s possible that in a paperclipper world, there are really a tiny number of intelligent beings because almost all self-replication and paperclip construction can be automated with very dumb/weak systems and you only occasionally need to consult something smarter than a honeybee. AIs could also vary in how much they are satisfied or how “big” their preferences are.
I think the only view which recovers indifference is something like “as long as stuff gets used and someone wanted this at some point, that’s just as good”. (This view also doesn’t actually care about stuff getting used, because there is someone existing who’d prefer the universe stays natural and/or you don’t mess with aliens.) I don’t think you buy this view?
To be clear, it’s not immediately obvious whether a preference utilitarian view like the one you’re talking about favors human control over AIs. It certainly favors control by that exact flavor of preference utilitarian view (so that you end up satisfying people across the (multi-/uni-)verse with the correct weighting). I’d guess it favors human control for broadly similar reasons to why I think more experience-focused utilitarian views also favor human control if that view is in a human.
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
(This comment is copied over from LW responding to a copy of Matthew’s comment there.)
On my best guess moral views, I think there is goodness in the paper clipper universe but this goodness (which isn’t from (acausal) trade) is very small relative to how good the universe can plausibly get. So, this just isn’t an important consideration but I certainly agree there is some value here.
I don’t think I agree with the strong version of the indifference view that you’re describing here. However, I probably do agree with a weaker version. In the weaker version that I largely agree with, our profound uncertainty about the long-term future means that, although different possible futures could indeed be extremely different in terms of their value, our limited ability to accurately predict or forecast outcomes so far ahead implies that, in practice, we shouldn’t overly emphasize these differences when making almost all ordinary decisions.
This doesn’t mean I think we should completely ignore the considerations you mentioned in your comment, but it does mean that I don’t tend to find those considerations particularly salient when deciding whether to work on certain types of AI research and development.
This reasoning is similar to why I try to be kind to people around me: while it’s theoretically possible that some galaxy-brained argument might exist showing that being extremely rude to people around me could ultimately lead to far better long-term outcomes that dramatically outweigh the short-term harm, in practice, it’s too difficult to reliably evaluate such abstract and distant possibilities. Therefore, I find it more practical to focus on immediate, clear, and direct considerations, like the straightforward fact that being kind is beneficial to the people I’m interacting with.
This puts me perhaps closest to the position you identified in the last paragraph:
Here’s an analogy that could help clarify my view: suppose we were talking about the risks of speeding up research into human genetic engineering or human cloning. In that case, I would still seriously consider speculative moral risks arising from the technology. For instance, I think it’s possible that genetically enhanced humans could coordinate to oppress or even eliminate natural unmodified humans, perhaps similar to the situation depicted in the movie GATTACA. Such scenarios could potentially have enormous long-term implications under my moral framework, even if it’s not immediately obvious what those implications might actually be.
However, even though these speculative risks are plausible and seem important to take into account, I’m hesitant to prioritize their (arguably very speculative) impacts above more practical and direct considerations when deciding whether to pursue such technologies. This is true even though it’s highly plausible that the long-run implications are, in some sense, more significant than the direct considerations that are easier to forecast.
Put more concretely, if someone argued that accelerating genetically engineering humans might negatively affect the long-term utilitarian moral value we derive from cosmic resources as a result of some indirect far-out consideration, I would likely find that argument far less compelling than if they informed me of more immediate, clear, and predictable effects of the research.
In general, I’m very cautious about relying heavily on indirect, abstract reasoning when deciding what actions we should take or what careers we should pursue. Instead, I prefer straightforward considerations that are harder to fool oneself about.
Gotcha, so if I understand correctly, you’re more so leaning on uncertainty for being mostly indifferent rather than on thinking you’d actually be indifferent if you understood exactly what would happen in the long run. This makes sense.
(I have a different perspective on decision making that has high stakes under uncertainty and I don’t personally feel sympathetic to this sort of cluelessness perspective as a heuristic in most cases or as a terminal moral view. See also the CLR work on cluelessness. Separately, my intuitions around cluelessness imply that, to the extent I put weight on this, when I’m clueless, I get more worried about unilateralists curse and downside which you don’t seem to put much weight on, though just rounding all kinda-uncertain long run effects to zero isn’t a crazy perspective.)
On the galaxy brained pont: I’m sympathetic to arguments against being too galaxy brained, so I see where you’re coming from there, but from my perspective, I was already responding to an argument which is one galaxy brain level deep.
I think the broader argument about AI takeover being bad from a longtermist perspective is not galaxy brained and the specialization of this argument to your flavor of preference utilitarianism also isn’t galaxy brained: you have some specific moral views (in this case about prefence utilitarianism) and all else equal you’d expect humans to share these moral views more than AIs that end up taking over despite their developers not wanting the AI to take over. So (all else equal) this makes AI takeover look bad, because if beings share your preferences, then more good stuff will happen.
Then you made a somewhat galaxy brained response to this about how you don’t actually care about shared preferences due to preference utilitarianism (because after all, you’re fine with any preferences right?). But, I don’t think this objection holds because there are a number of (somewhat galaxy brained) reasons why specifically optimizing for preference utilitarianism and related things may greatly outperform control by beings with arbitrary preferences.
From my perspective the argument looks sort of like:
Non galaxy brained argument for AI takeover being bad
Somewhat galaxy brained rebuttal by you about preference utilitarianism meaning you don’t actually care about this sort of preference similarity argument case for avoiding nonconsensual AI takeover
My somewhat galaxy brained response, but which is only galaxy brained substantially because it’s responding to a galaxy brained perspective abiut details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments. But, I think the preference utilitarian argument you’re giving is already quite galaxy brained and sensitive to details of the long run future.
As am I. At least when it comes to the important action-relevant question of whether to work on AI development, in the final analysis, I’d probably simplify my reasoning to something like, “Accelerating general-purpose technology seems good because it improves people’s lives.” This perspective roughly guides my moral views on not just AI, but also human genetic engineering, human cloning, and most other potentially transformative technologies.
I mention my views on preference utilitarianism mainly to explain why I don’t particularly value preserving humanity as a species beyond preserving the individual humans who are alive now. I’m not mentioning it to commit to any form of galaxy-brained argument that I think makes acceleration look great for the long-term. In practice, the key reason I support accelerating most technology, including AI, is simply the belief that doing so would be directly beneficial to people who exist or who will exist in the near-term.
And to be clear, we could separately discuss what effect this reasoning has on the more abstract question of whether AI takeover is bad or good in expectation, but here I’m focusing just on the most action-relevant point that seems salient to me, which is whether I should choose to work on AI development based on these considerations.