AI safety researcher
Thomas Kwaš¹
Iād love to sign up, but due to adverse selection concerns Iād prefer to be matched with an EA picked uniformly at random (whether they signed up or not). Is this possible?
what prompt did you use?
On a global scale I agree. My point is more that due to the salary standards in the industry, Eliezer isnāt necessarily out of line in drawing $600k, and itās probably not much more than he could earn elsewhere; therefore the financial incentive is fairly weak compared to that of Mechanize or other AI capabilities companies.
Being really good at your job is a good way to achieve impact in general, because your āimpact above replacementā is what counts. If a replacement level employee who is barely worth hiring has productivity 100, and the average productivity is 150, the average employee will get 50 impact above replacement. If you do your job 1.67x better than average (250 productivity), you earn 150 impact above replacement, which is triple the average.
I strongly disagree with a couple of claims:
MIRIās business model relies on the opposite narrative. MIRI pays Eliezer Yudkowsky $600,000 a year. It pays Nate Soares $235,000 a year. If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
[...] The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isnāt really transferable to anything else.
$235K is not very much money [edit: in the context of the AI industry]. I made close to Nateās salary as basically an unproductive intern at MIRI. $600K is also not much money. A Preparedness researcher at OpenAI has a starting salary of $310K ā $460K plus probably another $500K in equity. As for nonprofit salaries, METRās salary range goes up to $450K just for a āseniorā level RE/āRS, and I think itās reasonable for nonprofits to pay someone with 20 years of experience, who might be more like a principal RS, $600K or more.
In contrast, if Mechanize succeeds, Matthew Barnett will probably be a billionaire.
If Yudkowsky said extinction risks were low and wanted to focus on some finer aspect of alignment, e.g. ensuring that AIs respect human rights a million years from now, donors who shared their worldview would probably keep donating. Indeed, this might increase donations to MIRI because it would be closer to mainstream beliefs.
MIRIās work seems very transferable to other risks from AI, which governments and companies both have an interest in preventing. Yudkowsky and Soares have a somewhat weird skillset and I disagree with some of their research style but itās plausible to me they could still work productively in a mathy theoretical role in either capabilities or safety.
However, things I agree with
If the Mechanize co-founders wanted to focus on safety rather than capabilities, they could.
the Mechanize co-founders decided to start the company after forming their views on AI safety.
The Yudkowsky/āSoares/āMIRI argument about AI alignment is specifically that an AGIās goals and motivations are highly likely to be completely alien from human goals and motivations in a way thatās highly existentially dangerous.
Is there a formula for the pledge somewhere? I couldnāt find one.
See the gpt-5 report. āWorking lower boundā is maybe too strong; maybe itās more accurate to describe it as an initial guess at a warning threshold for rogue replication and 10x uplift (if we can even measure time horizons that long). I donāt know what the exact reasoning behind 40 hours was, but one fact is that humans canāt really start viable companies using plans that only take a ~week of work. IMO if AIs could do the equivalent with only a 40 human hour time horizon and continuously evade detection, theyād need to use their own advantages and have made up many current disadvantages relative to humans (like being bad at adversarial and multi-agent settings).
A slidĀing scale for donaĀtion percentage
What scale is the METR benchmark on? I see a line that āScores are normalized such that 100% represents a 50% success rate on tasks requiring 8 human-expert hours.ā, but is the 0% point on the scale 0 hours?
METR does not think that 8 human hours is sufficient autonomy for takeover; in fact 40 hours is our working lower bound.
What if we decide that the Amazon rainforest has a negative WAW sign? Would you be in favor of completely replacing it with a parking lot, if doing so could be done without undue suffering of the animals that already exist there?
Definitely not completely replacing because biodiversity has diminishing returns to land. If we pave the whole Amazon weāll probably extinct entire families (not to mention we probably cause ecological crises elsewhere and disrupt ecosystem services etc), whereas on the margin weāll only extinct species endemic to the deforested regions.
If the research on WAW comes out super negative I could imagine it being OK to replace half the Amazon with higher-welfare ecosystems now, and work on replacing the rest when some crazy AI tech allows all changes to be fully reversible. But the moral parliament would probably still not be happy about this. Eg killing is probably bad, and there is no feasible way to destroy half the Amazon in the near term without killing most of the animals in it.
Itās plausible to me that biodiversity is valuable, but with AGI on the horizon it seems a lot cheaper in expectation to do more out-there interventions, like influencing AI companies to care about biodiversity (alongside wild animal welfare), recording the DNA of undiscovered rainforest species about to go extinct, and buying the cheapest land possible (middle of Siberia or Australian desert, not productive farmland). Then when the technology is available in a few decades and weāre better at constructing stable ecosystems de novo, we can terraform the deserts into highly biodiverse nature preserves. Another advantage of this is that weāll know more about animal welfareāas it stands now the sign of habitat preservation is pretty unclear.
Nukit ships to many countries.
Thanks for the reply.
Everyone has different emotional reactions so I would be wary about generalizing here. Of the vegetarians I know, certainly not all are disgusted by meat. Disgust is often more correlated with whether they use a purity frame of morality or experience disgust in general than how much they empathize with animals [1]. Empathy is not an end; itās not required for virtuous action, and many people have utilitarian, justice-centered, or other frames that can prescribe actions with empathy taking a lesser role. As for me, I feel that after experiencing heightened empathy for those 40 days in 2021 and occasionally since, I understand its psychological effects on me well enough to know Iām not making a grave moral error.
I would only feel averse to eating human meat if the human were murdered just for people to eat, and wouldnāt feel much disgust unless it still looked like human body parts, so maybe Iām an exception. But Iām not sure how this is relevant.
Agree with the social signal purpose being different. I guess which one is better would depend on the social group. Around my friends who are either omnivores or vegan, I feel ok just signaling itās bad to eat the worst treated animals. But if everyone else avoided chicken and seemed to think eating everything else were fine, I would give up something else for signaling purposes, and maybe at some point itās better just to go vegan.
[1] Or just whether they grew up vegetarian, like how people are often disgusted by any strange food
Didnāt realize my only post of the year was from April 1st. Longforms are just so scary to write other than on April Foolās Day!
Are you interested in betting on these beliefs? I couldnāt find a bet with Vasco but it seems more likely we could find one, because it seems like youāre more confident
Youāre shooting the messenger. Iām not advocating for downvoting posts that smell of āthe outgroupā, just saying that this happens in most communities that are centered around an ideological or even methodological framework. Itās a way you can be downvoted while still being correct, especially from the LEAST thoughtful 25% of EA forum voters
Please read the quote from Claude more carefully. MacAskill is not an āanti-utilitarianā who thinks consequentialism is āfundamentally misguidedā, heās the moral uncertainty guy. The moral parliament usually recommends actions similar to consequentialism with side constraints in practice.
I probably wonāt engage more with this conversation.
Claude thinks possible outgroups include the following, which is similar to what I had in mind
Based on the EA Forumās general orientation, here are five individuals/āgroups whose characteristic opinions would likely face downvotes:
Effective accelerationists (e/āacc) - Advocates for rapid AI development with minimal safety precautions, viewing existential risk concerns as overblown or counterproductive
TESCREAL critics (like Emile Torres, as you mentioned) - Scholars who frame longtermism/āEA as ideologically dangerous, often linking it to eugenics, colonialism, or techno-utopianism
Anti-utilitarian philosophersāStrong deontologists or virtue ethicists who reject consequentialist frameworks as fundamentally misguided, particularly on issues like population ethics or AI risk trade-offs
Degrowth/āanti-progress advocatesāThose who argue economic/ātechnological growth is net-negative and should be reduced, contrary to EAās generally pro-progress orientation
Left-accelerationists and systemic change advocatesāCritics who view EA as a āneoliberalā distraction from necessary revolutionary change, or who see philanthropic approaches as fundamentally illegitimate compared to state redistribution
My main concern is that the arrival of AGI completely changes the situation in some unexpected way.
e.g. in the recent 80k podcast on fertility, Rob Wiblin opines that the fertility crash would be a global priority if not for AI likely replacing human labor soon and obviating the need for countries to have large human populations. There could be other effects.
My guess is that due to advanced AI, both artificial wombs and immortality will be technically feasible in the next 40 years, as well as other crazy healthcare tech. This is not an uncommon view
Before anything like a Delphi forecast it seems better to informally interview a couple of experts, and then write your own quick report on what the technical barriers are to artificial wombs. This way you can incorporate this into the structure of any forecasting exercise, e.g. by asking experts to forecast when each of hurdles X, Y, and Z will be solved, whereupon you can do things like identifying where the level of agreement is highest and lowest, as well as consistency checks against the overall forecast.
Most infant mortality still happens in the developing world, due to much more basic factors like tropical diseases. So if the goal is reducing infant mortality globally, you wonāt be addressing most of the problem, and for maternal mortality, the tech will need to be so mature that itās affordable for the average person in low-income countries, as well as culturally accepted.
Yeah, while I think truth-seeking is a real thing I agree itās often hard to judge in practice and vulnerable to being a weasel word.
Basically I have two concerns with deferring to experts. First is that when the world lacks people with true subject matter expertise, whoever has the most prestigeāmaybe not CEOs but certainly mainstream researchers on slightly related questionsāwill be seen as experts and we will need to worry about deferring to them.
Second, because EA topics are selected for being too weird/āunpopular to attract mainstream attention/āfunding, I think a common pattern is that of the best interventions, some are already funded, some are recommended by mainstream experts and remain underfunded, and some are too weird for the mainstream. Itās not really possible to find the ātoo weirdā kind without forming an inside view. We can start out deferring to experts, but by the time weāve spent enough resources investigating the question that youāre at all confident in what to do, the deferral to experts is partially replaced with understanding the research yourself as well as the load-bearing assumptions and biases of the experts. The mainstream experts will always get some weight, but it diminishes as your views start to incorporate their models rather than their views (example that comes to mind is economists on whether AGI will create explosive growth, and how recently good economic models have been developed by EA sources, now including some economists that vary assumptions and justify differences from the mainstream economistsā assumptions).
Wish I could give more concrete examples but Iām a bit swamped at work right now.
My guess is something like: Many organizations have quarterly caps on the number of false claims published. Their employees often want to make false claims, but towards the end of the quarter theyāre at the cap, so they delay the post to the first day of the next quarter.
Okay, but why only April 1? Well, on Jan 1 everyone is on holiday, and on July 1 everyone is out enjoying the good weather. Oct 1 coincides with national holidays in populous countries like China and Nigeria, and in the US people are hung over from fiscal New Yearās Eve. So we only really see the effect on April 1.
I would strongly predict that a false claims spike also happens in places with bad weather on July 1. Unfortunately, most places are in the Northern Hemisphere where itās warm, and Australia has good weather all year, so I think this is only testable when it snows in New Zealand.