Thomas Kwa🔹

Karma: 3,964

AI safety researcher

Thomas Kwa🔹 2 Apr 2026 6:21 UTC
7 points
1 ∶ 0
in reply to: Larks’s comment on: An unexplained annual spike in false claims on the EA Forum
My guess is something like: Many organizations have quarterly caps on the number of false claims published. Their employees often want to make false claims, but towards the end of the quarter they’re at the cap, so they delay the post to the first day of the next quarter.
Okay, but why only April 1? Well, on Jan 1 everyone is on holiday, and on July 1 everyone is out enjoying the good weather. Oct 1 coincides with national holidays in populous countries like China and Nigeria, and in the US people are hung over from fiscal New Year’s Eve. So we only really see the effect on April 1.
I would strongly predict that a false claims spike also happens in places with bad weather on July 1. Unfortunately, most places are in the Northern Hemisphere where it’s warm, and Australia has good weather all year, so I think this is only testable when it snows in New Zealand.

Thomas Kwa🔹 2 Apr 2026 5:39 UTC
8 points
0 ∶ 0
on: Announcing Highly Engaged EAs!
I’d love to sign up, but due to adverse selection concerns I’d prefer to be matched with an EA picked uniformly at random (whether they signed up or not). Is this possible?

Thomas Kwa🔹 13 Feb 2026 19:24 UTC
2 points
0 ∶ 0
in reply to: Linch’s comment on: Linch’s Shortform
what prompt did you use?

Thomas Kwa🔹 31 Jan 2026 21:01 UTC
4 points
1 ∶ 0
in reply to: Ben Stevenson’s comment on: Unfalsifiable stories of doom
On a global scale I agree. My point is more that due to the salary standards in the industry, Eliezer isn’t necessarily out of line in drawing $600k, and it’s probably not much more than he could earn elsewhere; therefore the financial incentive is fairly weak compared to that of Mechanize or other AI capabilities companies.

Thomas Kwa🔹 30 Jan 2026 1:33 UTC
7 points
0 ∶ 0
on: Do your job unreasonably well
Being really good at your job is a good way to achieve impact in general, because your “impact above replacement” is what counts. If a replacement level employee who is barely worth hiring has productivity 100, and the average productivity is 150, the average employee will get 50 impact above replacement. If you do your job 1.67x better than average (250 productivity), you earn 150 impact above replacement, which is triple the average.

Thomas Kwa🔹 29 Jan 2026 19:41 UTC
37 points
13 ∶ 9
in reply to: Yarrow Bouchard 🔸’s comment on: Unfalsifiable stories of doom
I strongly disagree with a couple of claims:
MIRI’s business model relies on the opposite narrative. MIRI pays Eliezer Yudkowsky $600,000 a year. It pays Nate Soares $235,000 a year. If they suddenly said that the risk of human extinction from AGI or superintelligence is extremely low, in all likelihood that money would dry up and Yudkowsky and Soares would be out of a job.
[...] The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else.
- $235K is not very much money [edit: in the context of the AI industry]. I made close to Nate’s salary as basically an unproductive intern at MIRI. $600K is also not much money. A Preparedness researcher at OpenAI has a starting salary of $310K – $460K plus probably another $500K in equity. As for nonprofit salaries, METR’s salary range goes up to $450K just for a “senior” level RE/RS, and I think it’s reasonable for nonprofits to pay someone with 20 years of experience, who might be more like a principal RS, $600K or more.
  - In contrast, if Mechanize succeeds, Matthew Barnett will probably be a billionaire.
- If Yudkowsky said extinction risks were low and wanted to focus on some finer aspect of alignment, e.g. ensuring that AIs respect human rights a million years from now, donors who shared their worldview would probably keep donating. Indeed, this might increase donations to MIRI because it would be closer to mainstream beliefs.
- MIRI’s work seems very transferable to other risks from AI, which governments and companies both have an interest in preventing. Yudkowsky and Soares have a somewhat weird skillset and I disagree with some of their research style but it’s plausible to me they could still work productively in a mathy theoretical role in either capabilities or safety.
However, things I agree with
If the Mechanize co-founders wanted to focus on safety rather than capabilities, they could.
the Mechanize co-founders decided to start the company after forming their views on AI safety.
The Yudkowsky/Soares/MIRI argument about AI alignment is specifically that an AGI’s goals and motivations are highly likely to be completely alien from human goals and motivations in a way that’s highly existentially dangerous.

Thomas Kwa🔹 26 Jan 2026 21:49 UTC
4 points
1 ∶ 0
in reply to: ethai’s comment on: A sliding scale for donation percentage
Is there a formula for the pledge somewhere? I couldn’t find one.

Thomas Kwa🔹 23 Jan 2026 3:14 UTC
6 points
0 ∶ 0
in reply to: David Mathers🔸’s comment on: Releasing TakeOverBench.com: a benchmark, for AI takeover
See the gpt-5 report. “Working lower bound” is maybe too strong; maybe it’s more accurate to describe it as an initial guess at a warning threshold for rogue replication and 10x uplift (if we can even measure time horizons that long). I don’t know what the exact reasoning behind 40 hours was, but one fact is that humans can’t really start viable companies using plans that only take a ~week of work. IMO if AIs could do the equivalent with only a 40 human hour time horizon and continuously evade detection, they’d need to use their own advantages and have made up many current disadvantages relative to humans (like being bad at adversarial and multi-agent settings).

A sliding scale for donation percentage

Thomas Kwa🔹22 Jan 2026 23:00 UTC

101 points

11 comments2 min readEA link

Thomas Kwa🔹 22 Jan 2026 20:53 UTC
13 points
0 ∶ 0
on: Releasing TakeOverBench.com: a benchmark, for AI takeover
What scale is the METR benchmark on? I see a line that “Scores are normalized such that 100% represents a 50% success rate on tasks requiring 8 human-expert hours.”, but is the 0% point on the scale 0 hours?
METR does not think that 8 human hours is sufficient autonomy for takeover; in fact 40 hours is our working lower bound.

Thomas Kwa🔹 21 Jan 2026 22:59 UTC
2 points
0 ∶ 0
in reply to: David Goodman’s comment on: Why Isn’t EA at the Table When $121 Billion Gets Allocated to Biodiversity Every Year?
What if we decide that the Amazon rainforest has a negative WAW sign? Would you be in favor of completely replacing it with a parking lot, if doing so could be done without undue suffering of the animals that already exist there?
Definitely not completely replacing because biodiversity has diminishing returns to land. If we pave the whole Amazon we’ll probably extinct entire families (not to mention we probably cause ecological crises elsewhere and disrupt ecosystem services etc), whereas on the margin we’ll only extinct species endemic to the deforested regions.
If the research on WAW comes out super negative I could imagine it being OK to replace half the Amazon with higher-welfare ecosystems now, and work on replacing the rest when some crazy AI tech allows all changes to be fully reversible. But the moral parliament would probably still not be happy about this. Eg killing is probably bad, and there is no feasible way to destroy half the Amazon in the near term without killing most of the animals in it.

Thomas Kwa🔹 21 Jan 2026 0:30 UTC
6 points
0 ∶ 3
in reply to: David Mathers🔸’s comment on: Why Isn’t EA at the Table When $121 Billion Gets Allocated to Biodiversity Every Year?
It’s plausible to me that biodiversity is valuable, but with AGI on the horizon it seems a lot cheaper in expectation to do more out-there interventions, like influencing AI companies to care about biodiversity (alongside wild animal welfare), recording the DNA of undiscovered rainforest species about to go extinct, and buying the cheapest land possible (middle of Siberia or Australian desert, not productive farmland). Then when the technology is available in a few decades and we’re better at constructing stable ecosystems de novo, we can terraform the deserts into highly biodiverse nature preserves. Another advantage of this is that we’ll know more about animal welfare—as it stands now the sign of habitat preservation is pretty unclear.

Thomas Kwa🔹 5 Jan 2026 1:49 UTC
5 points
1 ∶ 0
in reply to: Sanjay’s comment on: You Can Just Buy Far-UVC
Nukit ships to many countries.

Thomas Kwa🔹 5 Jan 2026 1:13 UTC
4 points
0 ∶ 0
in reply to: Tristan Katz’s comment on: Why you should eat meat—even if you hate factory farming
Thanks for the reply.
- Everyone has different emotional reactions so I would be wary about generalizing here. Of the vegetarians I know, certainly not all are disgusted by meat. Disgust is often more correlated with whether they use a purity frame of morality or experience disgust in general than how much they empathize with animals [1]. Empathy is not an end; it’s not required for virtuous action, and many people have utilitarian, justice-centered, or other frames that can prescribe actions with empathy taking a lesser role. As for me, I feel that after experiencing heightened empathy for those 40 days in 2021 and occasionally since, I understand its psychological effects on me well enough to know I’m not making a grave moral error.
  - I would only feel averse to eating human meat if the human were murdered just for people to eat, and wouldn’t feel much disgust unless it still looked like human body parts, so maybe I’m an exception. But I’m not sure how this is relevant.
- Agree with the social signal purpose being different. I guess which one is better would depend on the social group. Around my friends who are either omnivores or vegan, I feel ok just signaling it’s bad to eat the worst treated animals. But if everyone else avoided chicken and seemed to think eating everything else were fine, I would give up something else for signaling purposes, and maybe at some point it’s better just to go vegan.
[1] Or just whether they grew up vegetarian, like how people are often disgusted by any strange food

Thomas Kwa🔹 17 Dec 2025 20:55 UTC
9 points
1 ∶ 0
on: Thomas Kwa’s Shortform
Didn’t realize my only post of the year was from April 1st. Longforms are just so scary to write other than on April Fool’s Day!

Thomas Kwa🔹 12 Dec 2025 22:22 UTC
2 points
0 ∶ 0
on: Is the AI Industry in a Bubble?
Are you interested in betting on these beliefs? I couldn’t find a bet with Vasco but it seems more likely we could find one, because it seems like you’re more confident

Thomas Kwa🔹 9 Dec 2025 22:58 UTC
8 points
1 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Yarrow Bouchard’s Quick takes
- You’re shooting the messenger. I’m not advocating for downvoting posts that smell of “the outgroup”, just saying that this happens in most communities that are centered around an ideological or even methodological framework. It’s a way you can be downvoted while still being correct, especially from the LEAST thoughtful 25% of EA forum voters
- Please read the quote from Claude more carefully. MacAskill is not an “anti-utilitarian” who thinks consequentialism is “fundamentally misguided”, he’s the moral uncertainty guy. The moral parliament usually recommends actions similar to consequentialism with side constraints in practice.
I probably won’t engage more with this conversation.

Thomas Kwa🔹 9 Dec 2025 6:05 UTC
6 points
0 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Yarrow Bouchard’s Quick takes
Claude thinks possible outgroups include the following, which is similar to what I had in mind
Based on the EA Forum’s general orientation, here are five individuals/groups whose characteristic opinions would likely face downvotes:
1. Effective accelerationists (e/acc) - Advocates for rapid AI development with minimal safety precautions, viewing existential risk concerns as overblown or counterproductive
2. TESCREAL critics (like Emile Torres, as you mentioned) - Scholars who frame longtermism/EA as ideologically dangerous, often linking it to eugenics, colonialism, or techno-utopianism
3. Anti-utilitarian philosophers—Strong deontologists or virtue ethicists who reject consequentialist frameworks as fundamentally misguided, particularly on issues like population ethics or AI risk trade-offs
4. Degrowth/anti-progress advocates—Those who argue economic/technological growth is net-negative and should be reduced, contrary to EA’s generally pro-progress orientation
5. Left-accelerationists and systemic change advocates—Critics who view EA as a “neoliberal” distraction from necessary revolutionary change, or who see philanthropic approaches as fundamentally illegitimate compared to state redistribution

Thomas Kwa🔹 9 Dec 2025 0:01 UTC
6 points
2 ∶ 0
on: Artificial Wombs as a High Impact Career Path: Help Me Forecast the Timeline
- My main concern is that the arrival of AGI completely changes the situation in some unexpected way.
  - e.g. in the recent 80k podcast on fertility, Rob Wiblin opines that the fertility crash would be a global priority if not for AI likely replacing human labor soon and obviating the need for countries to have large human populations. There could be other effects.
  - My guess is that due to advanced AI, both artificial wombs and immortality will be technically feasible in the next 40 years, as well as other crazy healthcare tech. This is not an uncommon view
- Before anything like a Delphi forecast it seems better to informally interview a couple of experts, and then write your own quick report on what the technical barriers are to artificial wombs. This way you can incorporate this into the structure of any forecasting exercise, e.g. by asking experts to forecast when each of hurdles X, Y, and Z will be solved, whereupon you can do things like identifying where the level of agreement is highest and lowest, as well as consistency checks against the overall forecast.
- Most infant mortality still happens in the developing world, due to much more basic factors like tropical diseases. So if the goal is reducing infant mortality globally, you won’t be addressing most of the problem, and for maternal mortality, the tech will need to be so mature that it’s affordable for the average person in low-income countries, as well as culturally accepted.

Thomas Kwa🔹 8 Dec 2025 19:00 UTC
6 points
0 ∶ 0
in reply to: Arepo’s comment on: Noah Birnbaum’s Quick takes
Yeah, while I think truth-seeking is a real thing I agree it’s often hard to judge in practice and vulnerable to being a weasel word.
Basically I have two concerns with deferring to experts. First is that when the world lacks people with true subject matter expertise, whoever has the most prestige—maybe not CEOs but certainly mainstream researchers on slightly related questions—will be seen as experts and we will need to worry about deferring to them.
Second, because EA topics are selected for being too weird/unpopular to attract mainstream attention/funding, I think a common pattern is that of the best interventions, some are already funded, some are recommended by mainstream experts and remain underfunded, and some are too weird for the mainstream. It’s not really possible to find the “too weird” kind without forming an inside view. We can start out deferring to experts, but by the time we’ve spent enough resources investigating the question that you’re at all confident in what to do, the deferral to experts is partially replaced with understanding the research yourself as well as the load-bearing assumptions and biases of the experts. The mainstream experts will always get some weight, but it diminishes as your views start to incorporate their models rather than their views (example that comes to mind is economists on whether AGI will create explosive growth, and how recently good economic models have been developed by EA sources, now including some economists that vary assumptions and justify differences from the mainstream economists’ assumptions).
Wish I could give more concrete examples but I’m a bit swamped at work right now.

Thomas Kwa🔹

A slid­ing scale for dona­tion percentage

A sliding scale for donation percentage