Thomas Kwa

Karma: 3,742

AI safety researcher

Thomas Kwa 6 Dec 2025 21:34 UTC
2 points
1 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Noah Birnbaum’s Quick takes
Not “everyone agrees” what “utilitarianism” means either and it remains a useful word. In context you can tell I mean someone whose attitude, methods and incentives allow them to avoid the biases I listed and others.

Thomas Kwa 5 Dec 2025 11:28 UTC
14 points
4 ∶ 3
in reply to: Noah Birnbaum’s comment on: Noah Birnbaum’s Quick takes
I think the “most topics” thing is ambiguous. There are some topics on which mainstream experts tend to be correct and some on which they’re wrong, and although expertise is valuable on topics experts think about, they might be wrong on most topics central to EA. [1] Do we really wish we deferred to the CEO of PETA on what animal welfare interventions are best? EAs built that field in the last 15 years far beyond what “experts” knew before.
In the real world, assuming we have more than five minutes to think about a question, we shouldn’t “defer” to experts or immediately “embrace contrarian views”, rather use their expertise and reject it when appropriate. Since this wasn’t an option in the poll, my guess is many respondents just wrote how much they like being contrarian, and EAs have to often be contrarian on topics they think about so it came out in favor of contrarianism.
[1] Experts can be wrong because they don’t think in probabilities, they have a lack of imagination, there are obvious political incentives to say one thing over another, and probably other reasons, and lots of the central EA questions don’t have actual well-developed scientific fields around them, so many of the “experts” aren’t people who have thought about similar questions in a truth-seeking way for many years

Thomas Kwa 5 Dec 2025 10:49 UTC
6 points
2 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Yarrow Bouchard’s Quick takes
I think this is a significant reason why people downvote some, but not all, things they disagree with. Especially a member of the outgroup who makes arguments EAs have refuted before and need to reexplain, not saying it’s actually you

Thomas Kwa 5 Dec 2025 7:38 UTC
2 points
0 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Yarrow Bouchard’s Quick takes
Can you explain what you mean by “contextualizing more”? (What a curiously recursive question...)
I mean it in this sense; making people think you’re not part of the outgroup and don’t have objectionable beliefs related to the ones you actually hold, in whatever way is sensible and honest.
Maybe LW is better at using disagreement button as I find it’s pretty common for unpopular opinions to get lots of upvotes and disagree votes. One could use the API to see if the correlations are different there.

Thomas Kwa 5 Dec 2025 7:21 UTC
5 points
1 ∶ 0
on: Why Does EA Focus on Veganism but Not Other Boycotts?
IMO the real answer is that veganism is not an essential part of EA philosophy, just happens to be correlated with it due to the large number of people in animal advocacy. Most EA vegans and non-vegans think that their diet is a small portion of their impact compared to their career, and it’s not even close! Every time you spend an extra $5 finding a restaurant with a vegan option you could help 5,000 shrimp instead. Vegans have other reasons like non-consequentialist ethics, virtue signaling or self-signaling, or just a desire not to eat the actual flesh/body fluids of tortured animals.
If you have a similar emotional reaction to other products it seems completely valid to boycott them, although as you mention there can be significant practical burdens, both in adjusting one’s lifestyle to avoid such products and in judging whether the claims of marginal impact are valid. Being vegan is not obligatory in my culture and neither should boycotts be—unless the marginal impact of the boycott is larger than any other life choice which is essentially never true.

Thomas Kwa 5 Dec 2025 2:42 UTC
3 points
1 ∶ 0
on: 3 Stages of Competition for the Long-Term Future
I really enjoyed reading this post; thanks for writing it. I think it’s important to take space colonization seriously and shift into “near mode” given that, as you say, the first entity to start a Dyson Swarm has a high chance to get DSA if it isn’t already decided by AGI, and it’s probably only 10-35 years away.

Thomas Kwa 4 Dec 2025 23:17 UTC
2 points
0 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Zach Stein-Perlman’s Quick takes
Regarding COIs, it’s probably bigger that Daniela is married to Holden, and while not strictly a COI, we don’t want the association with OP’s political advocacy. There are probably other things, I’m don’t work on strategy

Thomas Kwa 4 Dec 2025 22:27 UTC
4 points
1 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Yarrow Bouchard’s Quick takes
Assorted thoughts
- Rate limits should not apply to comments on your own quick takes
- Rate limits could maybe not count negative karma below −10 or so, it seems much better to rate limit someone only when they have multiple downvoted comments
- 2.4:1 is not a very high karma:submission ratio. I have 10:1 even if you exclude the april fool’s day posts, though that could be because I have more popular opinions, which means that I could double my comment rate and get −1 karma on the extras and still be at 3.5
- if I were Yarrow I would contextualize more or use more friendly phrasing or something, and also not be bothered too much by single downvotes
- From scanning the linked comments I think that downvoters often think the comment in question has bad reasoning and detracts from effective discussion, not just that they disagree
- Deliberately not opining on the echo chamber question

Thomas Kwa 4 Dec 2025 21:53 UTC
6 points
0 ∶ 0
in reply to: Yarrow Bouchard 🔸’s comment on: Zach Stein-Perlman’s Quick takes
My understanding is METR doesn’t take Good Ventures money to avoid the appearance of COIs. We could maybe avoid creating actual COIs but it is crucial to the business model to appear as trusted and neutral as possible.

Thomas Kwa 4 Dec 2025 21:43 UTC
8 points
2 ∶ 0
on: Thomas Kwa’s Shortform
When 80,000 Hours pivoted to AI, I largely stopped listening to the podcast, thinking that as part of the industry I would already know everything. But I recently found myself driving a lot and consuming more audio content, and the recent ones eg with Holden, Daniel K and ASB are incredibly high quality and contain highly nontrivial, grounded opinions. If they keep this up I will probably keep listening until the end times.

Thomas Kwa 15 Nov 2025 0:52 UTC
9 points
1 ∶ 1
on: Some hardworking dads in EA
What inspiring and practical examples!
Maybe a commitment to impact causes EA parents to cooperate at maximizing it, which means optimally distributing the parenting workload whatever society thinks. In EA with lots of conferences and hardworking impactful women, it makes sense that the man’s op cost is often lower. Elsewhere couples cooperate to maximize income, but men tend to have higher earning potential so maybe the woman would often do more childcare anyway.
My sense is that parenting falls on the woman due not only to gender norms, but also higher average interest in childcare and other confounders—so I wonder how much is caused by other effects like EAs leaning liberal, questioning social expectations in general, or EA dads somehow being more keen on parenting. Also it’s unclear if EA men even contribute more than non-EA men.
I’m reminded a bit of the gender equality paradox where in the USSR, and maybe also countries with restrictive gender roles [1] there are higher rates of women in STEM and other male-dominated fields. The idea is that in liberal societies, there would be a disparity due to difference in interest, and some kinds of external factor can reduce disparities on net—in the Soviet case because equality was enforced by the state, in other cases if there is economic interest or a lack of Western stereotypes. So EA mindset is maybe one of these external factors—not to imply it’s like Soviet central planning or anything.
[1] the research seems disputed here

Thomas Kwa 7 Oct 2025 6:02 UTC
9 points
3 ∶ 0
on: Utilitarians Should Accept that Some Suffering Cannot be “Offset”
There are a few mistakes/gaps in the quantitative claims:
Continuity: If A ≻ B ≻ C, there’s some probability p ∈ (0, 1) where a guaranteed state of the world B is ex ante morally equivalent to “lottery p·A + (1-p)·C” (i.e., p chance of state of the world A, and the rest of the probability mass of C)
This is not quite the same as either property 3 or property 3′ in the Wikipedia article, and it’s plausible but unclear to me that you can prove 3′ from it. Property 3 uses “p ∈ [0, 1]” and 3′ has an inequality; it seems like the argument still goes through with 3′ so I’d switch to that, but then you should also say why 3 is unintuitive to you because VNM only requires 3 OR 3′.
This arbitrariness diminishes somewhat (though, again, not entirely) when viewed through the asymptotic structure. Once we accept that compensation requirements grow without bound as suffering intensifies, some threshold becomes inevitable. The asymptote must diverge somewhere; debates about exactly where are secondary to recognizing the underlying pattern.
“Grow without bound” just means that for any M, we have f(X) > M for sufficiently large X. This is different from there being a vertical asymptote so a threshold is not inevitable. For instance one could have f(X) = X or f(X) = X^2.
To be clear, whether we call this behavior ‘continuous’ depends on mathematical context and convention. In standard calculus, a function that approaches infinity exhibits an infinite discontinuity. [...]
[1] In the extended reals with appropriate topology, such a function can be rigorously called left-continuous.
It would be confusing to call this behavior continuous, because (a) the VNM axiom you reject is called continuity and (b) we are not using any other properties of the extended reals, but we are using real-valued probabilities and x values.
Once you’ve accepted that some suffering might require a number of flourishing lives that you could not write down, compute, or physically instantiate to morally justify, at least in principle, the additional step to “infinite” is smaller in some important conceptual sense than it might seem prima facie.
This may seem like a nitpick, but “write down”, “compute”, and “physically instantiate” are wildly different ranges of numbers. The largest number one could “physically instantiate” is something like 10^50 minds, the most one could “write down” the digits of is something like 10^10^10.
Not all large numbers are the same here, because if one thinks the offset ratio for a cluster headache is in the 10^50 range, there are only 50 ‘levels’ of suffering each of which is 10x worse than the last. If it’s over 10^10^10, there are over 10 billion such ‘levels’, it would be impossible to rate cluster headaches on a logarithmic pain scale, and we would happily give everyone on Earth (say) a level 10,000,000,000 cluster headache to prevent one person from having a (slightly worse than average) level 10,000,000,010 cluster headache. Moving from 10^10^10 to infinity, we would then believe that suffering has a threshold t where t + epsilon intensity suffering cannot be offset by removing t—epsilon intensity suffering, and also need to propose some other mechanism like lexicographic order for how to deal with suffering above the infinite badness threshold.
So it’s already a huge step to reject numbers we can “physically instantiate” to ones we can barely “write down”, and another step from there to infinity; at both steps your treatment of comparisons between different suffering intensities changes significantly, even in thought experiments without an unphysically large number of beings.

Thomas Kwa 7 Oct 2025 4:21 UTC
20 points
6 ∶ 1
in reply to: bruce’s comment on: Utilitarians Should Accept that Some Suffering Cannot be “Offset”
Ok interesting! I’d be interested in seeing this mapped out a bit more, because it does sound weird to have BOS be offsettable with positive wellbeing, positive wellbeing to be not offsettable with NOS, but BOS and NOS are offsetable with each other? Or maybe this isn’t your claim and I’m misunderstanding
This is what kills the proposal IMO, and EJT also pointed this out. The key difference between this proposal and standard utilitarianism where anything is offsettable isn’t the claim that that NOS is worse than TREE(3) or even 10^100 happy lives, since this isn’t a physically plausible tradeoff we will face anyway. It’s that once you believe in NOS, transitivity compels you to believe it is worse than any amounts of BOS, even a variety of BOS that, according to your best instruments, only differs from NOS in the tenth decimal place. Then once you believe this, the fact that you use a utility function compels you to create arbitrary amounts of BOS to avoid a tiny probability of a tiny amount of NOS.

Thomas Kwa 26 Sep 2025 22:03 UTC
22 points
0 ∶ 0
in reply to: Tristan Katz’s comment on: Why you should eat meat—even if you hate factory farming
It is not necessary to be permanently vegan for this. I have only avoided chicken for about 4 years, and hit all of these benefits.
- Because evidence suggests that when we eat animals we are likely to view them as having lower cognitive capabilities or moral status (see here for a wikipedia blurb about it).
  - I have felt sufficient empathy for chicken for basically the whole time I haven’t eaten it. I also went vegan for (secular) Lent four years ago, and felt somewhat more empathy for other animals, but my sense is eating non-chicken animals didn’t meaningfully cloud my moral judgment enough to care about, given my job isn’t in animal welfare.
- As a social signal, to show to others that you object to this practice as a whole.
  - My family eats chicken all the time, so when I visit they change to beef or vegetarian, which serves the social signal purpose without making it difficult for us to eat together
  - I gave up squid and octopus this year, and on two instances this has come up and people have praised me for being virtuous
- You just find it easier to live according to simple ethical principles rather than calculating the expected utility in every situation.
  - I don’t need to think about expected utility in every situation; it’s not hard to just not eat chicken. 98% of restaurants have high-protein non-chicken options whereas less than half have high-protein vegan options.
  - Also it’s more convenient than being vegan because there are fewer products to worry about. A vegan will have to check whether a sandwich has mayonnaise, pasta has cheese, pastry has lard/eggs/butter.
I separately believe that social and political change are pretty small compared to EA animal welfare efforts. But beef and high-welfare-certified meat options cut down on suffering by >90% vs factory farmed chicken (or eggs, squid, and some fish) and also serve many of the signaling benefits. If you eat welfare-certified animal products only, it may even be higher for two reasons:
- You transmit a higher-fidelity message; it’s clear you want to reduce suffering whereas people are vegan for many reasons, like health and religion
- Talking about welfare certifications is interesting, so you’re more likely to start positive conversations, whereas vegans are perceived as, and sometimes are, insufferable.

Thomas Kwa 9 Sep 2025 17:37 UTC
4 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: I bet superforecaster David Manheim 2 k$ that the unemployment rate in the United States in 2027 will be lower than 8 %
I perceive it as +EV to me but I feel like I’m not the best buyer of short timelines. I would maybe do even odds on before 2045 for smaller amounts, which is still good for you if you think the yearly chance won’t increase much. Otherwise maybe you should seek a bet with someone like Eli Lifland. The reason I’m not inclined to make large bets is that the markets would probably give better odds for something that unlikely, eg options that pay out with very high real interest rates; whereas a few hundred dollars is enough to generate good EA forum discussion.

Thomas Kwa 3 Sep 2025 8:20 UTC
10 points
1 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: I bet superforecaster David Manheim 2 k$ that the unemployment rate in the United States in 2027 will be lower than 8 %
No bet. I don’t have a strong view on short timelines or unemployment. We may find a bet about something else; here are some beliefs
- my or Linch’s position vs yours on probability of extinction from nuclear war (I’d put $2 against your $98 that you ever update upwards by 50:1 on extinction by nuclear war by 2050, but no more for obvious reasons)
- >25% that global energy consumption will increase by 25% year over year some year before 2035 (30% is the AI expert median, superforecaster median is <1%), maybe more
- probably >50% that a benchmark by Mechanize meant to measure economic value, if converted to time horizon, will double twice in the first 16 months (I’m not aware of one existing yet)
- probably >50% that AIs will outperform humans at forecasting geopolitical events by 2035, as long as the humans can’t read AI analyses, though this seems hard to operationalize

Thomas Kwa 3 Sep 2025 7:17 UTC
8 points
1 ∶ 0
in reply to: Dan_Keys’s comment on: I bet superforecaster David Manheim 2 k$ that the unemployment rate in the United States in 2027 will be lower than 8 %
Agree. Given that Vasco is willing to give 2:1 odds for 2029 below, this bet should have been 3:1 or better for David. It would have been a better signal of the midpoint odds to the community.

Thomas Kwa 1 Sep 2025 23:00 UTC
6 points
0 ∶ 0
in reply to: PSR’s comment on: Open thread: July—September 2025
A footnote says the 0.15% number isn’t an actual forecast: “Participants were asked to indicate their intuitive impression of this risk, rather than develop a detailed forecast”. But superforecasters’ other forecasts are roughly consistent with 0.15% for extinction, so it still bears explaining.
In general I think superforecasters tend to anchor on historical trends, while AI safety people anchor on what’s physically possible or conceivable. Superforecasters get good accuracy compared to domain experts on most questions because domain experts in many fields don’t know how to use reference classes and historical trends well. But it’s done poorly recently because progress has accelerated—even in 2022 superforecasters’ median for the AI IMO gold medal was 2035, whereas it actually happened in 2025. Choosing a reference class for extinction is very difficult so people just rely on vibes.
Let’s take the question of whether world energy consumption will double year-over-year before 2045. In the full writeup, superforecasters, whose median is 0.35%, emphasized the huge difficulty in constructing terrestrial facilities to use that much energy:
Superforecasters generally expressed skepticism about a massive increase (doubling) in global energy consumption, due to this having low base rates and requiring unlikely technical breakthroughs.
- Many rationales expressed skepticism that the rate of energy production could be scaled up so quickly even with advanced AI.
- The breakthroughs thought to be needed are in energy production and distribution techniques.
- A few superforecasters said that they thought fusion was the only plausible path, but even then other physical infrastructure might be limiting.
In contrast, I wrote about how doubling energy production in a year starting from self replicating robots in space just requires us to be more than ~0.1% efficient in refining asteroid raw material into solar panels and robots, and that it’s likely we get there eventually. I’m closer to 50% on this question.
Dyson swarms can have energy doubling times of *days*. The energy payback time of current solar panels on Earth is 1-2 years, in space there’s 8x more light than on Earth, and we’re >3 OOMs away from the minimum energy required to make solar panels (reducing SiO2 to Si).
I think to *not* get an energy doubling in one year by the time we exhaust the solar system’s energy, it would require a big slowdown (eg due to regulation or low energy demand) through about 15 OOMs of energy use, spanning from the first decently efficient self-replicating robots through Dyson swarms until we disassemble the gas planets for fusion fuel. Such a period would necessarily take decades or centuries to always be doubling slower than 1 year, which is basically an eternity when we have ASI.
The other factor is that AI safety people sometimes have a more inclusive definition of p(doom), that includes not just extinction but AIs seizing control of the world and colonizing the galaxy while leaving humans powerless.

Thomas Kwa 1 Sep 2025 21:03 UTC
14 points
11 ∶ 0
on: I bet superforecaster David Manheim 2 k$ that the unemployment rate in the United States in 2027 will be lower than 8 %
I think I would take your side here. Unemployment above 8% requires replacing so many jobs that humans can’t find new ones elsewhere even during the economic upswing created by AGI, and there is less than 2 years until the middle of 2027. This is not enough time for robotics (on current trends robotics time horizons will be under 1 hour) and AI companies can afford to keep hiring humans even if they wouldn’t generate enough value most places, so the question is whether we see extremely high unemployment in remotable sectors that automate away existing jobs but don’t have huge labor productivity gains from AI. 2029 could be a different story.
Would like to see David’s perspective here, whether he just has short timelines or has some economic argument too.

Thomas Kwa 30 Aug 2025 6:47 UTC
7 points
1 ∶ 0
in reply to: Anonymous238’s comment on: Why Not Import Humane Meat from Mexico?
Spreading around the term “humane meat” may get it into some people’s heads that this practice can be humane, which could in turn increase consumption overall, and effectively cancel out whatever benefits you’re speculating about.
I don’t know what the correct definition of “humane” is, but I strongly disagree with this claim in the second half. The question is whether higher-welfare imports reduce total suffering once we account for demand effects. So we should care about improving conditions from “torture camps” → “prisons” → “decent”. Torture camps are many times worse than merely “inhumane”!
The average consumer (who eats a ton of super high suffering chicken, doesn’t know that most chicken is torture chicken, and doesn’t believe labels anyway) wouldn’t eat much more chicken overall when the expensive chicken with the non-fraudulent “humane” label lowers in price. Nor would enough vegetarians start eating chicken because they’re only 5% of the US population and many of those are motivated by religion or health.
More likely, there will need to be a huge effort to get consumers to understand that they should spend anything on lower-suffering chicken, then another to get grocers to not mark up the price anyway, after which implementing this policy could replace 260 million torture camp chicken lives with maybe 300 million slightly uncomfortable chicken lives. (With a net increase mostly due to competition lowering the price of higher-suffering chicken.)
One can object to actually implementing this policy on deontological or practical grounds, but on consequences, high-suffering chicken is many times worse than “inhumane” pasture-raised chicken, so the demand increase would not even be close to canceling out the benefits unless you have a moral view under which everything inhumane is equally bad. I wish we were in a world where we could demand that food be 100% humane, but ignoring the principle of triage is why EA animal advocates, not purity-focused ones, have prevented billions of years of torture.