Jan_Kulveit

Karma: 5,010

Studying behaviour and interactions of boundedly rational agents, AI alignment and complex systems.

Research fellow at Future of Humanity Institute, Oxford. Other projects: European Summer Program on Rationality. Human-aligned AI Summer School. Epistea Lab.

Jan_Kulveit 29 Mar 2026 17:54 UTC
80 points
12 ∶ 0
on: Survey of AI safety leaders on x-risk, AGI timelines, and resource allocation (Feb 2026)
Just wanted to flag the group is heavily selected for belief alignment with something like “EA/Constellation/Trajan House” views, and “AI enabled human takeovers” was promoted as agenda to prioritize in multiple widely read memos by high statues people in the community (which the organisers prioritized in the reading list).

I dislike the “echo chambre” effect where the steps are:
- invite people partially based on alignment with the idea cluster
- tell them to read memos advocating something written by some of the most central people in the cluster
- poll attendees
- results are framed as “leaders and key thinkers in the x-risk and AI safety communities agree”

It is in some sense useful, but in my view the cluster of people invited represents maybe ~30% of thinking about x-risk and AI safety, and its mostly an amplification of existing voices.

”The slight lean against misaligned AI takeover resources is perhaps the most surprising result for this audience, and merits closer examination.”

This is unsurprising given the marginal and somewhat confusing nature of the question. My wild guess is—
some attendees voted for everything; it is unclear what does it mean on the margin, probably to grow everything, and prioritize more neglected topics?
- some attendees understood the marginal question as “assuming fixed pie, how to change the allocation”—with this understanding you need to assign something negative weight for consistency

Jan_Kulveit 4 Feb 2026 14:28 UTC
−6 points
1 ∶ 3
in reply to: Yarrow Bouchard 🔸’s comment on: Unfalsifiable stories of doom
I think it could be a helpful response for people who are able to respond to signals of the type “someone who has demonstrably good forecasting skills, is an expert in the field, and works on this long time claims X” by at least re-evaluating if their models make sense and are not missing some important considerations.

If someone is at least able to that, they can for example ask a friendly AI or some other friendly AI and they will tell you, based on conservative estimates and reference classes, that the original claim is likely wrong. They will still miss important considerations—in a way in which typical forecaster would also do—so the results are underestimates.

I think at the level of [some combination of lack of ability to think and motivated reasoning] when people are uninterested in e.g. sanity checking their thinking with AIs, it is not worth the time correcting them. People are wrong on the internet all the time.

(I think the debate was moderately useful—I made an update from this debate & voting patterns, broadly in the direction EA Forum descending to a level of random place on the internet where confused people talk about AI and it is broadly not worth to read or engage. I’m no longer that much active on EAF, but I’ve made some update)

Jan_Kulveit 2 Feb 2026 20:41 UTC
4 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: Unfalsifiable stories of doom
I was considering hypothetical scenarios of the type “imagine this offer from MIRI arrived, would a lab accept” ; clearly MIRI is not making the offer because the labs don’t have good alignment plans and they are obviously high integrity enough to not be corrupted by relatively tiny incentives like $3b

I would guess there are ways to operationalise the hypothethicals, and try to have, for example, Dan Hendrycks guess what would xAI do, him being an advisor.

With your bets about timelines—I did 8:1 bet with Daniel Kokotajlo against AI 2027 being as accurate as his previous forecast, so not sure which side of the “confident about short timelines” do you expect I should take. I’m happy to bet on some operationalization of your overall thinking and posting about the topic of AGI being bad, e.g. something like “3 smartest available AIs in 2035 compare all what we wrote in 2026 on EAF, LW and Twitter about AI and judge who was more confused, overconfident and miscalibrated”.

Jan_Kulveit 2 Feb 2026 10:58 UTC
4 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: Unfalsifiable stories of doom
The operationalisation you propose does not make any sense, Yudkowsky and Soares do not claim ChatGPT 5.2 will kill everyone or anything like that.

What about this:

MIRI approaches [a lab] with this offer: we have made some breakthrough in ability to verify if the way you are training AIs leads to misalignment in the way we are worried about. Unfortunately the way to verify requires a lot of computations (ie something like ARC), so it is expensive. We expect your whole training setup will pass this, but we will need $3B from you to run this; if our test will work, we will declare that your lab solved the technical part of AI alignment we were most worried about & some arguments which we expect to convince many people who listen to our views.

Or this: MIRI discusses stuff with xAI or Meta and convinces themselves their—secret—plan is by far the best chance humanity has, and everyone ML/AI smart and conscious should stop whatever they are doing and join them.

(Obviously these are also unrealistic / assume something like some lab coming with some plan which could even hypotehically work)

Jan_Kulveit 29 Jan 2026 17:16 UTC
−7 points
5 ∶ 14
in reply to: Yarrow Bouchard 🔸’s comment on: Unfalsifiable stories of doom
The financial basis for motivated reasoning is arguably even stronger in MIRI’s case than in Mechanize’s case. The kind of work MIRI is doing and the kind of experience Yudkowsky and Soares have isn’t really transferable to anything else.
It is somewhat difficult to react to this level of absolutely incredible nonsense politely, but I’ll try.

I disagree with both Yudkowsky and Soares about many things, but very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.

For the companies racing to AGI, Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.

Jan_Kulveit 24 Oct 2025 16:13 UTC
3 points
0 ∶ 0
in reply to: hrosspet’s comment on: ACS is hiring: why work here and why not
Travel: mostly planned (conferences, some research retreats).

We expect closely coordinated team work on the LLM psychology direction, with a bit looser connections to the gradual disempowerment / macrostrategy work. Broadly ACS is small enough that anyone is welcome to participate in anything they are interested in, and generally everyone has idea what others work on.

ACS is hiring: why work here and why not

Jan_Kulveit23 Oct 2025 9:38 UTC

39 points

4 comments2 min readEA link

Jan_Kulveit 17 Oct 2025 9:46 UTC
15 points
3 ∶ 1
in reply to: calebp’s comment on: calebp’s Shortform
My impression is EAGx Prague 22 managed to balance 1:1s with other content simply by not offering SwapCard 1:1s slots part of the time, having a lot of spaces for small group conversations, and suggesting to attendees they should aim for something like balanced diet. (Turning off SwapCard slots does not prevent people from scheduling 1:1, just adds a little friction; empirically it seems enough to prevent the mode where people just fill their time by 1:1s).

As far as I understand this will most likely not happen, because weight given to / goodharting on metrics like people reporting 1:1s is the most valuable use of time, metrics tracking “connections formed” and weird psychological effect of 1:1 fests. (People feel stimulated, connected, energized,… Part of the effect is superficial). Also the counterfactual value lost from lack of conversational energy at scales ~3 to 12ppl is not visible and likely not tracked in feedback (I think this has predictable effects on what types of collaborations do start and which do not, and the effect is on the margin bad.) The whole is downstream of problems like Don’t Over-Optimize Things / We can do better than argmax.

Btw I think you are too apologetic / self-deprecating (“inexperienced event organisers complaining about features of the conference”). I have decent experience running events and all what you wrote is spot on.

Do Not Tile the Lightcone with Your Confused Ontology

Jan_Kulveit13 Jun 2025 12:45 UTC

45 points

4 comments5 min readEA link

(boundedlyrational.substack.com)

Apply now to Human-aligned AI Summer School 2025

Pivocajs6 Jun 2025 19:34 UTC

8 points

1 comment2 min readEA link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit30 Jan 2025 17:07 UTC

45 points

4 comments2 min readEA link

(gradual-disempowerment.ai)

“Charity” as a conflationary alliance term

Jan_Kulveit13 Dec 2024 7:53 UTC

76 points

4 comments5 min readEA link

(www.lesswrong.com)

Jan_Kulveit 13 Dec 2024 5:49 UTC
4 points
2 ∶ 0
in reply to: Sarah Cheng 🔸’s comment on: Jan_Kulveit’s Quick takes
Thanks for explanation. My guess is this decision should not be delegated to LLMs but mostly to authors (possibly with some emphasis on correct classification in the UI).

I think the “the post concerns an ongoing conversation, scandal or discourse that would not be relevant to someone who doesn’t care about the EA community” should not be interpreted extensively, otherwise it can easily mean “any controversy or criticism”. I will repost it without the links to current discussions—these are non-central, similar points are raised repeatedly over the years and it is easy to find dozens of texts making them.

Jan_Kulveit 12 Dec 2024 22:22 UTC
29 points
6 ∶ 1
on: Jan_Kulveit’s Quick takes
I wrote a post on “Charity” as a conflationary alliance term. You can read it on LessWrong, but I’m also happy to discuss it here.

If wondering why not post it here: Originally posted it here with a LW cross-post. It was immediately slapped with the “Community” tag, despite not being about community, but about different ways people try to do good, talk about charity & ensuing confusions. It is about the space of ideas, not about the actual people or orgs.

With posts like OP announcements about details of EA group funding or EAG admissions bar not being marked as community, I find it increasingly hard to believe the “Community” tag is driven by the stated principe marking “Posts about the EA community and projects that focus on the EA community” and not by other motives, like e.g. forum mods expressing the view “we want people to think less about this / this may be controversial / we prefer someone new to not read this”.

My impression this moves substantial debates about ideas to the side, which is a state I don’t want to cooperate on by just leaving it as it is → moved the post on LessWrong and replaced by this comment.

Jan_Kulveit’s Quick takes

Jan_Kulveit12 Dec 2024 22:22 UTC

7 points

9 comments EA link

Jan_Kulveit 20 Nov 2024 7:52 UTC
59 points
16 ∶ 2
in reply to: Ben_West🔸’s comment on: Ben_West’s Shortform
Seems plausible the impact of that single individual act is so negative that aggregate impact of EA is negative.

I think people should reflect seriously upon this possibility and not fall prey to wishful thinking (let’s hope speeding up the AI race and making it superpower powered is the best intervention! it’s better if everyone warning about this was wrong and Leopold is right!).

The broader story here is that EA prioritization methodology is really good for finding highly leveraged spots in the world, but there isn’t a good methodology for figuring out what to do in such places, and there also isn’t a robust pipeline for promoting virtues and virtuous actors to such places.

Jan_Kulveit 28 Jun 2024 7:17 UTC
24 points
6 ∶ 4
in reply to: David Mathers🔸’s comment on: Distancing EA from rationality is foolish
I don’t think so. I think in practice

I. - Some people don’t like the big R community very much.

AND

2a. - Some people don’t think improving the EA community small-r rationality/epistemics should be one of top ~3-5 EA priorities.
OR
2b. - Some people do agree this is important, but don’t clearly see the extent to which the EA community imported healthy epistemic vigilance and norms from Rationalist or Rationality-adjacent circles

=>

- As a consequence, they are at risk of distancing from small r rationality as a collateral damage / by neglect

Also I think many people in the EA community don’t think it’s important to try hard at being small-r rational at the level of aliefs. No matter what is the actual situation revealed by actual decisions, I would expect the EA community to at least pay lip service to epistemics and reason, so I don’t think stated preferences are strong evidence.

“Being against small-r rationality is like being against kindness or virtue; no one thinks of themselves as taking that stand.”
Yes I do agree almost no one thinks about themselves that way. I think it is maybe somewhat similar to “Being against effective charity”—I would be surprised if people though about themselves that way.

Jan_Kulveit 26 Jun 2024 13:15 UTC
18 points
8 ∶ 0
in reply to: Pancakes4all’s comment on: Distancing EA from rationality is foolish
Reducing rationality to “understand most of Kahneman and Tversky’s work” and cognitive psychology would be extremely narrow and miss most of the topic.

To quickly get some independent perspective, I recommend reading the Overview of the handbook part of “The Handbook of Rationality” (2021, MIT Press, open access). For an extremely crude calibration: the Handbook has 65 chapters. I’m happy to argue at least half of them cover topics relevant to the EA project. About ~3 are directly about Kahneman and Tversky’s work. So, by this proxy, you would miss about 90% of whats relevant.

Jan_Kulveit 26 Jun 2024 10:33 UTC
9 points
3 ∶ 0
in reply to: Radical Empath Ismam’s comment on: Distancing EA from rationality is foolish
Sorry for sarcasm, but what about returning to the same level of non-involvement and non-interaction between EA and Rationality as you describe was happening in Sydney? I.e. EA events are just co-hosted with LW Rationality and Transhumanism, and the level of Rationality idea non-influence is kept on par with Transhumanism?

Jan_Kulveit 26 Jun 2024 9:35 UTC
36 points
9 ∶ 0
in reply to: Vaidehi Agarwalla 🔸’s comment on: Distancing EA from rationality is foolish
It would be indeed very strange if people made the distinction, thought about the problem carefully, and advocated for distancing from ‘small r’ rationality in particular.

I would expect real cases to look like
- someone is deciding about an EAGx conference program; a talk on prediction markets sounds subtly Rationality-coded, and is not put on schedule
- someone applies to OP for funding to create rationality training website; this is not funded because making the distinction between Rationality and rationality would require too much nuance
- someone is deciding about what intro level materials to link to; some links to LessWrong are not included

The crux is really what’s at the end of my text—if people do steps like above, and nothing else, they are distancing also from the ‘small r’ thing.

Obviously part of the problem for the separation plan is Rationality and Rationality-adjacent community actually made meaningful progress on rationality and rationality education; a funny example here in the comments … Radical Empath Ismam advocates for the split and suggests EAs should draw from the “scientific skepticism” tradition instead of Bay Rationality. Well, if I take that suggestion seriously, and start looking for what could be good intro materials relevant to the EA project (which “debunking claims about telekinesis” advocacy content probably isn’t) …. I’ll find New York City Skeptics and their podcast, Rationally Speaking. Run by Julia Galef, who also later wrote Scout Mindset. Excellent. And also, co-founded CFAR.

Jan_Kulveit

ACS is hiring: why work here and why not

Do Not Tile the Light­cone with Your Con­fused Ontology

Ap­ply now to Hu­man-al­igned AI Sum­mer School 2025

Grad­ual Disem­pow­er­ment: Sys­temic Ex­is­ten­tial Risks from In­cre­men­tal AI Development

“Char­ity” as a con­fla­tion­ary al­li­ance term

Jan_Kul­veit’s Quick takes

Do Not Tile the Lightcone with Your Confused Ontology

Apply now to Human-aligned AI Summer School 2025

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

“Charity” as a conflationary alliance term

Jan_Kulveit’s Quick takes