Jan Wehner🔸

Karma: 116

Jan Wehner🔸 24 Jan 2026 11:04 UTC
3 points
0 ∶ 0
in reply to: Rohin Shah’s comment on: Will we get automated alignment research before an AI Takeoff?
Thanks for the input!
On Scheming: I actually don’t think scheming risk is the most important factor. Even removing it completely doesn’t change my final conclusion. I agree that a bimodal distribution with scheming/non-scheming would be appropriate for a more sophisticated model. I just ended up lowering the weight I assign to the scheming factor (by half) to take into account that I am not sure whether scheming will/won’t be an issue.
In my analysis, the ability to get good feedback signals/success criteria is the factor that moves me the most to thinking that capabilities get sped up before safety.
On Task length: You have more visibility into this, so I’m happy to defer. But I’d love to hear more about why you think tasks in capabilities research have longer task lengths. Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?

Will we get automated alignment research before an AI Takeoff?

Jan Wehner🔸22 Jan 2026 17:57 UTC

43 points

5 comments11 min readEA link

Jan Wehner🔸 17 Jan 2026 13:49 UTC
3 points
0 ∶ 0
in reply to: zeshen🔸’s comment on: Should the AI Safety Community Prioritize Safety Cases?
It’s a great question. I see Safety Cases more as a meta-framework in which you can use different kinds of evidence. Other risk management techniques can be used as evidence in a Safety Case (eg this paper uses a delphi method).
Also I think Safety Cases are attractive to people in AI Safety because:
1) They offer flexibility for the kind of evidence and reasoning that is allowed. From skimming it seems to me that many of the other risk management practices you linked are more strict about the kind of arguments or the kind of evidence that can be brought.
2) They strive to comprehensively prove that overall risk is low. I think most of the other techniques don’t let you make claims such as “overall risk from a system is <x%” (which AI Safety people want).
3) (I might be wrong here), but it seems to me that many other risk management techniques require you to understand the system and it’s environment decently well, whereas this is very difficult for AI Safety.
Overall, you might well be right that other risk management techniques have been overlooked and we shouldn’t just focus on Safety Cases.

Jan Wehner🔸 14 Jan 2026 14:10 UTC
1 point
0 ∶ 0
in reply to: Ben R Smith’s comment on: Should the AI Safety Community Prioritize Safety Cases?
Hi Ben, sorry about the mistake and thanks for letting me know. I’ll update it here and on LessWrong immediately.

Should the AI Safety Community Prioritize Safety Cases?

Jan Wehner🔸11 Jan 2026 12:50 UTC

18 points

9 comments13 min readEA link

Jan Wehner🔸 8 Jan 2026 6:02 UTC
1 point
0 ∶ 0
in reply to: Mayowa Osibodu’s comment on: Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?
Great, I would be keen to read yoir next post! Esp because I think that the ability of attackers to remove many kinds of safeguards is a fundamental challenge in open source safety.

Jan Wehner🔸 4 Jan 2026 15:28 UTC
1 point
0 ∶ 0
on: Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?
Hi Mayowa, I agree that open-source safety is a big (and I think too overlooked) AI Safety problem. We can be sure that Hacker groups are currently/will soon use local LLMs for cyber-attacks.
Your idea is neat, but I’m worried that in practice it would be easy for actors (esp ones with decent technical capabilities) to circumvent these defences. You already mention the potential to just alter the data stored in plaintext, but I think the same would be possible with other methods of tracking the safety state. Eg with steganography, the attacker could occasionally use a different model to reword/rewrite outputs and thus remove the hidden information about the safety state.

Jan Wehner🔸 19 Dec 2025 16:09 UTC
1 point
0 ∶ 0
in reply to: Jadair’s comment on: Why You Shouldn’t Do a PhD
I’d say it’s a medium-sized deal. Academics can often propose ideas and show that they work on smaller (eg 7b) models. However, then it requires someone with a larger compute budget to like the idea & results and implement it at a larger scale.
There are some areas where access to compute is less important like MechInterp, RedTeaming, creating Benchmarks or more theoretical areas of AI research. Areas are more amenable to academic research if they don’t require training Frontier models. Eg inference or small fine-tuning runs on Frontier Models are actually not super expensive and can be done by academic labs. Also some areas of research can be done well on smaller models (eg MechInterp), so it’s fine if your uni doesn’t have so many GPUs.
But my experience (and also that of some others I know) was that I would regularly think of experiments or research ideas that I didn’t end up running or pursuing because I didn’t think I had the necessary compute.

Jan Wehner🔸 17 Dec 2025 14:37 UTC
1 point
0 ∶ 0
in reply to: Kestrel🔸’s comment on: Why You Shouldn’t Do a PhD
I appreciate the Counterbalance!

Why You Shouldn’t Do a PhD

Jan Wehner🔸17 Dec 2025 9:58 UTC

7 points

6 comments7 min readEA link

Learning from the Luddites: Implications for a modern AI labour movement

Jan Wehner🔸16 Oct 2025 17:11 UTC

5 points

1 comment8 min readEA link

Jan Wehner🔸 6 Oct 2025 14:18 UTC
2 points
0 ∶ 0
in reply to: Beyond Singularity’s comment on: Envision paradise in the face of catastrophe
Nice to see that we’re thinking along similar lines. I really like your thinking on finding a status-based game which still gives people something to strive for and could really help with giving people meaning in a post-work society!

Envision paradise in the face of catastrophe

Jan Wehner🔸2 Oct 2025 7:32 UTC

20 points

5 comments4 min readEA link

Jan Wehner🔸 29 Sep 2025 7:58 UTC
3 points
0 ∶ 0
in reply to: Henry Stanley 🔸’s comment on: Why you should eat meat—even if you hate factory farming
Thanks for the pointer Henry! It motivated me to look into culling more and I just wanted to share some EU-specific facts I found:
A hen produces ~350 eggs, so consuming one egg is ~1/350th of culling a male chicken. 28% of chicken in Europe have in-OVO sexing, with Germany having ~80%. The numbers are lower for organic eggs because for some reasons in-ovo sexing was forbidden for organic eggs until this year (stupid much???).
Overall, I find it difficult to weigh male-chicken-culling morally. Do they have strong conscious experience at that time? How much suffering is there involved in their deaths?

Jan Wehner🔸 29 Sep 2025 7:57 UTC
2 points
0 ∶ 0
in reply to: Kestrel🔸’s comment on: Why you should eat meat—even if you hate factory farming
Thanks a lot! These seem like very significant issues that updated me to put only eat 2 eggs/month instead of per week. I was surprised to read that chicken (even organic ones) can be kept inside for 5-6 months per year. Also, reading about the welfare issues of chicken bread for egg laying seems pretty bad (eg weaker bonestructure, immune system and social behavior).

Jan Wehner🔸 28 Sep 2025 14:44 UTC
5 points
0 ∶ 0
on: Why you should eat meat—even if you hate factory farming
Thanks for writing this Kat! While I don’t agree with everything, the core argument (cluelessness about nutritional science means ancestral diets are a strong prior) was convincing to me.
I wanted to note how I updated my diet from this and additional ~3 hours of research:
- 100g/week of sardines: (due to reasons here)
- 150g/week of mussels: I agree with the post that they are unlikely to be sentient
- 2 eggs/week: My guess is that EU welfare level 0 (organic) actually means chickens possibly have a net-positive life. Lmk if you know of welfare concerns with organic eggs in the EU!
- Once per month cow liver: In order to cover the “red-meat” food group, I’m adding some cow meat as it seems to cause the lowest suffering of commonly available animals per kg. Why liver and not normal beef? Firstly, it has higher nutrient density, thus you need less of it. Secondly, organs were regularly eaten by ancestors, thus the ancestral prior is strong. Thirdly, livers are a byproduct of normal meat production and organ meat is often discarded due to low demand. Thus, buying livers likely doesn’t increase demand for cows much.
I was already occasionally eating cheese beforehand; otherwise, yoghurt/kefir might also look good.
I’m happy to be convinced of changing this based on new evidence!

15 Levers to Influence Frontier AI Companies

Jan Wehner🔸26 Sep 2025 8:36 UTC

16 points

0 comments10 min readEA link

Jan Wehner🔸 9 Jan 2025 18:15 UTC
3 points
0 ∶ 0
on: Rethinking the Value of Working on AI Safety
Great post Johan! It got me thinking more deeply about the value of working on x-risk reduction and how we ought to act under uncertainty. I think people (including me) doing direct work on x-risk reduction would do well to reflect on the possibility of their work having (large) negative effects.
I read the post as making 2 main arguments:
1) The value of working on x-risk reduction is highly uncertain
2) Given uncertainty about the value of cause areas, we should use worldview diversification as a decision strategy
I agree with 1, but am not convinced by 2). WDS might make sense for a large funder like OpenPhil, but not for individuals seeking to maximize their positive impact on the world. Diversification makes sense to reduce downside risk or because of diminishing returns of investing in one option. I think neither of those apply to individuals interested in maximizing the EV of their positive impact on the world. 1) While protecting against downside risks (e.g. losing all your money) makes sense in investing, it is not important if you’re only maximizing Expected Value. 2) Each individual’s contributions to a cause area won’t change things massively, so it seems implausible that there are strong diminishing returns to their contributions.
However, I am in favor of your suggestion of doing worldview diversification at the community level, not at the level of individuals. To an extent, EA already does that by emphasising Neglectedness. Perhaps practically EAs should put more weight on neglectedness, rather than working on the single most important problem they see.

Jan Wehner🔸 21 Jan 2022 23:02 UTC
4 points
0 ∶ 0
in reply to: Buhl’s comment on: A collection of resources for Intro Fellowship organisers
Hey I’m one of the organisers of the PISE Fellowship and would like to weigh in on some of the points you made:
(a) I agree that it’s hard to cover all core ideas of EA well in 4 weeks. For example we were not able to fit in animal welfare. So for a 4 week model it seems essential to offer things like discussion groups, in depth fellowships, etc. so people can keep learning after the fellowship is over.
(b) From my experience friendships and social engagement come more from social activities or working together than from a fellowship (might be different for others). Again here it seems essential when running a 4 week fellowship to offer other ways of socialising and engaging. PISE does this by organising big social events and by recruiting people into commitees after the fellowship ends.
(c) Anecdotally, multiple people mentioned that they felt like 4 weeks was not a big commitment and joined because of that. I hope we soon have some data that can shed some light on this question.
There will be a longer post about our experience with the 4 week fellowship soon.
In the newly founded EA Delft we are planning to employ a different model. We will first run a 4 week intro fellowship (and advertise it as 4 weeks), but then throughout offer people to continue the fellowship for another 4 weeks. This way the people only willing to join for 4 weeks will join, but the ones willing to do the full 8 weeks will get to dive into more topics. We will share our experience and the results we get with this method at some point.

Jan Wehner🔸

Will we get au­to­mated al­ign­ment re­search be­fore an AI Take­off?

Should the AI Safety Com­mu­nity Pri­ori­tize Safety Cases?

Why You Shouldn’t Do a PhD

Learn­ing from the Lud­dites: Im­pli­ca­tions for a mod­ern AI labour movement

En­vi­sion par­adise in the face of catastrophe

15 Lev­ers to In­fluence Fron­tier AI Companies

Will we get automated alignment research before an AI Takeoff?

Should the AI Safety Community Prioritize Safety Cases?

Learning from the Luddites: Implications for a modern AI labour movement

Envision paradise in the face of catastrophe

15 Levers to Influence Frontier AI Companies