Teun van der Weij

Karma: 191

How to mitigate sandbagging

Teun van der WeijMar 23, 2025, 5:19 PM

3 points

0 comments EA link

Teun van der Weij Mar 20, 2025, 9:21 PM
1 point
0 ∶ 0
on: AIS Netherlands is looking for a Founding Executive Director (EOI form)
Let’s go!

Maybe you should cross-post it to LW too?

The Elicitation Game: Evaluating capability elicitation techniques

Teun van der WeijFeb 27, 2025, 8:33 PM

3 points

0 comments EA link

Teun van der Weij Jun 14, 2024, 8:55 AM
1 point
0 ∶ 0
in reply to: Andrew Gimber’s comment on: [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Ha, you’re clearly right. We will fix it.

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der WeijJun 13, 2024, 10:04 AM

24 points

2 comments EA link

(arxiv.org)

Teun van der Weij Feb 1, 2024, 7:12 AM
1 point
0 ∶ 0
in reply to: Yashvardhan’s comment on: List of projects that seem impactful for AI Governance
I don’t know of a list like this, but you could go through AI safety fundamentals governance track and look for key concepts. Potentially look through relevant papers and posts as well.

List of projects that seem impactful for AI Governance

JaimeRVJan 14, 2024, 4:52 PM

35 points

2 comments13 min readEA link

Beyond Humans: Why All Sentient Beings Matter in Existential Risk

Teun van der WeijMay 31, 2023, 9:21 PM

12 points

0 comments13 min readEA link

Announcing the European Network for AI Safety (ENAIS)

Esben KranMar 22, 2023, 5:57 PM

124 points

3 comments3 min readEA link

Teun van der Weij Aug 11, 2022, 10:25 PM
1 point
0 ∶ 0
in reply to: Steven Byrnes’s comment on: Changing the world through slack & hobbies
Ah, makes sense. Thanks!

Teun van der Weij Aug 11, 2022, 9:33 PM
0 points
0 ∶ 0
on: Changing the world through slack & hobbies
What does PI stand for?

Teun van der Weij Jun 23, 2022, 10:26 AM
3 points
0 ∶ 0
on: Job, skills, and career capital suggestions for 18-year-old
Hi fellow Dutchie!

I’d recommend getting a career talk at 80,000 hours, they offer free career counseling and might be easiest way to get quality advice!

Teun van der Weij Jun 23, 2022, 7:36 AM
3 points
0 ∶ 0
in reply to: howdoyousay?’s comment on: howdoyousay?’s Shortform
I myself don’t have too much to add, but in this 80,000 hours podcast with Will MacAskill they do discuss it. (if you don’t want to listen to the whole podcast, you can look up the transcript.

Teun van der Weij Jun 23, 2022, 1:52 AM
4 points
0 ∶ 0
on: Teun_Van_Der_Weij’s Shortform
I, like probably most reading this, am someone who strongly values reason and believes that helping others is a good thing.

This is why many people choose to ‘join’ EA: They find EA somewhere, think that the ideas make sense and seem true, presumably because it aligns well with their values and consider it right. The reason-based approach makes sense to them.

But isn’t the procedure similar for following any community? Imagine I am looking for meaning, so when I stumble upon a guru which aligns with (some of) my values, I integrate myself into that community. Similarly so for political parties, a football club, a weekly art gathering, etc.

I might as well have different beliefs and values. So these questions follow:

Does EA seem like the right movement to me because there is something inherently right about EA, or just because EA fits my initial values and beliefs? In other words, is EA just built on the belief in potentially arbitrary assumptions (e.g helping others is good), or is there actual substantiation for these assumptions?

I’m curious to hear thoughts about this, I might dig much deeper into this and potentially create a post for the Red Teaming Contest. There seems to be a lot more to be said and I’ve already got some ideas, but I wanted to find out whether this would be fruitful.

Teun_Van_Der_Weij’s Quick takes

Teun van der WeijJun 23, 2022, 1:52 AM

1 point

1 comment EA link

Teun van der Weij May 26, 2022, 6:56 AM
6 points
0 ∶ 0
on: Apply to attend an EA conference!
Small suggestion, instead of writing SF and DC, I’d put the full name instead of the abbreviations ‘SF’ and ‘DC’. I’d guess SF means San Francisco and DC Washington DC, but it’s not very open and clear to people outside the United States. (Especially when it’s an EA global event!)

Teun van der Weij Mar 9, 2022, 7:10 AM
3 points
0 ∶ 0
on: Can one person make a difference?
Good article!
However, the footnotes at the end of the article require some formatting work, at least for me (Windows laptop, Brave browser). Overall I’m also really impressed by the quality of the website, it’s great for the professional outlook of the organization :)

Teun van der Weij

How to miti­gate sandbagging

The Elic­i­ta­tion Game: Eval­u­at­ing ca­pa­bil­ity elic­i­ta­tion techniques

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

List of pro­jects that seem im­pact­ful for AI Governance

Beyond Hu­mans: Why All Sen­tient Be­ings Mat­ter in Ex­is­ten­tial Risk

An­nounc­ing the Euro­pean Net­work for AI Safety (ENAIS)