RSS

evhub

Karma: 1,730

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

I am a research scientist at Anthropic where I lead the Alignment Stress-Testing team. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

We must be very clear: fraud in the ser­vice of effec­tive al­tru­ism is unacceptable

evhub10 Nov 2022 23:31 UTC
709 points
85 comments3 min readEA link

You can talk to EA Funds be­fore applying

evhub28 Sep 2021 20:39 UTC
104 points
7 comments1 min readEA link

RSPs are pauses done right

evhub14 Oct 2023 4:06 UTC
97 points
7 comments1 min readEA link

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhub12 Jan 2024 23:51 UTC
80 points
0 comments1 min readEA link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

evhub12 Jan 2024 19:51 UTC
65 points
0 comments1 min readEA link
(arxiv.org)

The Hub­inger lec­tures on AGI safety: an in­tro­duc­tory lec­ture series

evhub22 Jun 2023 0:59 UTC
44 points
0 comments1 min readEA link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

evhub20 Dec 2022 20:09 UTC
25 points
0 comments1 min readEA link

FLI AI Align­ment pod­cast: Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

evhub1 Jul 2020 20:59 UTC
13 points
2 comments1 min readEA link
(futureoflife.org)