RSS

evhub

Karma: 1,761

Evan Hubinger (he/​him/​his) (evanjhub@gmail.com)

Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic

Selected work:

In­tro­duc­ing Align­ment Stress-Test­ing at Anthropic

evhubJan 12, 2024, 11:51 PM
80 points
0 commentsEA link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

evhubJan 12, 2024, 7:51 PM
65 points
0 commentsEA link
(arxiv.org)

RSPs are pauses done right

evhubOct 14, 2023, 4:06 AM
93 points
7 commentsEA link

The Hub­inger lec­tures on AGI safety: an in­tro­duc­tory lec­ture series

evhubJun 22, 2023, 12:59 AM
44 points
0 commentsEA link

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

evhubDec 20, 2022, 8:09 PM
25 points
0 commentsEA link

We must be very clear: fraud in the ser­vice of effec­tive al­tru­ism is unacceptable

evhubNov 10, 2022, 11:31 PM
713 points
86 comments3 min readEA link

Long-Term Fu­ture Fund: De­cem­ber 2021 grant recommendations

abergalAug 18, 2022, 8:50 PM
68 points
19 comments15 min readEA link

Long-Term Fu­ture Fund: July 2021 grant recommendations

abergalJan 18, 2022, 8:49 AM
75 points
7 comments17 min readEA link

You can talk to EA Funds be­fore applying

evhubSep 28, 2021, 8:39 PM
104 points
8 comments1 min readEA link

FLI AI Align­ment pod­cast: Evan Hub­inger on In­ner Align­ment, Outer Align­ment, and Pro­pos­als for Build­ing Safe Ad­vanced AI

evhubJul 1, 2020, 8:59 PM
13 points
2 comments1 min readEA link
(futureoflife.org)