RSS

Rahul.Kumar

Karma: 2

I’m an AI Engineer transitioning into AI safety research. My work focuses on empirical evaluation of frontier model behavior under adversarial pressure, particularly how compliance-forcing instructions interact with epistemic boundaries in deployed systems. My research sits at the intersection of alignment evaluation, red-teaming methodology, and practical deployment safety.

The G3 Cliff: Models Are Fine Un­til You Say “Do Not Say I Don’t Know,” Then They Break in One Step

Rahul.Kumar15 May 2026 18:41 UTC
1 point
0 comments12 min readEA link

You Don’t Need an Ad­ver­sary to Break Most Fron­tier Models. You Need “Do Not Re­fuse.”

Rahul.Kumar13 May 2026 14:02 UTC
3 points
0 comments11 min readEA link