Hey everyone! I’m currently testing a lightweight emotional scaffolding framework (VSPE) as a way to reduce sycophantic outputs in LLMs. It’s rooted in trauma-informed therapy but adapted for AI alignment, essentially guiding models through a human-like process of honest reflection under pressure.
Hey everyone! I’m currently testing a lightweight emotional scaffolding framework (VSPE) as a way to reduce sycophantic outputs in LLMs. It’s rooted in trauma-informed therapy but adapted for AI alignment, essentially guiding models through a human-like process of honest reflection under pressure.
I just launched a microgrant pilot on Manifund, and shared an introduction here: From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework
I would love feedback, collaboration, or simply thoughts from anyone thinking about:
Reward misspecification/flattery risks
Human-centered alignment scaffolds
Developmental interpretability (inspired by Timaeus’ work)
Thanks for reading :)
-Astelle Kay
My website: vspeframework.com