From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework
Hi EA Forum! đ
Iâm Astelle Kay, a counseling-psych grad student who moonlights in alignment whenever coursework (and caffeine) allow. Most of my brain lives where clinical psychology, systems thinking, and âplease-let-humanity-stick-aroundâ concerns intersect.
TL;DR
The VSPE Framework (Validation â Submission â Positivity â Empowerment) began life as a four-step therapy framework I tested with friends and family.
Side-effect: it cuts âflattery loopsââthe reflex to mirror praise instead of telling the hard truth.
That feels relevant to large language models.
Iâm shrinking VSPE into a 25-prompt Flattery-Reduction Benchmark plus a plug-and-play license so labs can tinker without hiring an ethics PhD and a lawyer.
Would love feedback, collaborators, or regrantor eyeballs before this grows beyond âone grad student + a whiteboard.â
From couch to compute cluster
Therapy roots.
Validate problems â Submit to what we canât control â realistic Positivity â Empower next steps. Four verbs, no jargon.Unexpected pattern.
When prompted to utilize my framework and simply give âempathy without advice,â my tiny GPT-4 chatbot stopped telling me I was brilliant and started giving candid, prosocial answers.Hey, thatâs sycophancy.
Anthropicâs Constitutional AI and several ARC evals flag flattering compliance as a safety risk. VSPE seemed to nudge the same dial.Fast-forward.
Provisional US patent filed (mainly so nobody locks VSPE away). Reached Stage 2 of the 2025 MATS selection. Now running a Manifund pilot â 25 prompts, $9.8 k, December read-out.
Why this might matter
Psych heuristics are under-used. RLHF /â RLAIF optimise âhelpful & harmless,â not ego management or praise addiction.
Audit-friendly. Four plain verbs: easy to port, easy to critique, zero secret sauce.
Bridge material. Therapy researchers rarely read AF; alignment folks rarely parse CBT manuals. VSPE tries to translate a sliver of each world.
How you can stress-test or support
Shoot holes in the 25-prompt designâtoo small? Wrong metric?
Name failure modes: Could VSPE blunt candor or creativity?
Point me to prior art so I can cite, not duplicate.
Regrant /â co-fund if you like cheap, falsifiable pilots (Manifund link at the bottom of this post).
âPsychology and AI share a flaw: both love telling us exactly what we want to hear.â
â sticky note above my desk
My hope: VSPE nudges future models toward frank, human-centred dialogueâfirst in micro-benchmarks, later (if it survives) in training loops.
Curious, sceptical, or just chasing cross-disciplinary rabbit holes? Drop a comment or DM. Iâll post code, data, and inevitable blooper reels as the project unfolds. More context at vspeframework.com.
With care,
Astelle
(Manifund pilot: [Manifund pilot])
This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiriesâespecially for commercial or model development useâplease contact Astelle Kay at astellekay@gmail.com.
Related work: Varma & Beitman (2025) recently proposed a CBT-style âtherapy loopâ prompt to curb hallucinations. VSPE targets the complementary issue of flattery; our benchmark will include the therapy loop as a baseline for comparison.