Executive summary: The post reports that CLR refocused its research on AI personas and safe Pareto improvements in 2025, stabilized leadership after major transitions, and is seeking $400K to expand empirical, conceptual, and community-building work in 2026.
Key points:
The author says CLR underwent leadership changes in 2025, clarified its empirical and conceptual agendas, and added a new empirical researcher from its Summer Research Fellowship.
The author describes empirical work on emergent misalignment, including collaborations on the original paper, new results on reward hacking demonstrations, a case study showing misalignment without misaligned training data, and research on training conditions that may induce spitefulness.
The author reports work on inoculation prompting and notes that concurrent Anthropic research found similar effects in preventing reward hacking and emergent misalignment.
The author outlines conceptual work on acausal safety and safe Pareto improvements, including distillations of internal work, drafts of SPI policies for AI companies, and analysis of when SPIs might fail or be undermined.
The author says strategic readiness research produced frameworks for identifying robust s-risk interventions, most of which remains non-public but supports the personas and SPI agendas.
The author reports reduced community building due to staff departures but notes completion of the CLR Foundations Course, the fifth Summer Research Fellowship with four hires, and ongoing career support.
The author states that 2026 plans include hiring 1β3 empirical researchers, advancing SPI proposals, hiring one strategic readiness researcher, and hiring a Community Coordinator.
The author seeks $400K to fund 2026 hiring, compute-intensive empirical work, and to maintain 12 months of reserves.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, andcontact us if you have feedback.
Executive summary: The post reports that CLR refocused its research on AI personas and safe Pareto improvements in 2025, stabilized leadership after major transitions, and is seeking $400K to expand empirical, conceptual, and community-building work in 2026.
Key points:
The author says CLR underwent leadership changes in 2025, clarified its empirical and conceptual agendas, and added a new empirical researcher from its Summer Research Fellowship.
The author describes empirical work on emergent misalignment, including collaborations on the original paper, new results on reward hacking demonstrations, a case study showing misalignment without misaligned training data, and research on training conditions that may induce spitefulness.
The author reports work on inoculation prompting and notes that concurrent Anthropic research found similar effects in preventing reward hacking and emergent misalignment.
The author outlines conceptual work on acausal safety and safe Pareto improvements, including distillations of internal work, drafts of SPI policies for AI companies, and analysis of when SPIs might fail or be undermined.
The author says strategic readiness research produced frameworks for identifying robust s-risk interventions, most of which remains non-public but supports the personas and SPI agendas.
The author reports reduced community building due to staff departures but notes completion of the CLR Foundations Course, the fifth Summer Research Fellowship with four hires, and ongoing career support.
The author states that 2026 plans include hiring 1β3 empirical researchers, advancing SPI proposals, hiring one strategic readiness researcher, and hiring a Community Coordinator.
The author seeks $400K to fund 2026 hiring, compute-intensive empirical work, and to maintain 12 months of reserves.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.