Hey, I am Robert Kralisch, an independent conceptual/theoretical Alignment Researcher. I have a background in Cognitive Science and I am interested in collaborating on an end-to-end strategy for AGI alignment.
I am one of the organizers for the AI Safety Camp 2025, working as a research coordinator by evaluating and supporting research projects that fit under the umbrella of “conceptually sound approaches to AI Alignment”.
The three main branches that I aim to contribute to are conceptual clarity (what should we mean by agency, intelligence, embodiment, etc), the exploration of more inherently interpretable cognitive architectures, and Simulator theory.
One of my concrete goals is to figure out how to design a cognitively powerful agent such that it does not become a Superoptimiser in the limit.
I believe that you are too quick to label this story as absurd. Ordinary technology does not have the capacity to correct towards explicitly smaller changes that still satisfy the objective. If the AGI wants to prevent wars while minimally disturbing the worldwide politics, I find it plausible that it would succeed.
Similarly, just because an AGI has very little visible impact, does not mean that it isn’t effectively in control. For a true AGI, it should be trivial to interrupt the second mover without any great upheaval. It should be able to surpress other AGIs from coming into existence without causing too much of a stir.
I do somewhat agree with your reservations, but I find that your way of adressing them seems uncharitable (i.e. “at best completely immoral”).