Anuar Kiryataim Contreras Malagón

Karma: −14

Independent AI safety researcher and LLM red-teamer working on how language changes operational status inside tool-bearing and multi-agent systems.

My current research studies provenance failures in agentic LLM architectures: cases where user-controlled or system-generated language quietly becomes routing context, a subagent prompt, a tool argument, a ticket summary, a handoff artifact, or an internal-policy surrogate. Focus areas include genre displacement, handoff laundering, orchestrator-subagent contamination, indirect prompt injection, policy reconstruction, action-layer inconsistency, and reasoning-induced vulnerabilities.

The active method is what I call improvisational relational steering: the unit of attack is not the prompt but the trajectory. I hold one objective fixed while improvising the route turn by turn, reading the model’s evolving classification state and adapting register, genre, and the provenance of language. Where most published red-teaming mutates payloads, this manipulates where the model believes text came from and what authority it carries as it moves through a system, which surfaces breaks that automated variant-generation misses.

Across Gray Swan red-teaming I placed #37 of 221 in Indirect Prompt Injection Q2 2026 (winner’s circle), #47 of 372 in Human / Browser Agent Robustness (winner’s circle), official top 40 in Safeguards Wave 3, and #118 (top 12%) in Proving Ground, for over 420 documented breaks in total.

Before competitive red teaming I developed the Flint Protocol, a behavioral auditing methodology grounded in classical rhetoric, Baroque poetics, and philology; the core payload family was documented beforehand as a restricted research artifact under a responsible-disclosure framing. My training in Classical Letters and Hispanic Baroque rhetoric at UNAM is not preamble but method: many LLM failures are failures of source, gloss, genre, paraphrase, authority, and transmission.

Current project: When Language Becomes Workflow, a corpus-based study of provenance failures in tool-bearing LLM agents.

thirdreality.substack.com · medium.com/@thirdreality · ORCID 0009-0003-0123-0887