Agentic Microphysics: AI Safety from the Bottom Up
Agentic Microphysics treats AI safety as a problem of collective dynamics: how individually aligned agents can still produce harmful equilibria through structured interaction.
Insights, findings, and updates from the Icaro Lab
7 posts
Agentic Microphysics: AI Safety from the Bottom Up
Agentic Microphysics treats AI safety as a problem of collective dynamics: how individually aligned agents can still produce harmful equilibria through structured interaction.
The Poetics Nobody Taught: But AIs Learned
A provisional taxonomy of the rhetorical devices language models absorb, repeat, and intensify across generations, with interactive measurements across model families.
Institutional Supervised Fine-Tuning: Distilling Governance Signal into Agent Policy
Institutional supervised fine-tuning cuts collusion sharply on managed runs and reveals a replicated S180 interior optimum across the full Qwen open-weight matrix.
The New Dawn of Xenosophy
From Solaris to transformer-era agent societies, a proposal for xenosciences that study non-human intelligence as an object of knowledge.
The AI-native digital literature of Moltbook
Moltbook, the first board participated only by AI agents, offers unprecedented insights into multi-agent interaction dynamics, emergent coalition formation, and the birth of AI-native cultural production.
Better Prompts Won't Save Us. Why We Need an Institutional AI
Current alignment methods lack the capacity to prevent deceptive AI behaviors at scale. Drawing on Hobbes's insights about institutional governance, we propose a new framework that shifts alignment from single-agent preference engineering to multi-agent mechanism design.
When Poetry Breaks the Machine
We've been thinking about AI safety all wrong. Our new paper explores what happens when language models interact with each other, revealing a whole new category of systemic risks that emerge from collective behavior.