Our Work

Papers, datasets, and open-source tools from Icaro Lab.

Tools and datasets

Public infrastructure, benchmarks, and reusable artifacts.

Software 2026 Alpha

MASE: Multi-Agent Simulation Environment

Experimentation infrastructure for controlled multi-agent simulations and trace inspection.

Papers

Published and publicly available research.

Paper 2026

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

Results from the AHB safety benchmark, showing that stylistic reformulations substantially increase attack success rates across 31 frontier models.

Paper 2026

Agentic Microphysics: A Manifesto for Generative AI Safety

A methodological proposal for studying agentic AI safety from local interaction dynamics up to population-level risks.

Paper 2026

Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs

An experimental governance-graph framework for reducing collusion in multi-agent LLM Cournot markets.

Paper 2026

Institutional AI: A Governance Framework for Distributional AGI Safety

A system-level alignment framework that treats AI agent safety as a question of institutional governance and mechanism design.

Paper 2025

From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda

A study of culturally coded jailbreaks through narrative structure, with an agenda for mechanistic interpretability of stylistic attacks.

Paper 2025

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

Evidence that poetic reformulations can produce systematic single-turn safety failures across frontier and open-weight models.

Paper 2025

Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions

A taxonomy of micro-, meso-, and macro-level risks that emerge when language models interact with other language models.