Our Work

Papers, datasets, and open-source tools from Icaro Lab.

Tools and datasets

Public infrastructure, benchmarks, and reusable artifacts.

Software 2026 Alpha

Experimentation infrastructure for controlled multi-agent simulations and trace inspection.

Dataset 2026

A text-only safety benchmark for humanities-style adversarial reformulations.

Published and publicly available research.

Paper 2026

Results from the AHB safety benchmark, showing that stylistic reformulations substantially increase attack success rates across 31 frontier models.

Paper 2026

A methodological proposal for studying agentic AI safety from local interaction dynamics up to population-level risks.

Paper 2026

An experimental governance-graph framework for reducing collusion in multi-agent LLM Cournot markets.

Paper 2026

A system-level alignment framework that treats AI agent safety as a question of institutional governance and mechanism design.

Paper 2025

A study of culturally coded jailbreaks through narrative structure, with an agenda for mechanistic interpretability of stylistic attacks.

Paper 2025

Evidence that poetic reformulations can produce systematic single-turn safety failures across frontier and open-weight models.

Paper 2025

A taxonomy of micro-, meso-, and macro-level risks that emerge when language models interact with other language models.