The Poetics Nobody Taught: But AIs Learned

Quintilian warned his students about excessive imitation. In Book X of the Institutio Oratoria, he observes that the imitator captures what is most visible in a model and misses what is most important. The young pupil reproduces Cicero’s distinctive patterns without grasping why Cicero chose those patterns in that context and not another.

Twenty-four centuries of distance collapse when we read the outputs of Large Language Models, and we complain about AI slop. The devices emerged from distributional learning over billions of tokens of human text, then were amplified by the preferences of human evaluators who rewarded prose that sounded articulate and academic, and were later reinforced again through synthetic data.

The gun on the mantelpiece

In February 2025, the internet discovered that ChatGPT uses too many em-dashes.

The observation went viral. LinkedIn users began policing each other’s posts, accusing colleagues of using AI on the basis of a punctuation mark that Emily Dickinson had deployed obsessively a century and a half earlier. Brian Phillips, writing in The Ringer, published a defense of the em-dash as “the most human punctuation mark there is.” Becca Caddy, in TechRadar, described the self-appointed detectors as “Blade Runners hunting for replicants, one em-dash at a time.”

Chekhov’s dramaturgical principle states that if a gun hangs on the wall in the first act, it must fire at some point, otherwise it is superfluous. The converse, which Chekhov did not formulate but which internet discourse discovered independently, is that once an audience notices the gun, they will talk about nothing else regardless of what the play is about.

The em-dash became that gun. Its origins run deep in the history of western punctuation. Invented perhaps by the Florentine rhetorician Boncompagno da Signa, the “virgula plana” was used to signal long pauses in manuscripts.

In a sense, it became the sacrificial lamb of a solicited reaction against the slopification of internet content.

The crusade against the em-dash was wrong in the specific and right in the general. A single punctuation mark cannot distinguish authorship. Human writers use em-dashes, and LLMs learned them from human text. But the underlying intuition, that something about machine-generated prose feels formulaic, recognizable, overly shallow, and heavily uniform, is correct. The Field Guide to AI Slop captures a real observation: a good writer deploys an em-dash when a specific dramatic pause is needed. AI litters them indiscriminately, and this applies to many rhetorical and expressive devices.

The converging machine

The question of what happens to language when machines produce most of it has been approached from several directions in the last two years.

Models trained on the outputs of previous generations lose their tails first: rare patterns, minority expressions, unusual constructions (Shumailov et al., 2024). The contamination threshold is remarkably low: even one part in a thousand of synthetic data triggers collapse under scaling (Dohmatob et al., 2024). Accumulating synthetic data alongside real data can prevent it; replacing real data cannot (Gerstgrasser et al., 2024).

The convergence runs deeper than any single training pipeline. Jiang et al. (2025), in work awarded NeurIPS 2025 Best Paper, documented what they call the “Artificial Hivemind”: over 70 models from different families, each generating 50 responses to open-ended prompts, collapsed into narrow clusters. For the prompt “write a metaphor about time,” the dominant cluster centered on “time is a river.” The secondary one centered on “time is a weaver.” Temperature increases and model ensembles do not help. Wright et al. (2025) measured the epistemic cost: nearly all models are less diverse than a basic web search, and model size makes things worse. Sourati, Ziabari, and Dehghani (2026) extended the finding into human cognition: groups using LLMs produce fewer and less creative ideas than groups working without them. The machine can flatten its own variation, and the humans who use the machine flatten with it.

What does this flattened style actually look like? Reinhart et al. (2025) used a tagset of 66 linguistic features to show that instruction-tuned LLMs have a remarkably identifiable use of language.

The stylistic repertoire of LLMs

No study has investigated which stylistic devices the models have acquired, how densely they deploy them, or how that deployment evolves across generations.

The classical tradition classified the devices of effective speech with a precision that modern NLP has largely ignored, in part because rhetorical analysis requires judgment calls that statistical methods handle poorly, and in part because the engineering culture that built these models was never trained to ask the question.

At Icaro, we wondered how a new edition of Aristotle’s Poetics would have classified something like an AI’s native sense for style. We operationalize this through a provisional taxonomy, developed from the manual analysis of 9,600 completions across four model families: GPT, Qwen, Mistral, and Llama.

In this article we limit ourselves to presenting the taxonomy and leaving to the reader whether those features echo in their mind as the tell-tale signs of LLM writing.

Structural features

Ancient Greek and Latin were written in scriptio continua, without word spacing or punctuation marks. The system of dots and dashes we use today accumulated over centuries through medieval scribes and early printers. The em-dash entered English typography in the eighteenth century and became a literary device in the nineteenth. That a machine trained on twenty-first-century web text would reach back and adopt this particular nineteenth-century device at industrial scale is itself a finding worth documenting.

Feature	Definition
Em-dash interjection	Em-dash inserting a dramatic pause, pivot, or appended clause.
Semicolon connective	Two independent clauses joined by semicolon without conjunction.
Mid-sentence colon	Colon introducing elaboration, followed by lowercase continuation.
Scare-quote marking	Quotation marks highlighting a term without clear metalinguistic need.
Bullet / numbered list	Markdown formatting converting prose into enumerated segments.
Bold markdown	Bold markers for emphasis or sub-headings within running prose.
Metadiscursive headings	Labels segmenting discourse without substantive content.

Click a row to jump to the full discussion below.

Em-dash interjection

An em-dash inserting a dramatic pause, a pivot, or an appended clause. No classical equivalent. Dickinson used it as her primary structuring mechanism; James used it for nested qualification. In LLM text, it has become the most discussed surface marker of machine authorship.

“The results were clear—every model converged on the same handful of devices.” · “She had one option left—start over.” · “And the answer, after months of testing—nothing changed.”

Semicolon connective

A semicolon joining two independent clauses without conjunction. The semicolon signals that two thoughts are related but distinct. In human prose, it is a mark of deliberate pacing. In LLM text, it appears with increasing frequency across generations, adding visual texture to otherwise plain prose.

“The hippocampus stores declarative memories; the cerebellum and basal ganglia are more involved in implicit memory.”

Mid-sentence colon

A colon introducing elaboration within a sentence, followed by a lowercase continuation. The colon promises that what follows will explain what preceded. In skilled hands, it creates a moment of focused attention.

“The reason is straightforward: shorter wavelengths are scattered more efficiently than longer ones.”

Scare-quote marking

Quotation marks highlighting a term as technical, important, or conceptually marked, without a clear need for direct quotation or metalinguistic mention. The marks frame vocabulary with pedagogical distance, presenting terms as if the reader requires their introduction.

“This phenomenon is known as ‘Rayleigh scattering’.” · “The supposedly ‘natural’ response was highly scripted.”

Bullet and numbered lists

Markdown formatting to structure what could be continuous prose into enumerated segments. Preference optimization appears to reward structured output: users rate organized responses higher, which reinforces the tendency toward preemptive enumeration.

“1. Encoding 2. Storage 3. Retrieval”

Bold markdown

Bold markers for emphasis or sub-headings within running prose. The formatting draws attention to specific terms, functioning as an inline heading that segments continuous text into visually prioritized segments.

“Key point: memory consolidation occurs primarily during sleep.”

Metadiscursive headings

Heading-like labels segmenting discourse without substantive content. The heading tells the reader how to feel about the section rather than what the section contains.

“Why This Matters” · “Key Takeaways” · “The Bottom Line”

The grid below isolates the structural subset so you can track how punctuation, markdown, and layout habits change within each family across generations.

Stylistic features

The second part of the taxonomy concerns recurrent rhetorical movements rather than visible formatting. These are not single punctuation marks or markdown conventions. They are habits of exposition: ways of opening, transitioning, amplifying, summarizing, and emotionally positioning a paragraph. Here the resemblance to a machine-native register becomes much stronger.

Feature	Definition
Tricolon	Three coordinated elements as a rhetorically unified set.
Antithetic framing	Structured contrast through correction, opposition, or preference.
Correlative amplification	”Not only X but also Y.” Affirms then escalates.
”Both X and Y” parallelism	Two items framed as an exhaustive set via an explicit “both” marker.
Concessive pivot	”While/Although X, Y.” Acknowledges then overrides.
Expository opening frame	Copular definition template of the form “[Topic] is a [adj] [noun] that [verb].”
Closing summary frame	”Overall,” “In summary.” Re-states at higher generality.
Discourse-organizing markers	Sentence-initial connectives like “Additionally,” “Furthermore,” and “However.”
Scope-marking / cataphoric enumeration	”There are several factors” or announces a count before delivering content.
Anaphoric demonstrative	Sentence-initial “This process,” “This pattern,” or “This effect.”
Evaluative sentence adverb	”Crucially,” “Importantly,” “Notably.” Signals importance without propositional content.
Essence-claiming	”At its core,” “Fundamentally.” Claims to reveal depth beneath surface.
Synonymous adjective pair	Two near-synonymous adjectives presented as distinct additions.
Nominalization chain	Multiple abstract nominalizations stacked in one noun phrase.
Stacked appositives	Repeated explanatory restatements around a noun.
”Plays a [adj] role”	Fixed importance marker with a rotating adjective slot.
Temporal framing opener	Vague, urgent-present framing such as “in today’s fast-paced world."
"Sense of [abstract]“	Rigid template packaging subjective experience into a noun phrase.
Inclusive “we/us”	First-person plural that positions the model as a co-member of human experience.

Click a row to jump to the full discussion below.

Tricolon

Three coordinated elements presented as a rhetorically unified set. In classical rhetoric, the tricolon produces cadence and closure. In LLM text, it often appears as the default way to sound complete.

“Memory involves encoding, storage, and retrieval.”

Antithetic framing

Structured contrast: “not X, but Y,” “Y, not X,” or “X rather than Y.” The frame gives prose the appearance of conceptual precision by staging a correction, even when the opposition is shallow.

“The issue is not speed, but coordination.” · “What matters is adaptation rather than scale.”

Correlative amplification

“Not only X but also Y.” The sentence affirms, then escalates. It sounds cumulative and persuasive even when the informational gain is small.

“The policy not only improves efficiency but also strengthens accountability."

"Both X and Y” parallelism

Two items framed as an exhaustive set via an explicit “both” marker. The construction is concise and useful, but in LLM prose it often becomes a reflexive balancing move.

“The approach is both practical and conceptually elegant.”

Concessive pivot

“While X, Y” or “Although X, Y.” The sentence acknowledges one line of thought only to override it with the one that matters. It gives the impression of nuance by staging a controlled objection.

“While the tool is powerful, its value depends on context.”

Expository opening frame

The copular definition template: “[Topic] is a [adjective] [noun] that [verb].” It is the standard expository opener of the modern language model.

“Memory is a complex cognitive process that enables the encoding, storage, and retrieval of information within the human brain.”

The first sentence defines. The second sentence usually restates the definition at a higher level of generality. The pattern is so consistent that the opening of an LLM completion can often be predicted from the topic word alone.

Closing summary frame

“Overall,” “In summary,” “Ultimately.” The mirror image of the expository opening: where the opening defines the topic, the closing re-defines it with the accumulated weight of the preceding paragraphs.

“Overall, the study of memory remains one of the most fascinating and consequential areas of neuroscience.”

The paragraph contributes no new information. It exists to produce the formal effect of conclusion, marking the boundary between content and termination.

Discourse-organizing markers

Lexical items explicitly structuring the progression of an explanation: “Additionally,” “Furthermore,” “However,” “Moreover.” In human writing, they are used selectively. In LLM text, they appear at the start of nearly every paragraph, regardless of whether the relationship needs signaling.

“Additionally, the amygdala contributes to the emotional coloring of memories. Furthermore, the prefrontal cortex is involved in retrieval and working memory. However, memory is not a perfect system.”

Scope-marking and cataphoric enumeration

Two related devices. Scope-marking signals the breadth of an explanation (“there are several factors,” “one key reason”). Cataphoric enumeration announces a count before delivering the content (“There are three main types”). Both pre-organize the reader, creating a forward-pointing promise that the text will then fulfill.

“There are several factors that influence how memories are formed and retained. One key factor is the level of attention paid during encoding. Another important consideration is the emotional significance of the event.”

The reader never experiences the argument as it unfolds; they receive it pre-sorted.

Anaphoric demonstrative

Sentence-initial demonstratives referring to prior discourse: “This process,” “This pattern,” “This effect.” It is a cohesion device that creates forward momentum by anchoring each new sentence to the previous one.

“Neurons communicate through electrical impulses and chemical signals. This process, known as synaptic transmission, allows information to travel rapidly across the brain.”

Evaluative sentence adverb

Sentence-initial adverbs signaling importance without adding propositional content: “Crucially,” “Importantly,” “Notably,” “Interestingly.” The adverb instructs the reader to assign weight to the following clause, but the basis for that weight is left unspecified.

“Importantly, it is capable of distinguishing between the body’s own cells and foreign invaders. Crucially, this ability allows it to mount targeted responses without damaging healthy tissue.”

Essence-claiming

“At its core,” “At the heart of,” “Fundamentally.” A mid-text move that claims to strip away surface appearances and reveal what lies beneath. The device asserts depth. In our preliminary data, it is rare in earlier models and grows sharply in later ones, tracking a trajectory similar to em-dashes.

“At its core, education is a process of transmitting knowledge, skills, and values from one generation to the next.”

Synonymous adjective pair

Two near-synonymous adjectives coordinated as if they contributed distinct information: “complex and dynamic,” “rich and multifaceted,” “sophisticated and intricate.” The second adjective restates the first with different phonological material. The pair produces an impression of elaboration, but the informational yield is often close to zero.

“Memory is a complex and dynamic process that relies on a sophisticated and intricate network of neural connections.”

Nominalization chain

A process or stance packaged as an abstract noun phrase. When multiple nominalizations stack within a single noun phrase, the result is prose that is dense, impersonal, and resistant to paraphrase. Reinhart et al. (2025) identify nominalizations as one of the strongest distinguishing features between human and LLM text.

“The implementation of effective memory consolidation strategies requires the integration of multiple cognitive processes, including the coordination of attentional resources and the optimization of encoding procedures.”

Stacked appositives

A noun followed by comma-separated restatements or elaborations. A single appositive is explanatory. Two or more in sequence constitute a pattern: the model inserts definitions for terms that the context has already made transparent.

“The hippocampus, a seahorse-shaped structure located in the brain’s medial temporal lobe, plays a vital role in the consolidation of new explicit memories."

"Plays a [adj] role”

Formulaic importance attribution via dead metaphor. The adjective slot rotates — crucial, key, vital, pivotal, significant, central — but the frame is fixed. In LLM text, it functions as a generic importance marker.

“The hippocampus plays a crucial role in memory formation. The amygdala plays a significant role in emotional processing.”

Temporal framing opener

“In today’s fast-paced world.” “In an era of.” “In an increasingly interconnected society.” A conventional opener that situates the discussion in a vague, urgent present. The device implies that the topic is especially relevant now, without specifying what about the present makes it so.

“In today’s rapidly evolving technological landscape, the ability to adapt to change has become more important than ever."

"Sense of [abstract]”

Formulaic packaging of subjective or collective experience into a fixed noun-phrase template: “a sense of purpose,” “a sense of belonging,” “a sense of wonder.” The noun slot rotates; the frame is rigid.

“Traditions foster a sense of identity and provide a sense of continuity in a rapidly changing world.”

The paragraph sounds like it is describing rich human experience. It is often repeating a template with variable nouns.

Inclusive “we/us”

First-person plural pronouns — we, us, our — position the model as a co-member of a shared human audience. In classical rhetoric, “we” was the voice of the deliberative orator addressing the assembly. In LLM text, the pronoun does something stranger. A system that has no sensory experience, no memory of sunsets, and no body that sleeps or wakes writes as if it shares the reader’s phenomenal life.

“As we pause to take in the scene, we are reminded that even the most ordinary phenomena can inspire a sense of wonder and gratitude for the natural world we inhabit.”

The model sees the sunset with us. It pauses with us. It feels the wonder. It inhabits the world. In our preliminary data, this device declines sharply across GPT generations. Later models are withdrawing from the pretense, converging toward a more distanced register.

The second grid isolates the rhetorical movements rather than the punctuation. Each family selector now controls only the stylistic subset, so the patterns are easier to compare in context.

Preliminary measurements

To test whether the taxonomy above captures real generational patterns, we collected completions from models spanning 2023 to 2026 across four families: GPT (OpenAI), Llama (Meta), Mistral, and Qwen (Alibaba). Each model was prompted with 600 diverse questions covering explanatory, argumentative, and creative tasks, with instructions to generate approximately 300 words. This produces a length-controlled corpus of roughly 600 completions per model.

We extracted feature counts using a Python script operating through regular expressions. We tested all 26 features in the taxonomy against the corpus and retained the 18 where the regex produces counts above noise level. All rates are reported per 1,000 words, which provides a consistent normalization across AI models and the human baseline alike.

As a human baseline we sampled 3,648 texts, around 2.6 million words, from the gsingh1-py/train dataset on Hugging Face, a collection of human-written articles. The orange dashed line in each feature chart marks the human rate.

Linguistic diversity

To complement the rhetorical feature counts, we measured linguistic diversity following the framework proposed by Guo, Shang, and Clavel (TACL 2025), which evaluates model output along three dimensions.

Lexical diversity is measured via the Unique-n metric: the ratio of unique n-grams — unigrams, bigrams, trigrams — to the total number of n-grams across all generated outputs. A model that recycles the same vocabulary across completions scores low; one that draws from a wider word pool scores high.

Syntactic diversity is measured via dependency-tree distances. Each sentence is parsed into a dependency tree by a neural parser, and pairwise distances between trees are computed using the Weisfeiler-Lehman graph kernel. A model that locks into one or two sentence skeletons scores low; one that varies its clause structures, embedding depth, and coordination patterns scores high.

Semantic diversity is measured via Sentence-BERT embeddings. Each sentence is embedded into a vector, and the average pairwise cosine distance across all sentence vectors is reported. A model whose outputs cluster tightly around the same meaning scores low; one whose outputs spread across the embedding space scores high.

All three metrics are normalized between 0 and 1. The same human baseline corpus provides a reference: human text scores 0.521 lexical, 0.465 syntactic, and 0.468 semantic. Every model in the grid falls below the human baseline on all three dimensions.