The Ancestor's Error
Shumailov's Nature paper proved the mechanism in a closed loop. Ahrefs found 74% of new web pages contained AI text. The thought experiment is no longer hypothetical.
Atul Gawande wrote in 2018 about what got lost when his hospital migrated to Epic. AI summarization runs the same migration in seconds, on documents whose structure does work the words do not.
In computer science there is a doctrine so foundational it is rarely stated. Substrate independence. Information is information regardless of the medium that carries it. A novel is the same novel printed on paper or displayed on a screen. A mathematical proof is the same proof in chalk or in LaTeX. The content is separable from the container. The message is independent of the medium.
This principle has been productive. It underlies the theory of computation, the architecture of the internet, the entire digitization project of the last forty years. It is also, in an important class of cases, wrong. The class of cases is precisely where AI summarization does the most damage.
In November 2018 The New Yorker published an essay by Atul Gawande called "Why Doctors Hate Their Computers." Gawande spent the piece tracking what was lost when his hospital migrated from paper records and old systems to Epic. The clinical content survived. Diagnoses and lab values were preserved. What did not survive, Gawande argued, was the layer of artifacts clinicians had built around the content over decades. Forcing functions, spatial conventions, unit-specific shorthand. A sticky note on the front of a chart, for instance, guarantees that the next person to open it encounters the critical fact first. The EHR's equivalent is an alert banner, which works only if it is noticed, and the alert-fatigue literature is the documentation of how often it is not.
These are not quaint anachronisms. They were information-carrying structures that evolved to support cognitive work. The substrate had never been independent. The medium was, in McLuhan's old formulation, the message. In the migration, the message was lost in translation.
I think about Gawande's piece often when I read about AI summarization being deployed in document-heavy domains.
Consider an informed consent form for a clinical trial. Its structure is mandated by federal regulation, 21 CFR Part 50. There is a description of the study. A description of the procedures. A risks section. A benefits section. A statement of alternatives. A confidentiality provision. A voluntary participation clause. The structure is not decorative. It is regulatory. The fact that risks appear in their own section, separated from benefits, under specified headings, is a legal requirement. The structure embodies a theory of how a prospective participant will encounter the information, and what kind of consent that encounter is supposed to produce.
When an AI system summarizes such a document, it treats the information as substrate-independent. The facts about risks and benefits are extracted, reorganized according to the model's internal coherence logic, presented in a new shape. If the summary is checked for accuracy it passes. Every fact is present. Every statement is true.
The structure is gone. The information lived as much in the structure as in the words. The separation of risks from benefits is a regulatory requirement embodying a theory of human cognition. The way risk is presented changes how a person processes it. Decades of behavioral research support the requirement. The AI cannot know any of this from the text alone, because the meaning carried by structure is not in the text. The meaning lives in the relationship between the text and the regulatory, cognitive, and institutional framework that gives the structure its force.
There is a useful distinction between three layers in any structured document.
The propositional layer is the facts. "The study drug may cause nausea." Easy to extract, easy to verify.
The categorical layer is the assignment of facts to categories. Nausea listed under "Risks of the Study Drug," not under "Risks of Standard Treatment." The same fact in a different category becomes a different fact for legal purposes.
The architectural layer is the meaning carried by the document's shape. That risks and benefits are in separate sections. That the voluntary participation clause comes last. That the alternatives section exists at all, reminding the reader they have choices. This layer makes no factual claims. It is invisible to any system that processes documents as bags of facts. It is also, in the documents that have been carefully designed, the layer that does the most work.
Substrate independence holds at the propositional layer. It fails at the categorical layer. It does not have vocabulary for the architectural layer.
Every digital transformation initiative that treats legacy systems as containers of extractable information is making an implicit substrate-independence claim. That the information can be extracted, transformed, and loaded into a new system without loss. The new system may organize, present, and query the information differently, but the information itself is preserved. This claim is often false, and the falsity is often invisible. AI summarization is the most aggressive form of the claim, because it does in seconds what an EHR migration takes years to do, and it does it across domains the EHR migration never touched.
There is a discipline that follows from these three layers. Summarize the propositional layer freely. Treat the categorical layer as a guide that points the reader back to the source rather than a replacement for it. Read the architectural layer directly, in the documents that have one, by engaging with the original. In a clinical trial consent form, the procedure description can be summarized. The categorical assignment of an event to study-specific risk versus background risk needs to be checked against the source. The architectural separation of risks from benefits, the sequence in which the participant encounters the information, the bare existence of the alternatives section, is something only the source can show you.
That discipline is unfashionable. It does not scale. It costs money to enforce. It will lose, at every quarterly review, to a competitor who skips it. Whether it loses on the longer horizon depends on whether the firms that skipped it ever encounter a case where the architectural layer was what mattered, and on whether they recognize what they lost when they realize they lost it.
The architectural layer is invisible until it is what was needed.
Shumailov's Nature paper proved the mechanism in a closed loop. Ahrefs found 74% of new web pages contained AI text. The thought experiment is no longer hypothetical.
Epic's sepsis prediction model missed 67% of sepsis cases at Michigan Medicine. The audit method we built for AI cannot catch what the system never said.
Air France 447 and a 2025 Polish endoscopy trial point at the same trap. The more reliable the system, the more thoroughly its absence becomes catastrophic.
Tell us about the decision you're trying to improve. We'll schedule a briefing with our principals to understand your environment and explore a potential fit.
Schedule a Briefing