The Correctness Gradient

In April 2024 +972 Magazine and Local Call published a report based on six interviews with Israeli intelligence officers who had served in Gaza. The reporting described a software system the IDF had built and named Lavender. Lavender was a classifier. It had been trained on the profiles of known Hamas operatives and was used to identify, from cellular and movement data, individuals statistically resembling the training set. Officers told the reporters the system marked roughly 37,000 Palestinians. They told the reporters that Lavender's misidentification rate was, by internal estimate, around ten percent. They told the reporters that the human review of a Lavender match averaged twenty seconds.

I have read the report several times. The number that lodges itself is not 37,000 or ten percent. It is twenty.

In American military doctrine there is a phrase, positive identification, that does the heaviest work. Before a weapon is used, an operator must establish that the target meets the criteria for engagement under the applicable rules of engagement. Positive identification is not a binary. It is a judgment. It is the act of moving the determination from "this match looks correct" to "I will be responsible for what happens next." That movement is the entire point of having a human in the loop. Twenty seconds is what the movement looks like when the loop has been closed and there is nothing left of the movement.

I have used the phrase "categorical collapse" in my own writing for the way AI systems compress qualitatively distinct judgments into a continuous probability score. The Lavender reporting is what categorical collapse looks like when the categories are combatant and civilian.

Here is the architectural part worth dwelling on.

The criteria for engagement under the law of armed conflict are not features. They are not "carrying a weapon" plus "located in a known threat area" plus "associated with a flagged phone." They are, irreducibly, contextual judgments. A man with a rifle in his courtyard is, depending on context the sensor cannot capture, either a Hamas operative, a member of the Palestinian civil police, a relative protecting his family from looters, or a guest at a wedding firing celebratory shots. The features that distinguish him are the same in all four cases. The judgment is a different kind of object than the features.

A model that fuses features into a probability score is not preserving the distinction. The score is the dissolved version of the categorical judgment. Whatever number Lavender prints next to a name does not, structurally cannot, contain the difference between "high probability of hostile combatant" and "confirmed hostile combatant." The first is a statistical estimate. The second is a determination, made under a legal standard, by an actor who can be held responsible. Twenty seconds is not enough time for the actor to convert one into the other. Twenty seconds is enough time to glance at the screen.

I do not write this as a comment on the war. I write it because the architecture is the architecture, and it is being assembled in every domain where high-tempo decisions and AI-mediated targeting recommendations co-exist.

The American military is building its version. Project Maven, which began in 2017 as a computer-vision project for full-motion video, has expanded across what the Department of Defense now calls the JADC2 stack. The doctrinal frame is "human on the loop" rather than "human in the loop," and the on-the-loop human will, under the same time pressures that produced the twenty-second cadence in Gaza, do approximately what those officers did. The pressure to compress the OODA loop is not a moral failing. It is a structural feature of any operational tempo where the threat moves faster than deliberation. Categorical boundaries are the first casualty of acceleration, because categorical boundaries require thought, and thought is what acceleration is in tension with.

The hardest part of writing about this honestly is that I do not have a recommendation that survives contact with the operational reality. "Insist on more than twenty seconds" is correct and impotent. "Do not deploy the system" is a different argument from a different person, and it is not the argument the targeting cell is having. The argument they are having is whether the next sortie launches in six minutes or eight, and whether the recommendation in front of them is good enough to act on.

The fact that the question is being asked in those terms is the answer to the question of whether the categorical boundary survived. It did not.

What I want from this piece is not a recommendation. I want the architecture to be visible, named, and on the table when the next conversation begins. Twenty seconds is not human review. Whatever it is, we should call it by its right name before we deploy more of it.

The Correctness Gradient

Further Reading

Related to Defense Strategy

The Ancestor's Error

The Taxonomy of Silence

The Fidelity Trap

Initiate Contact

Ready to transform your
decision architecture?

The Correctness Gradient

Further Reading

Related to Defense Strategy

The Ancestor's Error

The Taxonomy of Silence

The Fidelity Trap

Initiate Contact

Ready to transform your decision architecture?

Ready to transform your
decision architecture?