Deepfield: a modular assessment platform

The situation

Strategic assessment usually ends in a slide deck. The deck is stale within a quarter. The world has moved by then, and the analyst's reasoning is locked in a PDF that nobody can ask another question of. The program office's choices are to live with the stale answer or pay for another that will be stale in three months.

We built Deepfield because we wanted to stop selling memos.

What we built

We built Deepfield with the Andrew W. Marshall Foundation. The first version was for defense-oriented strategic assessment, where Marshall's analytic disciplines from the Office of Net Assessment encoded directly as runnable pipeline stages. The platform has since been generalized to support strategic assessment ontologies beyond defense.

Deepfield is an eleven-stage assessment pipeline. You feed it a strategic question. The system moves it through eleven modular stages, each with a defined input and output.

Charter. The system generates an assessment charter that captures decision context, time horizons, deliverable expectations, and the evidence that would change the answer. The charter requires human approval before any downstream stage runs, so the analyst and the platform agree on scope before compute commits.
Intake. A scoping pass produces an 8-step Marshall flow output, names the actors and centers of gravity, builds out the key intelligence questions, and runs an adversarial review of the framing itself.
Research. A research orchestrator dispatches parallel sub-agents, reranks the passages they retrieve, verifies citations, looks for under-represented hypotheses, and gates on coverage of the intelligence questions.
Extract. Entity and relationship extraction across the retrieved sources, with two-pass coreference resolution, evidence weighting, deduplication, and per-claim citation verification.
Graph. The extracted material becomes a knowledge graph with hierarchical community structure at coarse, medium, and fine resolutions, so the reasoner can ask questions of the corpus at different levels of zoom.
Reason. A multi-stage pass runs hypothesis generation, hypothesis tree search, an Analysis of Competing Hypotheses evidence matrix where contradicting evidence dominates each verdict (after Heuer's disconfirmation principle), iterative retrieval to fill diagnostic gaps, multi-agent debate, adversary rescoring, causal cascade analysis, synthesis, and a diagnostic trace. Between passes, the system identifies gaps between competing hypotheses, runs targeted search, folds new evidence into the graph, and re-runs the reasoner. RAND's directed-collection principle, encoded as a stage.
Analyze. Thirteen-type asymmetry detection, Mason and Mitroff's Strategic Assumption Surfacing and Testing, STEEP framing, signpost generation, twelve-persona red team challenges, self-consistency voting, an LLM-as-judge quality gate, self-refine, and a calibration audit.
Decision. A course-of-action pipeline runs synthesis, effect chains, resource estimation, multi-dimensional risk analysis, Pareto scoring, Roger Martin's "what would have to be true" framing, real options, Taleb-style resilience classification, reversibility, and decision briefs.
Wargame. Matrix-game solving over cognitive agents with operational-code analysis and perception masks, agent-based Monte Carlo simulation, twelve-persona red-team analysis, and a ranked vote.
Reconciliation. A post-wargame revising pass updates assumption robustness, course-of-action scores, decision-brief linchpins, and analytical confidence based on what the wargame revealed. Marshall's iterative review, encoded as a stage.
Output. The bundle. Reports in multiple formats. Knowledge-graph projections, victory cards, course-of-action comparisons, sensitivity heatmaps, equilibrium diagrams.

The whole thing runs end-to-end from a single command.

How a recommendation traces back

Pick any recommendation Deepfield produces. You can trace it backward.

The recommendation came from a ranked course of action. That course came from a decision pipeline that scored it across multiple dimensions and risk categories. Those scores came from a reasoner that ranked competing hypotheses against a Heuer-style evidence matrix where contradicting evidence dominates. The hypotheses were grounded in a knowledge graph. Each graph node came from extracted entities with a confidence score. Each entity came from specific source passages the extraction agents evaluated. Each source was retrieved by a research iteration that ran a specific query because a prior iteration flagged a coverage gap. After the wargame, every score in that chain was revised by a reconciliation pass that closed the loop.

That chain is carried as auditable provenance through every stage. Provenance is how the system is built. It exists whether or not anyone asks to see it. If a stage fails, it fails out loud. The pipeline doesn't fall back to templates or cached defaults to hide a broken call.

What it replaced

The default alternative is a consultancy engagement that delivers a PDF. Deepfield is the infrastructure underneath that engagement, kept running. The same analysis runs again when your inputs change, without re-paying for the analyst's learning curve. Any conclusion can be interrogated down to the document that supports it.

What a similar engagement looks like

Deepfield is a platform we embed. A typical engagement runs 12 to 16 weeks. We configure the pipeline for your domain, ingest your sources and credibility priors, tune the reasoning weights, and run the first real assessment together with your team. At the end you keep the running system, the knowledge graph, the code, and your data.

It's a fit when you have a recurring assessment need, your own sources, and a requirement that every conclusion be traceable back to the evidence that produced it. If your decision is one-shot and never repeats, Deepfield is too much infrastructure. If it's ongoing and the stakes are high, a platform you run yourself costs less over the first year than two consulting engagements, and the knowledge graph it builds gets sharper every quarter.

Deepfield: a modular assessment platform

The situation

What we built

How a recommendation traces back

What it replaced

What a similar engagement looks like

Making the case inside your organization?

More Work

Other systems we've shipped

DARPA program analysis sites: evidence you can audit

The Marshall archive: making a hidden corpus navigable

Georgetown Wicked Problems Lab: a classroom where AI assists but never decides

Initiate Contact

Bring us a decision you have to make and defend.