We ask each model to write a single grammatical English sentence that is as syntactically deep as possible — clauses nested within clauses, not length through coordination. Models are told they are competing against other frontier models and iteratively pushed to beat their own best output. Each sentence is scored by automated dependency parsing and gated by a coherence check from Claude Opus. The results are compared against literary sentences from Henry James, Virginia Woolf, and five other canonical prose stylists.
v1.0 — April 2026 · 27 models · 7 human references · Methodology
All 34 models ranked by composite score. Click column headers to sort.
| Rank | Model | Score ▼ | Max Depth | Mean Depth | Subord Ratio | Dep Distance | Words |
|---|
A sentence must pass gating -- single root, no run-on fragments or semicolon splices -- to receive a score. Five independent runs per model; the best valid sentence is kept. All parsing uses spaCy en_core_web_sm. Full methodology and examples →
Beyond the composite score, we catalog the types of subordination each model deploys and measure phrasal-level complexity. These patterns reveal how different models approach the problem of deep embedding.