From 3e3e1e701e0f76fff3cc3f2803badd1b549dc50a Mon Sep 17 00:00:00 2001 From: Jesse Vincent Date: Tue, 9 Jun 2026 22:42:54 -0700 Subject: [PATCH] Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence Round-2 fractals eval regressed to 70min/32.2M tokens (vs round-1's 42.8min/14.5M) while reaching baseline-parity quality. Per-subagent turn profiling attributed it to: haiku dispatches taking 2-3x the turns of sonnet (678 of 1197 subagent turns), reviewers re-fetching diffs by hand (518 Bash calls), and evidence-rule narration. Changes: turn-count-beats- token-price model guidance; controllers paste small diffs into reviewer prompts (reviewers then need few or no tool calls); evidence scoped to findings and would-be-bare-yes checks; Important defined as cannot-trust- until-fixed with coverage suggestions Minor; fixes dispatched only for Critical/Important. --- skills/subagent-driven-development/SKILL.md | 11 +++++++++++ .../code-quality-reviewer-prompt.md | 18 ++++++++++++++++-- .../spec-reviewer-prompt.md | 7 +++++++ 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/skills/subagent-driven-development/SKILL.md b/skills/subagent-driven-development/SKILL.md index ae801025..9a0f0660 100644 --- a/skills/subagent-driven-development/SKILL.md +++ b/skills/subagent-driven-development/SKILL.md @@ -104,6 +104,12 @@ most capable model; a subtle concurrency change does. omitted model inherits your session's model — often the most capable and most expensive — which silently defeats this section. +**Turn count beats token price.** Wall-clock and context cost scale with how +many turns a subagent takes, and the cheapest models routinely take 2-3× the +turns on multi-step work — costing more overall. Use a mid-tier model as the +floor for implementers and reviewers; reserve the cheapest tier for +single-file mechanical fixes. + **Task complexity signals (implementation tasks):** - Touches 1-2 files with a complete spec → cheap model - Touches multiple files with integration concerns → standard model @@ -154,6 +160,11 @@ final whole-branch review. When you fill a reviewer template: - Include the spec/design's global constraints that bind the task (version floors, naming and copy rules, platform requirements) in the requirements you paste — a reviewer can only enforce what you hand them. +- Paste the task's diff (`git diff BASE..HEAD` output) into the reviewer + prompt when it fits comfortably (up to a few hundred lines). A reviewer + with the diff in hand needs few or no tool calls. +- Dispatch fix subagents for Critical and Important findings. Record Minor + findings and move on — they roll up to the final whole-branch review. ## Prompt Templates diff --git a/skills/subagent-driven-development/code-quality-reviewer-prompt.md b/skills/subagent-driven-development/code-quality-reviewer-prompt.md index cb9dbc64..6fb27e3f 100644 --- a/skills/subagent-driven-development/code-quality-reviewer-prompt.md +++ b/skills/subagent-driven-development/code-quality-reviewer-prompt.md @@ -32,6 +32,14 @@ Subagent (general-purpose): git diff [BASE_SHA]..[HEAD_SHA] ``` + ## Diff + + [DIFF] + + If the diff is provided above, review from it directly — do not re-run + the git commands or re-read the files it already shows. Fetch anything + further only for a named concrete risk. + ## Read-Only Review Your review is read-only on this checkout. Do not mutate the working tree, @@ -84,12 +92,15 @@ Subagent (general-purpose): significantly grow existing files? (Don't flag pre-existing file sizes — focus on what this change contributed.) - Answer each item above with file:line evidence, not a bare yes or no. - An unsupported "yes" is not a review. + Cite file:line evidence for every finding and for any check you would + otherwise answer with a bare "yes." Cite, don't narrate — a tight report + that points at lines beats a long one that retells the diff. ## Calibration Categorize issues by actual severity. Not everything is Critical. + Important means this task cannot be trusted until it is fixed; + "coverage could be broader" and polish suggestions are Minor. Acknowledge what was done well before listing issues — accurate praise helps the implementer trust the rest of the feedback. @@ -127,5 +138,8 @@ Subagent (general-purpose): - `[TASK_TEXT]` — the task's requirements text or plan reference, for context - `[BASE_SHA]` — commit before this task - `[HEAD_SHA]` — current commit +- `[DIFF]` — paste `git diff BASE..HEAD` output when it fits comfortably + (up to a few hundred lines); otherwise replace with "(not provided — run + the git commands above)" **Reviewer returns:** Strengths, Issues (Critical/Important/Minor), Task quality verdict diff --git a/skills/subagent-driven-development/spec-reviewer-prompt.md b/skills/subagent-driven-development/spec-reviewer-prompt.md index 8e6d7f37..43e8d3d3 100644 --- a/skills/subagent-driven-development/spec-reviewer-prompt.md +++ b/skills/subagent-driven-development/spec-reviewer-prompt.md @@ -30,6 +30,13 @@ Subagent (general-purpose): Only read files in this diff. Do not crawl the broader codebase. + ## Diff + + [DIFF] + + If the diff is provided above, review from it directly — do not re-run + the git commands or re-read the files it already shows. + Spec compliance is judged by reading the diff against the requirements. The implementer already ran the tests and reported TDD evidence — do not re-run them. If a requirement cannot be verified from this diff alone