Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence

Round-2 fractals eval regressed to 70min/32.2M tokens (vs round-1's 42.8min/14.5M) while reaching baseline-parity quality. Per-subagent turn profiling attributed it to: haiku dispatches taking 2-3x the turns of sonnet (678 of 1197 subagent turns), reviewers re-fetching diffs by hand (518 Bash calls), and evidence-rule narration. Changes: turn-count-beats- token-price model guidance; controllers paste small diffs into reviewer prompts (reviewers then need few or no tool calls); evidence scoped to findings and would-be-bare-yes checks; Important defined as cannot-trust- until-fixed with coverage suggestions Minor; fixes dispatched only for Critical/Important.
2026-07-06 01:39:04 +08:00 · 2026-06-09 22:42:54 -07:00
parent b42846401f
commit da0a11f6d4
3 changed files with 34 additions and 2 deletions
--- a/skills/subagent-driven-development/spec-reviewer-prompt.md
+++ b/skills/subagent-driven-development/spec-reviewer-prompt.md
@@ -30,6 +30,13 @@ Subagent (general-purpose):

    Only read files in this diff. Do not crawl the broader codebase.

+    ## Diff
+
+    [DIFF]
+
+    If the diff is provided above, review from it directly — do not re-run
+    the git commands or re-read the files it already shows.
+
    Spec compliance is judged by reading the diff against the requirements.
    The implementer already ran the tests and reported TDD evidence — do not
    re-run them. If a requirement cannot be verified from this diff alone