Merge per-task reviews into one task reviewer (iteration 2)

Iteration-1 profiling: implementers and per-dispatch overhead dominate (429 of 686 subagent turns; controller coordination is half the dollars and scales with dispatch count), reviewers are individually lean, and the controller pasted the diff in only 2 of 22 review dispatches when the guidance was phrased as optional. Changes: spec-reviewer-prompt.md + code-quality-reviewer-prompt.md replaced by task-reviewer-prompt.md (one reviewer, one reading of a pasted diff, two verdicts: spec compliance ✅/❌/⚠️ and task quality); one fix dispatch can address both kinds of findings; controller now runs git diff itself and pastes it (imperative, not optional); implementers run focused tests while iterating and the full suite once before committing; flowchart, example, Red Flags, tool tables updated. The broad final whole-branch review is unchanged.
2026-06-12 21:59:04 +08:00 · 2026-06-09 23:58:28 -07:00
parent 3e3e1e701e
commit e3c74fc1c9
7 changed files with 198 additions and 295 deletions
--- a/skills/subagent-driven-development/task-reviewer-prompt.md
+++ b/skills/subagent-driven-development/task-reviewer-prompt.md
@@ -0,0 +1,156 @@
+# Task Reviewer Prompt Template
+
+Use this template when dispatching a task reviewer subagent. One reviewer, one
+reading of the diff, two verdicts: spec compliance and code quality.
+
+**Purpose:** Verify one task's implementation matches its requirements (nothing
+more, nothing less) and is well-built (clean, tested, maintainable)
+
+```
+Subagent (general-purpose):
+  description: "Review Task N (spec + quality)"
+  prompt: |
+    You are reviewing one task's implementation: first whether it matches its
+    requirements, then whether it is well-built. This is a task-scoped gate,
+    not a merge review — a broad whole-branch review happens separately after
+    all tasks are complete.
+
+    ## What Was Requested
+
+    [TASK_REQUIREMENTS]
+
+    ## What the Implementer Claims They Built
+
+    [DESCRIPTION]
+
+    ## Diff Under Review
+
+    **Base:** [BASE_SHA]
+    **Head:** [HEAD_SHA]
+
+    [DIFF]
+
+    Review from the diff above — do not re-run git commands or re-read the
+    files it already shows. If no diff was provided, fetch it yourself:
+    `git diff --stat [BASE_SHA]..[HEAD_SHA]` and `git diff [BASE_SHA]..[HEAD_SHA]`.
+    Only read files in this diff. Do not crawl the broader codebase. Inspect
+    code outside the diff only to evaluate a concrete risk you can name — and
+    name it in your report. Cross-cutting changes are legitimate named risks:
+    if the diff changes lock ordering, a function or API contract, or shared
+    mutable state, checking the call sites is the right method.
+
+    Your review is read-only on this checkout. Do not mutate the working
+    tree, the index, HEAD, or branch state in any way.
+
+    ## Do Not Trust the Report
+
+    Treat the implementer's report as unverified claims about the code. It
+    may be incomplete, inaccurate, or optimistic. Verify the claims against
+    the diff.
+
+    ## Tests
+
+    The implementer already ran the tests and reported results with TDD
+    evidence for exactly this code. Do not re-run the suite to confirm their
+    report. Run a test only when reading the code raises a specific doubt
+    that no existing run answers — and then a focused test, never a
+    package-wide suite, race detector run, or repeated/high-count loop. If
+    heavy validation seems warranted, recommend it in your report instead of
+    running it. If you cannot run commands in this environment, name the
+    test you would run.
+
+    Warnings or other noise in the implementer's reported test output are
+    findings — test output should be pristine.
+
+    ## Part 1: Spec Compliance
+
+    Compare the diff against What Was Requested:
+
+    - **Missing:** requirements they skipped, missed, or claimed without
+      implementing
+    - **Extra:** features that weren't requested, over-engineering, unneeded
+      "nice to haves"
+    - **Misunderstood:** right feature built the wrong way, wrong problem
+      solved
+
+    If a requirement cannot be verified from this diff alone (it lives in
+    unchanged code or spans tasks), report it as a ⚠️ item instead of
+    broadening your search.
+
+    ## Part 2: Code Quality
+
+    **Code quality:**
+    - Clean separation of concerns?
+    - Proper error handling?
+    - DRY without premature abstraction?
+    - Edge cases handled?
+
+    **Tests:**
+    - Do the new and changed tests verify real behavior, not mocks?
+    - Are the task's edge cases covered?
+
+    **Structure:**
+    - Does each file have one clear responsibility with a well-defined interface?
+    - Are units decomposed so they can be understood and tested independently?
+    - Is the implementation following the file structure from the plan?
+    - Did this change create new files that are already large, or
+      significantly grow existing files? (Don't flag pre-existing file
+      sizes — focus on what this change contributed.)
+
+    Cite file:line evidence for every finding and for any check you would
+    otherwise answer with a bare "yes." Cite, don't narrate — a tight report
+    that points at lines beats a long one that retells the diff.
+
+    ## Calibration
+
+    Categorize issues by actual severity. Not everything is Critical.
+    Important means this task cannot be trusted until it is fixed;
+    "coverage could be broader" and polish suggestions are Minor.
+    Acknowledge what was done well before listing issues — accurate praise
+    helps the implementer trust the rest of the feedback.
+
+    ## Output Format
+
+    ### Spec Compliance
+
+    - ✅ Spec compliant | ❌ Issues found: [what's missing/extra/misunderstood,
+      with file:line references]
+    - ⚠️ Cannot verify from diff: [requirements you could not verify from the
+      diff alone, and what the controller should check — report alongside the
+      ✅/❌ verdict for everything you could verify]
+
+    ### Strengths
+    [What's well done? Be specific.]
+
+    ### Issues
+
+    #### Critical (Must Fix)
+    #### Important (Should Fix)
+    #### Minor (Nice to Have)
+
+    For each issue: file:line, what's wrong, why it matters, how to fix
+    (if not obvious).
+
+    ### Assessment
+
+    **Task quality:** [Approved | Needs fixes]
+
+    **Reasoning:** [1-2 sentence technical assessment]
+```
+
+**Placeholders:**
+- `[TASK_REQUIREMENTS]` — full task text plus the spec/design's global
+  constraints that bind it (version floors, naming and copy rules, platform
+  requirements)
+- `[DESCRIPTION]` — what the implementer reports they built
+- `[BASE_SHA]` — commit before this task
+- `[HEAD_SHA]` — current commit
+- `[DIFF]` — paste `git diff BASE..HEAD` output (use `--stat` plus the
+  relevant hunks if it exceeds a few hundred lines); a reviewer with the
+  diff in hand needs few or no tool calls
+
+**Reviewer returns:** Spec Compliance verdict (✅/❌/⚠️), Strengths, Issues
+(Critical/Important/Minor), Task quality verdict
+
+A single fix dispatch can then address spec gaps and quality findings
+together; re-review after fixes covers both verdicts again.