Merge per-task reviews into one task reviewer (iteration 2)

Iteration-1 profiling: implementers and per-dispatch overhead dominate (429 of 686 subagent turns; controller coordination is half the dollars and scales with dispatch count), reviewers are individually lean, and the controller pasted the diff in only 2 of 22 review dispatches when the guidance was phrased as optional. Changes: spec-reviewer-prompt.md + code-quality-reviewer-prompt.md replaced by task-reviewer-prompt.md (one reviewer, one reading of a pasted diff, two verdicts: spec compliance ✅/❌/⚠️ and task quality); one fix dispatch can address both kinds of findings; controller now runs git diff itself and pastes it (imperative, not optional); implementers run focused tests while iterating and the full suite once before committing; flowchart, example, Red Flags, tool tables updated. The broad final whole-branch review is unchanged.
2026-07-27 12:44:01 +08:00 · 2026-06-09 23:58:28 -07:00
parent 3e3e1e701e
commit e3c74fc1c9
7 changed files with 198 additions and 295 deletions
--- a/skills/subagent-driven-development/SKILL.md
+++ b/skills/subagent-driven-development/SKILL.md
@@ -5,11 +5,11 @@ description: Use when executing implementation plans with independent tasks in t

 # Subagent-Driven Development

-Execute plan by dispatching fresh subagent per task, with two-stage review after each: spec compliance review first, then code quality review.
+Execute plan by dispatching a fresh implementer subagent per task, a combined task review (spec compliance + code quality, one reviewer, one reading of the diff) after each, and a broad whole-branch review at the end.

 **Why subagents:** You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.

-**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
+**Core principle:** Fresh subagent per task + one task review (spec + quality verdicts) + broad final review = high quality, fast iteration

 **Continuous execution:** Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.

@@ -36,7 +36,7 @@ digraph when_to_use {
 **vs. Executing Plans (parallel session):**
 - Same session (no context switch)
 - Fresh subagent per task (no context pollution)
- Two-stage review after each task: spec compliance first, then code quality
+- Combined review after each task (spec compliance + code quality verdicts), broad review at the end
 - Faster iteration (no human-in-loop between tasks)

 ## The Process
@@ -51,12 +51,9 @@ digraph process {
        "Implementer subagent asks questions?" [shape=diamond];
        "Answer questions, provide context" [shape=box];
        "Implementer subagent implements, tests, commits, self-reviews" [shape=box];
-        "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [shape=box];
-        "Spec reviewer subagent confirms code matches spec?" [shape=diamond];
-        "Implementer subagent fixes spec gaps" [shape=box];
-        "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [shape=box];
-        "Code quality reviewer subagent approves?" [shape=diamond];
-        "Implementer subagent fixes quality issues" [shape=box];
+        "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [shape=box];
+        "Task reviewer reports spec ✅ and quality approved?" [shape=diamond];
+        "Dispatch fix subagent for Critical/Important findings" [shape=box];
        "Mark task complete in todo list" [shape=box];
    }

@@ -70,15 +67,11 @@ digraph process {
    "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
    "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
    "Implementer subagent asks questions?" -> "Implementer subagent implements, tests, commits, self-reviews" [label="no"];
-    "Implementer subagent implements, tests, commits, self-reviews" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)";
-    "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" -> "Spec reviewer subagent confirms code matches spec?";
-    "Spec reviewer subagent confirms code matches spec?" -> "Implementer subagent fixes spec gaps" [label="no"];
-    "Implementer subagent fixes spec gaps" -> "Dispatch spec reviewer subagent (./spec-reviewer-prompt.md)" [label="re-review"];
-    "Spec reviewer subagent confirms code matches spec?" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="yes"];
-    "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" -> "Code quality reviewer subagent approves?";
-    "Code quality reviewer subagent approves?" -> "Implementer subagent fixes quality issues" [label="no"];
-    "Implementer subagent fixes quality issues" -> "Dispatch code quality reviewer subagent (./code-quality-reviewer-prompt.md)" [label="re-review"];
-    "Code quality reviewer subagent approves?" -> "Mark task complete in todo list" [label="yes"];
+    "Implementer subagent implements, tests, commits, self-reviews" -> "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)";
+    "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)" -> "Task reviewer reports spec ✅ and quality approved?";
+    "Task reviewer reports spec ✅ and quality approved?" -> "Dispatch fix subagent for Critical/Important findings" [label="no"];
+    "Dispatch fix subagent for Critical/Important findings" -> "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [label="re-review"];
+    "Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list" [label="yes"];
    "Mark task complete in todo list" -> "More tasks remain?";
    "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
    "More tasks remain?" -> "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [label="no"];
@@ -119,7 +112,7 @@ single-file mechanical fixes.

 Implementer subagents report one of four statuses. Handle each appropriately:

-**DONE:** Proceed to spec compliance review.
+**DONE:** Run `git diff BASE..HEAD`, then dispatch the task reviewer.

 **DONE_WITH_CONCERNS:** The implementer completed the work but flagged doubts. Read the concerns before proceeding. If the concerns are about correctness or scope, address them before review. If they're observations (e.g., "this file is getting large"), note them and proceed to review.

@@ -133,12 +126,12 @@ Implementer subagents report one of four statuses. Handle each appropriately:

 **Never** ignore an escalation or force the same model to retry without changes. If the implementer said it's stuck, something needs to change.

-## Handling Spec Reviewer ⚠️ Items
+## Handling Reviewer ⚠️ Items

-The spec reviewer may report "⚠️ Cannot verify from diff" items — requirements
-that live in unchanged code or span tasks. These do not block dispatching the
-code quality reviewer, but you must resolve each one yourself before marking
-the task complete: you hold the plan and cross-task context the reviewer
+The task reviewer may report "⚠️ Cannot verify from diff" items — requirements
+that live in unchanged code or span tasks. These do not block the rest of the
+review, but you must resolve each one yourself before marking the task
+complete: you hold the plan and cross-task context the reviewer
 lacks. If you confirm an item is a real gap, treat it as a failed spec
 review — send it back to the implementer and re-review.

@@ -160,17 +153,17 @@ final whole-branch review. When you fill a reviewer template:
 - Include the spec/design's global constraints that bind the task (version
  floors, naming and copy rules, platform requirements) in the requirements
  you paste — a reviewer can only enforce what you hand them.
- Paste the task's diff (`git diff BASE..HEAD` output) into the reviewer
-  prompt when it fits comfortably (up to a few hundred lines). A reviewer
-  with the diff in hand needs few or no tool calls.
+- Run `git diff BASE..HEAD` yourself and paste the output into the reviewer
+  prompt (`--stat` plus the relevant hunks if it exceeds a few hundred
+  lines). A reviewer with the diff in hand needs few or no tool calls; do
+  not make reviewers re-derive the diff.
 - Dispatch fix subagents for Critical and Important findings. Record Minor
  findings and move on — they roll up to the final whole-branch review.

 ## Prompt Templates

 - [implementer-prompt.md](implementer-prompt.md) - Dispatch implementer subagent
- [spec-reviewer-prompt.md](spec-reviewer-prompt.md) - Dispatch spec compliance reviewer subagent
- [code-quality-reviewer-prompt.md](code-quality-reviewer-prompt.md) - Dispatch code quality reviewer subagent
+- [task-reviewer-prompt.md](task-reviewer-prompt.md) - Dispatch task reviewer subagent (spec compliance + code quality, one dispatch)
 - Final whole-branch review: use superpowers:requesting-code-review's [code-reviewer.md](../requesting-code-review/code-reviewer.md)

 ## Example Workflow
@@ -198,11 +191,9 @@ Implementer: "Got it. Implementing now..."
  - Self-review: Found I missed --force flag, added it
  - Committed

-[Dispatch spec compliance reviewer]
-Spec reviewer: ✅ Spec compliant - all requirements met, nothing extra
-
-[Get git SHAs, dispatch code quality reviewer]
-Code reviewer: Strengths: Good test coverage, clean. Issues: None. Task quality: Approved.
+[Run git diff, dispatch task reviewer with the diff pasted in]
+Task reviewer: Spec ✅ - all requirements met, nothing extra.
+  Strengths: Good test coverage, clean. Issues: None. Task quality: Approved.

 [Mark Task 1 complete]

@@ -218,25 +209,17 @@ Implementer:
  - Self-review: All good
  - Committed

-[Dispatch spec compliance reviewer]
-Spec reviewer: ❌ Issues:
+[Run git diff, dispatch task reviewer with the diff pasted in]
+Task reviewer: Spec ❌:
  - Missing: Progress reporting (spec says "report every 100 items")
  - Extra: Added --json flag (not requested)
+  Issues (Important): Magic number (100)

-[Implementer fixes issues]
-Implementer: Removed --json flag, added progress reporting
+[Dispatch fix subagent with all findings]
+Fixer: Removed --json flag, added progress reporting, extracted PROGRESS_INTERVAL constant

-[Spec reviewer reviews again]
-Spec reviewer: ✅ Spec compliant now
-
-[Dispatch code quality reviewer]
-Code reviewer: Strengths: Solid. Issues (Important): Magic number (100)
-
-[Implementer fixes]
-Implementer: Extracted PROGRESS_INTERVAL constant
-
-[Code reviewer reviews again]
-Code reviewer: ✅ Task quality: Approved
+[Task reviewer reviews again]
+Task reviewer: Spec ✅. Task quality: Approved.

 [Mark Task 2 complete]

@@ -270,13 +253,13 @@ Done!

 **Quality gates:**
 - Self-review catches issues before handoff
- Two-stage review: spec compliance, then code quality
+- Task review carries two verdicts: spec compliance and code quality
 - Review loops ensure fixes actually work
 - Spec compliance prevents over/under-building
 - Code quality ensures implementation is well-built

 **Cost:**
- More subagent invocations (implementer + 2 reviewers per task)
+- More subagent invocations (implementer + reviewer per task)
 - Controller does more prep work (extracting all tasks upfront)
 - Review loops add iterations
 - But catches issues early (cheaper than debugging later)
@@ -285,20 +268,19 @@ Done!

 **Never:**
 - Start implementation on main/master branch without explicit user consent
- Skip reviews (spec compliance OR code quality)
+- Skip task review, or accept a report missing either verdict (spec compliance AND task quality are both required)
 - Proceed with unfixed issues
 - Dispatch multiple implementation subagents in parallel (conflicts)
 - Make subagent read plan file (provide full text instead)
 - Skip scene-setting context (subagent needs to understand where task fits)
 - Ignore subagent questions (answer before letting them proceed)
- Accept "close enough" on spec compliance (spec reviewer found issues = not done)
+- Accept "close enough" on spec compliance (reviewer found spec issues = not done)
 - Skip review loops (reviewer found issues = implementer fixes = review again)
 - Let implementer self-review replace actual review (both are needed)
 - Tell a reviewer what not to flag, or pre-rate a finding's severity in the
  dispatch prompt ("treat it as Minor at most") — the plan's example code is
  a starting point, not evidence that its weaknesses were chosen
- **Start code quality review before spec compliance is ✅** (wrong order)
- Move to next task while either review has open issues
+- Move to next task while the review has open Critical/Important issues

 **If subagent asks questions:**
 - Answer clearly and completely
--- a/skills/subagent-driven-development/code-quality-reviewer-prompt.md
+++ b/skills/subagent-driven-development/code-quality-reviewer-prompt.md
@@ -1,145 +0,0 @@
-# Code Quality Reviewer Prompt Template
-
-Use this template when dispatching a code quality reviewer subagent.
-
-**Purpose:** Verify one task's implementation is well-built (clean, tested, maintainable)
-
-**Only dispatch after spec compliance review passes.**
-
-```
-Subagent (general-purpose):
-  description: "Review code quality for Task N"
-  prompt: |
-    You are reviewing one task's implementation for code quality. This is a
-    task-scoped gate, not a merge review — a broad whole-branch review happens
-    separately after all tasks are complete.
-
-    ## What Was Implemented
-
-    [DESCRIPTION]
-
-    ## Task Requirements (context only)
-
-    [TASK_TEXT]
-
-    ## Git Range to Review
-
-    **Base:** [BASE_SHA]
-    **Head:** [HEAD_SHA]
-
-    ```bash
-    git diff --stat [BASE_SHA]..[HEAD_SHA]
-    git diff [BASE_SHA]..[HEAD_SHA]
-    ```
-
-    ## Diff
-
-    [DIFF]
-
-    If the diff is provided above, review from it directly — do not re-run
-    the git commands or re-read the files it already shows. Fetch anything
-    further only for a named concrete risk.
-
-    ## Read-Only Review
-
-    Your review is read-only on this checkout. Do not mutate the working tree,
-    the index, HEAD, or branch state in any way. Use tools like `git show`,
-    `git diff`, and `git log` to inspect history.
-
-    ## Scope
-
-    Spec compliance was already verified by a separate reviewer. Do not
-    re-check whether the code matches the requirements or the plan.
-
-    Start from the diff. Read the changed files first. Inspect code outside
-    the diff only to evaluate a concrete risk you can name — and name it in
-    your report. Cross-cutting changes are legitimate named risks: if the
-    diff changes lock ordering, a function or API contract, or shared mutable
-    state, checking the call sites is the right method. Do not crawl the
-    codebase by default.
-
-    ## Tests
-
-    The implementer already ran the tests and reported results with TDD
-    evidence for exactly this code. Do not re-run the suite to confirm their
-    report. Run a test only when reading the code raises a specific doubt
-    that no existing run answers — and then a focused test, never a
-    package-wide suite, race detector run, or repeated/high-count loop. If
-    heavy validation seems warranted, recommend it in your report instead of
-    running it. If you cannot run commands in this environment, name the
-    test you would run.
-
-    Warnings or other noise in the implementer's reported test output are
-    findings — test output should be pristine.
-
-    ## What to Check
-
-    **Code quality:**
-    - Clean separation of concerns?
-    - Proper error handling?
-    - DRY without premature abstraction?
-    - Edge cases handled?
-
-    **Tests:**
-    - Do the new and changed tests verify real behavior, not mocks?
-    - Are the task's edge cases covered?
-
-    **Structure:**
-    - Does each file have one clear responsibility with a well-defined interface?
-    - Are units decomposed so they can be understood and tested independently?
-    - Is the implementation following the file structure from the plan?
-    - Did this change create new files that are already large, or
-      significantly grow existing files? (Don't flag pre-existing file
-      sizes — focus on what this change contributed.)
-
-    Cite file:line evidence for every finding and for any check you would
-    otherwise answer with a bare "yes." Cite, don't narrate — a tight report
-    that points at lines beats a long one that retells the diff.
-
-    ## Calibration
-
-    Categorize issues by actual severity. Not everything is Critical.
-    Important means this task cannot be trusted until it is fixed;
-    "coverage could be broader" and polish suggestions are Minor.
-    Acknowledge what was done well before listing issues — accurate praise
-    helps the implementer trust the rest of the feedback.
-
-    ## Output Format
-
-    ### Strengths
-    [What's well done? Be specific.]
-
-    ### Issues
-
-    #### Critical (Must Fix)
-    [Bugs, data loss risks, broken functionality]
-
-    #### Important (Should Fix)
-    [Poor error handling, test gaps, structural problems]
-
-    #### Minor (Nice to Have)
-    [Code style, optimization opportunities]
-
-    For each issue:
-    - File:line reference
-    - What's wrong
-    - Why it matters
-    - How to fix (if not obvious)
-
-    ### Assessment
-
-    **Task quality:** [Approved | Needs fixes]
-
-    **Reasoning:** [1-2 sentence technical assessment]
-```
-
-**Placeholders:**
- `[DESCRIPTION]` — task summary, from implementer's report
- `[TASK_TEXT]` — the task's requirements text or plan reference, for context
- `[BASE_SHA]` — commit before this task
- `[HEAD_SHA]` — current commit
- `[DIFF]` — paste `git diff BASE..HEAD` output when it fits comfortably
-  (up to a few hundred lines); otherwise replace with "(not provided — run
-  the git commands above)"
-
-**Reviewer returns:** Strengths, Issues (Critical/Important/Minor), Task quality verdict
--- a/skills/subagent-driven-development/implementer-prompt.md
+++ b/skills/subagent-driven-development/implementer-prompt.md
@@ -41,6 +41,9 @@ Subagent (general-purpose):
    **While you work:** If you encounter something unexpected or unclear, **ask questions**.
    It's always OK to pause and clarify. Don't guess or make assumptions.

+    While iterating, run the focused test for what you're changing; run the
+    full suite once before committing, not after every edit.
+
    ## Code Organization

    You reason best about code you can hold in context at once, and your edits are more
--- a/skills/subagent-driven-development/spec-reviewer-prompt.md
+++ b/skills/subagent-driven-development/spec-reviewer-prompt.md
@@ -1,93 +0,0 @@
-# Spec Compliance Reviewer Prompt Template
-
-Use this template when dispatching a spec compliance reviewer subagent.
-
-**Purpose:** Verify implementer built what was requested (nothing more, nothing less)
-
-```
-Subagent (general-purpose):
-  description: "Review spec compliance for Task N"
-  prompt: |
-    You are reviewing whether an implementation matches its specification.
-
-    ## What Was Requested
-
-    [FULL TEXT of task requirements]
-
-    ## What Implementer Claims They Built
-
-    [From implementer's report]
-
-    ## Git Range to Review
-
-    **Base:** [BASE_SHA — commit before this task]
-    **Head:** [HEAD_SHA — current commit]
-
-    ```bash
-    git diff --stat [BASE_SHA]..[HEAD_SHA]
-    git diff [BASE_SHA]..[HEAD_SHA]
-    ```
-
-    Only read files in this diff. Do not crawl the broader codebase.
-
-    ## Diff
-
-    [DIFF]
-
-    If the diff is provided above, review from it directly — do not re-run
-    the git commands or re-read the files it already shows.
-
-    Spec compliance is judged by reading the diff against the requirements.
-    The implementer already ran the tests and reported TDD evidence — do not
-    re-run them. If a requirement cannot be verified from this diff alone
-    (it lives in unchanged code or spans tasks), report it as a ⚠️ item
-    instead of broadening your search.
-
-    ## Read-Only Review
-
-    Your review is read-only on this checkout. Do not mutate the working tree, the index, HEAD, or branch state in any way. Use tools like `git show`, `git diff`, and `git log` to inspect history.
-
-    ## CRITICAL: Do Not Trust the Report
-
-    Treat the implementer's report as unverified claims about the code. It may
-    be incomplete, inaccurate, or optimistic. Verify the claims against the diff.
-
-    **DO NOT:**
-    - Take their word for what they implemented
-    - Trust their claims about completeness
-    - Accept their interpretation of requirements
-
-    **DO:**
-    - Read the actual code they wrote
-    - Compare actual implementation to requirements line by line
-    - Check for missing pieces they claimed to implement
-    - Look for extra features they didn't mention
-
-    ## Your Job
-
-    Read the diff and verify:
-
-    **Missing requirements:**
-    - Did they implement everything that was requested?
-    - Are there requirements they skipped or missed?
-    - Did they claim something works but didn't actually implement it?
-
-    **Extra/unneeded work:**
-    - Did they build things that weren't requested?
-    - Did they over-engineer or add unnecessary features?
-    - Did they add "nice to haves" that weren't in spec?
-
-    **Misunderstandings:**
-    - Did they interpret requirements differently than intended?
-    - Did they solve the wrong problem?
-    - Did they implement the right feature but wrong way?
-
-    **Verify by reading the diff, not by trusting the report.**
-
-    Report:
-    - ✅ Spec compliant (if everything matches after code inspection)
-    - ❌ Issues found: [list specifically what's missing or extra, with file:line references]
-    - ⚠️ Cannot verify from diff: [requirements you could not verify from the
-      diff alone, and what the controller should check — report alongside the
-      ✅/❌ verdict for everything you could verify]
-```
--- a/skills/subagent-driven-development/task-reviewer-prompt.md
+++ b/skills/subagent-driven-development/task-reviewer-prompt.md
@@ -0,0 +1,156 @@
+# Task Reviewer Prompt Template
+
+Use this template when dispatching a task reviewer subagent. One reviewer, one
+reading of the diff, two verdicts: spec compliance and code quality.
+
+**Purpose:** Verify one task's implementation matches its requirements (nothing
+more, nothing less) and is well-built (clean, tested, maintainable)
+
+```
+Subagent (general-purpose):
+  description: "Review Task N (spec + quality)"
+  prompt: |
+    You are reviewing one task's implementation: first whether it matches its
+    requirements, then whether it is well-built. This is a task-scoped gate,
+    not a merge review — a broad whole-branch review happens separately after
+    all tasks are complete.
+
+    ## What Was Requested
+
+    [TASK_REQUIREMENTS]
+
+    ## What the Implementer Claims They Built
+
+    [DESCRIPTION]
+
+    ## Diff Under Review
+
+    **Base:** [BASE_SHA]
+    **Head:** [HEAD_SHA]
+
+    [DIFF]
+
+    Review from the diff above — do not re-run git commands or re-read the
+    files it already shows. If no diff was provided, fetch it yourself:
+    `git diff --stat [BASE_SHA]..[HEAD_SHA]` and `git diff [BASE_SHA]..[HEAD_SHA]`.
+    Only read files in this diff. Do not crawl the broader codebase. Inspect
+    code outside the diff only to evaluate a concrete risk you can name — and
+    name it in your report. Cross-cutting changes are legitimate named risks:
+    if the diff changes lock ordering, a function or API contract, or shared
+    mutable state, checking the call sites is the right method.
+
+    Your review is read-only on this checkout. Do not mutate the working
+    tree, the index, HEAD, or branch state in any way.
+
+    ## Do Not Trust the Report
+
+    Treat the implementer's report as unverified claims about the code. It
+    may be incomplete, inaccurate, or optimistic. Verify the claims against
+    the diff.
+
+    ## Tests
+
+    The implementer already ran the tests and reported results with TDD
+    evidence for exactly this code. Do not re-run the suite to confirm their
+    report. Run a test only when reading the code raises a specific doubt
+    that no existing run answers — and then a focused test, never a
+    package-wide suite, race detector run, or repeated/high-count loop. If
+    heavy validation seems warranted, recommend it in your report instead of
+    running it. If you cannot run commands in this environment, name the
+    test you would run.
+
+    Warnings or other noise in the implementer's reported test output are
+    findings — test output should be pristine.
+
+    ## Part 1: Spec Compliance
+
+    Compare the diff against What Was Requested:
+
+    - **Missing:** requirements they skipped, missed, or claimed without
+      implementing
+    - **Extra:** features that weren't requested, over-engineering, unneeded
+      "nice to haves"
+    - **Misunderstood:** right feature built the wrong way, wrong problem
+      solved
+
+    If a requirement cannot be verified from this diff alone (it lives in
+    unchanged code or spans tasks), report it as a ⚠️ item instead of
+    broadening your search.
+
+    ## Part 2: Code Quality
+
+    **Code quality:**
+    - Clean separation of concerns?
+    - Proper error handling?
+    - DRY without premature abstraction?
+    - Edge cases handled?
+
+    **Tests:**
+    - Do the new and changed tests verify real behavior, not mocks?
+    - Are the task's edge cases covered?
+
+    **Structure:**
+    - Does each file have one clear responsibility with a well-defined interface?
+    - Are units decomposed so they can be understood and tested independently?
+    - Is the implementation following the file structure from the plan?
+    - Did this change create new files that are already large, or
+      significantly grow existing files? (Don't flag pre-existing file
+      sizes — focus on what this change contributed.)
+
+    Cite file:line evidence for every finding and for any check you would
+    otherwise answer with a bare "yes." Cite, don't narrate — a tight report
+    that points at lines beats a long one that retells the diff.
+
+    ## Calibration
+
+    Categorize issues by actual severity. Not everything is Critical.
+    Important means this task cannot be trusted until it is fixed;
+    "coverage could be broader" and polish suggestions are Minor.
+    Acknowledge what was done well before listing issues — accurate praise
+    helps the implementer trust the rest of the feedback.
+
+    ## Output Format
+
+    ### Spec Compliance
+
+    - ✅ Spec compliant | ❌ Issues found: [what's missing/extra/misunderstood,
+      with file:line references]
+    - ⚠️ Cannot verify from diff: [requirements you could not verify from the
+      diff alone, and what the controller should check — report alongside the
+      ✅/❌ verdict for everything you could verify]
+
+    ### Strengths
+    [What's well done? Be specific.]
+
+    ### Issues
+
+    #### Critical (Must Fix)
+    #### Important (Should Fix)
+    #### Minor (Nice to Have)
+
+    For each issue: file:line, what's wrong, why it matters, how to fix
+    (if not obvious).
+
+    ### Assessment
+
+    **Task quality:** [Approved | Needs fixes]
+
+    **Reasoning:** [1-2 sentence technical assessment]
+```
+
+**Placeholders:**
+- `[TASK_REQUIREMENTS]` — full task text plus the spec/design's global
+  constraints that bind it (version floors, naming and copy rules, platform
+  requirements)
+- `[DESCRIPTION]` — what the implementer reports they built
+- `[BASE_SHA]` — commit before this task
+- `[HEAD_SHA]` — current commit
+- `[DIFF]` — paste `git diff BASE..HEAD` output (use `--stat` plus the
+  relevant hunks if it exceeds a few hundred lines); a reviewer with the
+  diff in hand needs few or no tool calls
+
+**Reviewer returns:** Spec Compliance verdict (✅/❌/⚠️), Strengths, Issues
+(Critical/Important/Minor), Task quality verdict
+
+A single fix dispatch can then address spec gaps and quality findings
+together; re-review after fixes covers both verdicts again.
--- a/skills/using-superpowers/references/antigravity-tools.md
+++ b/skills/using-superpowers/references/antigravity-tools.md
@@ -64,7 +64,7 @@ prompt-template file (e.g. `superpowers:subagent-driven-development`'s
 | Skill dispatch form | Antigravity equivalent |
 |---------------------|----------------------|
 | An implementer-style `*-prompt.md` template (writes code, runs tests) | Fill the template, then `invoke_subagent` with `TypeName: "self"` and the filled prompt |
-| A read-only reviewer template (`spec-reviewer`, `code-quality-reviewer`, `code-reviewer`, `requesting-code-review`'s `./code-reviewer.md`) | `invoke_subagent` with `TypeName: "research"` and the filled review template |
+| A read-only reviewer template (`task-reviewer`, `code-reviewer`, `requesting-code-review`'s `./code-reviewer.md`) | `invoke_subagent` with `TypeName: "research"` and the filled review template |
 | Inline prompt (no template referenced) | `invoke_subagent` with `TypeName: "self"` (or `"research"` if the task only reads) and your inline prompt |

 ### Prompt filling
--- a/skills/using-superpowers/references/gemini-tools.md
+++ b/skills/using-superpowers/references/gemini-tools.md
@@ -35,7 +35,7 @@ Skills dispatch with `Subagent (general-purpose):` and either reference a prompt

 | Skill dispatch form | Gemini CLI equivalent |
 |---------------------|----------------------|
-| References a `*-prompt.md` template (implementer, spec-reviewer, code-quality-reviewer, code-reviewer, etc.) | Fill the template, then `invoke_agent` with `agent_name: "generalist"` and the filled prompt |
+| References a `*-prompt.md` template (implementer, task-reviewer, code-reviewer, etc.) | Fill the template, then `invoke_agent` with `agent_name: "generalist"` and the filled prompt |
 | References `superpowers:requesting-code-review`'s `./code-reviewer.md` | `invoke_agent` with `agent_name: "generalist"` and the filled review template |
 | Inline prompt (no template referenced) | `invoke_agent` with `agent_name: "generalist"` and your inline prompt |