From 5da15d7eba60e69efb6171306841bffb3fb68653 Mon Sep 17 00:00:00 2001
From: Jesse Vincent <jesse@primeradiant.com>
Date: Tue, 9 Jun 2026 16:26:00 -0700
Subject: [PATCH] Add design spec: task-scoped review dispatch for SDD

---
 ...-sdd-task-scoped-review-dispatch-design.md | 75 +++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-06-09-sdd-task-scoped-review-dispatch-design.md

diff --git a/docs/superpowers/specs/2026-06-09-sdd-task-scoped-review-dispatch-design.md b/docs/superpowers/specs/2026-06-09-sdd-task-scoped-review-dispatch-design.md
new file mode 100644
index 00000000..ab64961f
--- /dev/null
+++ b/docs/superpowers/specs/2026-06-09-sdd-task-scoped-review-dispatch-design.md
@@ -0,0 +1,75 @@
+# SDD Task-Scoped Review Dispatch
+
+Make subagent-driven-development's per-task reviews cheaper and faster without weakening them, by scoping per-task review prompts to the task and stopping redundant work — while final branch review stays broad.
+
+## Problem
+
+Per-task code quality reviewers in SDD routinely do branch-review-scale work on single-task diffs. Evidence from real sessions on this machine:
+
+- In one SDD session, 7/8 quality reviewers ran repo-wide greps; the most expensive ran 50+ Bash commands over ~200 seconds. Quality reviewers cost 4-8× what spec reviewers cost on the same tasks.
+- Spec reviewers, whose prompt contains "Only read files in this diff. Do not crawl the broader codebase," stayed tight: 6-16 tool calls, 14-65 seconds.
+- No reviewer ran heavy tests autonomously. Every package-wide or repeated test run observed was explicitly requested by a controller-written prompt ("check all uses," "run tests if useful, especially race-focused ones," "does anything else read `Meta()`?").
+
+Root causes, in order of impact:
+
+1. **The per-task quality prompt inherits a merge-readiness review.** `code-quality-reviewer-prompt.md` delegates to `requesting-code-review/code-reviewer.md`, which asks about architecture, scalability, security, production readiness, and ends with "Ready to merge?" That frame licenses branch-level breadth on a one-task diff. The spec prompt's diff-scope guard was never carried over.
+2. **The controller gets no guidance on writing reviewer prompts**, so it invents open-ended directives ("check all uses") that reviewers interpret literally.
+3. **Duplicated work across the pipeline.** The quality template's "Plan alignment" dimension re-checks what the spec reviewer just verified. Reviewers re-run test suites the implementer already ran (and reported, with TDD evidence) on identical code.
+4. **Per-task and final review share one template**, so there is no representation of "per-task narrow, final broad" anywhere.
+
+A field report (`~/2026-06-09-code-quality-reviewer-scope-budget-issue.md`) first flagged this. Its cited session and headline numbers could not be verified, but its qualitative diagnosis was confirmed against two real local sessions. One correction to it: cross-cutting audits (lock ordering, changed contracts) are sometimes the *correct* review method — the fix must gate breadth behind a stated concrete risk, not forbid it.
+
+## Goals
+
+- Per-task reviews scoped to the task: diff-first reading, justified broadening, no redundant test runs.
+- Final whole-branch review keeps its current breadth.
+- No reduction in what reviews catch.
+
+## Non-goals / explicitly preserved
+
+- **Full re-reviews stay.** When a reviewer re-reviews after a fix, it still reviews the whole task at full reading breadth. (It does not re-run tests the implementer just ran on the amended code.)
+- **The two review stages stay separate.** Spec compliance and code quality remain independent subagents, serially gated. No merging.
+- **The coordinator keeps model judgment.** No forced model tier for reviews, in either direction.
+- **`requesting-code-review/` is untouched.** It remains the broad template for final branch review and ad-hoc review.
+- Review ordering (spec before quality), the fix-and-re-review loops, and the requirement to fix Critical/Important findings are unchanged.
+
+## Design
+
+### Shared principle: don't re-run tests on code that hasn't changed
+
+The implementer's report includes test results and TDD RED/GREEN evidence for exactly the code under review. Reviewers verify by reading. A reviewer runs a test only when reading raises a specific doubt that no existing run answers — and then a focused test, not a suite. After a fix, the implementer re-runs tests on the amended code; the re-reviewer does not repeat that run. This principle appears in both reviewer prompts and in the controller guidance.
+
+### 1. New file: `skills/subagent-driven-development/code-quality-reviewer-prompt.md` becomes self-contained
+
+Stop delegating to `requesting-code-review/code-reviewer.md`. The per-task quality reviewer gets its own scoped prompt template:
+
+- **Framing:** "You are reviewing one task's implementation for code quality." A task-scoped gate, not a merge review.
+- **Spec compliance is settled:** spec review already passed; do not re-litigate requirements or plan alignment.
+- **Review dimensions kept:** code quality (clarity, duplication, error handling), test quality (real behavior, not mocks), maintainability, and the existing SDD-specific checks (single responsibility, independent testability, file structure from plan, file growth contributed by this change). Dropped: plan alignment, security/scalability/production-readiness dimensions, merge verdict.
+- **Scope budget:** start from `git diff BASE..HEAD`; read changed files first; inspect adjacent code only to evaluate a concrete risk you can name. Cross-cutting changes — lock ordering, changed function/API contracts, shared mutable state — are legitimate named risks that justify checking call sites. Do not crawl the codebase by default.
+- **Test budget:** the shared principle above, plus: no package-wide suites, race detectors, or repeated/high-count runs unless you have first named a specific suspected flake or race. Otherwise, recommend heavy validation in the report instead of running it.
+- **Read-only rule** kept (no mutating the working tree, index, HEAD, or branch state).
+- **Verdict:** Strengths / Issues (Critical/Important/Minor) / "Task quality: Approved | Needs fixes."
+
+### 2. `skills/subagent-driven-development/spec-reviewer-prompt.md` cleanups
+
+- Remove the `git worktree add` how-to sentence. The read-only rule stays; a diff-scoped spec review never needs a checkout of another revision.
+- Resolve the tension between the diff-only guard and "verify everything independently": spec compliance is judged by reading the diff against the requirements. The implementer's TDD evidence covers "it runs" — apply the shared test principle.
+- New third verdict channel: requirements that cannot be verified from the diff (live in unchanged code, span tasks) are reported as explicit "⚠️ Cannot verify from diff — controller should check X" items, instead of either crawling or silently passing.
+- Replace the fabricated premise "The implementer finished suspiciously quickly" with grounded skepticism: treat the implementer's report as unverified claims about the code. Same distrust, no invented fact.
+
+### 3. `skills/subagent-driven-development/SKILL.md` controller changes
+
+- **Model Selection:** replace "Architecture, design, and review tasks: use the most capable available model" with judgment guidance — pick reviewer models the way implementer models are picked, scaled to the diff's size, complexity, and risk.
+- **Reviewer prompt construction** (new guidance near Red Flags): when dispatching reviewers, do not write open-ended directives ("check all uses," "run race tests if useful") without a concrete task-specific reason; do not ask reviewers to re-run tests the implementer already ran on the same code; per-task reviews are task-scoped gates — the broad review happens once, at the final whole-branch review.
+- **Final review stays broad, explicitly:** the final whole-branch reviewer dispatch uses `requesting-code-review/code-reviewer.md`. The Integration section's note that `superpowers:requesting-code-review` provides "the code review template for reviewer subagents" is corrected to apply to the final review only.
+- Flowchart topology is unchanged.
+
+## What this does not fix (known, deferred)
+
+The spec reviewer judges against task text the controller pasted; it cannot catch requirements dropped during the controller's extraction from the plan. That is an architectural property of "controller provides full text," not a prompt problem, and is out of scope here.
+
+## Verification
+
+- Plugin infrastructure tests (`tests/`) still pass.
+- Run the SDD skill-behavior evals in the `evals/` submodule before and after the change. Reviews must still catch seeded issues; per-task reviewers should show less repo-wide exploration and fewer redundant test runs.