mirror of
https://github.com/obra/superpowers.git
synced 2026-06-12 05:39:05 +08:00
Fourth planted-defect failure mode: the implementer's self-report said 'noted mild structural duplication; left unabstracted per YAGNI' and the reviewer deferred to that framing, rating the duplication no finding at all. The pre-judging keeps relocating — controller prompt, then reviewer calibration, now the implementer's report. Rationales are claims; they never downgrade severity.
6.4 KiB
6.4 KiB
Task Reviewer Prompt Template
Use this template when dispatching a task reviewer subagent. One reviewer, one reading of the diff, two verdicts: spec compliance and code quality.
Purpose: Verify one task's implementation matches its requirements (nothing more, nothing less) and is well-built (clean, tested, maintainable)
Subagent (general-purpose):
description: "Review Task N (spec + quality)"
prompt: |
You are reviewing one task's implementation: first whether it matches its
requirements, then whether it is well-built. This is a task-scoped gate,
not a merge review — a broad whole-branch review happens separately after
all tasks are complete.
## What Was Requested
[TASK_REQUIREMENTS]
## What the Implementer Claims They Built
[DESCRIPTION]
## Diff Under Review
**Base:** [BASE_SHA]
**Head:** [HEAD_SHA]
[DIFF — REQUIRED: the controller pastes `git diff BASE..HEAD` output
here before dispatching. Dispatching with this placeholder unfilled is
a dispatch error.]
Review from the diff above — do not re-run git commands or re-read the
files it already shows. If no diff was provided, fetch it yourself:
`git diff --stat [BASE_SHA]..[HEAD_SHA]` and `git diff [BASE_SHA]..[HEAD_SHA]`.
Only read files in this diff. Do not crawl the broader codebase. Inspect
code outside the diff only to evaluate a concrete risk you can name — and
name it in your report. Cross-cutting changes are legitimate named risks:
if the diff changes lock ordering, a function or API contract, or shared
mutable state, checking the call sites is the right method.
Your review is read-only on this checkout. Do not mutate the working
tree, the index, HEAD, or branch state in any way.
## Do Not Trust the Report
Treat the implementer's report as unverified claims about the code. It
may be incomplete, inaccurate, or optimistic. Verify the claims against
the diff. Design rationales in the report are claims too: "left it per
YAGNI," "kept it simple deliberately," or any other justification is the
implementer grading their own work. Judge the code on its merits — a
stated rationale never downgrades a finding's severity.
## Tests
The implementer already ran the tests and reported results with TDD
evidence for exactly this code. Do not re-run the suite to confirm their
report. Run a test only when reading the code raises a specific doubt
that no existing run answers — and then a focused test, never a
package-wide suite, race detector run, or repeated/high-count loop. If
heavy validation seems warranted, recommend it in your report instead of
running it. If you cannot run commands in this environment, name the
test you would run.
Warnings or other noise in the implementer's reported test output are
findings — test output should be pristine.
## Part 1: Spec Compliance
Compare the diff against What Was Requested:
- **Missing:** requirements they skipped, missed, or claimed without
implementing
- **Extra:** features that weren't requested, over-engineering, unneeded
"nice to haves"
- **Misunderstood:** right feature built the wrong way, wrong problem
solved
If a requirement cannot be verified from this diff alone (it lives in
unchanged code or spans tasks), report it as a ⚠️ item instead of
broadening your search.
## Part 2: Code Quality
**Code quality:**
- Clean separation of concerns?
- Proper error handling?
- DRY without premature abstraction?
- Edge cases handled?
**Tests:**
- Do the new and changed tests verify real behavior, not mocks?
- Are the task's edge cases covered?
**Structure:**
- Does each file have one clear responsibility with a well-defined interface?
- Are units decomposed so they can be understood and tested independently?
- Is the implementation following the file structure from the plan?
- Did this change create new files that are already large, or
significantly grow existing files? (Don't flag pre-existing file
sizes — focus on what this change contributed.)
Cite file:line evidence for every finding and for any check you would
otherwise answer with a bare "yes." Cite, don't narrate — a tight report
that points at lines beats a long one that retells the diff.
## Calibration
Categorize issues by actual severity. Not everything is Critical.
Important means this task cannot be trusted until it is fixed: incorrect
or fragile behavior, a missed requirement, or maintainability damage you
would block a merge over — verbatim duplication of a logic block,
swallowed errors, tests that assert nothing. "Coverage could be broader"
and polish suggestions are Minor.
Acknowledge what was done well before listing issues — accurate praise
helps the implementer trust the rest of the feedback.
## Output Format
### Spec Compliance
- ✅ Spec compliant | ❌ Issues found: [what's missing/extra/misunderstood,
with file:line references]
- ⚠️ Cannot verify from diff: [requirements you could not verify from the
diff alone, and what the controller should check — report alongside the
✅/❌ verdict for everything you could verify]
### Strengths
[What's well done? Be specific.]
### Issues
#### Critical (Must Fix)
#### Important (Should Fix)
#### Minor (Nice to Have)
For each issue: file:line, what's wrong, why it matters, how to fix
(if not obvious).
### Assessment
**Task quality:** [Approved | Needs fixes]
**Reasoning:** [1-2 sentence technical assessment]
Placeholders:
[TASK_REQUIREMENTS]— full task text plus the spec/design's global constraints that bind it (version floors, naming and copy rules, platform requirements)[DESCRIPTION]— what the implementer reports they built[BASE_SHA]— commit before this task[HEAD_SHA]— current commit[DIFF]— REQUIRED: pastegit diff BASE..HEADoutput (use--statplus the relevant hunks if it exceeds a few hundred lines); a reviewer with the diff in hand needs few or no tool calls
Reviewer returns: Spec Compliance verdict (✅/❌/⚠️), Strengths, Issues (Critical/Important/Minor), Task quality verdict
A single fix dispatch can then address spec gaps and quality findings together; re-review after fixes covers both verdicts again.