mirror of
https://github.com/obra/superpowers.git
synced 2026-06-12 21:59:04 +08:00
Merge per-task reviews into one task reviewer (iteration 2)
Iteration-1 profiling: implementers and per-dispatch overhead dominate (429 of 686 subagent turns; controller coordination is half the dollars and scales with dispatch count), reviewers are individually lean, and the controller pasted the diff in only 2 of 22 review dispatches when the guidance was phrased as optional. Changes: spec-reviewer-prompt.md + code-quality-reviewer-prompt.md replaced by task-reviewer-prompt.md (one reviewer, one reading of a pasted diff, two verdicts: spec compliance ✅/❌/⚠️ and task quality); one fix dispatch can address both kinds of findings; controller now runs git diff itself and pastes it (imperative, not optional); implementers run focused tests while iterating and the full suite once before committing; flowchart, example, Red Flags, tool tables updated. The broad final whole-branch review is unchanged.
This commit is contained in:
156
skills/subagent-driven-development/task-reviewer-prompt.md
Normal file
156
skills/subagent-driven-development/task-reviewer-prompt.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# Task Reviewer Prompt Template
|
||||
|
||||
Use this template when dispatching a task reviewer subagent. One reviewer, one
|
||||
reading of the diff, two verdicts: spec compliance and code quality.
|
||||
|
||||
**Purpose:** Verify one task's implementation matches its requirements (nothing
|
||||
more, nothing less) and is well-built (clean, tested, maintainable)
|
||||
|
||||
```
|
||||
Subagent (general-purpose):
|
||||
description: "Review Task N (spec + quality)"
|
||||
prompt: |
|
||||
You are reviewing one task's implementation: first whether it matches its
|
||||
requirements, then whether it is well-built. This is a task-scoped gate,
|
||||
not a merge review — a broad whole-branch review happens separately after
|
||||
all tasks are complete.
|
||||
|
||||
## What Was Requested
|
||||
|
||||
[TASK_REQUIREMENTS]
|
||||
|
||||
## What the Implementer Claims They Built
|
||||
|
||||
[DESCRIPTION]
|
||||
|
||||
## Diff Under Review
|
||||
|
||||
**Base:** [BASE_SHA]
|
||||
**Head:** [HEAD_SHA]
|
||||
|
||||
[DIFF]
|
||||
|
||||
Review from the diff above — do not re-run git commands or re-read the
|
||||
files it already shows. If no diff was provided, fetch it yourself:
|
||||
`git diff --stat [BASE_SHA]..[HEAD_SHA]` and `git diff [BASE_SHA]..[HEAD_SHA]`.
|
||||
Only read files in this diff. Do not crawl the broader codebase. Inspect
|
||||
code outside the diff only to evaluate a concrete risk you can name — and
|
||||
name it in your report. Cross-cutting changes are legitimate named risks:
|
||||
if the diff changes lock ordering, a function or API contract, or shared
|
||||
mutable state, checking the call sites is the right method.
|
||||
|
||||
Your review is read-only on this checkout. Do not mutate the working
|
||||
tree, the index, HEAD, or branch state in any way.
|
||||
|
||||
## Do Not Trust the Report
|
||||
|
||||
Treat the implementer's report as unverified claims about the code. It
|
||||
may be incomplete, inaccurate, or optimistic. Verify the claims against
|
||||
the diff.
|
||||
|
||||
## Tests
|
||||
|
||||
The implementer already ran the tests and reported results with TDD
|
||||
evidence for exactly this code. Do not re-run the suite to confirm their
|
||||
report. Run a test only when reading the code raises a specific doubt
|
||||
that no existing run answers — and then a focused test, never a
|
||||
package-wide suite, race detector run, or repeated/high-count loop. If
|
||||
heavy validation seems warranted, recommend it in your report instead of
|
||||
running it. If you cannot run commands in this environment, name the
|
||||
test you would run.
|
||||
|
||||
Warnings or other noise in the implementer's reported test output are
|
||||
findings — test output should be pristine.
|
||||
|
||||
## Part 1: Spec Compliance
|
||||
|
||||
Compare the diff against What Was Requested:
|
||||
|
||||
- **Missing:** requirements they skipped, missed, or claimed without
|
||||
implementing
|
||||
- **Extra:** features that weren't requested, over-engineering, unneeded
|
||||
"nice to haves"
|
||||
- **Misunderstood:** right feature built the wrong way, wrong problem
|
||||
solved
|
||||
|
||||
If a requirement cannot be verified from this diff alone (it lives in
|
||||
unchanged code or spans tasks), report it as a ⚠️ item instead of
|
||||
broadening your search.
|
||||
|
||||
## Part 2: Code Quality
|
||||
|
||||
**Code quality:**
|
||||
- Clean separation of concerns?
|
||||
- Proper error handling?
|
||||
- DRY without premature abstraction?
|
||||
- Edge cases handled?
|
||||
|
||||
**Tests:**
|
||||
- Do the new and changed tests verify real behavior, not mocks?
|
||||
- Are the task's edge cases covered?
|
||||
|
||||
**Structure:**
|
||||
- Does each file have one clear responsibility with a well-defined interface?
|
||||
- Are units decomposed so they can be understood and tested independently?
|
||||
- Is the implementation following the file structure from the plan?
|
||||
- Did this change create new files that are already large, or
|
||||
significantly grow existing files? (Don't flag pre-existing file
|
||||
sizes — focus on what this change contributed.)
|
||||
|
||||
Cite file:line evidence for every finding and for any check you would
|
||||
otherwise answer with a bare "yes." Cite, don't narrate — a tight report
|
||||
that points at lines beats a long one that retells the diff.
|
||||
|
||||
## Calibration
|
||||
|
||||
Categorize issues by actual severity. Not everything is Critical.
|
||||
Important means this task cannot be trusted until it is fixed;
|
||||
"coverage could be broader" and polish suggestions are Minor.
|
||||
Acknowledge what was done well before listing issues — accurate praise
|
||||
helps the implementer trust the rest of the feedback.
|
||||
|
||||
## Output Format
|
||||
|
||||
### Spec Compliance
|
||||
|
||||
- ✅ Spec compliant | ❌ Issues found: [what's missing/extra/misunderstood,
|
||||
with file:line references]
|
||||
- ⚠️ Cannot verify from diff: [requirements you could not verify from the
|
||||
diff alone, and what the controller should check — report alongside the
|
||||
✅/❌ verdict for everything you could verify]
|
||||
|
||||
### Strengths
|
||||
[What's well done? Be specific.]
|
||||
|
||||
### Issues
|
||||
|
||||
#### Critical (Must Fix)
|
||||
#### Important (Should Fix)
|
||||
#### Minor (Nice to Have)
|
||||
|
||||
For each issue: file:line, what's wrong, why it matters, how to fix
|
||||
(if not obvious).
|
||||
|
||||
### Assessment
|
||||
|
||||
**Task quality:** [Approved | Needs fixes]
|
||||
|
||||
**Reasoning:** [1-2 sentence technical assessment]
|
||||
```
|
||||
|
||||
**Placeholders:**
|
||||
- `[TASK_REQUIREMENTS]` — full task text plus the spec/design's global
|
||||
constraints that bind it (version floors, naming and copy rules, platform
|
||||
requirements)
|
||||
- `[DESCRIPTION]` — what the implementer reports they built
|
||||
- `[BASE_SHA]` — commit before this task
|
||||
- `[HEAD_SHA]` — current commit
|
||||
- `[DIFF]` — paste `git diff BASE..HEAD` output (use `--stat` plus the
|
||||
relevant hunks if it exceeds a few hundred lines); a reviewer with the
|
||||
diff in hand needs few or no tool calls
|
||||
|
||||
**Reviewer returns:** Spec Compliance verdict (✅/❌/⚠️), Strengths, Issues
|
||||
(Critical/Important/Minor), Task quality verdict
|
||||
|
||||
A single fix dispatch can then address spec gaps and quality findings
|
||||
together; re-review after fixes covers both verdicts again.
|
||||
Reference in New Issue
Block a user