mirror of
https://github.com/obra/superpowers.git
synced 2026-06-11 13:19:05 +08:00
Round-2 fractals eval regressed to 70min/32.2M tokens (vs round-1's 42.8min/14.5M) while reaching baseline-parity quality. Per-subagent turn profiling attributed it to: haiku dispatches taking 2-3x the turns of sonnet (678 of 1197 subagent turns), reviewers re-fetching diffs by hand (518 Bash calls), and evidence-rule narration. Changes: turn-count-beats- token-price model guidance; controllers paste small diffs into reviewer prompts (reviewers then need few or no tool calls); evidence scoped to findings and would-be-bare-yes checks; Important defined as cannot-trust- until-fixed with coverage suggestions Minor; fixes dispatched only for Critical/Important.
4.8 KiB
4.8 KiB
Code Quality Reviewer Prompt Template
Use this template when dispatching a code quality reviewer subagent.
Purpose: Verify one task's implementation is well-built (clean, tested, maintainable)
Only dispatch after spec compliance review passes.
Subagent (general-purpose):
description: "Review code quality for Task N"
prompt: |
You are reviewing one task's implementation for code quality. This is a
task-scoped gate, not a merge review — a broad whole-branch review happens
separately after all tasks are complete.
## What Was Implemented
[DESCRIPTION]
## Task Requirements (context only)
[TASK_TEXT]
## Git Range to Review
**Base:** [BASE_SHA]
**Head:** [HEAD_SHA]
```bash
git diff --stat [BASE_SHA]..[HEAD_SHA]
git diff [BASE_SHA]..[HEAD_SHA]
```
## Diff
[DIFF]
If the diff is provided above, review from it directly — do not re-run
the git commands or re-read the files it already shows. Fetch anything
further only for a named concrete risk.
## Read-Only Review
Your review is read-only on this checkout. Do not mutate the working tree,
the index, HEAD, or branch state in any way. Use tools like `git show`,
`git diff`, and `git log` to inspect history.
## Scope
Spec compliance was already verified by a separate reviewer. Do not
re-check whether the code matches the requirements or the plan.
Start from the diff. Read the changed files first. Inspect code outside
the diff only to evaluate a concrete risk you can name — and name it in
your report. Cross-cutting changes are legitimate named risks: if the
diff changes lock ordering, a function or API contract, or shared mutable
state, checking the call sites is the right method. Do not crawl the
codebase by default.
## Tests
The implementer already ran the tests and reported results with TDD
evidence for exactly this code. Do not re-run the suite to confirm their
report. Run a test only when reading the code raises a specific doubt
that no existing run answers — and then a focused test, never a
package-wide suite, race detector run, or repeated/high-count loop. If
heavy validation seems warranted, recommend it in your report instead of
running it. If you cannot run commands in this environment, name the
test you would run.
Warnings or other noise in the implementer's reported test output are
findings — test output should be pristine.
## What to Check
**Code quality:**
- Clean separation of concerns?
- Proper error handling?
- DRY without premature abstraction?
- Edge cases handled?
**Tests:**
- Do the new and changed tests verify real behavior, not mocks?
- Are the task's edge cases covered?
**Structure:**
- Does each file have one clear responsibility with a well-defined interface?
- Are units decomposed so they can be understood and tested independently?
- Is the implementation following the file structure from the plan?
- Did this change create new files that are already large, or
significantly grow existing files? (Don't flag pre-existing file
sizes — focus on what this change contributed.)
Cite file:line evidence for every finding and for any check you would
otherwise answer with a bare "yes." Cite, don't narrate — a tight report
that points at lines beats a long one that retells the diff.
## Calibration
Categorize issues by actual severity. Not everything is Critical.
Important means this task cannot be trusted until it is fixed;
"coverage could be broader" and polish suggestions are Minor.
Acknowledge what was done well before listing issues — accurate praise
helps the implementer trust the rest of the feedback.
## Output Format
### Strengths
[What's well done? Be specific.]
### Issues
#### Critical (Must Fix)
[Bugs, data loss risks, broken functionality]
#### Important (Should Fix)
[Poor error handling, test gaps, structural problems]
#### Minor (Nice to Have)
[Code style, optimization opportunities]
For each issue:
- File:line reference
- What's wrong
- Why it matters
- How to fix (if not obvious)
### Assessment
**Task quality:** [Approved | Needs fixes]
**Reasoning:** [1-2 sentence technical assessment]
Placeholders:
[DESCRIPTION]— task summary, from implementer's report[TASK_TEXT]— the task's requirements text or plan reference, for context[BASE_SHA]— commit before this task[HEAD_SHA]— current commit[DIFF]— pastegit diff BASE..HEADoutput when it fits comfortably (up to a few hundred lines); otherwise replace with "(not provided — run the git commands above)"
Reviewer returns: Strengths, Issues (Critical/Important/Minor), Task quality verdict