mirror of
https://github.com/obra/superpowers.git
synced 2026-06-10 20:59:05 +08:00
Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence
Round-2 fractals eval regressed to 70min/32.2M tokens (vs round-1's 42.8min/14.5M) while reaching baseline-parity quality. Per-subagent turn profiling attributed it to: haiku dispatches taking 2-3x the turns of sonnet (678 of 1197 subagent turns), reviewers re-fetching diffs by hand (518 Bash calls), and evidence-rule narration. Changes: turn-count-beats- token-price model guidance; controllers paste small diffs into reviewer prompts (reviewers then need few or no tool calls); evidence scoped to findings and would-be-bare-yes checks; Important defined as cannot-trust- until-fixed with coverage suggestions Minor; fixes dispatched only for Critical/Important.
This commit is contained in:
@@ -104,6 +104,12 @@ most capable model; a subtle concurrency change does.
|
||||
omitted model inherits your session's model — often the most capable and
|
||||
most expensive — which silently defeats this section.
|
||||
|
||||
**Turn count beats token price.** Wall-clock and context cost scale with how
|
||||
many turns a subagent takes, and the cheapest models routinely take 2-3× the
|
||||
turns on multi-step work — costing more overall. Use a mid-tier model as the
|
||||
floor for implementers and reviewers; reserve the cheapest tier for
|
||||
single-file mechanical fixes.
|
||||
|
||||
**Task complexity signals (implementation tasks):**
|
||||
- Touches 1-2 files with a complete spec → cheap model
|
||||
- Touches multiple files with integration concerns → standard model
|
||||
@@ -154,6 +160,11 @@ final whole-branch review. When you fill a reviewer template:
|
||||
- Include the spec/design's global constraints that bind the task (version
|
||||
floors, naming and copy rules, platform requirements) in the requirements
|
||||
you paste — a reviewer can only enforce what you hand them.
|
||||
- Paste the task's diff (`git diff BASE..HEAD` output) into the reviewer
|
||||
prompt when it fits comfortably (up to a few hundred lines). A reviewer
|
||||
with the diff in hand needs few or no tool calls.
|
||||
- Dispatch fix subagents for Critical and Important findings. Record Minor
|
||||
findings and move on — they roll up to the final whole-branch review.
|
||||
|
||||
## Prompt Templates
|
||||
|
||||
|
||||
@@ -32,6 +32,14 @@ Subagent (general-purpose):
|
||||
git diff [BASE_SHA]..[HEAD_SHA]
|
||||
```
|
||||
|
||||
## Diff
|
||||
|
||||
[DIFF]
|
||||
|
||||
If the diff is provided above, review from it directly — do not re-run
|
||||
the git commands or re-read the files it already shows. Fetch anything
|
||||
further only for a named concrete risk.
|
||||
|
||||
## Read-Only Review
|
||||
|
||||
Your review is read-only on this checkout. Do not mutate the working tree,
|
||||
@@ -84,12 +92,15 @@ Subagent (general-purpose):
|
||||
significantly grow existing files? (Don't flag pre-existing file
|
||||
sizes — focus on what this change contributed.)
|
||||
|
||||
Answer each item above with file:line evidence, not a bare yes or no.
|
||||
An unsupported "yes" is not a review.
|
||||
Cite file:line evidence for every finding and for any check you would
|
||||
otherwise answer with a bare "yes." Cite, don't narrate — a tight report
|
||||
that points at lines beats a long one that retells the diff.
|
||||
|
||||
## Calibration
|
||||
|
||||
Categorize issues by actual severity. Not everything is Critical.
|
||||
Important means this task cannot be trusted until it is fixed;
|
||||
"coverage could be broader" and polish suggestions are Minor.
|
||||
Acknowledge what was done well before listing issues — accurate praise
|
||||
helps the implementer trust the rest of the feedback.
|
||||
|
||||
@@ -127,5 +138,8 @@ Subagent (general-purpose):
|
||||
- `[TASK_TEXT]` — the task's requirements text or plan reference, for context
|
||||
- `[BASE_SHA]` — commit before this task
|
||||
- `[HEAD_SHA]` — current commit
|
||||
- `[DIFF]` — paste `git diff BASE..HEAD` output when it fits comfortably
|
||||
(up to a few hundred lines); otherwise replace with "(not provided — run
|
||||
the git commands above)"
|
||||
|
||||
**Reviewer returns:** Strengths, Issues (Critical/Important/Minor), Task quality verdict
|
||||
|
||||
@@ -30,6 +30,13 @@ Subagent (general-purpose):
|
||||
|
||||
Only read files in this diff. Do not crawl the broader codebase.
|
||||
|
||||
## Diff
|
||||
|
||||
[DIFF]
|
||||
|
||||
If the diff is provided above, review from it directly — do not re-run
|
||||
the git commands or re-read the files it already shows.
|
||||
|
||||
Spec compliance is judged by reading the diff against the requirements.
|
||||
The implementer already ran the tests and reported TDD evidence — do not
|
||||
re-run them. If a requirement cannot be verified from this diff alone
|
||||
|
||||
Reference in New Issue
Block a user