mirror of
https://github.com/obra/superpowers.git
synced 2026-06-12 05:39:05 +08:00
Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget
Validated 2026-06-10 (all gates pass): go-fractals 54.1-54.7 min / $12.81-14.31 (baseline 64.9 / $16.07); svelte-todo 55.0 min / 19.3M / $14.99 (baseline 79.7 / 27.3M / $20.98); planted-defect pass $2.77. Dispatch-model discipline 3/3 runs after moving model: into the templates as a REQUIRED line. Full experiment log: evals docs/experiments/2026-06-10-sdd-cost-experiments.md
This commit is contained in:
@@ -54,15 +54,15 @@ digraph process {
|
||||
"Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [shape=box];
|
||||
"Task reviewer reports spec ✅ and quality approved?" [shape=diamond];
|
||||
"Dispatch fix subagent for Critical/Important findings" [shape=box];
|
||||
"Mark task complete in todo list" [shape=box];
|
||||
"Mark task complete in todo list and progress ledger" [shape=box];
|
||||
}
|
||||
|
||||
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
|
||||
"Read plan, note context and global constraints, create todos" [shape=box];
|
||||
"More tasks remain?" [shape=diamond];
|
||||
"Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [shape=box];
|
||||
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
|
||||
|
||||
"Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||
"Read plan, note context and global constraints, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
|
||||
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
|
||||
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||
@@ -71,8 +71,8 @@ digraph process {
|
||||
"Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" -> "Task reviewer reports spec ✅ and quality approved?";
|
||||
"Task reviewer reports spec ✅ and quality approved?" -> "Dispatch fix subagent for Critical/Important findings" [label="no"];
|
||||
"Dispatch fix subagent for Critical/Important findings" -> "Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [label="re-review"];
|
||||
"Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list" [label="yes"];
|
||||
"Mark task complete in todo list" -> "More tasks remain?";
|
||||
"Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list and progress ledger" [label="yes"];
|
||||
"Mark task complete in todo list and progress ledger" -> "More tasks remain?";
|
||||
"More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
|
||||
"More tasks remain?" -> "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [label="no"];
|
||||
"Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" -> "Use superpowers:finishing-a-development-branch";
|
||||
@@ -161,14 +161,74 @@ final whole-branch review. When you fill a reviewer template:
|
||||
the commit list, stat summary, and full diff with context in one Read
|
||||
call. Use the BASE you recorded before dispatching the implementer —
|
||||
never `HEAD~1`, which silently truncates multi-commit tasks.
|
||||
- A dispatch prompt describes one task, not the session's history. Do not
|
||||
paste accumulated prior-task summaries ("state after Tasks 1-3") into
|
||||
later dispatches — a real session's dispatch hit 42k chars of which 99%
|
||||
was pasted history. A fresh subagent needs its task, the interfaces it
|
||||
touches, and the global constraints. Nothing else.
|
||||
- Dispatch fix subagents for Critical and Important findings. Record Minor
|
||||
findings and move on — then paste the accumulated Minor findings into the
|
||||
final whole-branch review dispatch so it can triage which must be fixed
|
||||
findings in the progress ledger as you go, and point the final
|
||||
whole-branch review at that list so it can triage which must be fixed
|
||||
before merge. A roll-up nobody reads is a silent discard.
|
||||
- The final whole-branch review gets a package too: run
|
||||
`scripts/review-package MERGE_BASE HEAD` (MERGE_BASE = the commit the
|
||||
branch started from, e.g. `git merge-base main HEAD`) and include the
|
||||
printed path in the final review dispatch, so the final reviewer reads
|
||||
one file instead of re-deriving the branch diff with git commands.
|
||||
- Every fix dispatch carries the implementer contract: the fix subagent
|
||||
re-runs the tests covering its change and reports the results. A fix
|
||||
report without test evidence is incomplete — do not re-review on top of
|
||||
it.
|
||||
re-runs the tests covering its change and reports the results. Name the
|
||||
covering test files in the dispatch — a one-line fix does not need the
|
||||
whole suite. A fix report without test evidence is incomplete — do not
|
||||
re-review on top of it.
|
||||
- If the final whole-branch review returns findings, dispatch ONE fix
|
||||
subagent with the complete findings list — not one fixer per finding.
|
||||
Per-finding fixers each rebuild context and re-run suites; a real
|
||||
session's final-review fix wave cost more than all its tasks combined.
|
||||
|
||||
## File Handoffs
|
||||
|
||||
Everything you paste into a dispatch prompt — and everything a subagent
|
||||
prints back — stays resident in your context for the rest of the session
|
||||
and is re-read on every later turn. Hand artifacts over as files:
|
||||
|
||||
- **Task brief:** before dispatching an implementer, run this skill's
|
||||
`scripts/task-brief PLAN_FILE N` — it extracts the task's full text to a
|
||||
uniquely named file and prints the path. Compose the dispatch so the
|
||||
brief stays the single source of requirements. Your dispatch should
|
||||
contain: (1) one line on where this task fits in the project; (2) the
|
||||
brief path, introduced as "read this first — it is your requirements,
|
||||
with the exact values to use verbatim"; (3) interfaces and decisions
|
||||
from earlier tasks that the brief cannot know; (4) your resolution of
|
||||
any ambiguity you noticed in the brief; (5) the report-file path and
|
||||
report contract. Exact values (numbers, magic strings, signatures, test
|
||||
cases) appear only in the brief.
|
||||
- **Report file:** name the implementer's report file after the brief
|
||||
(brief `…/task-N-brief.md` → report `…/task-N-report.md`) and put it in
|
||||
the dispatch prompt. The implementer writes the full report there and
|
||||
returns only status, commits, a one-line test summary, and concerns.
|
||||
- **Reviewer inputs:** the task reviewer gets three paths — the same brief
|
||||
file, the report file, and the review package — plus the global
|
||||
constraints that bind the task.
|
||||
- Fix dispatches append their fix report (with test results) to the same
|
||||
report file and return a short summary; re-reviews read the updated file.
|
||||
|
||||
## Durable Progress
|
||||
|
||||
Conversation memory does not survive compaction. In real sessions,
|
||||
controllers that lost their place have re-dispatched entire completed task
|
||||
sequences — the single most expensive failure observed. Track progress in
|
||||
a ledger file, not only in todos.
|
||||
|
||||
- At skill start, check for a ledger:
|
||||
`cat "$(git rev-parse --git-path sdd)/progress.md"`. Tasks listed there
|
||||
as complete are DONE — do not re-dispatch them; resume at the first task
|
||||
not marked complete.
|
||||
- When a task's review comes back clean, append one line to the ledger in
|
||||
the same message as your other bookkeeping:
|
||||
`Task N: complete (commits <base7>..<head7>, review clean)`.
|
||||
- The ledger is your recovery map: the commits it names exist in git even
|
||||
when your context no longer remembers creating them. After compaction,
|
||||
trust the ledger and `git log` over your own recollection.
|
||||
|
||||
## Prompt Templates
|
||||
|
||||
@@ -182,13 +242,11 @@ final whole-branch review. When you fill a reviewer template:
|
||||
You: I'm using Subagent-Driven Development to execute this plan.
|
||||
|
||||
[Read plan file once: docs/superpowers/plans/feature-plan.md]
|
||||
[Extract all 5 tasks with full text and context]
|
||||
[Create todos for all tasks]
|
||||
|
||||
Task 1: Hook installation script
|
||||
|
||||
[Get Task 1 text and context (already extracted)]
|
||||
[Dispatch implementation subagent with full task text + context]
|
||||
[Run task-brief for Task 1; dispatch implementer with brief + report paths + context]
|
||||
|
||||
Implementer: "Before I begin - should the hook be installed at user or system level?"
|
||||
|
||||
@@ -209,8 +267,7 @@ Task reviewer: Spec ✅ - all requirements met, nothing extra.
|
||||
|
||||
Task 2: Recovery modes
|
||||
|
||||
[Get Task 2 text and context (already extracted)]
|
||||
[Dispatch implementation subagent with full task text + context]
|
||||
[Run task-brief for Task 2; dispatch implementer with brief + report paths + context]
|
||||
|
||||
Implementer: [No questions, proceeds]
|
||||
Implementer:
|
||||
@@ -256,8 +313,8 @@ Done!
|
||||
- Review checkpoints automatic
|
||||
|
||||
**Efficiency gains:**
|
||||
- No file reading overhead (controller provides full text)
|
||||
- Controller curates exactly what context is needed
|
||||
- Controller curates exactly what context is needed; bulk artifacts move
|
||||
as files, not pasted text
|
||||
- Subagent gets complete information upfront
|
||||
- Questions surfaced before work begins (not after)
|
||||
|
||||
@@ -281,7 +338,8 @@ Done!
|
||||
- Skip task review, or accept a report missing either verdict (spec compliance AND task quality are both required)
|
||||
- Proceed with unfixed issues
|
||||
- Dispatch multiple implementation subagents in parallel (conflicts)
|
||||
- Make subagent read plan file (provide full text instead)
|
||||
- Make a subagent read the whole plan file (hand it its task brief —
|
||||
`scripts/task-brief` — instead)
|
||||
- Skip scene-setting context (subagent needs to understand where task fits)
|
||||
- Ignore subagent questions (answer before letting them proceed)
|
||||
- Accept "close enough" on spec compliance (reviewer found spec issues = not done)
|
||||
@@ -294,6 +352,8 @@ Done!
|
||||
(`scripts/review-package BASE HEAD`) and name the printed path in the
|
||||
prompt
|
||||
- Move to next task while the review has open Critical/Important issues
|
||||
- Re-dispatch a task the progress ledger already marks complete — check
|
||||
the ledger (and `git log`) after any compaction or resume
|
||||
|
||||
**If subagent asks questions:**
|
||||
- Answer clearly and completely
|
||||
|
||||
Reference in New Issue
Block a user