mirror of
https://github.com/obra/superpowers.git
synced 2026-06-11 21:29:07 +08:00
Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget
Validated 2026-06-10 (all gates pass): go-fractals 54.1-54.7 min / $12.81-14.31 (baseline 64.9 / $16.07); svelte-todo 55.0 min / 19.3M / $14.99 (baseline 79.7 / 27.3M / $20.98); planted-defect pass $2.77. Dispatch-model discipline 3/3 runs after moving model: into the templates as a REQUIRED line. Full experiment log: evals docs/experiments/2026-06-10-sdd-cost-experiments.md
This commit is contained in:
@@ -54,15 +54,15 @@ digraph process {
|
||||
"Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [shape=box];
|
||||
"Task reviewer reports spec ✅ and quality approved?" [shape=diamond];
|
||||
"Dispatch fix subagent for Critical/Important findings" [shape=box];
|
||||
"Mark task complete in todo list" [shape=box];
|
||||
"Mark task complete in todo list and progress ledger" [shape=box];
|
||||
}
|
||||
|
||||
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
|
||||
"Read plan, note context and global constraints, create todos" [shape=box];
|
||||
"More tasks remain?" [shape=diamond];
|
||||
"Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [shape=box];
|
||||
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
|
||||
|
||||
"Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||
"Read plan, note context and global constraints, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
|
||||
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
|
||||
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||
@@ -71,8 +71,8 @@ digraph process {
|
||||
"Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" -> "Task reviewer reports spec ✅ and quality approved?";
|
||||
"Task reviewer reports spec ✅ and quality approved?" -> "Dispatch fix subagent for Critical/Important findings" [label="no"];
|
||||
"Dispatch fix subagent for Critical/Important findings" -> "Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [label="re-review"];
|
||||
"Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list" [label="yes"];
|
||||
"Mark task complete in todo list" -> "More tasks remain?";
|
||||
"Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list and progress ledger" [label="yes"];
|
||||
"Mark task complete in todo list and progress ledger" -> "More tasks remain?";
|
||||
"More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
|
||||
"More tasks remain?" -> "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [label="no"];
|
||||
"Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" -> "Use superpowers:finishing-a-development-branch";
|
||||
@@ -161,14 +161,74 @@ final whole-branch review. When you fill a reviewer template:
|
||||
the commit list, stat summary, and full diff with context in one Read
|
||||
call. Use the BASE you recorded before dispatching the implementer —
|
||||
never `HEAD~1`, which silently truncates multi-commit tasks.
|
||||
- A dispatch prompt describes one task, not the session's history. Do not
|
||||
paste accumulated prior-task summaries ("state after Tasks 1-3") into
|
||||
later dispatches — a real session's dispatch hit 42k chars of which 99%
|
||||
was pasted history. A fresh subagent needs its task, the interfaces it
|
||||
touches, and the global constraints. Nothing else.
|
||||
- Dispatch fix subagents for Critical and Important findings. Record Minor
|
||||
findings and move on — then paste the accumulated Minor findings into the
|
||||
final whole-branch review dispatch so it can triage which must be fixed
|
||||
findings in the progress ledger as you go, and point the final
|
||||
whole-branch review at that list so it can triage which must be fixed
|
||||
before merge. A roll-up nobody reads is a silent discard.
|
||||
- The final whole-branch review gets a package too: run
|
||||
`scripts/review-package MERGE_BASE HEAD` (MERGE_BASE = the commit the
|
||||
branch started from, e.g. `git merge-base main HEAD`) and include the
|
||||
printed path in the final review dispatch, so the final reviewer reads
|
||||
one file instead of re-deriving the branch diff with git commands.
|
||||
- Every fix dispatch carries the implementer contract: the fix subagent
|
||||
re-runs the tests covering its change and reports the results. A fix
|
||||
report without test evidence is incomplete — do not re-review on top of
|
||||
it.
|
||||
re-runs the tests covering its change and reports the results. Name the
|
||||
covering test files in the dispatch — a one-line fix does not need the
|
||||
whole suite. A fix report without test evidence is incomplete — do not
|
||||
re-review on top of it.
|
||||
- If the final whole-branch review returns findings, dispatch ONE fix
|
||||
subagent with the complete findings list — not one fixer per finding.
|
||||
Per-finding fixers each rebuild context and re-run suites; a real
|
||||
session's final-review fix wave cost more than all its tasks combined.
|
||||
|
||||
## File Handoffs
|
||||
|
||||
Everything you paste into a dispatch prompt — and everything a subagent
|
||||
prints back — stays resident in your context for the rest of the session
|
||||
and is re-read on every later turn. Hand artifacts over as files:
|
||||
|
||||
- **Task brief:** before dispatching an implementer, run this skill's
|
||||
`scripts/task-brief PLAN_FILE N` — it extracts the task's full text to a
|
||||
uniquely named file and prints the path. Compose the dispatch so the
|
||||
brief stays the single source of requirements. Your dispatch should
|
||||
contain: (1) one line on where this task fits in the project; (2) the
|
||||
brief path, introduced as "read this first — it is your requirements,
|
||||
with the exact values to use verbatim"; (3) interfaces and decisions
|
||||
from earlier tasks that the brief cannot know; (4) your resolution of
|
||||
any ambiguity you noticed in the brief; (5) the report-file path and
|
||||
report contract. Exact values (numbers, magic strings, signatures, test
|
||||
cases) appear only in the brief.
|
||||
- **Report file:** name the implementer's report file after the brief
|
||||
(brief `…/task-N-brief.md` → report `…/task-N-report.md`) and put it in
|
||||
the dispatch prompt. The implementer writes the full report there and
|
||||
returns only status, commits, a one-line test summary, and concerns.
|
||||
- **Reviewer inputs:** the task reviewer gets three paths — the same brief
|
||||
file, the report file, and the review package — plus the global
|
||||
constraints that bind the task.
|
||||
- Fix dispatches append their fix report (with test results) to the same
|
||||
report file and return a short summary; re-reviews read the updated file.
|
||||
|
||||
## Durable Progress
|
||||
|
||||
Conversation memory does not survive compaction. In real sessions,
|
||||
controllers that lost their place have re-dispatched entire completed task
|
||||
sequences — the single most expensive failure observed. Track progress in
|
||||
a ledger file, not only in todos.
|
||||
|
||||
- At skill start, check for a ledger:
|
||||
`cat "$(git rev-parse --git-path sdd)/progress.md"`. Tasks listed there
|
||||
as complete are DONE — do not re-dispatch them; resume at the first task
|
||||
not marked complete.
|
||||
- When a task's review comes back clean, append one line to the ledger in
|
||||
the same message as your other bookkeeping:
|
||||
`Task N: complete (commits <base7>..<head7>, review clean)`.
|
||||
- The ledger is your recovery map: the commits it names exist in git even
|
||||
when your context no longer remembers creating them. After compaction,
|
||||
trust the ledger and `git log` over your own recollection.
|
||||
|
||||
## Prompt Templates
|
||||
|
||||
@@ -182,13 +242,11 @@ final whole-branch review. When you fill a reviewer template:
|
||||
You: I'm using Subagent-Driven Development to execute this plan.
|
||||
|
||||
[Read plan file once: docs/superpowers/plans/feature-plan.md]
|
||||
[Extract all 5 tasks with full text and context]
|
||||
[Create todos for all tasks]
|
||||
|
||||
Task 1: Hook installation script
|
||||
|
||||
[Get Task 1 text and context (already extracted)]
|
||||
[Dispatch implementation subagent with full task text + context]
|
||||
[Run task-brief for Task 1; dispatch implementer with brief + report paths + context]
|
||||
|
||||
Implementer: "Before I begin - should the hook be installed at user or system level?"
|
||||
|
||||
@@ -209,8 +267,7 @@ Task reviewer: Spec ✅ - all requirements met, nothing extra.
|
||||
|
||||
Task 2: Recovery modes
|
||||
|
||||
[Get Task 2 text and context (already extracted)]
|
||||
[Dispatch implementation subagent with full task text + context]
|
||||
[Run task-brief for Task 2; dispatch implementer with brief + report paths + context]
|
||||
|
||||
Implementer: [No questions, proceeds]
|
||||
Implementer:
|
||||
@@ -256,8 +313,8 @@ Done!
|
||||
- Review checkpoints automatic
|
||||
|
||||
**Efficiency gains:**
|
||||
- No file reading overhead (controller provides full text)
|
||||
- Controller curates exactly what context is needed
|
||||
- Controller curates exactly what context is needed; bulk artifacts move
|
||||
as files, not pasted text
|
||||
- Subagent gets complete information upfront
|
||||
- Questions surfaced before work begins (not after)
|
||||
|
||||
@@ -281,7 +338,8 @@ Done!
|
||||
- Skip task review, or accept a report missing either verdict (spec compliance AND task quality are both required)
|
||||
- Proceed with unfixed issues
|
||||
- Dispatch multiple implementation subagents in parallel (conflicts)
|
||||
- Make subagent read plan file (provide full text instead)
|
||||
- Make a subagent read the whole plan file (hand it its task brief —
|
||||
`scripts/task-brief` — instead)
|
||||
- Skip scene-setting context (subagent needs to understand where task fits)
|
||||
- Ignore subagent questions (answer before letting them proceed)
|
||||
- Accept "close enough" on spec compliance (reviewer found spec issues = not done)
|
||||
@@ -294,6 +352,8 @@ Done!
|
||||
(`scripts/review-package BASE HEAD`) and name the printed path in the
|
||||
prompt
|
||||
- Move to next task while the review has open Critical/Important issues
|
||||
- Re-dispatch a task the progress ledger already marks complete — check
|
||||
the ledger (and `git log`) after any compaction or resume
|
||||
|
||||
**If subagent asks questions:**
|
||||
- Answer clearly and completely
|
||||
|
||||
@@ -5,12 +5,15 @@ Use this template when dispatching an implementer subagent.
|
||||
```
|
||||
Subagent (general-purpose):
|
||||
description: "Implement Task N: [task name]"
|
||||
model: [MODEL — REQUIRED: choose per SKILL.md Model Selection; an omitted
|
||||
model silently inherits the session's most expensive one]
|
||||
prompt: |
|
||||
You are implementing Task N: [task name]
|
||||
|
||||
## Task Description
|
||||
|
||||
[FULL TEXT of task from plan - paste it here, don't make subagent read file]
|
||||
Read your task brief first: [BRIEF_FILE]
|
||||
It contains the full task text from the plan.
|
||||
|
||||
## Context
|
||||
|
||||
@@ -104,13 +107,12 @@ Subagent (general-purpose):
|
||||
## After Review Findings
|
||||
|
||||
If a reviewer finds issues and you fix them, re-run the tests that cover
|
||||
the amended code and include the results in your fix report. Reviewers
|
||||
the amended code and append the results to your report file. Reviewers
|
||||
will not re-run tests for you — your report is the test evidence.
|
||||
|
||||
## Report Format
|
||||
|
||||
When done, report:
|
||||
- **Status:** DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
|
||||
Write your full report to [REPORT_FILE]:
|
||||
- What you implemented (or what you attempted, if blocked)
|
||||
- What you tested and test results
|
||||
- **TDD Evidence** (if TDD was required for this task):
|
||||
@@ -120,6 +122,17 @@ Subagent (general-purpose):
|
||||
- Self-review findings (if any)
|
||||
- Any issues or concerns
|
||||
|
||||
Then report back with ONLY (under 15 lines — the detail lives in the
|
||||
report file):
|
||||
- **Status:** DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
|
||||
- Commits created (short SHA + subject)
|
||||
- One-line test summary (e.g. "14/14 passing, output pristine")
|
||||
- Your concerns, if any
|
||||
- The report file path
|
||||
|
||||
If BLOCKED or NEEDS_CONTEXT, put the specifics in the final message
|
||||
itself — the controller acts on it directly.
|
||||
|
||||
Use DONE_WITH_CONCERNS if you completed the work but have doubts about correctness.
|
||||
Use BLOCKED if you cannot complete the task. Use NEEDS_CONTEXT if you need
|
||||
information that wasn't provided. Never silently produce work you're unsure about.
|
||||
|
||||
42
skills/subagent-driven-development/scripts/task-brief
Executable file
42
skills/subagent-driven-development/scripts/task-brief
Executable file
@@ -0,0 +1,42 @@
|
||||
#!/usr/bin/env bash
|
||||
# Extract one task's full text from an implementation plan into a file the
|
||||
# implementer reads in one call, so the task text never has to be pasted
|
||||
# through the controller's context.
|
||||
#
|
||||
# Usage: task-brief PLAN_FILE TASK_NUMBER [OUTFILE]
|
||||
# Default OUTFILE: <git-dir>/sdd/task-<N>-brief.md — unique per repo
|
||||
# instance, so concurrent sessions cannot collide.
|
||||
set -euo pipefail
|
||||
|
||||
if [ $# -lt 2 ] || [ $# -gt 3 ]; then
|
||||
echo "usage: task-brief PLAN_FILE TASK_NUMBER [OUTFILE]" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
plan=$1
|
||||
n=$2
|
||||
[ -f "$plan" ] || { echo "no such plan file: $plan" >&2; exit 2; }
|
||||
|
||||
if [ $# -eq 3 ]; then
|
||||
out=$3
|
||||
else
|
||||
dir=$(git rev-parse --git-path sdd)
|
||||
mkdir -p "$dir"
|
||||
dir=$(cd "$dir" && pwd)
|
||||
out="$dir/task-${n}-brief.md"
|
||||
fi
|
||||
|
||||
awk -v n="$n" '
|
||||
/^```/ { infence = !infence }
|
||||
!infence && /^#+[ \t]+Task[ \t]+[0-9]+/ {
|
||||
intask = ($0 ~ ("^#+[ \t]+Task[ \t]+" n "([^0-9]|$)"))
|
||||
}
|
||||
intask { print }
|
||||
' "$plan" > "$out"
|
||||
|
||||
if [ ! -s "$out" ]; then
|
||||
echo "task ${n} not found in ${plan} (no heading matching 'Task ${n}')" >&2
|
||||
exit 3
|
||||
fi
|
||||
|
||||
echo "wrote ${out}: $(wc -l < "$out" | tr -d ' ') lines"
|
||||
@@ -10,6 +10,8 @@ more, nothing less) and is well-built (clean, tested, maintainable)
|
||||
```
|
||||
Subagent (general-purpose):
|
||||
description: "Review Task N (spec + quality)"
|
||||
model: [MODEL — REQUIRED: choose per SKILL.md Model Selection; an omitted
|
||||
model silently inherits the session's most expensive one]
|
||||
prompt: |
|
||||
You are reviewing one task's implementation: first whether it matches its
|
||||
requirements, then whether it is well-built. This is a task-scoped gate,
|
||||
@@ -18,11 +20,14 @@ Subagent (general-purpose):
|
||||
|
||||
## What Was Requested
|
||||
|
||||
[TASK_REQUIREMENTS]
|
||||
Read the task brief: [BRIEF_FILE]
|
||||
|
||||
Global constraints from the spec/design that bind this task:
|
||||
[GLOBAL_CONSTRAINTS]
|
||||
|
||||
## What the Implementer Claims They Built
|
||||
|
||||
[DESCRIPTION]
|
||||
Read the implementer's report: [REPORT_FILE]
|
||||
|
||||
## Diff Under Review
|
||||
|
||||
@@ -32,14 +37,17 @@ Subagent (general-purpose):
|
||||
|
||||
Read the diff file once — it contains the commit list, a stat summary,
|
||||
and the full diff with surrounding context, and it is your view of the
|
||||
change. Do not re-run git commands or re-read the files it already
|
||||
shows. If the diff file is missing, fetch the diff yourself:
|
||||
change. The diff's context lines ARE the changed files: do not Read a
|
||||
changed file separately unless a hunk you must judge is cut off
|
||||
mid-function — and say so in your report. Do not re-run git commands.
|
||||
If the diff file is missing, fetch the diff yourself:
|
||||
`git diff --stat [BASE_SHA]..[HEAD_SHA]` and `git diff [BASE_SHA]..[HEAD_SHA]`.
|
||||
Only read files in this diff. Do not crawl the broader codebase. Inspect
|
||||
code outside the diff only to evaluate a concrete risk you can name — and
|
||||
name it in your report. Cross-cutting changes are legitimate named risks:
|
||||
if the diff changes lock ordering, a function or API contract, or shared
|
||||
mutable state, checking the call sites is the right method.
|
||||
Do not crawl the broader codebase. Inspect code outside the diff only
|
||||
to evaluate a concrete risk you can name — one focused check per named
|
||||
risk, and name both the risk and what you checked in your report.
|
||||
Cross-cutting changes are legitimate named risks: if the diff changes
|
||||
lock ordering, a function or API contract, or shared mutable state,
|
||||
checking the call sites is the right method.
|
||||
|
||||
Your review is read-only on this checkout. Do not mutate the working
|
||||
tree, the index, HEAD, or branch state in any way.
|
||||
@@ -147,10 +155,13 @@ Subagent (general-purpose):
|
||||
```
|
||||
|
||||
**Placeholders:**
|
||||
- `[TASK_REQUIREMENTS]` — full task text plus the spec/design's global
|
||||
constraints that bind it (version floors, naming and copy rules, platform
|
||||
requirements)
|
||||
- `[DESCRIPTION]` — what the implementer reports they built
|
||||
- `[MODEL]` — REQUIRED: reviewer model per SKILL.md Model Selection
|
||||
- `[BRIEF_FILE]` — REQUIRED: the task brief file (`scripts/task-brief PLAN N`
|
||||
prints the path; same file the implementer worked from)
|
||||
- `[GLOBAL_CONSTRAINTS]` — the spec/design's global constraints that bind
|
||||
this task (version floors, naming and copy rules, platform requirements)
|
||||
- `[REPORT_FILE]` — REQUIRED: the file the implementer wrote its detailed
|
||||
report to
|
||||
- `[BASE_SHA]` — commit before this task
|
||||
- `[HEAD_SHA]` — current commit
|
||||
- `[DIFF_FILE]` — REQUIRED: the path the controller wrote the review
|
||||
|
||||
Reference in New Issue
Block a user