Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget

Validated 2026-06-10 (all gates pass): go-fractals 54.1-54.7 min / $12.81-14.31
(baseline 64.9 / $16.07); svelte-todo 55.0 min / 19.3M / $14.99 (baseline
79.7 / 27.3M / $20.98); planted-defect pass $2.77. Dispatch-model discipline
3/3 runs after moving model: into the templates as a REQUIRED line.
Full experiment log: evals docs/experiments/2026-06-10-sdd-cost-experiments.md
This commit is contained in:
Jesse Vincent
2026-06-10 13:08:06 -07:00
parent 926096a1d7
commit b81f35bb1e
4 changed files with 161 additions and 35 deletions

View File

@@ -54,15 +54,15 @@ digraph process {
"Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [shape=box];
"Task reviewer reports spec ✅ and quality approved?" [shape=diamond];
"Dispatch fix subagent for Critical/Important findings" [shape=box];
"Mark task complete in todo list" [shape=box];
"Mark task complete in todo list and progress ledger" [shape=box];
}
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
"Read plan, note context and global constraints, create todos" [shape=box];
"More tasks remain?" [shape=diamond];
"Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [shape=box];
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
"Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
"Read plan, note context and global constraints, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
@@ -71,8 +71,8 @@ digraph process {
"Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" -> "Task reviewer reports spec ✅ and quality approved?";
"Task reviewer reports spec ✅ and quality approved?" -> "Dispatch fix subagent for Critical/Important findings" [label="no"];
"Dispatch fix subagent for Critical/Important findings" -> "Write diff file, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [label="re-review"];
"Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list" [label="yes"];
"Mark task complete in todo list" -> "More tasks remain?";
"Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list and progress ledger" [label="yes"];
"Mark task complete in todo list and progress ledger" -> "More tasks remain?";
"More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
"More tasks remain?" -> "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [label="no"];
"Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" -> "Use superpowers:finishing-a-development-branch";
@@ -161,14 +161,74 @@ final whole-branch review. When you fill a reviewer template:
the commit list, stat summary, and full diff with context in one Read
call. Use the BASE you recorded before dispatching the implementer —
never `HEAD~1`, which silently truncates multi-commit tasks.
- A dispatch prompt describes one task, not the session's history. Do not
paste accumulated prior-task summaries ("state after Tasks 1-3") into
later dispatches — a real session's dispatch hit 42k chars of which 99%
was pasted history. A fresh subagent needs its task, the interfaces it
touches, and the global constraints. Nothing else.
- Dispatch fix subagents for Critical and Important findings. Record Minor
findings and move on — then paste the accumulated Minor findings into the
final whole-branch review dispatch so it can triage which must be fixed
findings in the progress ledger as you go, and point the final
whole-branch review at that list so it can triage which must be fixed
before merge. A roll-up nobody reads is a silent discard.
- The final whole-branch review gets a package too: run
`scripts/review-package MERGE_BASE HEAD` (MERGE_BASE = the commit the
branch started from, e.g. `git merge-base main HEAD`) and include the
printed path in the final review dispatch, so the final reviewer reads
one file instead of re-deriving the branch diff with git commands.
- Every fix dispatch carries the implementer contract: the fix subagent
re-runs the tests covering its change and reports the results. A fix
report without test evidence is incomplete — do not re-review on top of
it.
re-runs the tests covering its change and reports the results. Name the
covering test files in the dispatch — a one-line fix does not need the
whole suite. A fix report without test evidence is incomplete — do not
re-review on top of it.
- If the final whole-branch review returns findings, dispatch ONE fix
subagent with the complete findings list — not one fixer per finding.
Per-finding fixers each rebuild context and re-run suites; a real
session's final-review fix wave cost more than all its tasks combined.
## File Handoffs
Everything you paste into a dispatch prompt — and everything a subagent
prints back — stays resident in your context for the rest of the session
and is re-read on every later turn. Hand artifacts over as files:
- **Task brief:** before dispatching an implementer, run this skill's
`scripts/task-brief PLAN_FILE N` — it extracts the task's full text to a
uniquely named file and prints the path. Compose the dispatch so the
brief stays the single source of requirements. Your dispatch should
contain: (1) one line on where this task fits in the project; (2) the
brief path, introduced as "read this first — it is your requirements,
with the exact values to use verbatim"; (3) interfaces and decisions
from earlier tasks that the brief cannot know; (4) your resolution of
any ambiguity you noticed in the brief; (5) the report-file path and
report contract. Exact values (numbers, magic strings, signatures, test
cases) appear only in the brief.
- **Report file:** name the implementer's report file after the brief
(brief `…/task-N-brief.md` → report `…/task-N-report.md`) and put it in
the dispatch prompt. The implementer writes the full report there and
returns only status, commits, a one-line test summary, and concerns.
- **Reviewer inputs:** the task reviewer gets three paths — the same brief
file, the report file, and the review package — plus the global
constraints that bind the task.
- Fix dispatches append their fix report (with test results) to the same
report file and return a short summary; re-reviews read the updated file.
## Durable Progress
Conversation memory does not survive compaction. In real sessions,
controllers that lost their place have re-dispatched entire completed task
sequences — the single most expensive failure observed. Track progress in
a ledger file, not only in todos.
- At skill start, check for a ledger:
`cat "$(git rev-parse --git-path sdd)/progress.md"`. Tasks listed there
as complete are DONE — do not re-dispatch them; resume at the first task
not marked complete.
- When a task's review comes back clean, append one line to the ledger in
the same message as your other bookkeeping:
`Task N: complete (commits <base7>..<head7>, review clean)`.
- The ledger is your recovery map: the commits it names exist in git even
when your context no longer remembers creating them. After compaction,
trust the ledger and `git log` over your own recollection.
## Prompt Templates
@@ -182,13 +242,11 @@ final whole-branch review. When you fill a reviewer template:
You: I'm using Subagent-Driven Development to execute this plan.
[Read plan file once: docs/superpowers/plans/feature-plan.md]
[Extract all 5 tasks with full text and context]
[Create todos for all tasks]
Task 1: Hook installation script
[Get Task 1 text and context (already extracted)]
[Dispatch implementation subagent with full task text + context]
[Run task-brief for Task 1; dispatch implementer with brief + report paths + context]
Implementer: "Before I begin - should the hook be installed at user or system level?"
@@ -209,8 +267,7 @@ Task reviewer: Spec ✅ - all requirements met, nothing extra.
Task 2: Recovery modes
[Get Task 2 text and context (already extracted)]
[Dispatch implementation subagent with full task text + context]
[Run task-brief for Task 2; dispatch implementer with brief + report paths + context]
Implementer: [No questions, proceeds]
Implementer:
@@ -256,8 +313,8 @@ Done!
- Review checkpoints automatic
**Efficiency gains:**
- No file reading overhead (controller provides full text)
- Controller curates exactly what context is needed
- Controller curates exactly what context is needed; bulk artifacts move
as files, not pasted text
- Subagent gets complete information upfront
- Questions surfaced before work begins (not after)
@@ -281,7 +338,8 @@ Done!
- Skip task review, or accept a report missing either verdict (spec compliance AND task quality are both required)
- Proceed with unfixed issues
- Dispatch multiple implementation subagents in parallel (conflicts)
- Make subagent read plan file (provide full text instead)
- Make a subagent read the whole plan file (hand it its task brief —
`scripts/task-brief` — instead)
- Skip scene-setting context (subagent needs to understand where task fits)
- Ignore subagent questions (answer before letting them proceed)
- Accept "close enough" on spec compliance (reviewer found spec issues = not done)
@@ -294,6 +352,8 @@ Done!
(`scripts/review-package BASE HEAD`) and name the printed path in the
prompt
- Move to next task while the review has open Critical/Important issues
- Re-dispatch a task the progress ledger already marks complete — check
the ledger (and `git log`) after any compaction or resume
**If subagent asks questions:**
- Answer clearly and completely

View File

@@ -5,12 +5,15 @@ Use this template when dispatching an implementer subagent.
```
Subagent (general-purpose):
description: "Implement Task N: [task name]"
model: [MODEL — REQUIRED: choose per SKILL.md Model Selection; an omitted
model silently inherits the session's most expensive one]
prompt: |
You are implementing Task N: [task name]
## Task Description
[FULL TEXT of task from plan - paste it here, don't make subagent read file]
Read your task brief first: [BRIEF_FILE]
It contains the full task text from the plan.
## Context
@@ -104,13 +107,12 @@ Subagent (general-purpose):
## After Review Findings
If a reviewer finds issues and you fix them, re-run the tests that cover
the amended code and include the results in your fix report. Reviewers
the amended code and append the results to your report file. Reviewers
will not re-run tests for you — your report is the test evidence.
## Report Format
When done, report:
- **Status:** DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
Write your full report to [REPORT_FILE]:
- What you implemented (or what you attempted, if blocked)
- What you tested and test results
- **TDD Evidence** (if TDD was required for this task):
@@ -120,6 +122,17 @@ Subagent (general-purpose):
- Self-review findings (if any)
- Any issues or concerns
Then report back with ONLY (under 15 lines — the detail lives in the
report file):
- **Status:** DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT
- Commits created (short SHA + subject)
- One-line test summary (e.g. "14/14 passing, output pristine")
- Your concerns, if any
- The report file path
If BLOCKED or NEEDS_CONTEXT, put the specifics in the final message
itself — the controller acts on it directly.
Use DONE_WITH_CONCERNS if you completed the work but have doubts about correctness.
Use BLOCKED if you cannot complete the task. Use NEEDS_CONTEXT if you need
information that wasn't provided. Never silently produce work you're unsure about.

View File

@@ -0,0 +1,42 @@
#!/usr/bin/env bash
# Extract one task's full text from an implementation plan into a file the
# implementer reads in one call, so the task text never has to be pasted
# through the controller's context.
#
# Usage: task-brief PLAN_FILE TASK_NUMBER [OUTFILE]
# Default OUTFILE: <git-dir>/sdd/task-<N>-brief.md — unique per repo
# instance, so concurrent sessions cannot collide.
set -euo pipefail
if [ $# -lt 2 ] || [ $# -gt 3 ]; then
echo "usage: task-brief PLAN_FILE TASK_NUMBER [OUTFILE]" >&2
exit 2
fi
plan=$1
n=$2
[ -f "$plan" ] || { echo "no such plan file: $plan" >&2; exit 2; }
if [ $# -eq 3 ]; then
out=$3
else
dir=$(git rev-parse --git-path sdd)
mkdir -p "$dir"
dir=$(cd "$dir" && pwd)
out="$dir/task-${n}-brief.md"
fi
awk -v n="$n" '
/^```/ { infence = !infence }
!infence && /^#+[ \t]+Task[ \t]+[0-9]+/ {
intask = ($0 ~ ("^#+[ \t]+Task[ \t]+" n "([^0-9]|$)"))
}
intask { print }
' "$plan" > "$out"
if [ ! -s "$out" ]; then
echo "task ${n} not found in ${plan} (no heading matching 'Task ${n}')" >&2
exit 3
fi
echo "wrote ${out}: $(wc -l < "$out" | tr -d ' ') lines"

View File

@@ -10,6 +10,8 @@ more, nothing less) and is well-built (clean, tested, maintainable)
```
Subagent (general-purpose):
description: "Review Task N (spec + quality)"
model: [MODEL — REQUIRED: choose per SKILL.md Model Selection; an omitted
model silently inherits the session's most expensive one]
prompt: |
You are reviewing one task's implementation: first whether it matches its
requirements, then whether it is well-built. This is a task-scoped gate,
@@ -18,11 +20,14 @@ Subagent (general-purpose):
## What Was Requested
[TASK_REQUIREMENTS]
Read the task brief: [BRIEF_FILE]
Global constraints from the spec/design that bind this task:
[GLOBAL_CONSTRAINTS]
## What the Implementer Claims They Built
[DESCRIPTION]
Read the implementer's report: [REPORT_FILE]
## Diff Under Review
@@ -32,14 +37,17 @@ Subagent (general-purpose):
Read the diff file once — it contains the commit list, a stat summary,
and the full diff with surrounding context, and it is your view of the
change. Do not re-run git commands or re-read the files it already
shows. If the diff file is missing, fetch the diff yourself:
change. The diff's context lines ARE the changed files: do not Read a
changed file separately unless a hunk you must judge is cut off
mid-function — and say so in your report. Do not re-run git commands.
If the diff file is missing, fetch the diff yourself:
`git diff --stat [BASE_SHA]..[HEAD_SHA]` and `git diff [BASE_SHA]..[HEAD_SHA]`.
Only read files in this diff. Do not crawl the broader codebase. Inspect
code outside the diff only to evaluate a concrete risk you can name — and
name it in your report. Cross-cutting changes are legitimate named risks:
if the diff changes lock ordering, a function or API contract, or shared
mutable state, checking the call sites is the right method.
Do not crawl the broader codebase. Inspect code outside the diff only
to evaluate a concrete risk you can name — one focused check per named
risk, and name both the risk and what you checked in your report.
Cross-cutting changes are legitimate named risks: if the diff changes
lock ordering, a function or API contract, or shared mutable state,
checking the call sites is the right method.
Your review is read-only on this checkout. Do not mutate the working
tree, the index, HEAD, or branch state in any way.
@@ -147,10 +155,13 @@ Subagent (general-purpose):
```
**Placeholders:**
- `[TASK_REQUIREMENTS]` — full task text plus the spec/design's global
constraints that bind it (version floors, naming and copy rules, platform
requirements)
- `[DESCRIPTION]` — what the implementer reports they built
- `[MODEL]` — REQUIRED: reviewer model per SKILL.md Model Selection
- `[BRIEF_FILE]` — REQUIRED: the task brief file (`scripts/task-brief PLAN N`
prints the path; same file the implementer worked from)
- `[GLOBAL_CONSTRAINTS]` — the spec/design's global constraints that bind
this task (version floors, naming and copy rules, platform requirements)
- `[REPORT_FILE]` — REQUIRED: the file the implementer wrote its detailed
report to
- `[BASE_SHA]` — commit before this task
- `[HEAD_SHA]` — current commit
- `[DIFF_FILE]` — REQUIRED: the path the controller wrote the review