mirror of https://github.com/obra/superpowers.git synced 2026-06-10 20:59:05 +08:00

Files

Jesse Vincent 5e2907fc4f Close the Minor-severity escape hatch

With merged review, a planted verbatim-duplication defect shipped: the
reviewer rated it Minor (YAGNI) under the strict cannot-be-trusted
definition of Important, and the Minor-rolls-up rule meant no fix was
ever dispatched and the final review never saw the finding. Calibration
now names merge-blocking maintainability damage (verbatim duplication,
swallowed errors, assertion-free tests) as Important, and controllers
must paste accumulated Minor findings into the final review dispatch.

2026-06-10 02:09:10 -07:00

15 KiB

Raw Blame History

name, description

name	description
subagent-driven-development	Use when executing implementation plans with independent tasks in the current session

Subagent-Driven Development

Execute plan by dispatching a fresh implementer subagent per task, a combined task review (spec compliance + code quality, one reviewer, one reading of the diff) after each, and a broad whole-branch review at the end.

Why subagents: You delegate tasks to specialized agents with isolated context. By precisely crafting their instructions and context, you ensure they stay focused and succeed at their task. They should never inherit your session's context or history — you construct exactly what they need. This also preserves your own context for coordination work.

Core principle: Fresh subagent per task + one task review (spec + quality verdicts) + broad final review = high quality, fast iteration

Continuous execution: Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.

When to Use

digraph when_to_use {
    "Have implementation plan?" [shape=diamond];
    "Tasks mostly independent?" [shape=diamond];
    "Stay in this session?" [shape=diamond];
    "subagent-driven-development" [shape=box];
    "executing-plans" [shape=box];
    "Manual execution or brainstorm first" [shape=box];

    "Have implementation plan?" -> "Tasks mostly independent?" [label="yes"];
    "Have implementation plan?" -> "Manual execution or brainstorm first" [label="no"];
    "Tasks mostly independent?" -> "Stay in this session?" [label="yes"];
    "Tasks mostly independent?" -> "Manual execution or brainstorm first" [label="no - tightly coupled"];
    "Stay in this session?" -> "subagent-driven-development" [label="yes"];
    "Stay in this session?" -> "executing-plans" [label="no - parallel session"];
}

vs. Executing Plans (parallel session):

Same session (no context switch)
Fresh subagent per task (no context pollution)
Combined review after each task (spec compliance + code quality verdicts), broad review at the end
Faster iteration (no human-in-loop between tasks)

The Process

digraph process {
    rankdir=TB;

    subgraph cluster_per_task {
        label="Per Task";
        "Dispatch implementer subagent (./implementer-prompt.md)" [shape=box];
        "Implementer subagent asks questions?" [shape=diamond];
        "Answer questions, provide context" [shape=box];
        "Implementer subagent implements, tests, commits, self-reviews" [shape=box];
        "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [shape=box];
        "Task reviewer reports spec ✅ and quality approved?" [shape=diamond];
        "Dispatch fix subagent for Critical/Important findings" [shape=box];
        "Mark task complete in todo list" [shape=box];
    }

    "Read plan, extract all tasks with full text, note context, create todos" [shape=box];
    "More tasks remain?" [shape=diamond];
    "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [shape=box];
    "Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];

    "Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
    "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
    "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
    "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
    "Implementer subagent asks questions?" -> "Implementer subagent implements, tests, commits, self-reviews" [label="no"];
    "Implementer subagent implements, tests, commits, self-reviews" -> "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)";
    "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)" -> "Task reviewer reports spec ✅ and quality approved?";
    "Task reviewer reports spec ✅ and quality approved?" -> "Dispatch fix subagent for Critical/Important findings" [label="no"];
    "Dispatch fix subagent for Critical/Important findings" -> "Run git diff, dispatch task reviewer subagent (./task-reviewer-prompt.md)" [label="re-review"];
    "Task reviewer reports spec ✅ and quality approved?" -> "Mark task complete in todo list" [label="yes"];
    "Mark task complete in todo list" -> "More tasks remain?";
    "More tasks remain?" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="yes"];
    "More tasks remain?" -> "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" [label="no"];
    "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" -> "Use superpowers:finishing-a-development-branch";
}

Model Selection

Use the least powerful model that can handle each role to conserve cost and increase speed.

Mechanical implementation tasks (isolated functions, clear specs, 1-2 files): use a fast, cheap model. Most implementation tasks are mechanical when the plan is well-specified.

Integration and judgment tasks (multi-file coordination, pattern matching, debugging): use a standard model.

Architecture and design tasks: use the most capable available model.

Review tasks: choose the model with the same judgment, scaled to the diff's size, complexity, and risk. A small mechanical diff does not need the most capable model; a subtle concurrency change does.

Always specify the model explicitly when dispatching a subagent. An omitted model inherits your session's model — often the most capable and most expensive — which silently defeats this section.

Turn count beats token price. Wall-clock and context cost scale with how many turns a subagent takes, and the cheapest models routinely take 2-3× the turns on multi-step work — costing more overall. Use a mid-tier model as the floor for implementers and reviewers; reserve the cheapest tier for single-file mechanical fixes.

Task complexity signals (implementation tasks):

Touches 1-2 files with a complete spec → cheap model
Touches multiple files with integration concerns → standard model
Requires design judgment or broad codebase understanding → most capable model

Handling Implementer Status

Implementer subagents report one of four statuses. Handle each appropriately:

DONE: Run git diff BASE..HEAD, then dispatch the task reviewer.

DONE_WITH_CONCERNS: The implementer completed the work but flagged doubts. Read the concerns before proceeding. If the concerns are about correctness or scope, address them before review. If they're observations (e.g., "this file is getting large"), note them and proceed to review.

NEEDS_CONTEXT: The implementer needs information that wasn't provided. Provide the missing context and re-dispatch.

BLOCKED: The implementer cannot complete the task. Assess the blocker:

If it's a context problem, provide more context and re-dispatch with the same model
If the task requires more reasoning, re-dispatch with a more capable model
If the task is too large, break it into smaller pieces
If the plan itself is wrong, escalate to the human

Never ignore an escalation or force the same model to retry without changes. If the implementer said it's stuck, something needs to change.

Handling Reviewer ⚠️ Items

The task reviewer may report "⚠️ Cannot verify from diff" items — requirements that live in unchanged code or span tasks. These do not block the rest of the review, but you must resolve each one yourself before marking the task complete: you hold the plan and cross-task context the reviewer lacks. If you confirm an item is a real gap, treat it as a failed spec review — send it back to the implementer and re-review.

Constructing Reviewer Prompts

Per-task reviews are task-scoped gates. The broad review happens once, at the final whole-branch review. When you fill a reviewer template:

Do not add open-ended directives like "check all uses" or "run race tests if useful" without a concrete, task-specific reason
Do not ask a reviewer to re-run tests the implementer already ran on the same code — the implementer's report carries the test evidence
Do not pre-judge findings for the reviewer — never instruct a reviewer to ignore or not flag a specific issue. If you believe a finding would be a false positive, let the reviewer raise it and adjudicate it in the review loop. If the prompt you are writing contains "do not flag," "don't treat X as a defect," "at most Minor," or "the plan chose" — stop: you are pre-judging, usually to spare yourself a review loop.
Include the spec/design's global constraints that bind the task (version floors, naming and copy rules, platform requirements) in the requirements you paste — a reviewer can only enforce what you hand them.
Run git diff BASE..HEAD yourself and paste the output into the reviewer prompt (--stat plus the relevant hunks if it exceeds a few hundred lines). A reviewer with the diff in hand needs few or no tool calls; do not make reviewers re-derive the diff.
Dispatch fix subagents for Critical and Important findings. Record Minor findings and move on — then paste the accumulated Minor findings into the final whole-branch review dispatch so it can triage which must be fixed before merge. A roll-up nobody reads is a silent discard.

Prompt Templates

implementer-prompt.md - Dispatch implementer subagent
task-reviewer-prompt.md - Dispatch task reviewer subagent (spec compliance + code quality, one dispatch)
Final whole-branch review: use superpowers:requesting-code-review's code-reviewer.md

Example Workflow

You: I'm using Subagent-Driven Development to execute this plan.

[Read plan file once: docs/superpowers/plans/feature-plan.md]
[Extract all 5 tasks with full text and context]
[Create todos for all tasks]

Task 1: Hook installation script

[Get Task 1 text and context (already extracted)]
[Dispatch implementation subagent with full task text + context]

Implementer: "Before I begin - should the hook be installed at user or system level?"

You: "User level (~/.config/superpowers/hooks/)"

Implementer: "Got it. Implementing now..."
[Later] Implementer:
  - Implemented install-hook command
  - Added tests, 5/5 passing
  - Self-review: Found I missed --force flag, added it
  - Committed

[Run git diff, dispatch task reviewer with the diff pasted in]
Task reviewer: Spec ✅ - all requirements met, nothing extra.
  Strengths: Good test coverage, clean. Issues: None. Task quality: Approved.

[Mark Task 1 complete]

Task 2: Recovery modes

[Get Task 2 text and context (already extracted)]
[Dispatch implementation subagent with full task text + context]

Implementer: [No questions, proceeds]
Implementer:
  - Added verify/repair modes
  - 8/8 tests passing
  - Self-review: All good
  - Committed

[Run git diff, dispatch task reviewer with the diff pasted in]
Task reviewer: Spec ❌:
  - Missing: Progress reporting (spec says "report every 100 items")
  - Extra: Added --json flag (not requested)
  Issues (Important): Magic number (100)

[Dispatch fix subagent with all findings]
Fixer: Removed --json flag, added progress reporting, extracted PROGRESS_INTERVAL constant

[Task reviewer reviews again]
Task reviewer: Spec ✅. Task quality: Approved.

[Mark Task 2 complete]

...

[After all tasks]
[Dispatch final code-reviewer]
Final reviewer: All requirements met, ready to merge

Done!

Advantages

vs. Manual execution:

Subagents follow TDD naturally
Fresh context per task (no confusion)
Parallel-safe (subagents don't interfere)
Subagent can ask questions (before AND during work)

vs. Executing Plans:

Same session (no handoff)
Continuous progress (no waiting)
Review checkpoints automatic

Efficiency gains:

No file reading overhead (controller provides full text)
Controller curates exactly what context is needed
Subagent gets complete information upfront
Questions surfaced before work begins (not after)

Quality gates:

Self-review catches issues before handoff
Task review carries two verdicts: spec compliance and code quality
Review loops ensure fixes actually work
Spec compliance prevents over/under-building
Code quality ensures implementation is well-built

Cost:

More subagent invocations (implementer + reviewer per task)
Controller does more prep work (extracting all tasks upfront)
Review loops add iterations
But catches issues early (cheaper than debugging later)

Red Flags

Never:

Start implementation on main/master branch without explicit user consent
Skip task review, or accept a report missing either verdict (spec compliance AND task quality are both required)
Proceed with unfixed issues
Dispatch multiple implementation subagents in parallel (conflicts)
Make subagent read plan file (provide full text instead)
Skip scene-setting context (subagent needs to understand where task fits)
Ignore subagent questions (answer before letting them proceed)
Accept "close enough" on spec compliance (reviewer found spec issues = not done)
Skip review loops (reviewer found issues = implementer fixes = review again)
Let implementer self-review replace actual review (both are needed)
Tell a reviewer what not to flag, or pre-rate a finding's severity in the dispatch prompt ("treat it as Minor at most") — the plan's example code is a starting point, not evidence that its weaknesses were chosen
Move to next task while the review has open Critical/Important issues

If subagent asks questions:

Answer clearly and completely
Provide additional context if needed
Don't rush them into implementation

If reviewer finds issues:

Implementer (same subagent) fixes them
Reviewer reviews again
Repeat until approved
Don't skip the re-review

If subagent fails task:

Dispatch fix subagent with specific instructions
Don't try to fix manually (context pollution)

Integration

Required workflow skills:

superpowers:using-git-worktrees - Ensures isolated workspace (creates one or verifies existing)
superpowers:writing-plans - Creates the plan this skill executes
superpowers:requesting-code-review - Code review template for the final whole-branch review
superpowers:finishing-a-development-branch - Complete development after all tasks

Subagents should use:

superpowers:test-driven-development - Subagents follow TDD for each task

Alternative workflow:

superpowers:executing-plans - Use for parallel session instead of same-session execution

15 KiB Raw Blame History Unescape Escape