Compare commits

..

3 Commits

Author SHA1 Message Date
Jesse Vincent
9d2b0e971d writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks
Builds on #1715's reference discipline: the two structures are framed as
the narrow exceptions for spec content subagents must see (they get the
plan or one task of it, never the spec). Exception wording micro-validated
as PRI-2173 arm B: adopts 6/6 with zero restatement leak (lowest spec-copy
rate of all arms, plans -15% bytes vs dev control); Global Constraints
header elicited 0/5->5/5 with verbatim values, Interfaces 0->100% signature
availability (L1 micros). Value = lens determinism + mechanical extraction
into task briefs and reviewer constraints blocks, plus fix-wave reduction
(1 vs 2-4 in L1 full runs); the values themselves survive reference
discipline by riding code blocks.
2026-06-11 15:48:18 -07:00
Drew Ritter
e5f337b89e fix(skills): plans reference the spec instead of restating it — end to end (SUP-333 A)
Consolidates the reference-discipline change with every consumer of it,
so this PR is independently mergeable (previously split across two
stacked PRs whose intermediate state left the SDD spec reviewer blind).

writing-plans: plans reference the spec — never restate, paraphrase,
or summarize it; spec owns WHAT/WHY, plan owns HOW; cite by path in
the header (**Spec:** template line) and by section where a task needs
context; No Placeholders repetition stays intra-plan; no-spec branch
scoped to conversational-requirements-only (eval-caught: an agent used
an unscoped no-spec branch to skip writing the spec entirely).

brainstorming: spec path loophole closed (claude shortened
docs/superpowers/specs/ to docs/specs/, documented run); an existing
differently-named docs dir is not a "user preference".

subagent-driven-development: Spec Context section — the controller
reads the plan-cited spec and pastes cited sections into implementer
and spec-reviewer prompts; the spec reviewer's diff-only rule gets a
spec-document exception. Without this, reference discipline starves
the pipeline of requirements.

executing-plans: Step 1 reads the spec the plan cites (the
non-subagent path; plans are no longer self-contained).

Eval evidence (quorum, full-stack text): cost-spec-plan-duplication
claude 3/3 pass (RED: 5/5 agents failed), codex pass, pi pass (the
683-line duplication RED agent); sdd-spec-context-consumed functional
pass with deterministic dispatch-prompt check;
writing-plans-no-spec-conversational 2/2 pass;
triggering-writing-plans canary 3/3.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 00:20:49 -07:00
Drew Ritter
0cb1960068 chore(evals): bump submodule for Claude Haiku target 2026-06-10 16:31:16 -07:00
3 changed files with 28 additions and 11 deletions

2
evals

Submodule evals updated: ff3ee83f94...f8e5a9949f

View File

@@ -11,8 +11,6 @@ Execute plan by dispatching fresh subagent per task, with two-stage review after
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
**Proportionality:** Review fanout scales with the change. When the entire plan is one trivial, fully-specified mechanical change — a log statement, a typo fix, a constant bump with no security or behavioral consequences — implement it directly (or with a single implementer subagent), verify per superpowers:verification-before-completion (run the relevant command, confirm output), commit, and skip all review subagents, including the final reviewer: three review dispatches cost more than a one-line diff. Trivial is a property of the diff — it changes no logic, no control flow, and nothing security-relevant — not of the plan's self-description. Any doubt means not trivial: run the full pipeline. Within a multi-task plan, never skip reviews, regardless of task size.
**Continuous execution:** Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.
## When to Use
@@ -63,16 +61,11 @@ digraph process {
}
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" [shape=diamond];
"Implement directly, verify, commit (no review fanout)" [shape=box];
"More tasks remain?" [shape=diamond];
"Dispatch final code reviewer subagent for entire implementation" [shape=box];
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
"Read plan, extract all tasks with full text, note context, create todos" -> "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)";
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Implement directly, verify, commit (no review fanout)" [label="yes — see Proportionality"];
"Implement directly, verify, commit (no review fanout)" -> "Use superpowers:finishing-a-development-branch";
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="no"];
"Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
@@ -248,7 +241,7 @@ Done!
**Never:**
- Start implementation on main/master branch without explicit user consent
- Skip reviews — sole exception: a plan that is entirely one trivial change (see Proportionality)
- Skip reviews (spec compliance OR code quality)
- Proceed with unfixed issues
- Dispatch multiple implementation subagents in parallel (conflicts)
- Make subagent read plan file (provide full text instead)

View File

@@ -13,6 +13,8 @@ Assume they are a skilled developer, but know almost nothing about our toolset o
**Plans reference the spec; they never restate, paraphrase, or summarize it.** The spec owns the WHAT and WHY — requirements, acceptance criteria, design decisions; the plan owns the HOW — tasks, files, code, commands. Cite it by path in the header and by section where a task needs context. Reference discipline never means skipping the spec: if brainstorming produced one, it exists and the plan cites it. No Placeholders still requires repeating code and commands WITHIN the plan; copying FROM the spec is different: a step that needs a requirement's prose is under-specified — turn it into a concrete action. Snapshotting spec text into the plan hides drift, not prevents it. "Zero context" means each step is mechanically executable, not that the plan repeats the spec.
**Two narrow exceptions to reference discipline** — subagents executing the plan see the plan (or a single task of it), never the spec, so two kinds of spec content travel in the plan itself: the `## Global Constraints` section (the spec's project-wide requirements, exact values copied verbatim) and each task's `**Interfaces:**` block (exact signatures). Copy those values exactly; everything else stays referenced, never restated.
**Announce at start:** "I'm using the writing-plans skill to create the implementation plan."
**Context:** If working in an isolated worktree, it should have been created via the `superpowers:using-git-worktrees` skill at execution time.
@@ -35,6 +37,15 @@ Before defining tasks, map out which files will be created or modified and what
This structure informs the task decomposition. Each task should produce self-contained changes that make sense independently.
## Task Right-Sizing
A task is the smallest unit that carries its own test cycle and is worth a
fresh reviewer's gate. When drawing task boundaries: fold setup,
configuration, scaffolding, and documentation steps into the task whose
deliverable needs them; split only where a reviewer could meaningfully
reject one task while approving its neighbor. Each task ends with an
independently testable deliverable.
## Bite-Sized Task Granularity
**Each step is one action (2-5 minutes):**
@@ -61,6 +72,13 @@ This structure informs the task decomposition. Each task should produce self-con
**Tech Stack:** [Key technologies/libraries]
## Global Constraints
[The spec's project-wide requirements — version floors, dependency limits,
naming and copy rules, platform requirements — one line each, with exact
values copied verbatim from the spec. Every task's requirements implicitly
include this section.]
---
```
@@ -74,6 +92,12 @@ This structure informs the task decomposition. Each task should produce self-con
- Modify: `exact/path/to/existing.py:123-145`
- Test: `tests/exact/path/to/test.py`
**Interfaces:**
- Consumes: [what this task uses from earlier tasks — exact signatures]
- Produces: [what later tasks rely on — exact function names, parameter
and return types. A task's implementer sees only their own task; this
block is how they learn the names and types neighboring tasks use.]
- [ ] **Step 1: Write the failing test**
```python
@@ -149,7 +173,7 @@ After saving the plan, offer execution choice:
**If Subagent-Driven chosen:**
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
- Fresh subagent per task + two-stage review (review fanout scales with the change — see that skill's Proportionality rule)
- Fresh subagent per task + two-stage review
**If Inline Execution chosen:**
- **REQUIRED SUB-SKILL:** Use superpowers:executing-plans