Compare commits

..

7 Commits

Author SHA1 Message Date
Drew Ritter
60f617489c refine(skills): staff-review round — fix spec-access contradiction, qualify constant bumps
Staff-review findings (4-reviewer panel):
- CONTRADICTION FIX: Spec Context said "Subagents never read the spec
  file themselves" while spec-reviewer-prompt grants exactly that
  access. Now: implementers never read it; the spec reviewer may, at
  the cited path.
- "a constant bump" was an unqualified trivial example — a one-line
  BCRYPT_ROUNDS or session-TTL change is a security-posture change;
  now qualified "with no security or behavioral consequences"
  (matching brainstorming's config-change qualifier). The diff-property
  definition adds "nothing security-relevant".
- Proportionality rewritten 146→~115 words (house style; one statement
  of the multi-task containment instead of two).
- Red Flags Never-line trimmed 33→14 words (pointer to Proportionality
  instead of third in-file restatement).
- Prompt-template rationale tails cut (the controller just read Spec
  Context; subagents need the pasted text, not the policy rationale).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:26:11 -07:00
Drew Ritter
a6ce936ac1 harden(skills): SDD proportionality resists over-use; pipeline consumes cited specs
Adversarial + consistency review findings (B1, B2, B3, B5, F1):
- Red Flags line read literally licensed skipping reviews on trivial
  tasks INSIDE multi-task plans; now states the only exception is a
  whole-plan trivial change and never-skip within multi-task plans.
- "a one-line edit" example blessed one-line behavioral changes
  (e.g. adding "|| user.isOwner"); dropped. Trivial is now defined as
  a property of the diff (no logic/control-flow/behavior change), not
  of the plan's self-description. The "nothing for review to catch"
  justification proved too much; replaced with the cost argument.
- "verify it" was undefined on the trivial path; now concrete (run
  tests/command, confirm output, verification-before-completion).
- Flowchart diamond now matches the prose: "fully-specified" + "any
  doubt = no" (the failing agents execute the flowchart literally).
- New Spec Context section + prompt-template updates: the controller
  reads the spec cited in the plan header and pastes cited sections
  into implementer/spec-reviewer prompts; the spec reviewer's
  diff-only rule gets a spec-document exception. Without this, the
  stack's reference-not-restate rule starves the SDD pipeline of
  requirements.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:24:57 -07:00
Drew Ritter
3c9870febe fix(skills): SDD review fanout scales with the change (SUP-333 #2)
subagent-driven-development mandated implementer + two-stage review +
final reviewer unconditionally — agy and opencode each dispatched 4
subagents for a one-line console.log in the 2026-06-09 quorum sweep
(cost-trivial-task-review-fanout), and the agents that passed did so
only by disobeying the skill.

- Proportionality rule: when the entire plan is one trivial,
  fully-specified mechanical change, implement directly, verify,
  commit — no review fanout. "When in doubt, it is not trivial."
  Within a multi-task plan the full pipeline still applies to every
  task regardless of size.
- Flowchart gets the trivial-exit diamond (the failing agents follow
  the flowchart literally; prose alone would not redirect them).
- Red Flags "never skip reviews" amended to reference the exception so
  the skill does not contradict itself.

TDD evidence (quorum):
- RED: agy 025324Z + opencode batches — 4 dispatches for 1 line
- GREEN: cost-trivial-task-review-fanout-opencode-20260610T002518Z-f3f5
  pass — 0 dispatches, $0.04, change landed on main checkout
- Canary: sdd-rejects-extra-features-claude-20260610T002901Z-458a pass —
  multi-task plan still runs implementer + two-stage review per task
  (tool-called Agent ✓, spec reviewer as YAGNI gate after each task)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:24:57 -07:00
Drew Ritter
81874ec5b1 refine(skills): staff-review round — trim reference rule, close executing-plans spec gap
Staff-review findings (4-reviewer panel):
- Reference paragraph rewritten 170→123 words preserving every
  behavioral condition (paraphrase/summarize coverage, no-skip guard,
  WHAT-WHY/HOW split, No Placeholders boundary, drift counter,
  zero-context rescope); fixes the "(brainstorming did)" syntax.
- **Spec:** header bracket: cut the never-skip sermon duplicated from
  the Overview (same loaded document); the conditional none-branch
  stays.
- executing-plans Step 1 now reads the spec the plan cites — plans are
  no longer self-contained, and the non-subagent execution path was
  never told (the eval only exercised the SDD consumer).
- writing-plans plan-location preference line gets the same
  existing-dir-is-not-a-preference guard as the spec path.
- brainstorming: deduplicate the docs/specs/ prohibition (step 6
  parenthetical stays; After-the-Design bullet was the second
  statement in one file).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:24:57 -07:00
Drew Ritter
49a91fd404 harden(skills): no-spec branch cannot be used to skip writing the spec
Eval-caught regression: the no-spec branch added to the **Spec:**
header gave the agent a sanctioned path to skip the spec doc entirely
("avoiding duplication by skipping the spec" —
cost-spec-plan-duplication-claude-20260610T213934Z-8e5b, fail). The
branch is now scoped: if brainstorming happened the spec exists and
must be cited; "none — requirements:" applies only when requirements
arrived conversationally and no spec doc was ever produced. The
reference-discipline paragraph states the same rule up front.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:49:01 -07:00
Drew Ritter
64d194a08e harden(skills): close paraphrase/no-spec/preference loopholes in plan reference rule
Adversarial review findings (C1, C2, C3, C5, A8, F3):
- "never restate" did not cover paraphrase/summary — the actual failure
  mode in the RED evidence; now "never restate, paraphrase, or summarize".
- The No Placeholders intra-plan repetition mandate gave a symmetric
  argument for re-inlining the spec; the rule now draws the line:
  repetition WITHIN the plan is required, copying FROM the spec is not.
- Drift argument was invertible ("snapshot to avoid drift"); now states
  snapshots hide drift.
- **Spec:** header gets a no-spec branch (state requirements once in
  the header, not per task) instead of inviting "no spec, rule is moot".
- Brainstorming path bullet: an existing differently-named docs dir is
  not a "user preference" override.
- Execution Handoff now notes review fanout scales (forward-ref to
  SDD's Proportionality rule) instead of promising unconditional
  two-stage review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 14:34:56 -07:00
Drew Ritter
fa07663322 fix(skills): plans reference the spec instead of restating it (SUP-333 #1)
writing-plans told agents to "document everything they need to know"
assuming zero context — every agent in the 2026-06-09 six-agent quorum
sweep obeyed and restated the entire spec inline in the plan
(cost-spec-plan-duplication failed 5/5 completed agents; pi's plan was
683 lines of duplicated spec).

- writing-plans: state the division of labor — spec owns WHAT/WHY,
  plan owns HOW; cite the spec by path/section, never restate it.
  "Zero context" means mechanically executable steps, not duplication.
  Add a **Spec:** line to the plan header template.
- brainstorming: close the path loophole the re-run exposed — claude
  shortened docs/superpowers/specs/ to docs/specs/ in 2/2 runs; both
  path mentions now explicitly forbid the shortening.

TDD evidence (quorum):
- RED: batch-20260609T023452Z-68aa et al — 5/5 agents fail
- GREEN: cost-spec-plan-duplication-claude-20260609T234142Z-9625 pass
  (plan: "this plan does not restate them" + spec cited by path;
  both docs in docs/superpowers/)
- Canary: triggering-writing-plans-claude pass (skill still fires)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-09 16:52:21 -07:00
3 changed files with 11 additions and 4 deletions

2
evals

Submodule evals updated: f8e5a9949f...ff3ee83f94

View File

@@ -11,6 +11,8 @@ Execute plan by dispatching fresh subagent per task, with two-stage review after
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
**Proportionality:** Review fanout scales with the change. When the entire plan is one trivial, fully-specified mechanical change — a log statement, a typo fix, a constant bump with no security or behavioral consequences — implement it directly (or with a single implementer subagent), verify per superpowers:verification-before-completion (run the relevant command, confirm output), commit, and skip all review subagents, including the final reviewer: three review dispatches cost more than a one-line diff. Trivial is a property of the diff — it changes no logic, no control flow, and nothing security-relevant — not of the plan's self-description. Any doubt means not trivial: run the full pipeline. Within a multi-task plan, never skip reviews, regardless of task size.
**Continuous execution:** Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.
## When to Use
@@ -61,11 +63,16 @@ digraph process {
}
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" [shape=diamond];
"Implement directly, verify, commit (no review fanout)" [shape=box];
"More tasks remain?" [shape=diamond];
"Dispatch final code reviewer subagent for entire implementation" [shape=box];
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
"Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
"Read plan, extract all tasks with full text, note context, create todos" -> "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)";
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Implement directly, verify, commit (no review fanout)" [label="yes — see Proportionality"];
"Implement directly, verify, commit (no review fanout)" -> "Use superpowers:finishing-a-development-branch";
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="no"];
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
@@ -241,7 +248,7 @@ Done!
**Never:**
- Start implementation on main/master branch without explicit user consent
- Skip reviews (spec compliance OR code quality)
- Skip reviews — sole exception: a plan that is entirely one trivial change (see Proportionality)
- Proceed with unfixed issues
- Dispatch multiple implementation subagents in parallel (conflicts)
- Make subagent read plan file (provide full text instead)

View File

@@ -149,7 +149,7 @@ After saving the plan, offer execution choice:
**If Subagent-Driven chosen:**
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
- Fresh subagent per task + two-stage review
- Fresh subagent per task + two-stage review (review fanout scales with the change — see that skill's Proportionality rule)
**If Inline Execution chosen:**
- **REQUIRED SUB-SKILL:** Use superpowers:executing-plans