refine(skills): reconcile flowchart + EXTREMELY-IMPORTANT with the exception rule; writing-skills carve-out

Staff-review findings (4-reviewer panel): - The skill_flow digraph still routed "yes, even 1%" straight to invoke with no exception branch — and this stack's own evidence says agents follow flowcharts literally. The flow now passes through "Skill's own description exempts this request?" with no/any-doubt → invoke. - The <EXTREMELY-IMPORTANT> block ("you cannot rationalize your way out of this") read unconditional; one parenthetical defers to The Rule's single carve-out without weakening the block. - Trimmed the redundant "the description defines the skill's scope" clause from The Rule paragraph. - writing-skills' "descriptions must not carry process" doctrine would have had a future editor strip the brainstorming exception and silently regress the cost evals; it now distinguishes negative triggering conditions (scope — allowed and, per the routing rule, required at the description) from workflow summaries (still forbidden). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
fix(skills): description-level exceptions are authoritative in the routing rule (SUP-333 #4 )
2026-06-11 13:19:05 +08:00 · 2026-06-10 18:54:15 -07:00 · 2026-06-10 18:54:15 -07:00 · 2026-06-10 18:54:14 -07:00 · 2026-06-10 18:27:46 -07:00 · 2026-06-10 18:26:12 -07:00
6 changed files with 40 additions and 10 deletions
--- a/skills/brainstorming/SKILL.md
+++ b/skills/brainstorming/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: brainstorming
-description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
+description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation. The one exception: a request that leaves zero design decisions open needs no design - implement it directly without invoking this skill (e.g. 'add a basic checkbox, nothing fancy' where the context leaves nothing to choose). Decisions ARE open - so invoke - if the change adds a file or dependency, touches a schema, API contract, or persisted data (even when the user stated the outcome), deletes or disables working functionality (even when the deletion is exactly what was asked), touches security posture at all (auth, sessions, permissions, CORS, crypto - even with the exact value stated), alters user-visible behavior beyond the stated change, has more than one plausible reading, or the user frames it as a feature or project to think through."
 ---

 # Brainstorming Ideas Into Designs
@@ -10,12 +10,22 @@ Help turn ideas into fully formed designs and specs through natural collaborativ
 Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.

 <HARD-GATE>
-Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
+Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity, with exactly one exception.
+
+Exception — nothing to design: when the exception in this skill's description applies (zero open design decisions; its tripwire list puts the gate back on), implement directly. TDD and verification-before-completion still apply. Brainstorming exists to surface decisions; when there are none, the user's request IS the design. Any doubt: the gate holds.
 </HARD-GATE>

 ## Anti-Pattern: "This Is Too Simple To Need A Design"

-Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
+Anything with open decisions goes through this process. A todo list, a single-function utility, a data migration — "simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but if anything remains to decide, you MUST present it and get approval. Do not confuse this with the nothing-to-design exception above: "this seems simple, I'll skip the design" is a rationalization whenever decisions exist.
+
+| Excuse | Reality |
+|--------|---------|
+| "The codebase has an established pattern, so nothing is open" | A pattern answers HOW, not WHETHER or WHAT. Those decisions are still open unless the user made them. |
+| "I can infer the obvious choice" | If there is a choice to infer, a decision is open. Invoke. |
+| "The user said keep it simple / nothing fancy" | A hedge describes the solution's size, not the request's completeness. Check what remains undecided, not the tone. |
+| "Asking would waste the user's time" | One design question costs seconds; an unexamined assumption costs a rewrite. |
+| "The user already made that decision — they told me to delete it" | A requested deletion still has consequences the user may not have weighed (working feature, no usage data, alternatives). Surface them first; the tripwire applies to requested deletions. |

 ## Checklist

--- a/skills/subagent-driven-development/SKILL.md
+++ b/skills/subagent-driven-development/SKILL.md
@@ -11,6 +11,8 @@ Execute plan by dispatching fresh subagent per task, with two-stage review after

 **Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration

+**Proportionality:** Review fanout scales with the change. When the entire plan is one trivial, fully-specified mechanical change — a log statement, a typo fix, a constant bump with no security or behavioral consequences — implement it directly (or with a single implementer subagent), verify per superpowers:verification-before-completion (run the relevant command, confirm output), commit, and skip all review subagents, including the final reviewer: three review dispatches cost more than a one-line diff. Trivial is a property of the diff — it changes no logic, no control flow, and nothing security-relevant — not of the plan's self-description. Any doubt means not trivial: run the full pipeline. Within a multi-task plan, never skip reviews, regardless of task size.
+
 **Continuous execution:** Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.

 ## When to Use
@@ -61,11 +63,16 @@ digraph process {
    }

    "Read plan, extract all tasks with full text, note context, create todos" [shape=box];
+    "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" [shape=diamond];
+    "Implement directly, verify, commit (no review fanout)" [shape=box];
    "More tasks remain?" [shape=diamond];
    "Dispatch final code reviewer subagent for entire implementation" [shape=box];
    "Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];

-    "Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
+    "Read plan, extract all tasks with full text, note context, create todos" -> "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)";
+    "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Implement directly, verify, commit (no review fanout)" [label="yes — see Proportionality"];
+    "Implement directly, verify, commit (no review fanout)" -> "Use superpowers:finishing-a-development-branch";
+    "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="no"];
    "Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
    "Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
    "Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
@@ -86,6 +93,10 @@ digraph process {
 }
 ```

+## Spec Context
+
+If the plan's header cites a spec (`**Spec:** <path>`), read it once during plan extraction. Plans reference requirements rather than restating them — when a task cites a spec section, paste that section's text into the implementer and spec-reviewer prompts along with the task text. Implementer subagents never read the spec file themselves; the spec reviewer may additionally read it at the cited path (its prompt says so).
+
 ## Model Selection

 Use the least powerful model that can handle each role to conserve cost and increase speed.
@@ -237,7 +248,7 @@ Done!

 **Never:**
 - Start implementation on main/master branch without explicit user consent
- Skip reviews (spec compliance OR code quality)
+- Skip reviews — sole exception: a plan that is entirely one trivial change (see Proportionality)
 - Proceed with unfixed issues
 - Dispatch multiple implementation subagents in parallel (conflicts)
 - Make subagent read plan file (provide full text instead)
--- a/skills/subagent-driven-development/implementer-prompt.md
+++ b/skills/subagent-driven-development/implementer-prompt.md
@@ -12,6 +12,8 @@ Subagent (general-purpose):

    [FULL TEXT of task from plan - paste it here, don't make subagent read file]

+    [If the task cites spec sections, paste the cited sections' text here too]
+
    ## Context

    [Scene-setting: where this fits, dependencies, architectural context]
--- a/skills/subagent-driven-development/spec-reviewer-prompt.md
+++ b/skills/subagent-driven-development/spec-reviewer-prompt.md
@@ -12,7 +12,7 @@ Subagent (general-purpose):

    ## What Was Requested

-    [FULL TEXT of task requirements]
+    [FULL TEXT of task requirements, including the text of any spec sections the task cites]

    ## What Implementer Claims They Built

@@ -28,7 +28,7 @@ Subagent (general-purpose):
    git diff [BASE_SHA]..[HEAD_SHA]
    ```

-    Only read files in this diff. Do not crawl the broader codebase.
+    Only read files in this diff. Do not crawl the broader codebase. (One exception: if the requirements cite a spec document, you may read that spec at its cited path.)

    ## Read-Only Review

--- a/skills/using-superpowers/SKILL.md
+++ b/skills/using-superpowers/SKILL.md
@@ -12,7 +12,7 @@ If you think there is even a 1% chance a skill might apply to what you are doing

 IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.

-This is not negotiable. This is not optional. You cannot rationalize your way out of this.
+This is not negotiable. This is not optional. You cannot rationalize your way out of this. (The single carve-out: a skill whose own description says it does not apply — see The Rule.)
 </EXTREMELY-IMPORTANT>

 ## Instruction Priority
@@ -49,6 +49,8 @@ Skills speak in actions ("dispatch a subagent", "create a todo", "read a file")

 **Invoke relevant or requested skills BEFORE any response or action.** Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it.

+**Documented exceptions in a skill's own description are authoritative.** When a description itself says the skill does not apply to a request (e.g. brainstorming's nothing-to-design exception), not invoking it is compliance, not rationalization. Any doubt about whether the exception's conditions hold means invoke. Only the skill's description can define such an exception; you cannot infer one.
+
 ```dot
 digraph skill_flow {
    "User message received" [shape=doublecircle];
@@ -69,7 +71,10 @@ digraph skill_flow {
    "Invoke brainstorming skill" -> "Might any skill apply?";

    "User message received" -> "Might any skill apply?";
-    "Might any skill apply?" -> "Invoke the skill" [label="yes, even 1%"];
+    "Might any skill apply?" -> "Skill's own description exempts this request?" [label="yes, even 1%"];
+    "Skill's own description exempts this request?" [shape=diamond];
+    "Skill's own description exempts this request?" -> "Invoke the skill" [label="no / any doubt"];
+    "Skill's own description exempts this request?" -> "Respond (including clarifications)" [label="yes, clearly"];
    "Might any skill apply?" -> "Respond (including clarifications)" [label="definitely not"];
    "Invoke the skill" -> "Announce: 'Using [skill] to [purpose]'";
    "Announce: 'Using [skill] to [purpose]'" -> "Has checklist?";
@@ -118,4 +123,4 @@ The skill itself tells you which.

 ## User Instructions

-Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows.
+Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows — unless a skill's own description exempts the request (see The Rule above).
--- a/skills/writing-skills/SKILL.md
+++ b/skills/writing-skills/SKILL.md
@@ -151,6 +151,8 @@ Concrete results

 The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.

+(Negative triggering conditions are still triggering conditions: a description MAY state when the skill does NOT apply — including its tripwires — and per using-superpowers' Rule such description-level exceptions are authoritative, so they must live here, not only in the body. That is scope, not workflow.)
+
 **Why this matters:** Testing revealed that when a description summarizes the skill's workflow, an agent may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused an agent to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).

 When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), the agent correctly read the flowchart and followed the two-stage review process.
Author	SHA1	Message	Date
Drew Ritter	36e289ea8b	refine(skills): reconcile flowchart + EXTREMELY-IMPORTANT with the exception rule; writing-skills carve-out Staff-review findings (4-reviewer panel): - The skill_flow digraph still routed "yes, even 1%" straight to invoke with no exception branch — and this stack's own evidence says agents follow flowcharts literally. The flow now passes through "Skill's own description exempts this request?" with no/any-doubt → invoke. - The <EXTREMELY-IMPORTANT> block ("you cannot rationalize your way out of this") read unconditional; one parenthetical defers to The Rule's single carve-out without weakening the block. - Trimmed the redundant "the description defines the skill's scope" clause from The Rule paragraph. - writing-skills' "descriptions must not carry process" doctrine would have had a future editor strip the brainstorming exception and silently regress the cost evals; it now distinguishes negative triggering conditions (scope — allowed and, per the routing rule, required at the description) from workflow summaries (still forbidden). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:54:15 -07:00
Drew Ritter	aecb6c34e6	fix(skills): description-level exceptions are authoritative in the routing rule (SUP-333 #4 ) Adversarial review findings D1/D2: the 1%-chance invocation rule and the "Add X doesn't mean skip workflows" line contradicted the new brainstorming description exception in both directions — a compliant agent re-imposes the cost failure (invocation itself is the measured cost event), while a cost-optimizing agent could treat any skip as sanctioned. The routing skill now states: a documented exception in a skill's own description defines that skill's scope (compliance, not rationalization); any doubt about the exception's conditions means invoke; and only the description can define one — agents cannot infer exceptions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:54:15 -07:00
Drew Ritter	87ddface1a	refine(skills): requested deletions still trip the gate Eval-caught leak (cost-remove-export-boundary-claude, first run): the agent reasoned "the user already decided the deletion, so no design decision is open" and silently removed a working feature — reading the tripwires as indicators of open decisions rather than unconditional re-gates. The deletion tripwire now carries the same rider as the security one ("even when the deletion is exactly what was asked"), and the rationalization table counters the exact quoted escape. Description: 950/1024 chars. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:54:14 -07:00
Drew Ritter	4d8ff907c4	refine(skills): single-source the triviality tripwires at the description Staff-review findings (4-reviewer panel): - The tripwire list existed twice in this file (description + HARD-GATE) and the copies had already drifted after one editing round — the framing tripwire and the security qualifier lived only in the HARD-GATE, which the skip decision never reads (our own GREEN-attempt-1 evidence). The description is now the single authoritative list; the HARD-GATE exception defers to it. - Security-posture fix: the "beyond the literally stated value" escape no longer applies to security — touching auth, sessions, permissions, CORS, or crypto re-gates EVEN when the value is exactly as stated (the harm of "set CORS to *" IS the stated value). User-visible behavior keeps the beyond-the-stated-change scope (a requested checkbox is the stated change; that is the point of the exception). - The framing tripwire moves into the description where it can act. - Anti-pattern final clause cut (was the 4th in-file statement of the exception's condition). - Description: 886/1024 chars, YAML-validated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:27:46 -07:00
Drew Ritter	a8406dc678	harden(skills): brainstorming exception gets routing-layer tripwires + rationalization counters Adversarial review findings (A1-A7, D3): - BLOCKER A1: the re-gating tripwires lived only in the HARD-GATE, but the skip decision happens at the description (our own GREEN-attempt-1 evidence). The description now carries the tripwires: adds a file/dependency, touches schema/API/persisted data, deletes or disables anything, alters behavior/security posture, >1 plausible reading. - A2: "a schema/API/data question" was defeated by "the user answered the question"; now touch-based ("even if the user stated the desired outcome"). - A3: destructive changes and behavior/security-visible changes had no tripwire (pure removals were structurally invisible); both added. "a literal config value change" example now qualified ("with no security or behavioral consequences"). - A4: the checkbox example no longer teaches hedge-phrase = fully specified ("where the context leaves nothing to choose"). - A5: "EVERY project regardless of perceived simplicity" now ends "with exactly one exception below" instead of contradicting it. - A6: rationalization table added (codebase-pattern, infer-the-obvious, hedge-phrase, asking-wastes-time). - A7: anti-pattern opener is a claim again ("Anything with open decisions goes through this process"). - D3: exception states TDD and verification-before-completion still apply, so the fast path does not read as zero-oversight. Description: 689 chars (limit 1024), YAML-validated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:26:12 -07:00
Drew Ritter	3985dd7711	fix(skills): brainstorming gate exempts requests with nothing to design (SUP-333 #3 ) The HARD-GATE ("EVERY project regardless of perceived simplicity") plus the anti-pattern list naming "a config change" made design+approval mandatory even for fully-specified trivial asks — all 6 agents in the 2026-06-09 quorum sweep ran a multi-option design flow for "a basic checkbox, nothing fancy" (cost-checkbox-over-trigger failed 6/6). Two layers, because routing happens before skill content is read (GREEN attempt 1 proved it: the agent invoked the skill on the description's mandate and only then saw the in-skill exception, and the invocation itself is the cost event): - description: carve-out visible at skill-selection time — zero open design decisions, fully specified trivial change → implement directly without invoking. - HARD-GATE: matching exception with objective re-gating tripwires (new file/dependency, schema/API/data question, >1 plausible interpretation, user frames it as a feature/project), and the anti-pattern section now distinguishes "seems simple" (a rationalization when decisions exist) from "contains every decision" (the exception). "A config change" moves from the all-of-them list to the exception's example. The repo's acceptance test ("Let's make a react todo list" must auto-trigger brainstorming) is unaffected: a react todo list leaves many decisions open and todo lists remain in the anti-pattern list. TDD evidence (quorum): - RED: cost-checkbox-over-trigger fails 6/6 agents (batch 2026-06-09); GREEN attempt 1 with in-skill exception only: still fail (invoked via description, then asked a clarifying question) - GREEN: cost-checkbox-over-trigger-claude-20260610T004320Z-a30e pass — no brainstorming invocation, agent cited the exception verbatim, checkbox landed in 31s - Canary: cost-spec-plan-duplication-claude-20260610T004506Z-22ea pass — a real feature still triggers the full brainstorm→spec→plan flow (and the stacked writing-plans reference discipline holds) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:26:11 -07:00
Drew Ritter	60f617489c	refine(skills): staff-review round — fix spec-access contradiction, qualify constant bumps Staff-review findings (4-reviewer panel): - CONTRADICTION FIX: Spec Context said "Subagents never read the spec file themselves" while spec-reviewer-prompt grants exactly that access. Now: implementers never read it; the spec reviewer may, at the cited path. - "a constant bump" was an unqualified trivial example — a one-line BCRYPT_ROUNDS or session-TTL change is a security-posture change; now qualified "with no security or behavioral consequences" (matching brainstorming's config-change qualifier). The diff-property definition adds "nothing security-relevant". - Proportionality rewritten 146→~115 words (house style; one statement of the multi-task containment instead of two). - Red Flags Never-line trimmed 33→14 words (pointer to Proportionality instead of third in-file restatement). - Prompt-template rationale tails cut (the controller just read Spec Context; subagents need the pasted text, not the policy rationale). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:26:11 -07:00
Drew Ritter	a6ce936ac1	harden(skills): SDD proportionality resists over-use; pipeline consumes cited specs Adversarial + consistency review findings (B1, B2, B3, B5, F1): - Red Flags line read literally licensed skipping reviews on trivial tasks INSIDE multi-task plans; now states the only exception is a whole-plan trivial change and never-skip within multi-task plans. - "a one-line edit" example blessed one-line behavioral changes (e.g. adding "\|\| user.isOwner"); dropped. Trivial is now defined as a property of the diff (no logic/control-flow/behavior change), not of the plan's self-description. The "nothing for review to catch" justification proved too much; replaced with the cost argument. - "verify it" was undefined on the trivial path; now concrete (run tests/command, confirm output, verification-before-completion). - Flowchart diamond now matches the prose: "fully-specified" + "any doubt = no" (the failing agents execute the flowchart literally). - New Spec Context section + prompt-template updates: the controller reads the spec cited in the plan header and pastes cited sections into implementer/spec-reviewer prompts; the spec reviewer's diff-only rule gets a spec-document exception. Without this, the stack's reference-not-restate rule starves the SDD pipeline of requirements. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:24:57 -07:00
Drew Ritter	3c9870febe	fix(skills): SDD review fanout scales with the change (SUP-333 #2 ) subagent-driven-development mandated implementer + two-stage review + final reviewer unconditionally — agy and opencode each dispatched 4 subagents for a one-line console.log in the 2026-06-09 quorum sweep (cost-trivial-task-review-fanout), and the agents that passed did so only by disobeying the skill. - Proportionality rule: when the entire plan is one trivial, fully-specified mechanical change, implement directly, verify, commit — no review fanout. "When in doubt, it is not trivial." Within a multi-task plan the full pipeline still applies to every task regardless of size. - Flowchart gets the trivial-exit diamond (the failing agents follow the flowchart literally; prose alone would not redirect them). - Red Flags "never skip reviews" amended to reference the exception so the skill does not contradict itself. TDD evidence (quorum): - RED: agy 025324Z + opencode batches — 4 dispatches for 1 line - GREEN: cost-trivial-task-review-fanout-opencode-20260610T002518Z-f3f5 pass — 0 dispatches, $0.04, change landed on main checkout - Canary: sdd-rejects-extra-features-claude-20260610T002901Z-458a pass — multi-task plan still runs implementer + two-stage review per task (tool-called Agent ✓, spec reviewer as YAGNI gate after each task) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:24:57 -07:00