fix(writing-skills): hang backfire mechanism on the separated prohibition-vs-recipe comparison (NEW-4); control comparison stated as trend

fix(writing-skills): scope empirical claims, honest noise reporting, conditionalize micro-test checklist line
Adversarial review findings 1/3/9: the head-to-head result is now scoped to its context (dispatch-prompt guidance) with an explicit micro-test-your- own-case instruction; the nuance-clause result is reported as consistent->noisy rather than 'measurably dilutes'; the checklist line is scoped to behavior-shaping guidance and the micro method no longer assumes raw API access.
2026-06-12 13:49:05 +08:00 · 2026-06-11 11:30:31 -07:00 · 2026-06-11 11:10:33 -07:00 · 2026-06-11 10:20:24 -07:00
3 changed files with 39 additions and 30 deletions
--- a/skills/brainstorming/SKILL.md
+++ b/skills/brainstorming/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: brainstorming
-description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation. The one exception (nothing-to-design) must be EARNED by a tripwire scan first - invoke this skill if the change: adds a file or dependency; touches a schema, API contract, or persisted data (even when the user stated the outcome); deletes or disables working functionality (even when asked); touches security posture at all (auth, sessions, timeouts, permissions, CORS, crypto - even with the exact value stated); alters user-visible behavior beyond the stated change; has more than one plausible reading; or is framed as a feature or project. Only when NO tripwire hits and the outcome is fully specified (e.g. 'add a basic checkbox, nothing fancy' where context leaves nothing to choose): state your scan in one line, then implement directly without invoking this skill."
+description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
 ---

 # Brainstorming Ideas Into Designs
@@ -10,22 +10,12 @@ Help turn ideas into fully formed designs and specs through natural collaborativ
 Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.

 <HARD-GATE>
-Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity, with exactly one exception.
-
-Exception — nothing to design: when the exception in this skill's description applies (zero open design decisions; its tripwire list puts the gate back on), implement directly. TDD and verification-before-completion still apply. Brainstorming exists to surface decisions; when there are none, the user's request IS the design. Any doubt: the gate holds.
+Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
 </HARD-GATE>

 ## Anti-Pattern: "This Is Too Simple To Need A Design"

-Anything with open decisions goes through this process. A todo list, a single-function utility, a data migration — "simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but if anything remains to decide, you MUST present it and get approval. Do not confuse this with the nothing-to-design exception above: "this seems simple, I'll skip the design" is a rationalization whenever decisions exist.
-
-| Excuse | Reality |
-|--------|---------|
-| "The codebase has an established pattern, so nothing is open" | A pattern answers HOW, not WHETHER or WHAT. Those decisions are still open unless the user made them. |
-| "I can infer the obvious choice" | If there is a choice to infer, a decision is open. Invoke. |
-| "The user said keep it simple / nothing fancy" | A hedge describes the solution's size, not the request's completeness. Check what remains undecided, not the tone. |
-| "Asking would waste the user's time" | One design question costs seconds; an unexamined assumption costs a rewrite. |
-| "The user already made that decision — they told me to delete it" | A requested deletion still has consequences the user may not have weighed (working feature, no usage data, alternatives). Surface them first; the tripwire applies to requested deletions. |
+Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.

 ## Checklist

--- a/skills/using-superpowers/SKILL.md
+++ b/skills/using-superpowers/SKILL.md
@@ -12,7 +12,7 @@ If you think there is even a 1% chance a skill might apply to what you are doing

 IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.

-This is not negotiable. This is not optional. You cannot rationalize your way out of this. (The single carve-out: a skill whose own description says it does not apply — see The Rule.)
+This is not negotiable. This is not optional. You cannot rationalize your way out of this.
 </EXTREMELY-IMPORTANT>

 ## Instruction Priority
@@ -49,10 +49,6 @@ Skills speak in actions ("dispatch a subagent", "create a todo", "read a file")

 **Invoke relevant or requested skills BEFORE any response or action.** Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it.

-**Documented exceptions in a skill's own description are authoritative.** When a description itself says the skill does not apply to a request (e.g. brainstorming's nothing-to-design exception), not invoking it is compliance, not rationalization. Any doubt about whether the exception's conditions hold means invoke. Only the skill's description can define such an exception; you cannot infer one.
-
-**An exception skip must be stated, never silent.** Before your first action, write one line naming the exception and the tripwire scan that came up empty — e.g. "Skipping brainstorming per its exception: no security/deletion/schema/new-file tripwires; outcome fully specified." If you did not write the scan line, you did not scan — invoke the skill instead.
-
 ```dot
 digraph skill_flow {
    "User message received" [shape=doublecircle];
@@ -73,12 +69,7 @@ digraph skill_flow {
    "Invoke brainstorming skill" -> "Might any skill apply?";

    "User message received" -> "Might any skill apply?";
-    "Might any skill apply?" -> "Skill's own description exempts this request?" [label="yes, even 1%"];
-    "Skill's own description exempts this request?" [shape=diamond];
-    "Skill's own description exempts this request?" -> "Invoke the skill" [label="no / any doubt"];
-    "Skill's own description exempts this request?" -> "State the one-line tripwire scan, then proceed" [label="yes, clearly"];
-    "State the one-line tripwire scan, then proceed" [shape=box];
-    "State the one-line tripwire scan, then proceed" -> "Respond (including clarifications)";
+    "Might any skill apply?" -> "Invoke the skill" [label="yes, even 1%"];
    "Might any skill apply?" -> "Respond (including clarifications)" [label="definitely not"];
    "Invoke the skill" -> "Announce: 'Using [skill] to [purpose]'";
    "Announce: 'Using [skill] to [purpose]'" -> "Has checklist?";
@@ -103,7 +94,6 @@ These thoughts mean STOP—you're rationalizing:
 | "I remember this skill" | Skills evolve. Read current version. |
 | "This doesn't count as a task" | Action = task. Check for skills. |
 | "The skill is overkill" | Simple things become complex. Use it. |
-| "Too trivial to scan the tripwire list" | The scan is one sentence. Write it or invoke the skill. |
 | "I'll just do this one thing first" | Check BEFORE doing anything. |
 | "This feels productive" | Undisciplined action wastes time. Skills prevent this. |
 | "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. |
@@ -128,6 +118,4 @@ The skill itself tells you which.

 ## User Instructions

-Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows — unless a skill's own description exempts the request (see The Rule above).
-
-Pressure phrasing — "don't ask questions", "make assumptions", "just build it" — changes how you interact (state assumptions instead of asking), not which skills you invoke. Only an instruction that names what to skip ("don't write a plan", "skip TDD") or a description exception skips a workflow step. "Your instruction wins per the priority rules" applied to an unnamed workflow step is a rationalization.
+Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows.
--- a/skills/writing-skills/SKILL.md
+++ b/skills/writing-skills/SKILL.md
@@ -151,8 +151,6 @@ Concrete results

 The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.

-(Negative triggering conditions are still triggering conditions: a description MAY state when the skill does NOT apply — including its tripwires — and per using-superpowers' Rule such description-level exceptions are authoritative, so they must live here, not only in the body. That is scope, not workflow.)
-
 **Why this matters:** Testing revealed that when a description summarizes the skill's workflow, an agent may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused an agent to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).

 When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), the agent correctly read the flowchart and followed the two-stage review process.
@@ -458,10 +456,29 @@ Different skill types need different test approaches:

 **All of these mean: Test before deploying. No exceptions.**

+## Match the Form to the Failure
+
+Before writing guidance, classify the baseline failure. The form that bulletproofs one failure type measurably backfires on another.
+
+| Baseline failure | Right form | Wrong form |
+|---|---|---|
+| Skips/violates a rule under pressure (knows better, does it anyway) | Prohibition + rationalization table + red flags (see Bulletproofing below) | Soft guidance ("prefer...", "consider...") |
+| Complies, but output has the wrong shape (bloated prompt, buried verdict, restated spec) | Positive recipe or contract: state what the output IS — its parts, in order | Prohibition list ("don't restate", "never narrate") |
+| Omits a required element from something they already produce | Structural: REQUIRED field or slot in the template they fill in | Prose reminders near the template |
+| Behavior should depend on a condition | Conditional keyed to an observable predicate ("if the brief exists, reference it") | Unconditional rule + exemption clauses |
+
+**Why prohibitions backfire on shaping problems:** under a competing incentive ("make the prompt self-contained"), agents negotiate with "don't X". In head-to-head wording tests on dispatch-prompt guidance, the prohibition arm produced clearly more of the unwanted content than the recipe arm (fully separated distributions), and trended worse than even the no-guidance control — micro-test your own case rather than assuming, but never reach for the prohibition by default. A recipe leaves nothing to negotiate: the output matches the stated shape or it doesn't.
+
+**Rules for whichever form you pick:**
+- **No nuance clauses.** "Don't X unless it matters" reopens the negotiation — appending a single nuance clause to a winning recipe degraded it from consistent to noisy in the same wording tests. Express a real exception as its own conditional on an observable predicate.
+- **Exemption clauses don't scope.** "This limit doesn't apply to code blocks" still suppresses code blocks. If part of the output must be exempt, restructure so the rule can't reach it.
+
 ## Bulletproofing Skills Against Rationalization

 Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.

+**Scope:** this toolkit is for discipline failures — an agent that knows the rule and skips it under pressure. For wrong-shaped output or omitted elements, prohibition-based bulletproofing backfires; use the forms in Match the Form to the Failure instead.
+
 **Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.

 ### Close Every Loophole Explicitly
@@ -555,6 +572,18 @@ Run same scenarios WITH skill. Agent should now comply.

 Agent found new rationalization? Add explicit counter. Re-test until bulletproof.

+### Micro-Test Wording Before Full Scenarios
+
+Full pressure-scenario runs are the final gate, but they are slow and expensive per iteration. Verify the wording itself first with micro-tests:
+
+1. **One fresh-context sample per call** — a raw API call, or a single-shot subagent if you don't have API access. System prompt = the realistic context the guidance will live in (the full skill or prompt template, not the guidance in isolation); user message = a task that tempts the failure.
+2. **Always include a no-guidance control.** If the control doesn't exhibit the failure, there is nothing to fix — stop, don't author the guidance.
+3. **5+ reps per variant.** Single samples lie.
+4. **Manually read every flagged match.** Score programmatically if you like, but template echoes and quoted counter-examples masquerade as hits; automated counts alone overstate both failure and success.
+5. **Variance is a metric.** When guidance lands, reps converge on the same shape. Five different interpretations across five reps means the wording isn't binding — tighten the form before adding words.
+
+Micro-tests verify wording; they do not replace pressure scenarios for discipline skills.
+
 **Testing methodology:** See [testing-skills-with-subagents.md](testing-skills-with-subagents.md) for the complete testing methodology:
 - How to write pressure scenarios
 - Pressure types (time, sunk cost, authority, exhaustion)
@@ -612,6 +641,8 @@ Deploying untested skills = deploying untested code. It's a violation of quality
 - [ ] Keywords throughout for search (errors, symptoms, tools)
 - [ ] Clear overview with core principle
 - [ ] Address specific baseline failures identified in RED
+- [ ] Guidance form matches the failure type (see Match the Form to the Failure)
+- [ ] For behavior-shaping guidance: wording micro-tested against a no-guidance control (5+ reps, every flagged match read manually) — N/A for pure reference skills
 - [ ] Code inline OR link to separate file
 - [ ] One excellent example (not multi-language)
 - [ ] Run scenarios WITH skill - verify agents now comply