mirror of
https://github.com/obra/superpowers.git
synced 2026-06-13 06:09:04 +08:00
Compare commits
1 Commits
writing-sk
...
drew/sup-3
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f9d11b3c2f |
@@ -11,6 +11,8 @@ Execute plan by dispatching fresh subagent per task, with two-stage review after
|
|||||||
|
|
||||||
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
|
**Core principle:** Fresh subagent per task + two-stage review (spec then quality) = high quality, fast iteration
|
||||||
|
|
||||||
|
**Proportionality:** Review fanout scales with the change. When the entire plan is one trivial, fully-specified mechanical change — a log statement, a typo fix, a constant bump with no security or behavioral consequences — implement it directly (or with a single implementer subagent), verify per superpowers:verification-before-completion (run the relevant command, confirm output), commit, and skip all review subagents, including the final reviewer: three review dispatches cost more than a one-line diff. Trivial is a property of the diff — it changes no logic, no control flow, and nothing security-relevant — not of the plan's self-description. Any doubt means not trivial: run the full pipeline. Within a multi-task plan, never skip reviews, regardless of task size.
|
||||||
|
|
||||||
**Continuous execution:** Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.
|
**Continuous execution:** Do not pause to check in with your human partner between tasks. Execute all tasks from the plan without stopping. The only reasons to stop are: BLOCKED status you cannot resolve, ambiguity that genuinely prevents progress, or all tasks complete. "Should I continue?" prompts and progress summaries waste their time — they asked you to execute the plan, so execute it.
|
||||||
|
|
||||||
## When to Use
|
## When to Use
|
||||||
@@ -61,11 +63,16 @@ digraph process {
|
|||||||
}
|
}
|
||||||
|
|
||||||
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
|
"Read plan, extract all tasks with full text, note context, create todos" [shape=box];
|
||||||
|
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" [shape=diamond];
|
||||||
|
"Implement directly, verify, commit (no review fanout)" [shape=box];
|
||||||
"More tasks remain?" [shape=diamond];
|
"More tasks remain?" [shape=diamond];
|
||||||
"Dispatch final code reviewer subagent for entire implementation" [shape=box];
|
"Dispatch final code reviewer subagent for entire implementation" [shape=box];
|
||||||
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
|
"Use superpowers:finishing-a-development-branch" [shape=box style=filled fillcolor=lightgreen];
|
||||||
|
|
||||||
"Read plan, extract all tasks with full text, note context, create todos" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
"Read plan, extract all tasks with full text, note context, create todos" -> "Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)";
|
||||||
|
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Implement directly, verify, commit (no review fanout)" [label="yes — see Proportionality"];
|
||||||
|
"Implement directly, verify, commit (no review fanout)" -> "Use superpowers:finishing-a-development-branch";
|
||||||
|
"Entire plan = one trivial, fully-specified mechanical change? (any doubt = no)" -> "Dispatch implementer subagent (./implementer-prompt.md)" [label="no"];
|
||||||
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
|
"Dispatch implementer subagent (./implementer-prompt.md)" -> "Implementer subagent asks questions?";
|
||||||
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
|
"Implementer subagent asks questions?" -> "Answer questions, provide context" [label="yes"];
|
||||||
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
"Answer questions, provide context" -> "Dispatch implementer subagent (./implementer-prompt.md)";
|
||||||
@@ -237,7 +244,7 @@ Done!
|
|||||||
|
|
||||||
**Never:**
|
**Never:**
|
||||||
- Start implementation on main/master branch without explicit user consent
|
- Start implementation on main/master branch without explicit user consent
|
||||||
- Skip reviews (spec compliance OR code quality)
|
- Skip reviews — sole exception: a plan that is entirely one trivial change (see Proportionality)
|
||||||
- Proceed with unfixed issues
|
- Proceed with unfixed issues
|
||||||
- Dispatch multiple implementation subagents in parallel (conflicts)
|
- Dispatch multiple implementation subagents in parallel (conflicts)
|
||||||
- Make subagent read plan file (provide full text instead)
|
- Make subagent read plan file (provide full text instead)
|
||||||
|
|||||||
@@ -145,7 +145,7 @@ After saving the plan, offer execution choice:
|
|||||||
|
|
||||||
**If Subagent-Driven chosen:**
|
**If Subagent-Driven chosen:**
|
||||||
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
|
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
|
||||||
- Fresh subagent per task + two-stage review
|
- Fresh subagent per task + two-stage review (review fanout scales with the change — see that skill's Proportionality rule)
|
||||||
|
|
||||||
**If Inline Execution chosen:**
|
**If Inline Execution chosen:**
|
||||||
- **REQUIRED SUB-SKILL:** Use superpowers:executing-plans
|
- **REQUIRED SUB-SKILL:** Use superpowers:executing-plans
|
||||||
|
|||||||
@@ -456,29 +456,10 @@ Different skill types need different test approaches:
|
|||||||
|
|
||||||
**All of these mean: Test before deploying. No exceptions.**
|
**All of these mean: Test before deploying. No exceptions.**
|
||||||
|
|
||||||
## Match the Form to the Failure
|
|
||||||
|
|
||||||
Before writing guidance, classify the baseline failure. The form that bulletproofs one failure type measurably backfires on another.
|
|
||||||
|
|
||||||
| Baseline failure | Right form | Wrong form |
|
|
||||||
|---|---|---|
|
|
||||||
| Skips/violates a rule under pressure (knows better, does it anyway) | Prohibition + rationalization table + red flags (see Bulletproofing below) | Soft guidance ("prefer...", "consider...") |
|
|
||||||
| Complies, but output has the wrong shape (bloated prompt, buried verdict, restated spec) | Positive recipe or contract: state what the output IS — its parts, in order | Prohibition list ("don't restate", "never narrate") |
|
|
||||||
| Omits a required element from something they already produce | Structural: REQUIRED field or slot in the template they fill in | Prose reminders near the template |
|
|
||||||
| Behavior should depend on a condition | Conditional keyed to an observable predicate ("if the brief exists, reference it") | Unconditional rule + exemption clauses |
|
|
||||||
|
|
||||||
**Why prohibitions backfire on shaping problems:** under a competing incentive ("make the prompt self-contained"), agents negotiate with "don't X". In head-to-head wording tests on dispatch-prompt guidance, the prohibition arm produced clearly more of the unwanted content than the recipe arm (fully separated distributions), and trended worse than even the no-guidance control — micro-test your own case rather than assuming, but never reach for the prohibition by default. A recipe leaves nothing to negotiate: the output matches the stated shape or it doesn't.
|
|
||||||
|
|
||||||
**Rules for whichever form you pick:**
|
|
||||||
- **No nuance clauses.** "Don't X unless it matters" reopens the negotiation — appending a single nuance clause to a winning recipe degraded it from consistent to noisy in the same wording tests. Express a real exception as its own conditional on an observable predicate.
|
|
||||||
- **Exemption clauses don't scope.** "This limit doesn't apply to code blocks" still suppresses code blocks. If part of the output must be exempt, restructure so the rule can't reach it.
|
|
||||||
|
|
||||||
## Bulletproofing Skills Against Rationalization
|
## Bulletproofing Skills Against Rationalization
|
||||||
|
|
||||||
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
|
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
|
||||||
|
|
||||||
**Scope:** this toolkit is for discipline failures — an agent that knows the rule and skips it under pressure. For wrong-shaped output or omitted elements, prohibition-based bulletproofing backfires; use the forms in Match the Form to the Failure instead.
|
|
||||||
|
|
||||||
**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
|
**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
|
||||||
|
|
||||||
### Close Every Loophole Explicitly
|
### Close Every Loophole Explicitly
|
||||||
@@ -572,18 +553,6 @@ Run same scenarios WITH skill. Agent should now comply.
|
|||||||
|
|
||||||
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
|
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
|
||||||
|
|
||||||
### Micro-Test Wording Before Full Scenarios
|
|
||||||
|
|
||||||
Full pressure-scenario runs are the final gate, but they are slow and expensive per iteration. Verify the wording itself first with micro-tests:
|
|
||||||
|
|
||||||
1. **One fresh-context sample per call** — a raw API call, or a single-shot subagent if you don't have API access. System prompt = the realistic context the guidance will live in (the full skill or prompt template, not the guidance in isolation); user message = a task that tempts the failure.
|
|
||||||
2. **Always include a no-guidance control.** If the control doesn't exhibit the failure, there is nothing to fix — stop, don't author the guidance.
|
|
||||||
3. **5+ reps per variant.** Single samples lie.
|
|
||||||
4. **Manually read every flagged match.** Score programmatically if you like, but template echoes and quoted counter-examples masquerade as hits; automated counts alone overstate both failure and success.
|
|
||||||
5. **Variance is a metric.** When guidance lands, reps converge on the same shape. Five different interpretations across five reps means the wording isn't binding — tighten the form before adding words.
|
|
||||||
|
|
||||||
Micro-tests verify wording; they do not replace pressure scenarios for discipline skills.
|
|
||||||
|
|
||||||
**Testing methodology:** See [testing-skills-with-subagents.md](testing-skills-with-subagents.md) for the complete testing methodology:
|
**Testing methodology:** See [testing-skills-with-subagents.md](testing-skills-with-subagents.md) for the complete testing methodology:
|
||||||
- How to write pressure scenarios
|
- How to write pressure scenarios
|
||||||
- Pressure types (time, sunk cost, authority, exhaustion)
|
- Pressure types (time, sunk cost, authority, exhaustion)
|
||||||
@@ -641,8 +610,6 @@ Deploying untested skills = deploying untested code. It's a violation of quality
|
|||||||
- [ ] Keywords throughout for search (errors, symptoms, tools)
|
- [ ] Keywords throughout for search (errors, symptoms, tools)
|
||||||
- [ ] Clear overview with core principle
|
- [ ] Clear overview with core principle
|
||||||
- [ ] Address specific baseline failures identified in RED
|
- [ ] Address specific baseline failures identified in RED
|
||||||
- [ ] Guidance form matches the failure type (see Match the Form to the Failure)
|
|
||||||
- [ ] For behavior-shaping guidance: wording micro-tested against a no-guidance control (5+ reps, every flagged match read manually) — N/A for pure reference skills
|
|
||||||
- [ ] Code inline OR link to separate file
|
- [ ] Code inline OR link to separate file
|
||||||
- [ ] One excellent example (not multi-language)
|
- [ ] One excellent example (not multi-language)
|
||||||
- [ ] Run scenarios WITH skill - verify agents now comply
|
- [ ] Run scenarios WITH skill - verify agents now comply
|
||||||
|
|||||||
Reference in New Issue
Block a user