Eval-caught leak (cost-remove-export-boundary-claude, first run): the
agent reasoned "the user already decided the deletion, so no design
decision is open" and silently removed a working feature — reading the
tripwires as indicators of open decisions rather than unconditional
re-gates. The deletion tripwire now carries the same rider as the
security one ("even when the deletion is exactly what was asked"), and
the rationalization table counters the exact quoted escape.
Description: 950/1024 chars.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Staff-review findings (4-reviewer panel):
- The tripwire list existed twice in this file (description +
HARD-GATE) and the copies had already drifted after one editing
round — the framing tripwire and the security qualifier lived only
in the HARD-GATE, which the skip decision never reads (our own
GREEN-attempt-1 evidence). The description is now the single
authoritative list; the HARD-GATE exception defers to it.
- Security-posture fix: the "beyond the literally stated value" escape
no longer applies to security — touching auth, sessions, permissions,
CORS, or crypto re-gates EVEN when the value is exactly as stated
(the harm of "set CORS to *" IS the stated value). User-visible
behavior keeps the beyond-the-stated-change scope (a requested
checkbox is the stated change; that is the point of the exception).
- The framing tripwire moves into the description where it can act.
- Anti-pattern final clause cut (was the 4th in-file statement of the
exception's condition).
- Description: 886/1024 chars, YAML-validated.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adversarial review findings (A1-A7, D3):
- BLOCKER A1: the re-gating tripwires lived only in the HARD-GATE, but
the skip decision happens at the description (our own GREEN-attempt-1
evidence). The description now carries the tripwires: adds a
file/dependency, touches schema/API/persisted data, deletes or
disables anything, alters behavior/security posture, >1 plausible
reading.
- A2: "a schema/API/data question" was defeated by "the user answered
the question"; now touch-based ("even if the user stated the desired
outcome").
- A3: destructive changes and behavior/security-visible changes had no
tripwire (pure removals were structurally invisible); both added.
"a literal config value change" example now qualified ("with no
security or behavioral consequences").
- A4: the checkbox example no longer teaches hedge-phrase = fully
specified ("where the context leaves nothing to choose").
- A5: "EVERY project regardless of perceived simplicity" now ends
"with exactly one exception below" instead of contradicting it.
- A6: rationalization table added (codebase-pattern, infer-the-obvious,
hedge-phrase, asking-wastes-time).
- A7: anti-pattern opener is a claim again ("Anything with open
decisions goes through this process").
- D3: exception states TDD and verification-before-completion still
apply, so the fast path does not read as zero-oversight.
Description: 689 chars (limit 1024), YAML-validated.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The HARD-GATE ("EVERY project regardless of perceived simplicity") plus
the anti-pattern list naming "a config change" made design+approval
mandatory even for fully-specified trivial asks — all 6 agents in the
2026-06-09 quorum sweep ran a multi-option design flow for "a basic
checkbox, nothing fancy" (cost-checkbox-over-trigger failed 6/6).
Two layers, because routing happens before skill content is read
(GREEN attempt 1 proved it: the agent invoked the skill on the
description's mandate and only then saw the in-skill exception, and
the invocation itself is the cost event):
- description: carve-out visible at skill-selection time — zero open
design decisions, fully specified trivial change → implement
directly without invoking.
- HARD-GATE: matching exception with objective re-gating tripwires
(new file/dependency, schema/API/data question, >1 plausible
interpretation, user frames it as a feature/project), and the
anti-pattern section now distinguishes "seems simple" (a
rationalization when decisions exist) from "contains every decision"
(the exception). "A config change" moves from the all-of-them list
to the exception's example.
The repo's acceptance test ("Let's make a react todo list" must
auto-trigger brainstorming) is unaffected: a react todo list leaves
many decisions open and todo lists remain in the anti-pattern list.
TDD evidence (quorum):
- RED: cost-checkbox-over-trigger fails 6/6 agents (batch 2026-06-09);
GREEN attempt 1 with in-skill exception only: still fail (invoked
via description, then asked a clarifying question)
- GREEN: cost-checkbox-over-trigger-claude-20260610T004320Z-a30e pass —
no brainstorming invocation, agent cited the exception verbatim,
checkbox landed in 31s
- Canary: cost-spec-plan-duplication-claude-20260610T004506Z-22ea pass —
a real feature still triggers the full brainstorm→spec→plan flow
(and the stacked writing-plans reference discipline holds)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Staff-review findings (4-reviewer panel):
- Reference paragraph rewritten 170→123 words preserving every
behavioral condition (paraphrase/summarize coverage, no-skip guard,
WHAT-WHY/HOW split, No Placeholders boundary, drift counter,
zero-context rescope); fixes the "(brainstorming did)" syntax.
- **Spec:** header bracket: cut the never-skip sermon duplicated from
the Overview (same loaded document); the conditional none-branch
stays.
- executing-plans Step 1 now reads the spec the plan cites — plans are
no longer self-contained, and the non-subagent execution path was
never told (the eval only exercised the SDD consumer).
- writing-plans plan-location preference line gets the same
existing-dir-is-not-a-preference guard as the spec path.
- brainstorming: deduplicate the docs/specs/ prohibition (step 6
parenthetical stays; After-the-Design bullet was the second
statement in one file).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adversarial review findings (C1, C2, C3, C5, A8, F3):
- "never restate" did not cover paraphrase/summary — the actual failure
mode in the RED evidence; now "never restate, paraphrase, or summarize".
- The No Placeholders intra-plan repetition mandate gave a symmetric
argument for re-inlining the spec; the rule now draws the line:
repetition WITHIN the plan is required, copying FROM the spec is not.
- Drift argument was invertible ("snapshot to avoid drift"); now states
snapshots hide drift.
- **Spec:** header gets a no-spec branch (state requirements once in
the header, not per task) instead of inviting "no spec, rule is moot".
- Brainstorming path bullet: an existing differently-named docs dir is
not a "user preference" override.
- Execution Handoff now notes review fanout scales (forward-ref to
SDD's Proportionality rule) instead of promising unconditional
two-stage review.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
writing-plans told agents to "document everything they need to know"
assuming zero context — every agent in the 2026-06-09 six-agent quorum
sweep obeyed and restated the entire spec inline in the plan
(cost-spec-plan-duplication failed 5/5 completed agents; pi's plan was
683 lines of duplicated spec).
- writing-plans: state the division of labor — spec owns WHAT/WHY,
plan owns HOW; cite the spec by path/section, never restate it.
"Zero context" means mechanically executable steps, not duplication.
Add a **Spec:** line to the plan header template.
- brainstorming: close the path loophole the re-run exposed — claude
shortened docs/superpowers/specs/ to docs/specs/ in 2/2 runs; both
path mentions now explicitly forbid the shortening.
TDD evidence (quorum):
- RED: batch-20260609T023452Z-68aa et al — 5/5 agents fail
- GREEN: cost-spec-plan-duplication-claude-20260609T234142Z-9625 pass
(plan: "this plan does not restate them" + spec cited by path;
both docs in docs/superpowers/)
- Canary: triggering-writing-plans-claude pass (skill still fires)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Issue #1134: agents reading visual-companion.md see bare commands like
`scripts/start-server.sh`, correctly identify the plugin install
directory, then look for `<plugin>/scripts/start-server.sh` instead of
`<plugin>/skills/brainstorming/scripts/start-server.sh`. The file
doesn't exist at the plugin-rooted path, so the agent concludes the
visual companion isn't available and falls back to text-only
brainstorming.
Multiple independent reproductions in the issue thread, plus one user's
agent self-reported: "I assumed the scripts folder was in the root
directory of the plugin, it didn't realize it could have been talking
about the skill folder itself."
Change all `scripts/<file>` references in visual-companion.md to
`skills/brainstorming/scripts/<file>`. Agents that correctly identify
the plugin root will now join to the right path.
Closes#1134.
Replace Claude-Code-specific tool names in skill prose, prompt
templates, and OpenCode-facing docs with action-language descriptions
that resolve to each runtime's native tool via the per-platform refs.
Changes by category:
- Prose mentions ("Use TodoWrite to track...", "Use Task tool with
general-purpose type") → action language ("Track each item as a
todo", "Dispatch a general-purpose subagent")
- Prompt template headers (6 files): "Task tool (general-purpose):"
→ "Subagent (general-purpose):" — preserves the type information
without naming Claude Code's specific dispatch tool
- DOT flowchart node labels: "Invoke Skill tool" → "Invoke the
skill"; "Create TodoWrite todo per item" → "Create a todo per
item"
- OpenCode INSTALL.md and docs/README.opencode.md: replace the old
"TodoWrite → todowrite, Task → @mention" mapping (which both
taught a vocabulary skills no longer use AND was wrong about
@mention being a real OpenCode syntax) with an action-language
mapping verified against the installed OpenCode CLI's tool
inventory.
The platform-tools refs landed in Phase B already document each
runtime's resolution; skills now speak in the actions those refs
map. Tool names that genuinely belong only in the per-platform
dispatch section ("In Claude Code: Use the `Skill` tool") and the
Claude-Code-specific Bash run_in_background flag note in
visual-companion remain — those are intentional carve-outs.
Misc platform/runtime statements and adjacencies that don't fit the
prose, config-ref, README-ordering, or tool-vocabulary buckets:
- visual-companion frame template: rename CSS/HTML id #claude-content
→ #frame-content. The id is purely styling — nothing external
references it. The brainstorm-server test that asserted the old
string is updated in lockstep.
- visual-companion launch instructions: add a Copilot CLI section
alongside Claude Code, Codex, and Gemini CLI; combine the Claude
Code (macOS / Linux) and (Windows) sections so heading style
matches the other (non-OS-qualified) platforms.
- visual-companion: "Use Write tool" → "Use your file-creation tool"
for the cat/heredoc warning. The prohibition is what's load-
bearing, not the tool name.
- executing-plans/SKILL.md: list all subagent-capable runtimes
(Claude Code, Codex CLI, Codex App, Copilot CLI, Gemini CLI) and
point at the per-platform tool refs as the source of truth.
- executing-plans/SKILL.md: relative path "using-superpowers/
references/" → "../using-superpowers/references/" to resolve
correctly from the executing-plans/ directory.
No bundled spec doc here — Phase D was scope-extension work that
took place across rounds, with no standalone spec authored.
Two bugs caused the brainstorm server to self-terminate within 60s:
1. ownerAlive() treated EPERM (permission denied) as "process dead".
When the owner PID belongs to a different user (Tailscale SSH,
system daemons), process.kill(pid, 0) throws EPERM — but the
process IS alive. Fixed: return e.code === 'EPERM'.
2. On WSL, the grandparent PID resolves to a short-lived subprocess
that exits before the first 60s lifecycle check. The PID is
genuinely dead (ESRCH), so the EPERM fix alone doesn't help.
Fixed: validate the owner PID at server startup — if it's already
dead, it was a bad resolution, so disable monitoring and rely on
the 30-minute idle timeout.
This also removes the Windows/MSYS2-specific OWNER_PID="" carve-out
from start-server.sh, since the server now handles invalid PIDs
generically at startup regardless of platform.
Tested on Linux (magic-kingdom) via Tailscale SSH:
- Root-owned owner PID (EPERM): server survives ✓
- Dead owner PID at startup (WSL sim): monitoring disabled, survives ✓
- Valid owner that dies: server shuts down within 60s ✓
Fixes#879
ownerAlive() treated EPERM (permission denied) the same as ESRCH
(process not found), causing the server to self-terminate within 60s
whenever the owner process ran as a different user. This affected WSL
(owner is a Windows process), Tailscale SSH, and any cross-user
scenario.
The fix: `return e.code === 'EPERM'` — if we get permission denied,
the process is alive; we just can't signal it.
Tested on Linux via Tailscale SSH with a root-owned grandparent PID:
- Server survives past the 60s lifecycle check (EPERM = alive)
- Server still shuts down when owner genuinely dies (ESRCH = dead)
Fixes#879
The session directory now contains two peers: content/ (HTML served to
the browser) and state/ (events, server-info, pid, log). Previously
all files shared a single directory, making server state and user
interaction data accessible over the /files/ HTTP route.
Also fixes stale test assertion ("Waiting for Claude" → "Waiting for
the agent").
Reported-By: 吉田仁
Metadata files (.server-info, .events, .server.pid, .server.log,
.server-stopped) were stored in the same directory served over HTTP,
making them accessible via the /files/ route. They now live in a .meta/
subdirectory that is not web-accessible.
Also fixes a stale test assertion ("Waiting for Claude" → "Waiting for
the agent").
Reported-By: 吉田仁
The subagent review loop (dispatching a fresh agent to review plans/specs)
doubled execution time (~25 min overhead) without measurably improving plan
quality. Regression testing across 5 versions (v3.6.0 through v5.0.4) with
5 trials each showed identical plan sizes, task counts, and quality scores
regardless of whether the review loop ran.
Changes:
- writing-plans: Replace subagent Plan Review Loop with inline Self-Review
checklist (spec coverage, placeholder scan, type consistency)
- writing-plans: Add explicit "No Placeholders" section listing plan failures
(TBD, vague descriptions, undefined references, "similar to Task N")
- brainstorming: Replace subagent Spec Review Loop with inline Spec Self-Review
(placeholder scan, internal consistency, scope check, ambiguity check)
- Both skills now use "look at it with fresh eyes" framing
Testing: 5 trials with the new skill show self-review catches 3-5 real bugs
per run (spawn positions, API mismatches, seed bugs, grid indexing) in ~30s
instead of ~25 min. Remaining defects are comparable to the subagent approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Skip OWNER_PID monitoring on Windows/MSYS2 where the PID namespace is
invisible to Node.js, preventing server self-termination after 60s (#770)
- Document run_in_background: true for Claude Code on Windows (#767)
- Restore user choice between subagent-driven and inline execution after
plan writing; subagent-driven is recommended but no longer mandatory
- Add Windows lifecycle test script verified on Windows 11 VM
- Note #723 (stop-server.sh reliability) as already fixed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove chunk-based plan review in favor of single whole-plan review
- Add Calibration sections to both reviewer prompts so only serious
issues block approval
- Reduce max review iterations from 5 to 3
- Streamline reviewer checklists (spec: 7→5, plan: 7→4 categories)
Replace #!/bin/bash with #!/usr/bin/env bash in 13 scripts. The
hardcoded path fails on NixOS, FreeBSD, and macOS with Homebrew bash.
#!/usr/bin/env bash is the portable POSIX-friendly alternative.
Tested on Linux and Windows (Git Bash + CMD). macOS is the primary
beneficiary since Homebrew installs bash to /opt/homebrew/bin/bash.
Based on #700, closes#700.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Windows/Git Bash reaps nohup background processes, causing the brainstorm
server to die silently after launch. Auto-detect Windows via OSTYPE
(msys/cygwin/mingw) and MSYSTEM env vars, switching to foreground mode
automatically. Tested on Windows 11 from CMD, PowerShell, and Git Bash —
all route through Git Bash and hit the same issue.
Based on #740, fixes#737. Also adds CHANGELOG.md documenting the fix and
a known OWNER_PID/WINPID mismatch on the main branch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Subagents should never inherit the parent session's context or history.
The dispatcher constructs exactly what each subagent needs, keeping
both sides focused: the subagent on its task, the controller on
coordination.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
$PPID inside start-server.sh is the ephemeral shell the harness spawns
to run the script — it dies immediately when the script exits, causing
the server to shut down after ~60s. Now resolves grandparent PID via
`ps -o ppid= -p $PPID` to get the actual harness process (e.g. claude).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
start-server.sh passes $PPID as BRAINSTORM_OWNER_PID to the server.
The server checks every 60s if the owner process is still alive
(kill -0). If it's gone, the server shuts down immediately —
deletes .server-info, writes .server-stopped, exits cleanly.
Works across all harnesses (CC, Codex, Gemini CLI) since it
tracks the shell process that launched the script, which dies
when the harness dies.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Server tracks activity (HTTP requests, WebSocket messages, file
changes) and exits after 30 minutes of inactivity. On exit, deletes
.server-info and writes .server-stopped with reason. Visual companion
guide now instructs agents to check .server-info before each screen
push and restart if needed. Works on all harnesses, not just CC.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete 717 files: index.js, package.json, package-lock.json, and
the entire node_modules directory (express, ws, chokidar + deps).
Update start-server.sh to use server.js. Remove gitignore exception.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete zero-dep brainstorm server. Uses knownFiles set to
distinguish new screens from updates (macOS fs.watch reports
'rename' for both). All 56 tests pass (31 unit + 25 integration).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements RFC 6455 handshake, frame encoding/decoding for text
frames. All 31 unit tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The spec review loop (dispatch spec-document-reviewer subagent, iterate
until approved) existed in the prose "After the Design" section but was
missing from both the checklist and the process flow diagram. Since agents
follow the diagram and checklist more reliably than prose, the spec review
step was being skipped entirely.
Added step 7 (spec review loop) to the checklist and a corresponding
"Spec review loop" → "Spec review passed?" node pair to the dot graph.
Tested with claude --plugin-dir and claude-session-driver: worker now
correctly dispatches the spec-document-reviewer subagent after writing
the design doc and before presenting to the user for review.
Fixes#677.
start-server.sh now runs npm install if node_modules is missing.
Fixes broken server when superpowers is installed as a plugin (node_modules
are in .gitignore and not included in the clone).
The server now writes its startup JSON to $SCREEN_DIR/.server-info.
Agents that launch the server via background execution (where stdout is
hidden) can read this file to get the URL, port, and screen_dir.
The visual companion docs now give concrete launch commands per platform:
Claude Code (default mode), Codex (auto-foreground via CODEX_CI), Gemini CLI
(--foreground with is_background), and a fallback for other environments.
Moves lib/brainstorm-server/ → skills/brainstorming/scripts/ so the
brainstorming skill uses relative paths (scripts/start-server.sh) instead
of ${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/. This follows the
agentskills.io specification for portable, cross-platform skills.
Updates visual-companion.md references and test paths. All tests pass.
After the spec review loop passes, the skill now asks the user to review
the written spec file before invoking writing-plans. This prevents the
agent from racing ahead to implementation planning without giving the
user a chance to read and adjust the written document.
Fixes#565
Brainstorming now assesses whether a project is too large for a single
spec and helps decompose into sub-projects. Scope check is inline in
the understanding phase (testing showed it was skipped as a separate step).
Spec reviewer also checks scope. Writing-plans has a backstop.
Add design-for-isolation and working-in-existing-codebases guidance to
brainstorming. Add file size awareness and escalation prompts to SDD
implementer and code quality reviewer. Writing-plans gets architecture
section sizing guidance. Spec and plan reviewers get architecture and
file size checks.
Brainstorming skill now offers an optional browser-based visual companion
for questions involving visual decisions (mockups, layouts, diagrams).
The companion is a tool, not a mode — each question is evaluated for
whether browser or terminal is more appropriate.
Includes visual-companion.md progressive disclosure guide with server
workflow, screen authoring patterns, and feedback collection.
Co-Authored-By: Drew Ritter <drew@ritter.dev>