A second adversarial review of the merged branch found that combining the
session-key auth with the feature work created real bugs the (vacuous) tests
missed:
- [Critical] GET /files/ (empty name) resolved to CONTENT_DIR and crashed the
process with uncaught EISDIR — newly reachable because the query-stripping
refactor turns /files/?key=... into /files/. Reject non-regular-file names.
- [High] --open opened a KEYLESS url, which the auth gate 403s — the headline
feature landed on the error page. Open the keyed url.
- [High] Same-port restart regenerated the token (port persisted, token not), so
the open tab's old cookie 403'd and never reconnected — contradicting the
documented promise. Persist the token (BRAINSTORM_TOKEN_FILE / .last-token)
alongside the port.
- [Medium] Token sat in world-readable server-info/server.log (0644 in /tmp).
umask 077 in start-server.sh + mode 0600 on server-info/.last-token.
- [Medium] touchActivity() ran before the auth check, so unauthenticated requests
defeated the idle timeout. Count activity only after authorization.
- [Low] COOKIE_NAME embedded the pre-fallback port; derive it from the actual
bound port (also prevents a cross-server cookie-jar collision on fallback).
Tests added/strengthened (previously passed vacuously): /files/ no-crash; the
auto-open url carries the key and is reachable (200); restart reuses the same key
not just the port; unauthenticated requests don't reset the idle clock.
Full suite green (ws-protocol 32, helper 12, auth 13, server 29, lifecycle 8,
stop-server 4); restart smoke confirms same port+key and old URL -> 200.
The companion server is reachable by any local browser tab (default loopback
bind) and by any host that can route to it (remote --host bind). It served
screens, files, and accepted event-injecting WebSocket connections with no
authentication, so a malicious browser tab or a direct remote client could read
brainstorm content or inject events that the agent reads as the user's input
(prompt injection into a live session).
Generate a per-session secret token, carry it in the served URL as ?key=, and
mirror it into an HttpOnly SameSite=Strict per-port cookie on first load so
same-origin subresources and the WebSocket handshake authenticate automatically.
Every HTTP request and WebSocket upgrade now requires a valid key (query or
cookie, constant-time compared); unauthenticated requests get a friendly 403
explaining they need the full URL. A secret authenticates the client uniformly
across loopback, tunnel, and remote binds and defeats DNS rebinding, which a
Host/Origin allowlist cannot.
Also guard handleMessage against a null JSON payload that crashed the process.
Tests: new auth.test.js (13 cases) covering the key on /, /files/*, and WS plus
cookie bootstrap and the null-payload guard; server.test.js threads the key;
ws-protocol.test.js + auth.test.js wired into npm test.
Closes#1014
Refs #1110, #1553, #1504
The node+server.cjs command match (from the adversarial review) still matched any
unrelated node process running a file named server.cjs. When we recorded the
bound port (state/server-info) and lsof is available, additionally require the
PID to be the process actually LISTENING on this session's port — which rules out
a different project's server.cjs / editor task runner that recycled the stale
PID. Falls back to the command match when the port or lsof isn't available.
Test: a 'node server.cjs' process not listening on the recorded port is spared.
Refs #1703
From a two-reviewer adversarial pass:
- [High] EADDRINUSE fallback clobbered the shared .last-port: onListen wrote the
bound port unconditionally, so a fallback to a random port overwrote the
preferred port another live session still owns — stranding that session's open
tab forever. Now persist only when we bound the preferred port (not on
fallback). The fallback test now asserts .last-port integrity (teeth-verified).
- [Medium] maybeOpenBrowser ran the URL through a shell (exec + JSON.stringify),
which does NOT neutralize $(...) in a url-host. Platform launchers now use
execFile with the URL as an argv element (no shell). The operator-set
BRAINSTORM_OPEN_CMD path stays shell-based (trusted input).
- [Medium] --open was a silent no-op on native Windows (no win32 branch). Added.
- [Medium] helper.js reconnect/status/tombstone had only substring-grep tests.
Added behavioral tests driving the state machine against a mocked browser:
Reconnecting+backoff (500->1000->2000), tombstone after the grace period, and
reload-on-recovery.
- [Low] status pill showed a false 'Connected' before the socket opened; now
starts 'Connecting…' until onopen.
Not changed (flagged): stop-server.sh's PID-ownership check still matches any
'node ... server.cjs' (narrow residual — a recycled PID onto an unrelated node
server.cjs); robust fix needs fragile cross-platform process introspection.
Move the companion consent from an upfront, anticipatory offer to the first
moment a question would genuinely be clearer shown than told. If no visual
question ever arises, it's never offered. On approval the agent starts the
server with --open, so the user's browser opens to the first screen — the pop is
tied to that approval, never unsolicited.
Also hardens visual-companion.md: confirming the server is alive (server-info
present, server-stopped absent) before referring to the URL is now a required
step; restart with the same --project-dir reuses the port so the open tab
reconnects on its own (paused overlay while down); idle default corrected to 4h.
NOTE: SKILL.md is behavior-shaping content — this flow change should be
eval-tested (writing-skills adversarial pressure test) before merge.
Refs #1237, #1037
When the user approves the visual companion, open their browser automatically the
first time a screen is actually ready to show — rather than at startup (just the
waiting page) or making them open the URL by hand.
Opt-in and gated on approval: off unless BRAINSTORM_OPEN is set (start-server.sh
--open, which the agent passes only after the user agrees to use the companion).
Even then it fires once, and is skipped if a browser is already connected, on a
non-loopback/remote bind, or when headless. Launcher is the platform default
(open / xdg-open / WSL cmd.exe) or BRAINSTORM_OPEN_CMD; best-effort, never fatal.
lifecycle.test.js: opens once on the first screen when approved; does NOT open
without approval.
Closes#755
Refs #759
When the companion idle-shuts-down and the agent restarts it, a fresh random
port meant the user's open browser tab pointed at a dead URL. Persist the bound
port per project and prefer it on the next start, so the restarted server comes
up on the same port and the open tab's reconnect just works.
- start-server.sh exports BRAINSTORM_PORT_FILE=<project>/.superpowers/brainstorm/
.last-port for project sessions (not /tmp).
- server.cjs prefers an explicit BRAINSTORM_PORT, else the recorded port, else
random; writes the actually-bound port back; and on EADDRINUSE (preferred port
still in use) falls back to a random port once instead of crashing.
lifecycle.test.js: restart reuses the recorded port; a taken preferred port
falls back to a random one without crashing.
Refs #1237
The injected client reconnected on a fixed 1s timer with no feedback: if the
laptop slept or the server restarted, the page showed 'Connected' over a dead
socket and silently queued events. And when the server stopped, the user got a
bare connection-refused with no explanation.
helper.js now:
- reconnects with exponential backoff (500ms, doubling, capped at 30s; reset on
open), with an onerror->close handler, nulls the socket on close, and clears a
pending timer before scheduling another;
- drives the frame status pill Connected/Reconnecting/Disconnected via a
--status-color custom property (frame-template.html);
- after ~15s disconnected, shows a self-styled 'Companion paused' overlay
(tombstone) explaining the companion stopped and will reconnect automatically;
- on recovery from a tombstoned outage (e.g. server restarted on the same port)
reloads to pick up the restarted server's current screen.
The reconnect-backoff is an exported pure function; helper.test.js unit-tests it
(doubling + cap progression) and asserts the status/tombstone/reconnect wiring.
DOM behaviour is verified live.
Refs #856, #1237
The companion shut down after only 30 minutes idle — too short for real
brainstorming, where a single question can sit far longer. And shutdown() never
closed upgraded WebSocket sockets, so an open browser connection could keep the
Node process alive after it was supposed to exit.
- Default idle timeout raised to 4 hours, configurable via BRAINSTORM_IDLE_TIMEOUT_MS
and start-server.sh --idle-timeout-minutes (validated positive integer).
- Reported as idle_timeout_ms in the server-started JSON / server-info.
- shutdown() now destroys all client sockets so the process exits even with an
open WebSocket.
- Watchdog check interval is configurable (BRAINSTORM_LIFECYCLE_CHECK_MS, default
60s) so the lifecycle can be tested without minute-long waits.
Adds lifecycle.test.js (configured timeout reported; idle shutdown exits despite
an open WS — teeth-verified; the start-server flag). Wires ws-protocol,
lifecycle, and stop-server suites into npm test.
Closes#1237
Refs #1689
stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a
reboot or PID wraparound the pid file can point at an unrelated, live process —
which we would then kill.
Verify the PID is actually our server (a running 'node ... server.cjs') before
signalling it. If ownership can't be proven, fail closed: remove the stale pid
file and report {status: stale_pid} without killing anything. Real servers still
stop ({status: stopped}); a missing pid file still reports not_running.
Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real
server is stopped, and a missing pid file.
Refs #1703
On macOS (and ExFAT/SMB volumes) the OS writes ._<name>.html sidecar files
holding binary resource-fork metadata. These end with .html, so they passed the
content filter and could be picked as the newest screen — serving binary garbage
to the browser instead of the mockup — or fetched via /files/.
Skip dotfiles (leading '.') at all four sites that list or serve content:
getNewestScreen, the /files/ endpoint, the known-files seed, and the fs.watch
handler. Tests cover serving (/ and /files/) and the watch path (a ._ file must
not trigger a reload).
Refs #950
Adversarial review findings 1/3/9: the head-to-head result is now scoped
to its context (dispatch-prompt guidance) with an explicit micro-test-your-
own-case instruction; the nuance-clause result is reported as
consistent->noisy rather than 'measurably dilutes'; the checklist line is
scoped to behavior-shaping guidance and the micro method no longer assumes
raw API access.
RED battery (35 opus authoring samples against the current skill) showed
authors default to prohibition+rationalization-table for composition-
shaping problems (T1: 5/5), where that form measurably backfires
(prohibition 4.4 vs 3.6 no-guidance control vs 3.0 recipe restatement
errors), and design only full-subagent verification with no wording
micro-tests, no mandatory no-guidance control, no manual inspection of
automated matches, no variance signal (T7: 5/5).
Adds: Match the Form to the Failure (failure-type -> form table, nuance/
exemption rules), scope note on Bulletproofing, Micro-Test Wording
subsection, two checklist lines. Deliberately narrow: T3/T4/T5/T6 RED
samples showed Iron Law / elicit-first behavior already strong.
Add a conditional TDD Evidence field to the implementer report format so controllers can verify RED and GREEN output when TDD was required.
The field asks for the command run, relevant RED/GREEN output, and the expected RED failure reason rather than raw full logs.
Fixes#994.
Per obra's guidance on #1609: remove the github-specific instruction rather
than replacing it with a platform-detection table. Agents already know their
forge tooling; the skill only needs to cover the push step.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces hardcoded `gh pr create` in Option 2 with a platform-neutral
note: check `git remote get-url origin` first, then use gh (GitHub),
glab (GitLab), or fall back to the compare URL for unknown platforms.
Adds matching Red Flag entry so agents don't skip the detection step.
Fixes#1609
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Antigravity (Google's `agy` CLI) installs the existing Superpowers plugin
directly:
agy plugin install https://github.com/obra/superpowers
agy imports the bundled skills and runs the plugin's SessionStart hook, so
using-superpowers bootstraps from the first message — verified on agy 1.0.3:
a fresh session given "Let's make a react todo list" auto-triggers the
brainstorming skill instead of writing code. agy discovers skills natively
and, having no Skill tool, loads them by reading SKILL.md with view_file.
No scaffold, installer, or generated context file is needed. This adds only:
- README.md: an Antigravity install section + Quickstart link
- skills/using-superpowers/SKILL.md: reference to the agy tool mapping
- skills/using-superpowers/references/antigravity-tools.md: action->tool
mapping for agy (view_file, write_to_file, invoke_subagent, manage_task,
and skill loading via view_file on SKILL.md)
- tests/antigravity/: structural test for the tool mapping, mirroring
tests/pi/
The "Signals You're Doing It Wrong" bullet in systematic-debugging/SKILL.md
contains the literal token Claude Code's runtime scans for in tool result
bodies. Every Skill-tool invocation of this skill caused the harness to
inject a spurious system-reminder claiming the user requested deeper
reasoning, silently bumping every session into extended thinking.
Replace the bullet's spelling so the contiguous letter sequence the scanner
matches is broken with a hyphen. The signal text remains recognizable to
the agent and the documented action ("Question fundamentals, not just
symptoms") is unchanged.
Fixesobra/superpowers#1283
Issue #1134: agents reading visual-companion.md see bare commands like
`scripts/start-server.sh`, correctly identify the plugin install
directory, then look for `<plugin>/scripts/start-server.sh` instead of
`<plugin>/skills/brainstorming/scripts/start-server.sh`. The file
doesn't exist at the plugin-rooted path, so the agent concludes the
visual companion isn't available and falls back to text-only
brainstorming.
Multiple independent reproductions in the issue thread, plus one user's
agent self-reported: "I assumed the scripts folder was in the root
directory of the plugin, it didn't realize it could have been talking
about the skill folder itself."
Change all `scripts/<file>` references in visual-companion.md to
`skills/brainstorming/scripts/<file>`. Agents that correctly identify
the plugin root will now join to the right path.
Closes#1134.
Matches the style used by the spec-reviewer-prompt.md and
code-quality-reviewer-prompt.md call sites, which already use square
brackets ([VAR] or [VAR — description]). No semantic change — these
placeholders are filled in by the controller; nothing programmatic
substitutes them.
Two problems with the SDD reviewer prompts on dev:
- spec-reviewer-prompt.md never received a git range, so the
general-purpose subagent had to crawl the entire codebase to find what
changed. Reporter measured 20-33 minute spec reviews on simple tasks
(#1538).
- Neither reviewer prompt told the subagent that review is read-only.
A spec reviewer running `git checkout <parent-sha>` for historical
comparison silently detached HEAD on the controller's branch, then
subsequent task commits accumulated on the detached HEAD and were
effectively orphaned (#1543, reproduced independently in #1543's
thread).
Add a Git Range to Review section to spec-reviewer-prompt.md that
mirrors the one code-reviewer.md already has, plus a Read-Only Review
section in both reviewer prompt templates stating the principle: do
not mutate the working tree, the index, HEAD, or branch state. Allow
inspecting other revisions via a separate temporary worktree, so the
read-only rule does not block legitimate historical comparison.
Closes#1538.
Closes#1543.
Replace Claude-Code-specific tool names in skill prose, prompt
templates, and OpenCode-facing docs with action-language descriptions
that resolve to each runtime's native tool via the per-platform refs.
Changes by category:
- Prose mentions ("Use TodoWrite to track...", "Use Task tool with
general-purpose type") → action language ("Track each item as a
todo", "Dispatch a general-purpose subagent")
- Prompt template headers (6 files): "Task tool (general-purpose):"
→ "Subagent (general-purpose):" — preserves the type information
without naming Claude Code's specific dispatch tool
- DOT flowchart node labels: "Invoke Skill tool" → "Invoke the
skill"; "Create TodoWrite todo per item" → "Create a todo per
item"
- OpenCode INSTALL.md and docs/README.opencode.md: replace the old
"TodoWrite → todowrite, Task → @mention" mapping (which both
taught a vocabulary skills no longer use AND was wrong about
@mention being a real OpenCode syntax) with an action-language
mapping verified against the installed OpenCode CLI's tool
inventory.
The platform-tools refs landed in Phase B already document each
runtime's resolution; skills now speak in the actions those refs
map. Tool names that genuinely belong only in the per-platform
dispatch section ("In Claude Code: Use the `Skill` tool") and the
Claude-Code-specific Bash run_in_background flag note in
visual-companion remain — those are intentional carve-outs.
Misc platform/runtime statements and adjacencies that don't fit the
prose, config-ref, README-ordering, or tool-vocabulary buckets:
- visual-companion frame template: rename CSS/HTML id #claude-content
→ #frame-content. The id is purely styling — nothing external
references it. The brainstorm-server test that asserted the old
string is updated in lockstep.
- visual-companion launch instructions: add a Copilot CLI section
alongside Claude Code, Codex, and Gemini CLI; combine the Claude
Code (macOS / Linux) and (Windows) sections so heading style
matches the other (non-OS-qualified) platforms.
- visual-companion: "Use Write tool" → "Use your file-creation tool"
for the cat/heredoc warning. The prohibition is what's load-
bearing, not the tool name.
- executing-plans/SKILL.md: list all subagent-capable runtimes
(Claude Code, Codex CLI, Codex App, Copilot CLI, Gemini CLI) and
point at the per-platform tool refs as the source of truth.
- executing-plans/SKILL.md: relative path "using-superpowers/
references/" → "../using-superpowers/references/" to resolve
correctly from the executing-plans/ directory.
No bundled spec doc here — Phase D was scope-extension work that
took place across rounds, with no standalone spec authored.
Two structural changes:
1. Generalize CLAUDE.md-specific guidance:
- "Project-specific conventions (put in CLAUDE.md)" → "(put in
your instructions file)" in writing-skills/SKILL.md
- "(explicit CLAUDE.md violation)" → "(explicit instruction-file
violation)" in receiving-code-review/SKILL.md
- The instruction-priority list in using-superpowers/SKILL.md
stays inclusive (CLAUDE.md, GEMINI.md, AGENTS.md) — that's
load-bearing, not a substitution opportunity.
2. Per-platform tool reference files at skills/using-superpowers/
references/{claude-code,codex,copilot,gemini}-tools.md. Each ref
documents:
- The runtime's preferred instructions file (CLAUDE.md, AGENTS.md,
GEMINI.md, etc.) and how it loads
- The runtime's personal-skills directory + cross-runtime
~/.agents/skills/ path where applicable
- Action-language → tool-name mapping table
Tool names and table content reflect the source-verified state from
direct inspection of openai/codex, google-gemini/gemini-cli,
sst/opencode, and the installed @github/copilot package. Filenames
and behaviors are sourced from each runtime's official docs.
Files in this commit also pick up later-phase changes that
accumulated on the same files (using-superpowers/SKILL.md "How to
Access Skills" overhaul, action-language flowchart, refs' final
table content). The bundled spec records original scope.
Replace generic third-person "Claude" with "agents" / "your agent"
forms across active skill prose, the README intro, and the vendored
anthropic-best-practices.md reference. Carve-outs preserved:
historical attribution paths, the "Variant C: Claude.AI Emphatic
Style" example label, model identifiers (Haiku/Sonnet/Opus), and the
"In Claude Code:" per-platform skill-dispatch list.
Coined-term rename: "Claude Search Optimization (CSO)" → "Skill
Discovery Optimization (SDO)" in writing-skills/SKILL.md.
Files in this commit also pick up later-phase changes that
accumulated on the same files (dispatching-parallel-agents code-
example transformation, writing-skills numbering and path fixes).
The bundled spec at docs/superpowers/specs/ records the original
scope and the carve-outs.
README.md gets only its prose change here; the alphabetization
lands in Phase C's commit.