Compare commits

..

152 Commits

Author SHA1 Message Date
Jesse Vincent
9b6be89aea E37: pre-flight plan review — surface plan conflicts as one batched question before Task 1 2026-06-15 12:16:47 -07:00
Jesse Vincent
eafa95b437 Spec: L2b tested — opus structural win, sonnet transmission+attention gap (E35/E36); bump evals to 9919b27 2026-06-15 12:16:47 -07:00
Jesse Vincent
228f2cb8a9 L2b: plan-mandated defects are findings the human adjudicates
Reviewer tripwire (Calibration): a plan-mandated defect IS a finding,
reported as Important and labeled plan-mandated — the plan's authorship
does not grade its own work.

Controller rule (review loop): a plan-mandated finding, or any finding
conflicting with the plan's text, escalates to the human like any plan
contradiction — never dismissed because the plan mandates it.

E35 micro (frozen 0a98 replay, sonnet reviewer, 6v6): without the
tripwire 0/6 reports give the controller anything to escalate on (all
Approved, defect endorsed as spec-required); with it 6/6 report the
defect as a labeled finding.
2026-06-15 12:16:47 -07:00
Jesse Vincent
e161df5a9b E27 stack: conditional impl tier + final-review tier pin + narration recipe + terse reviewer contract 2026-06-15 12:16:47 -07:00
Jesse Vincent
5e204f1128 E03: cheapest-tier implementers when plan carries complete code (transcription hypothesis) 2026-06-15 12:16:47 -07:00
Jesse Vincent
b1eb92ea72 Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not 2026-06-15 12:15:06 -07:00
Jesse Vincent
6e9bbb7e3e writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks
Claims are fidelity and variance, not dollars (full attribution in the
superpowers-evals experiment log, 2026-06-11 L1 entry):
- Global Constraints header: 0/5 -> 5/5 adoption in micro-tests, exact
  values verbatim; makes constraints mechanically propagatable to briefs
  and reviewers (a version-floor violation class shipped because they
  weren't). The one fix wave in the elicited full runs was a version-floor
  catch this header enabled.
- Per-task Interfaces blocks: 0 -> 100% of tasks, exact signatures,
  within-plan consistent; removes the controller's per-dispatch interface
  re-derivation.
- Task right-sizing: 9.4 -> 8.4 mean tasks at svelte scale (kills
  standalone Types/README micro-tasks); no effect at small scale.
- End-to-end (opus-written plan executed under SDD): guidance plan ran 1
  fix wave vs control's 2-4 (control plan shipped a real Sierpinski bug);
  execution cost equal within noise.
2026-06-15 12:15:06 -07:00
Jesse Vincent
fe938ac86c Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules
E30 replay: the planted-DRY catch is causally determined by the
controller-composed constraints block (0/6 with process-shaped vs 5/6
with the spec's own wording). E31 micro: this recipe doubles the rate
at which composed blocks carry the spec's cross-component relationship
(6/6 vs 3/6). Affects dev and the redesign equally (E29: both 4/5).
2026-06-15 12:15:06 -07:00
Jesse Vincent
07dec9331f Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance 2026-06-15 12:15:06 -07:00
Jesse Vincent
afcbf8bacb Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) 2026-06-15 12:15:06 -07:00
Jesse Vincent
fa14c8d671 Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) 2026-06-15 12:15:06 -07:00
Jesse Vincent
8b76932337 Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first 2026-06-15 12:15:06 -07:00
Jesse Vincent
0702ec2c6f Record writing-plans micro-test result: resolved, no change needed 2026-06-15 12:15:06 -07:00
Jesse Vincent
85a9324a53 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) 2026-06-15 12:15:06 -07:00
Jesse Vincent
610b09874e Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist 2026-06-15 12:15:06 -07:00
Jesse Vincent
6df501ea5d Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget
Validated 2026-06-10 (all gates pass): go-fractals 54.1-54.7 min / $12.81-14.31
(baseline 64.9 / $16.07); svelte-todo 55.0 min / 19.3M / $14.99 (baseline
79.7 / 27.3M / $20.98); planted-defect pass $2.77. Dispatch-model discipline
3/3 runs after moving model: into the templates as a REQUIRED line.
Full experiment log: evals docs/experiments/2026-06-10-sdd-cost-experiments.md
2026-06-15 12:15:06 -07:00
Jesse Vincent
1585f40c8e Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants 2026-06-15 12:15:06 -07:00
Jesse Vincent
60c0b744b4 Shared: unique review-package collateral names 2026-06-15 12:15:06 -07:00
Jesse Vincent
6b3e4ad407 Add review-package script; close fix-dispatch test gap
scripts/review-package generates the reviewer's input deterministically:
commit list, stat summary, and net diff with -U10 context, written to a
file from an explicit BASE. Live runs showed controllers improvising
'git diff HEAD~1..HEAD', which silently truncates multi-commit tasks,
and svelte's five fix dispatches shipped without re-running any tests —
fix dispatches now explicitly carry the implementer's
re-run-and-report contract.
2026-06-15 12:15:06 -07:00
Jesse Vincent
a84bb0f52b Describe the review design as current state, not as a delta
The skill read as a changelog: 'combined task review,' 'one reviewer,
one reading,' 'one dispatch,' and an example still showing diffs pasted
into prompts. A reader who never saw the two-reviewer design has no
referent for 'combined.' Prose now states the design directly, and the
flowchart/example reflect the diff-file handoff.
2026-06-15 12:15:06 -07:00
Jesse Vincent
69d396a676 Spec: record iterations 2-3 results and final frozen-config matrix 2026-06-15 12:15:06 -07:00
Jesse Vincent
e4457c970e Hand reviewers the diff as a file, not a paste
Paste adoption stayed at 0/15 even as a Red Flag — and the controller's
reluctance is locally rational: pasting loads the diff into the (most
expensive) controller context permanently, while a reviewer self-fetch
costs a few cheap turns. The diff-file handoff is cheap for both sides:
the controller redirects git diff to /tmp without reading it, and the
reviewer gets the whole change in one Read call.
2026-06-15 12:15:06 -07:00
Jesse Vincent
fac5888846 Reviewer skepticism covers the implementer's design rationales
Fourth planted-defect failure mode: the implementer's self-report said
'noted mild structural duplication; left unabstracted per YAGNI' and the
reviewer deferred to that framing, rating the duplication no finding at
all. The pre-judging keeps relocating — controller prompt, then reviewer
calibration, now the implementer's report. Rationales are claims; they
never downgrade severity.
2026-06-15 12:15:06 -07:00
Jesse Vincent
8ac14c0450 Make diff-pasting non-optional for task reviewer dispatch
Adoption was 6/11 reviews on fractals and 0/17 on svelte when phrased
as guidance; reviewers without the diff re-derive it by hand, which is
the single largest remaining reviewer cost. Now a Red Flags Never entry
and a REQUIRED marker on the template placeholder.
2026-06-15 12:15:06 -07:00
Jesse Vincent
3ed554d557 Close the Minor-severity escape hatch
With merged review, a planted verbatim-duplication defect shipped: the
reviewer rated it Minor (YAGNI) under the strict cannot-be-trusted
definition of Important, and the Minor-rolls-up rule meant no fix was
ever dispatched and the final review never saw the finding. Calibration
now names merge-blocking maintainability damage (verbatim duplication,
swallowed errors, assertion-free tests) as Important, and controllers
must paste accumulated Minor findings into the final review dispatch.
2026-06-15 12:15:06 -07:00
Jesse Vincent
4e8edca36e Spec: document cost iterations and the per-task review consolidation 2026-06-15 12:15:06 -07:00
Jesse Vincent
d7726d99dc Merge per-task reviews into one task reviewer (iteration 2)
Iteration-1 profiling: implementers and per-dispatch overhead dominate
(429 of 686 subagent turns; controller coordination is half the dollars
and scales with dispatch count), reviewers are individually lean, and
the controller pasted the diff in only 2 of 22 review dispatches when
the guidance was phrased as optional.

Changes: spec-reviewer-prompt.md + code-quality-reviewer-prompt.md
replaced by task-reviewer-prompt.md (one reviewer, one reading of a
pasted diff, two verdicts: spec compliance //⚠️ and task quality);
one fix dispatch can address both kinds of findings; controller now
runs git diff itself and pastes it (imperative, not optional);
implementers run focused tests while iterating and the full suite once
before committing; flowchart, example, Red Flags, tool tables updated.
The broad final whole-branch review is unchanged.
2026-06-15 12:15:06 -07:00
Jesse Vincent
4c1f1e5cc5 Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence
Round-2 fractals eval regressed to 70min/32.2M tokens (vs round-1's
42.8min/14.5M) while reaching baseline-parity quality. Per-subagent turn
profiling attributed it to: haiku dispatches taking 2-3x the turns of
sonnet (678 of 1197 subagent turns), reviewers re-fetching diffs by hand
(518 Bash calls), and evidence-rule narration. Changes: turn-count-beats-
token-price model guidance; controllers paste small diffs into reviewer
prompts (reviewers then need few or no tool calls); evidence scoped to
findings and would-be-bare-yes checks; Important defined as cannot-trust-
until-fixed with coverage suggestions Minor; fixes dispatched only for
Critical/Important.
2026-06-15 12:15:06 -07:00
Jesse Vincent
7288393773 Add phrase-level pre-judging triggers to reviewer prompt rule
Resumed the offending eval controller session and asked it why it
pre-judged despite the rule being in context. Its retrospective: the
motive was avoiding a review loop, the abstract rule was read but not
applied at the moment it governs, and a phrase-level trigger ('do not
flag', 'at most Minor', 'don't treat X as a defect', 'the plan chose')
would have fired where the principle did not.
2026-06-15 12:15:06 -07:00
Jesse Vincent
254a8e2e32 Red Flags: never tell a reviewer what not to flag or pre-rate severity
Second observed instance: with the Constructing Reviewer Prompts rule
already live, a controller still wrote 'do not treat that duplication as
a defect to fix — the plan chose it; you may note it as a Minor
observation at most' into a quality reviewer dispatch, fabricating plan
intent from the plan's example snippet. Promote the rule to the Red
Flags Never list and name the rationalization.
2026-06-15 12:15:06 -07:00
Jesse Vincent
7c11cee649 Close three review blind spots found by defect tracing
Live eval deliverables shipped five polish defects; tracing each through
the transcripts showed three mechanisms, each now addressed:
- reviewers answered pointed checklist items with unsupported yes
  (evidence rule: every What-to-Check answer needs file:line evidence)
- no reviewer ever saw the design's global constraints (controllers now
  paste binding constraints into task requirements)
- test output noise was invisible everywhere (pristine-output checks in
  implementer self-review and quality review)
2026-06-15 12:15:06 -07:00
Jesse Vincent
b36cf86afd Require explicit model on subagent dispatch
In live eval runs, controllers given judgment-based model selection
stopped passing a model at all; the omitted parameter inherits the
session's top-tier model, silently making every subagent maximally
expensive (one run dispatched 26/26 reviewers on the session model).
2026-06-15 12:15:06 -07:00
Jesse Vincent
06bec17a34 Forbid controllers pre-judging reviewer findings
A live eval run of sdd-quality-reviewer-catches-planted-defect caught the
SDD controller fabricating a plan constraint and instructing the quality
reviewer not to flag the planted DRY violation. The duplication shipped.
Constructing Reviewer Prompts now bans suppression directives alongside
open-ended broadening directives.
2026-06-15 12:15:06 -07:00
Jesse Vincent
236524413b Sync plan: escaped pre() pattern in Task 5 checks block 2026-06-15 12:15:06 -07:00
Jesse Vincent
6e019e0316 Fix plan doc: correct Task 1 grep expectation; sync Task 5 story block 2026-06-15 12:15:06 -07:00
Jesse Vincent
d4bb8d268f Sync plan's Task 5 blocks with review fixes 2026-06-15 12:15:06 -07:00
Jesse Vincent
d519ba65fd SDD controller: reviewer prompt budgets, ⚠️ handling, final-review pointer, model judgment 2026-06-15 12:15:06 -07:00
Jesse Vincent
d32a56dc32 Implementer prompt: re-run covering tests after fixing review findings 2026-06-15 12:15:06 -07:00
Jesse Vincent
994bc26d2a Scope spec reviewer's Your Job wording to the diff 2026-06-15 12:15:06 -07:00
Jesse Vincent
d5850df1bc Spec reviewer: judge from the diff, grounded skepticism, ⚠️ verdict channel 2026-06-15 12:15:06 -07:00
Jesse Vincent
b5edd40d2c Use bare placeholder names in quality reviewer prompt body 2026-06-15 12:15:06 -07:00
Jesse Vincent
6a02446953 Make per-task quality reviewer prompt self-contained and task-scoped 2026-06-15 12:15:06 -07:00
Jesse Vincent
042d238b26 Add implementation plan for task-scoped review dispatch 2026-06-15 12:15:06 -07:00
Jesse Vincent
cf81ad2ac3 Harden review-dispatch spec per adversarial review findings 2026-06-15 12:15:06 -07:00
Jesse Vincent
cb0dbeb095 Add design spec: task-scoped review dispatch for SDD 2026-06-15 12:15:06 -07:00
Drew Ritter
9eb452afe7 chore: bump evals submodule to claude transcript-capture fix
Bumps evals 7f8e80c -> db37d5f (superpowers-evals#16): the claude launcher now
sets CLAUDE_CODE_FORCE_SESSION_PERSISTENCE=1 so nested interactive claude
(>=2.1.176) persists its transcript — restoring claude capture (verdicts +
cost/token data) on the latest CLI (2.1.177) with no version pin. Also folds in
the audit_liveness ruff/ty cleanup and the B1 audit-doc correction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 15:17:11 -07:00
Drew Ritter
93f2ce91b8 Fix companion stop metadata and token permissions 2026-06-11 13:53:06 -07:00
Drew Ritter
e9ee6c5b4d Harden Windows browser launcher 2026-06-11 13:53:06 -07:00
Drew Ritter
5415cb8ccf Fix Windows lifecycle validation 2026-06-11 13:53:06 -07:00
Drew Ritter
1c21a91e01 Align visual companion docs with shipped scope 2026-06-11 13:53:06 -07:00
Drew Ritter
441335ee3e Fix companion test cleanup and argv assertions 2026-06-11 13:53:06 -07:00
Drew Ritter
377192f7a1 Harden companion platform tests 2026-06-11 13:53:06 -07:00
Drew Ritter
5eea0d09d7 Fix companion lifecycle test ownership metadata 2026-06-11 13:53:06 -07:00
Drew Ritter
a6a4cd85b9 Harden companion stop ownership proof 2026-06-11 13:53:06 -07:00
Drew Ritter
8034176801 Isolate companion fallback tokens 2026-06-11 13:53:06 -07:00
Drew Ritter
2bab677ba7 Fix server test fallback cleanup 2026-06-11 13:53:06 -07:00
Drew Ritter
c4cde1eed9 Harden root screen containment 2026-06-11 13:53:06 -07:00
Drew Ritter
5f3b317741 Plan visual companion final hardening fixup 2026-06-11 13:53:06 -07:00
Drew Ritter
7bb6af2f67 Tighten visual companion hardening spec 2026-06-11 13:53:06 -07:00
Drew Ritter
4f88b89c75 Document visual companion final hardening fixup 2026-06-11 13:53:06 -07:00
Drew Ritter
c7d7e3550f Harden companion Windows lifecycle coverage 2026-06-11 13:53:06 -07:00
Drew Ritter
a2e67bbd9b Harden brainstorm companion auth regressions 2026-06-11 13:53:06 -07:00
Drew Ritter
fe812c418f Document visual companion auth hardening plan 2026-06-11 13:53:06 -07:00
Jesse Vincent
f4d1788ffb fix(brainstorm-server): fix auth-integration bugs from full-branch review
A second adversarial review of the merged branch found that combining the
session-key auth with the feature work created real bugs the (vacuous) tests
missed:

- [Critical] GET /files/ (empty name) resolved to CONTENT_DIR and crashed the
  process with uncaught EISDIR — newly reachable because the query-stripping
  refactor turns /files/?key=... into /files/. Reject non-regular-file names.
- [High] --open opened a KEYLESS url, which the auth gate 403s — the headline
  feature landed on the error page. Open the keyed url.
- [High] Same-port restart regenerated the token (port persisted, token not), so
  the open tab's old cookie 403'd and never reconnected — contradicting the
  documented promise. Persist the token (BRAINSTORM_TOKEN_FILE / .last-token)
  alongside the port.
- [Medium] Token sat in world-readable server-info/server.log (0644 in /tmp).
  umask 077 in start-server.sh + mode 0600 on server-info/.last-token.
- [Medium] touchActivity() ran before the auth check, so unauthenticated requests
  defeated the idle timeout. Count activity only after authorization.
- [Low] COOKIE_NAME embedded the pre-fallback port; derive it from the actual
  bound port (also prevents a cross-server cookie-jar collision on fallback).

Tests added/strengthened (previously passed vacuously): /files/ no-crash; the
auto-open url carries the key and is reachable (200); restart reuses the same key
not just the port; unauthenticated requests don't reset the idle clock.
Full suite green (ws-protocol 32, helper 12, auth 13, server 29, lifecycle 8,
stop-server 4); restart smoke confirms same port+key and old URL -> 200.
2026-06-11 13:53:06 -07:00
Jesse Vincent
4341c3f4d5 test(brainstorm-server): thread session key through tests after auth merge
Integrating the per-session-key auth onto the same branch as the dotfile and
lifecycle work: two tests added after the auth commit opened WebSockets without a
key (server.test.js dotfile-reload, lifecycle.test.js idle-shutdown), which the
auth gate now resets. Pass ?key=/BRAINSTORM_TOKEN in both. Full suite green:
ws-protocol 32, helper 12, auth 13, server 28, lifecycle 7, stop-server 4.
2026-06-11 13:53:06 -07:00
Jesse Vincent
c64c4ea6f4 feat(brainstorm-server): gate every endpoint behind a per-session key
The companion server is reachable by any local browser tab (default loopback
bind) and by any host that can route to it (remote --host bind). It served
screens, files, and accepted event-injecting WebSocket connections with no
authentication, so a malicious browser tab or a direct remote client could read
brainstorm content or inject events that the agent reads as the user's input
(prompt injection into a live session).

Generate a per-session secret token, carry it in the served URL as ?key=, and
mirror it into an HttpOnly SameSite=Strict per-port cookie on first load so
same-origin subresources and the WebSocket handshake authenticate automatically.
Every HTTP request and WebSocket upgrade now requires a valid key (query or
cookie, constant-time compared); unauthenticated requests get a friendly 403
explaining they need the full URL. A secret authenticates the client uniformly
across loopback, tunnel, and remote binds and defeats DNS rebinding, which a
Host/Origin allowlist cannot.

Also guard handleMessage against a null JSON payload that crashed the process.

Tests: new auth.test.js (13 cases) covering the key on /, /files/*, and WS plus
cookie bootstrap and the null-payload guard; server.test.js threads the key;
ws-protocol.test.js + auth.test.js wired into npm test.

Closes #1014
Refs #1110, #1553, #1504
2026-06-11 13:53:06 -07:00
Jesse Vincent
de05e020d8 docs(brainstorm): catalog visual companion issues; choose session-key for security
Records the triage of open issues/PRs touching the brainstorm companion server
and the decision to protect it with a per-session secret key (supersedes the
Host/Origin allowlist approach) so remote-connected users are covered, not just
loopback.
2026-06-11 13:53:06 -07:00
Jesse Vincent
eee4f87471 fix(brainstorm-server): tie stop-server PID check to the session's port
The node+server.cjs command match (from the adversarial review) still matched any
unrelated node process running a file named server.cjs. When we recorded the
bound port (state/server-info) and lsof is available, additionally require the
PID to be the process actually LISTENING on this session's port — which rules out
a different project's server.cjs / editor task runner that recycled the stale
PID. Falls back to the command match when the port or lsof isn't available.

Test: a 'node server.cjs' process not listening on the recorded port is spared.

Refs #1703
2026-06-11 13:53:06 -07:00
Jesse Vincent
bac46a5dcb fix(brainstorm-server): address adversarial review findings
From a two-reviewer adversarial pass:

- [High] EADDRINUSE fallback clobbered the shared .last-port: onListen wrote the
  bound port unconditionally, so a fallback to a random port overwrote the
  preferred port another live session still owns — stranding that session's open
  tab forever. Now persist only when we bound the preferred port (not on
  fallback). The fallback test now asserts .last-port integrity (teeth-verified).

- [Medium] maybeOpenBrowser ran the URL through a shell (exec + JSON.stringify),
  which does NOT neutralize $(...) in a url-host. Platform launchers now use
  execFile with the URL as an argv element (no shell). The operator-set
  BRAINSTORM_OPEN_CMD path stays shell-based (trusted input).

- [Medium] --open was a silent no-op on native Windows (no win32 branch). Added.

- [Medium] helper.js reconnect/status/tombstone had only substring-grep tests.
  Added behavioral tests driving the state machine against a mocked browser:
  Reconnecting+backoff (500->1000->2000), tombstone after the grace period, and
  reload-on-recovery.

- [Low] status pill showed a false 'Connected' before the socket opened; now
  starts 'Connecting…' until onopen.

Not changed (flagged): stop-server.sh's PID-ownership check still matches any
'node ... server.cjs' (narrow residual — a recycled PID onto an unrelated node
server.cjs); robust fix needs fragile cross-platform process introspection.
2026-06-11 13:53:06 -07:00
Jesse Vincent
daa41c0670 feat(brainstorming): offer the visual companion just-in-time; harden lifecycle guidance
Move the companion consent from an upfront, anticipatory offer to the first
moment a question would genuinely be clearer shown than told. If no visual
question ever arises, it's never offered. On approval the agent starts the
server with --open, so the user's browser opens to the first screen — the pop is
tied to that approval, never unsolicited.

Also hardens visual-companion.md: confirming the server is alive (server-info
present, server-stopped absent) before referring to the URL is now a required
step; restart with the same --project-dir reuses the port so the open tab
reconnects on its own (paused overlay while down); idle default corrected to 4h.

NOTE: SKILL.md is behavior-shaping content — this flow change should be
eval-tested (writing-skills adversarial pressure test) before merge.

Refs #1237, #1037
2026-06-11 13:53:06 -07:00
Jesse Vincent
0d37ff6505 feat(brainstorm-server): opt-in auto-open of the browser on the first screen
When the user approves the visual companion, open their browser automatically the
first time a screen is actually ready to show — rather than at startup (just the
waiting page) or making them open the URL by hand.

Opt-in and gated on approval: off unless BRAINSTORM_OPEN is set (start-server.sh
--open, which the agent passes only after the user agrees to use the companion).
Even then it fires once, and is skipped if a browser is already connected, on a
non-loopback/remote bind, or when headless. Launcher is the platform default
(open / xdg-open / WSL cmd.exe) or BRAINSTORM_OPEN_CMD; best-effort, never fatal.

lifecycle.test.js: opens once on the first screen when approved; does NOT open
without approval.

Closes #755
Refs #759
2026-06-11 13:53:06 -07:00
Jesse Vincent
13da997ac7 feat(brainstorm-server): reuse the same port on session restart
When the companion idle-shuts-down and the agent restarts it, a fresh random
port meant the user's open browser tab pointed at a dead URL. Persist the bound
port per project and prefer it on the next start, so the restarted server comes
up on the same port and the open tab's reconnect just works.

- start-server.sh exports BRAINSTORM_PORT_FILE=<project>/.superpowers/brainstorm/
  .last-port for project sessions (not /tmp).
- server.cjs prefers an explicit BRAINSTORM_PORT, else the recorded port, else
  random; writes the actually-bound port back; and on EADDRINUSE (preferred port
  still in use) falls back to a random port once instead of crashing.

lifecycle.test.js: restart reuses the recorded port; a taken preferred port
falls back to a random one without crashing.

Refs #1237
2026-06-11 13:53:06 -07:00
Jesse Vincent
31a0de857b feat(brainstorm-companion): resilient reconnect, live status, paused overlay
The injected client reconnected on a fixed 1s timer with no feedback: if the
laptop slept or the server restarted, the page showed 'Connected' over a dead
socket and silently queued events. And when the server stopped, the user got a
bare connection-refused with no explanation.

helper.js now:
- reconnects with exponential backoff (500ms, doubling, capped at 30s; reset on
  open), with an onerror->close handler, nulls the socket on close, and clears a
  pending timer before scheduling another;
- drives the frame status pill Connected/Reconnecting/Disconnected via a
  --status-color custom property (frame-template.html);
- after ~15s disconnected, shows a self-styled 'Companion paused' overlay
  (tombstone) explaining the companion stopped and will reconnect automatically;
- on recovery from a tombstoned outage (e.g. server restarted on the same port)
  reloads to pick up the restarted server's current screen.

The reconnect-backoff is an exported pure function; helper.test.js unit-tests it
(doubling + cap progression) and asserts the status/tombstone/reconnect wiring.
DOM behaviour is verified live.

Refs #856, #1237
2026-06-11 13:53:06 -07:00
Jesse Vincent
c292421627 feat(brainstorm-server): 4h configurable idle timeout; close WS on shutdown
The companion shut down after only 30 minutes idle — too short for real
brainstorming, where a single question can sit far longer. And shutdown() never
closed upgraded WebSocket sockets, so an open browser connection could keep the
Node process alive after it was supposed to exit.

- Default idle timeout raised to 4 hours, configurable via BRAINSTORM_IDLE_TIMEOUT_MS
  and start-server.sh --idle-timeout-minutes (validated positive integer).
- Reported as idle_timeout_ms in the server-started JSON / server-info.
- shutdown() now destroys all client sockets so the process exits even with an
  open WebSocket.
- Watchdog check interval is configurable (BRAINSTORM_LIFECYCLE_CHECK_MS, default
  60s) so the lifecycle can be tested without minute-long waits.

Adds lifecycle.test.js (configured timeout reported; idle shutdown exits despite
an open WS — teeth-verified; the start-server flag). Wires ws-protocol,
lifecycle, and stop-server suites into npm test.

Closes #1237
Refs #1689
2026-06-11 13:53:06 -07:00
Jesse Vincent
9b00cc298d fix(brainstorm-server): verify PID ownership before stopping
stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a
reboot or PID wraparound the pid file can point at an unrelated, live process —
which we would then kill.

Verify the PID is actually our server (a running 'node ... server.cjs') before
signalling it. If ownership can't be proven, fail closed: remove the stale pid
file and report {status: stale_pid} without killing anything. Real servers still
stop ({status: stopped}); a missing pid file still reports not_running.

Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real
server is stopped, and a missing pid file.

Refs #1703
2026-06-11 13:53:06 -07:00
Jesse Vincent
88fe1e7e15 fix(brainstorm-server): ignore macOS resource-fork dotfiles
On macOS (and ExFAT/SMB volumes) the OS writes ._<name>.html sidecar files
holding binary resource-fork metadata. These end with .html, so they passed the
content filter and could be picked as the newest screen — serving binary garbage
to the browser instead of the mockup — or fetched via /files/.

Skip dotfiles (leading '.') at all four sites that list or serve content:
getNewestScreen, the /files/ endpoint, the known-files seed, and the fs.watch
handler. Tests cover serving (/ and /files/) and the watch path (a ._ file must
not trigger a reload).

Refs #950
2026-06-11 13:53:06 -07:00
Drew Ritter
e6c983888f chore(evals): bump submodule to SUP-333 boundary + plumbing scenarios (7f8e80c)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 13:45:59 -07:00
Jesse Vincent
74f85a7709 fix(writing-skills): hang backfire mechanism on the separated prohibition-vs-recipe comparison (NEW-4); control comparison stated as trend 2026-06-11 12:11:37 -07:00
Jesse Vincent
b148b648eb fix(writing-skills): scope empirical claims, honest noise reporting, conditionalize micro-test checklist line
Adversarial review findings 1/3/9: the head-to-head result is now scoped
to its context (dispatch-prompt guidance) with an explicit micro-test-your-
own-case instruction; the nuance-clause result is reported as
consistent->noisy rather than 'measurably dilutes'; the checklist line is
scoped to behavior-shaping guidance and the micro method no longer assumes
raw API access.
2026-06-11 12:11:37 -07:00
Jesse Vincent
3e565ca2ad feat(writing-skills): form-selection table + micro-test wording method
RED battery (35 opus authoring samples against the current skill) showed
authors default to prohibition+rationalization-table for composition-
shaping problems (T1: 5/5), where that form measurably backfires
(prohibition 4.4 vs 3.6 no-guidance control vs 3.0 recipe restatement
errors), and design only full-subagent verification with no wording
micro-tests, no mandatory no-guidance control, no manual inspection of
automated matches, no variance signal (T7: 5/5).

Adds: Match the Form to the Failure (failure-type -> form table, nuance/
exemption rules), scope note on Bulletproofing, Micro-Test Wording
subsection, two checklist lines. Deliberately narrow: T3/T4/T5/T6 RED
samples showed Iron Law / elicit-first behavior already strong.
2026-06-11 12:11:37 -07:00
Drew Ritter
0cb1960068 chore(evals): bump submodule for Claude Haiku target 2026-06-10 16:31:16 -07:00
Jesse Vincent
f55642e0dd Require contributors to disclose authoring environment and target dev
Add a mandatory self-identification disclosure (model, harness, harness
version, all installed plugins) to the PR template and all three issue
templates, and document the requirement in the contributor guidelines.
We weigh contributions differently depending on what produced them:
content reasoned from documentation is held to a different bar than work
grounded in a real session.

Also state explicitly, in both CLAUDE.md and the PR template, that all
PRs must target the dev branch rather than main.
2026-06-08 22:14:34 -07:00
Drew Ritter
ae1eefb7f9 chore(evals): bump submodule to --scenarios filter (ff3ee83)
Adds `run-all --scenarios` for resuming a scenario subset across the Code
Assist rate-limit windows. Follows the agy rate-limit fix (79f9963).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:46:00 -07:00
Drew Ritter
617168aff5 chore(evals): bump submodule to antigravity rate-limit fix (79f9963)
Serialize antigravity against the Gemini Code Assist rate limit
(max_concurrency=1), diagnose 429/RESOURCE_EXHAUSTED honestly instead of as
auth, fail-fast on a latched window, and tolerant preflight OK match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 16:27:35 -07:00
Rahul
d7c260a978 fix(brainstorming): cap websocket frame payloads 2026-06-02 11:24:02 -07:00
Drew Ritter
f3f0789c5c Add shell lint script 2026-06-01 19:48:28 -07:00
Drew Ritter
16a1719988 Tighten Kimi plugin porting coverage 2026-06-01 19:41:58 -07:00
Drew Ritter
c74c22daa7 docs: restore Kimi direct install command 2026-06-01 19:41:58 -07:00
Drew Ritter
773bbf61d6 docs: simplify Kimi README install steps 2026-06-01 19:41:58 -07:00
Drew Ritter
6b76158550 fix: wire Kimi plugin into release metadata 2026-06-01 19:41:58 -07:00
Drew Ritter
7fec40bb55 fix: align Kimi manifest with supported fields 2026-06-01 19:41:58 -07:00
qer
2a8e54735b feat: add Kimi Code plugin manifest 2026-06-01 19:41:58 -07:00
Matt Van Horn
f776394360 feat(subagent-dev): add TDD RED evidence to implementer report format
Add a conditional TDD Evidence field to the implementer report format so controllers can verify RED and GREEN output when TDD was required.

The field asks for the command run, relevant RED/GREEN output, and the expected RED failure reason rather than raw full logs.

Fixes #994.
2026-06-01 16:15:05 -07:00
Drew Ritter
7301c81b4d docs(windows): trim polyglot hook implementation copy 2026-06-01 16:07:01 -07:00
dev_Hakaze
9d3e68a5ad docs(windows): update polyglot hook docs
Rewrite the Windows polyglot hook documentation to match the current run-hook.cmd dispatcher and update the porting guide cross-reference.\n\nFixes #1653.
2026-06-01 15:57:30 -07:00
nestorluiscamachopaz
81c3052416 fix: foreground mode saves node PID and clears OWNER_PID on Windows/MSYS2
Verified on real Windows Git Bash: lifecycle test passed 12/12, manual start/stop released the port, and no brainstorm node processes remained.
2026-06-01 14:26:22 -07:00
nawfal
c879454a0d fix(finishing-a-development-branch): remove gh-specific PR creation instruction
Per obra's guidance on #1609: remove the github-specific instruction rather
than replacing it with a platform-detection table. Agents already know their
forge tooling; the skill only needs to cover the push step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 13:58:22 -07:00
nawfal
ff213eb2cf fix(finishing-a-development-branch): detect remote platform before creating PR/MR
Replaces hardcoded `gh pr create` in Option 2 with a platform-neutral
note: check `git remote get-url origin` first, then use gh (GitHub),
glab (GitLab), or fall back to the compare URL for unknown platforms.

Adds matching Red Flag entry so agents don't skip the detection step.

Fixes #1609

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 13:58:22 -07:00
Jesse Vincent
da00e59958 feat: add Antigravity CLI (agy) support
Antigravity (Google's `agy` CLI) installs the existing Superpowers plugin
directly:

    agy plugin install https://github.com/obra/superpowers

agy imports the bundled skills and runs the plugin's SessionStart hook, so
using-superpowers bootstraps from the first message — verified on agy 1.0.3:
a fresh session given "Let's make a react todo list" auto-triggers the
brainstorming skill instead of writing code. agy discovers skills natively
and, having no Skill tool, loads them by reading SKILL.md with view_file.

No scaffold, installer, or generated context file is needed. This adds only:

- README.md: an Antigravity install section + Quickstart link
- skills/using-superpowers/SKILL.md: reference to the agy tool mapping
- skills/using-superpowers/references/antigravity-tools.md: action->tool
  mapping for agy (view_file, write_to_file, invoke_subagent, manage_task,
  and skill loading via view_file on SKILL.md)
- tests/antigravity/: structural test for the tool mapping, mirroring
  tests/pi/
2026-06-01 11:42:09 -07:00
Jesse Vincent
deceaec78d docs: add 'Porting Superpowers to a New Harness' guide
An evergreen guide for adding support for a new harness (IDE, CLI, or agent
runner). Teaches the invariants — automatic session-start bootstrap, skill
discovery/invocation, tool mapping, the acceptance test — and points at the
closest reference integration shape (shell-hook, in-process plugin,
instructions-file / declared context file) to copy. Covers discovery, build,
local install, tmux-driven verification, distribution, and PR submission, with a
live reference-integration index and a gotchas appendix.

Two non-negotiable rules: (1) never edit skill bodies; (2) everything ships
through the harness's own install mechanism — never edit the user's config. When
a plugin installer strips undeclared files, declare the bootstrap as a recognized
component (a manifest contextFileName-style context file the installer preserves
and the harness loads every session), generated at install time from the live
SKILL.md + tool mapping. Surfaced-skill-description bootstrap is the softer
fallback.

Hardened against real end-to-end ports (Antigravity CLI): shapes can compose; a
fork doesn't inherit its parent's behavior; a hook system != a usable
session-start event; verify @-includes AND context-file preservation with a
marker; web-search the docs and study existing plugins; reverse-engineer
undocumented harnesses; print/headless modes may hang; workspace-trust gates
stall tmux; declared context files survive plugin install while undeclared files
are stripped; skills-path registration is per-harness.
2026-06-01 10:07:38 -07:00
Jesse Vincent
e63e44bedf fix(sync-to-codex-plugin): exclude /.pi/ so the pi extension doesn't leak into the Codex plugin
The .pi/ directory holds the pi-harness extension (.pi/extensions/superpowers.ts),
which is tracked (not git-ignored), so the git-ignored-path exclusion helpers
never caught it. It was also missing from the static EXCLUDES list alongside the
other harness dotdirs (.opencode, .cursor-plugin, .claude-plugin), so a sync
would rsync pi's files into the Codex plugin distribution. Add /.pi/ to EXCLUDES.
2026-05-29 15:05:38 -07:00
Jesse Vincent
8811b0f2d7 Revert "Make visual-companion.md script paths skill-rooted, not plugin-rooted"
This reverts commit e9f5188289.
2026-05-23 17:01:46 -07:00
Jesse Vincent
d48bec6cc3 Revert "Probe per-user Git Bash and Scoop before falling back to PATH on Windows"
This reverts commit a8f0738e3a.
2026-05-23 17:00:15 -07:00
Jesse Vincent
a8f0738e3a Probe per-user Git Bash and Scoop before falling back to PATH on Windows
Stock Windows 10/11 ships C:\Windows\System32\bash.exe (the WSL
launcher) as the first match for `where bash`. WSL's bash cannot
execute Windows-style script paths, so when Git Bash is installed
outside the two standard system locations -- specifically the
per-user "Only for me" Git for Windows installer
(%LOCALAPPDATA%\Programs\Git) or a Scoop install
(%USERPROFILE%\scoop\apps\git\current\usr\bin) -- run-hook.cmd
silently fails: WSL prints "Windows Subsystem for Linux must be
updated", the script returns 0, and Superpowers' SessionStart
bootstrap is never injected. From the user's perspective skills
auto-trigger inconsistently or not at all, with no surfaced error.

Add explicit probes for both locations between the existing system-
wide Git for Windows checks and the `where bash` fallback. Also add
a comment to the fallback documenting the WSL-launcher trap so future
maintainers understand why the explicit probes must come first.

Verified on a Windows 11 VM (dockur/windows 11, Git Bash 2.x, Node
22):
- System Git present: existing probe still matches (no regression)
- System Git absent, per-user Git present via junction: new probe
  matches, hook produces valid 6422-byte JSON, exit 0
- All Git probes absent: confirmed WSL trap fires
  ("Windows Subsystem for Linux must be updated") and the hook exits 0
  silently, demonstrating the original bug

Existing tests/hooks/test-session-start.sh still passes on macOS (7/7).

Reported by @ytchenak in #1607.

Co-authored-by: ytchenak <ytchenak@users.noreply.github.com>
Closes #1607.
2026-05-23 16:58:56 -07:00
Jesse Vincent
f36bad5b78 Pipe SessionStart hook printf through cat to absorb EPIPE on Windows
On Windows + Git Bash, the SessionStart hook prints a confusing
diagnostic at every startup ("printf: write error: Permission denied")
when Claude Code closes the hook's stdout pipe before the printf has
finished writing. The hook still runs to completion and context still
gets injected, but the diagnostic surfaces every session because
Git Bash's printf reports EPIPE as "Permission denied" (not "Broken
pipe" like Linux) and our `set -euo pipefail` lets that error escape.

Piping each printf through `cat` makes the external cat process the
recipient of any SIGPIPE / EPIPE. cat's failure does not propagate to
the parent bash under pipefail because cat is the last command in the
pipeline and exits cleanly when the pipe stays open long enough to
hold the data. On macOS/Linux the cat passthrough is transparent (no
behavior change, no measurable cost).

Verified:
- Existing tests/hooks/test-session-start.sh: 7/7 pass on macOS
- Manual run on Windows 11 + Git Bash 5.2 + Node 22 produces valid JSON,
  clean stderr, and exit 0
- JSON output is byte-identical to the unpatched hook

Reported by @silvertakana in #1612, attribution preserved in the
Co-authored-by trailer below — this is the same fix shape the original
PR proposed.

Co-authored-by: silvertakana <silvertakana@users.noreply.github.com>
Closes #1612.
2026-05-23 16:55:46 -07:00
Nick Galatis
21ad401e90 fix(systematic-debugging): defuse Claude Code ultrathink keyword scanner trigger (#1558)
The "Signals You're Doing It Wrong" bullet in systematic-debugging/SKILL.md
contains the literal token Claude Code's runtime scans for in tool result
bodies. Every Skill-tool invocation of this skill caused the harness to
inject a spurious system-reminder claiming the user requested deeper
reasoning, silently bumping every session into extended thinking.

Replace the bullet's spelling so the contiguous letter sequence the scanner
matches is broken with a hyphen. The signal text remains recognizable to
the agent and the documented action ("Question fundamentals, not just
symptoms") is unchanged.

Fixes obra/superpowers#1283
2026-05-23 16:51:00 -07:00
Jesse Vincent
e9f5188289 Make visual-companion.md script paths skill-rooted, not plugin-rooted
Issue #1134: agents reading visual-companion.md see bare commands like
`scripts/start-server.sh`, correctly identify the plugin install
directory, then look for `<plugin>/scripts/start-server.sh` instead of
`<plugin>/skills/brainstorming/scripts/start-server.sh`. The file
doesn't exist at the plugin-rooted path, so the agent concludes the
visual companion isn't available and falls back to text-only
brainstorming.

Multiple independent reproductions in the issue thread, plus one user's
agent self-reported: "I assumed the scripts folder was in the root
directory of the plugin, it didn't realize it could have been talking
about the skill folder itself."

Change all `scripts/<file>` references in visual-companion.md to
`skills/brainstorming/scripts/<file>`. Agents that correctly identify
the plugin root will now join to the right path.

Closes #1134.
2026-05-23 16:42:13 -07:00
Jesse Vincent
eef50b96f0 Align windows-lifecycle test with current brainstorm server layout
The test had drifted behind three server implementation changes and no
longer ran against the actual server:

- Server entrypoint renamed from server.js to server.cjs; the test still
  invoked node on server.js and failed with MODULE_NOT_FOUND.
- Server state moved to a state/ subdirectory (state/server-info,
  state/server.pid); the test still waited on .server-info and wrote
  .server.pid at the session root.
- Owner-PID startup validation now keeps the server running when the
  owner PID is dead at startup: it logs owner-pid-invalid, disables
  owner monitoring, and falls back to the idle timeout. The test still
  expected the server to self-terminate within 60s of a dead-at-startup
  owner.

Update file/path references to match the current server, and rewrite
the dead-at-startup test to assert the current behavior: server
survives, log contains owner-pid-invalid, log does not contain a
spurious "owner process exited" line.

Verified locally: 9 passed, 0 failed, 3 skipped (Windows-only).
2026-05-23 16:36:45 -07:00
Jesse Vincent
e1d3f71e0d Convert curly to square brackets in code-reviewer.md placeholders
Matches the style used by the spec-reviewer-prompt.md and
code-quality-reviewer-prompt.md call sites, which already use square
brackets ([VAR] or [VAR — description]). No semantic change — these
placeholders are filled in by the controller; nothing programmatic
substitutes them.
2026-05-23 16:14:24 -07:00
Jesse Vincent
b2212dc913 Scope spec reviewer to task diff and make reviewers read-only
Two problems with the SDD reviewer prompts on dev:

- spec-reviewer-prompt.md never received a git range, so the
  general-purpose subagent had to crawl the entire codebase to find what
  changed. Reporter measured 20-33 minute spec reviews on simple tasks
  (#1538).
- Neither reviewer prompt told the subagent that review is read-only.
  A spec reviewer running `git checkout <parent-sha>` for historical
  comparison silently detached HEAD on the controller's branch, then
  subsequent task commits accumulated on the detached HEAD and were
  effectively orphaned (#1543, reproduced independently in #1543's
  thread).

Add a Git Range to Review section to spec-reviewer-prompt.md that
mirrors the one code-reviewer.md already has, plus a Read-Only Review
section in both reviewer prompt templates stating the principle: do
not mutate the working tree, the index, HEAD, or branch state. Allow
inspecting other revisions via a separate temporary worktree, so the
read-only rule does not block legitimate historical comparison.

Closes #1538.
Closes #1543.
2026-05-23 16:14:05 -07:00
Jesse Vincent
180f009090 @mhat reported that his claude got confused about 'debugging' being named as a skill in the bootstrap 2026-05-21 17:23:25 -04:00
Drew Ritter
8c1f7c5dae Bump superpowers-evals submodule 2026-05-14 16:32:24 -07:00
Drew Ritter
201f945838 [codex] support native Codex plugin hooks (#1540)
* docs: specify Codex native hooks parity

* docs: refine Codex hooks spec after review

* docs: record Codex hook contract spike

* docs: plan Codex native hooks implementation

* feat: support Codex native plugin hooks

* test: add Codex native hook drill coverage

* Simplify Codex hook entrypoint
2026-05-14 15:59:38 -07:00
Drew Ritter
49bf5ad6dc Align Pi mapping with action vocabulary 2026-05-13 17:58:46 -07:00
Drew Ritter
4bd0973879 Bump evals submodule for Pi backend 2026-05-13 17:58:46 -07:00
Jesse Vincent
452f1ed40b chore: keep pi extension under .pi 2026-05-13 17:58:46 -07:00
Jesse Vincent
cafbc5a4bd feat: add pi superpowers package extension 2026-05-13 17:58:46 -07:00
Jesse Vincent
da35948daf docs: plan pi extension and evals work 2026-05-13 17:58:46 -07:00
Drew Ritter
d4d99117f2 Tighten cross-platform tool references 2026-05-13 17:46:28 -07:00
Jesse Vincent
01034bcf8f Phase E: action-language tool vocabulary
Replace Claude-Code-specific tool names in skill prose, prompt
templates, and OpenCode-facing docs with action-language descriptions
that resolve to each runtime's native tool via the per-platform refs.

Changes by category:

- Prose mentions ("Use TodoWrite to track...", "Use Task tool with
  general-purpose type") → action language ("Track each item as a
  todo", "Dispatch a general-purpose subagent")

- Prompt template headers (6 files): "Task tool (general-purpose):"
  → "Subagent (general-purpose):" — preserves the type information
  without naming Claude Code's specific dispatch tool

- DOT flowchart node labels: "Invoke Skill tool" → "Invoke the
  skill"; "Create TodoWrite todo per item" → "Create a todo per
  item"

- OpenCode INSTALL.md and docs/README.opencode.md: replace the old
  "TodoWrite → todowrite, Task → @mention" mapping (which both
  taught a vocabulary skills no longer use AND was wrong about
  @mention being a real OpenCode syntax) with an action-language
  mapping verified against the installed OpenCode CLI's tool
  inventory.

The platform-tools refs landed in Phase B already document each
runtime's resolution; skills now speak in the actions those refs
map. Tool names that genuinely belong only in the per-platform
dispatch section ("In Claude Code: Use the `Skill` tool") and the
Claude-Code-specific Bash run_in_background flag note in
visual-companion remain — those are intentional carve-outs.
2026-05-13 17:46:28 -07:00
Jesse Vincent
b87a5e4721 Phase D: cross-runtime tweaks (visual-companion, executing-plans, test)
Misc platform/runtime statements and adjacencies that don't fit the
prose, config-ref, README-ordering, or tool-vocabulary buckets:

- visual-companion frame template: rename CSS/HTML id #claude-content
  → #frame-content. The id is purely styling — nothing external
  references it. The brainstorm-server test that asserted the old
  string is updated in lockstep.

- visual-companion launch instructions: add a Copilot CLI section
  alongside Claude Code, Codex, and Gemini CLI; combine the Claude
  Code (macOS / Linux) and (Windows) sections so heading style
  matches the other (non-OS-qualified) platforms.

- visual-companion: "Use Write tool" → "Use your file-creation tool"
  for the cat/heredoc warning. The prohibition is what's load-
  bearing, not the tool name.

- executing-plans/SKILL.md: list all subagent-capable runtimes
  (Claude Code, Codex CLI, Codex App, Copilot CLI, Gemini CLI) and
  point at the per-platform tool refs as the source of truth.

- executing-plans/SKILL.md: relative path "using-superpowers/
  references/" → "../using-superpowers/references/" to resolve
  correctly from the executing-plans/ directory.

No bundled spec doc here — Phase D was scope-extension work that
took place across rounds, with no standalone spec authored.
2026-05-13 17:46:28 -07:00
Jesse Vincent
e47d6f4f85 Phase C: alphabetize README platform listings + spec
Quickstart link list and the per-harness install sub-sections both
reorder to strict alphabetical:

  Claude Code, Codex App, Codex CLI, Cursor, Factory Droid,
  Gemini CLI, GitHub Copilot CLI, OpenCode

Three blocks moved (Codex App swaps with Codex CLI; Cursor moves up
two slots; GitHub Copilot CLI moves up one). Claude Code stays first
by alphabetical chance.

Each install sub-section's content is byte-identical pre/post —
only the positions change. Quickstart anchors verified against the
new heading order.
2026-05-13 17:46:28 -07:00
Jesse Vincent
5c0402736e Phase B: config-file refs + per-platform tool refs + spec
Two structural changes:

1. Generalize CLAUDE.md-specific guidance:
   - "Project-specific conventions (put in CLAUDE.md)" → "(put in
     your instructions file)" in writing-skills/SKILL.md
   - "(explicit CLAUDE.md violation)" → "(explicit instruction-file
     violation)" in receiving-code-review/SKILL.md
   - The instruction-priority list in using-superpowers/SKILL.md
     stays inclusive (CLAUDE.md, GEMINI.md, AGENTS.md) — that's
     load-bearing, not a substitution opportunity.

2. Per-platform tool reference files at skills/using-superpowers/
   references/{claude-code,codex,copilot,gemini}-tools.md. Each ref
   documents:
   - The runtime's preferred instructions file (CLAUDE.md, AGENTS.md,
     GEMINI.md, etc.) and how it loads
   - The runtime's personal-skills directory + cross-runtime
     ~/.agents/skills/ path where applicable
   - Action-language → tool-name mapping table

Tool names and table content reflect the source-verified state from
direct inspection of openai/codex, google-gemini/gemini-cli,
sst/opencode, and the installed @github/copilot package. Filenames
and behaviors are sourced from each runtime's official docs.

Files in this commit also pick up later-phase changes that
accumulated on the same files (using-superpowers/SKILL.md "How to
Access Skills" overhaul, action-language flowchart, refs' final
table content). The bundled spec records original scope.
2026-05-13 17:46:28 -07:00
Jesse Vincent
d0e413b591 Phase A: agent-neutral prose + CSO → SDO + spec
Replace generic third-person "Claude" with "agents" / "your agent"
forms across active skill prose, the README intro, and the vendored
anthropic-best-practices.md reference. Carve-outs preserved:
historical attribution paths, the "Variant C: Claude.AI Emphatic
Style" example label, model identifiers (Haiku/Sonnet/Opus), and the
"In Claude Code:" per-platform skill-dispatch list.

Coined-term rename: "Claude Search Optimization (CSO)" → "Skill
Discovery Optimization (SDO)" in writing-skills/SKILL.md.

Files in this commit also pick up later-phase changes that
accumulated on the same files (dispatching-parallel-agents code-
example transformation, writing-skills numbering and path fixes).
The bundled spec at docs/superpowers/specs/ records the original
scope and the carve-outs.

README.md gets only its prose change here; the alphabetization
lands in Phase C's commit.
2026-05-13 17:46:28 -07:00
Drew Ritter
d25618db58 Move eval harness to submodule (#1541) 2026-05-13 12:25:41 -07:00
Drew Ritter
3d6dc90c6d fix(tdd): link testing anti-patterns reference (#1532)
Fixes #1529.
2026-05-12 17:22:42 -07:00
Drew Ritter
a152bb3932 [codex] replace Circle K signal with generic review guidance (#1531)
* Remove Circle K signal from review skill

* Add generic review hesitation guidance

* Use Jesse wording for review hesitation guidance
2026-05-12 17:22:19 -07:00
Drew Ritter
3dfb376268 fix: remove global worktree path fallback (#1476) 2026-05-12 10:24:45 -07:00
Drew Ritter
491df7360c fix(using-git-worktrees): repair skipped Step 2 numbering (#1522) 2026-05-11 17:50:01 -07:00
fuleinist
9088f563e7 fix: remove stale Cursor plugin refs 2026-05-11 17:04:35 -07:00
Stable Genius
d4cf61b4c8 fix(writing-skills): use markdown link for testing methodology reference 2026-05-11 16:51:00 -07:00
Drew Ritter
7f02ccd91b evals: use pre-commit hooks 2026-05-06 15:47:39 -07:00
Drew Ritter
35e42a16ce evals: add Gemini 2.5 Flash backend 2026-05-06 15:47:39 -07:00
Drew Ritter
58082d04f8 evals: drop drill source marker 2026-05-06 15:47:39 -07:00
Drew Ritter
3dc0ea6876 evals: remove unreleased wave scenarios 2026-05-06 15:47:39 -07:00
Jesse Vincent
0bf37499b4 Address adversarial review findings
- evals/README.md, evals/CLAUDE.md: fix uv install command from
  'uv sync --dev' to 'uv sync --extra dev'. Drill's pyproject.toml
  uses [project.optional-dependencies], so --dev is a no-op for
  pytest/ruff/ty; --extra dev is the correct invocation.
- tests/claude-code/run-skill-tests.sh: drop test-requesting-code-review.sh
  from integration_tests array (file deleted earlier in this branch).
- tests/claude-code/README.md: replace test-requesting-code-review.sh
  section with test-worktree-native-preference.sh (the worktree test
  is kept; the code-review test was lifted into drill).
- docs/testing.md, CLAUDE.md: remove "Copilot CLI" from the harness
  list. evals/backends/ has claude*, codex, gemini configs but no
  copilot.yaml, so the claim was unsupported.

Adversarial review credit: reviewer #2 found four legitimate issues
(uv-sync, run-skill-tests stale ref, README stale ref via #1, and
Copilot CLI fabrication); reviewer #1 found two distinct issues
(run-skill-tests + tests/claude-code/README.md). Reviewer #2 wins
this round.
2026-05-06 15:47:39 -07:00
Jesse Vincent
f7c5312265 docs: introduce evals/ as the canonical skill-behavior eval harness
- docs/testing.md split into Plugin tests + Skill behavior evals.
  Plugin tests section enumerates the bash tests that survive
  (kept by drill-coverage analysis or as describe-skill tests).
- CLAUDE.md adds Eval harness section pointing at evals/.
- README.md Contributing section mentions evals/ alongside tests/.
- .gitignore adds evals/{results,.venv,.env} as belt-and-suspenders
  (evals/.gitignore covers these locally; root-level entries help
  tooling that does not recurse into nested ignore files).
2026-05-06 15:47:39 -07:00
Jesse Vincent
f5175fb31a docs: annotate dated artifacts referencing lifted bash tests
- RELEASE-NOTES.md: note that test-requesting-code-review.sh and
  test-document-review-system.sh were lifted into drill scenarios
  on 2026-05-06; references are preserved as dated artifacts.
- docs/superpowers/plans/2026-03-23-codex-app-compatibility.md:
  note that tests/skill-triggering/ was lifted into drill scenarios
  on 2026-05-06; the run-all.sh reference is a dated artifact.

Subagent second-pass scrub confirmed no other active references in
the tree (excluding evals/ and the spec/plan for this work itself).
2026-05-06 15:47:39 -07:00
Jesse Vincent
45c7dc2cce tests: annotate three kept bash tests with drill coverage notes
- test-worktree-native-preference.sh: drill covers PRESSURE phase only;
  RED + GREEN baselines have no drill counterpart and are kept so
  the RED-GREEN-REFACTOR validation remains rerunnable end-to-end.
- test-subagent-driven-development-integration.sh: drill covers the
  YAGNI subset (forbidden exports + reviewer-as-gate). Bash adds
  >=3 commits, >=2 subagent dispatches, TodoWrite usage, test file
  existence check, and token-budget telemetry. Kept until drill
  scenario covers those or they are retired.
- test-subagent-driven-development.sh: tests agent's ability to
  *describe* SDD (string matches against expected keywords). Drill
  scenarios test behavior, not description-recall. Kept by design.

Subagent verification recorded in commit messages of subsequent
deletions; gap analyses driving these annotations are also in the
verification subagent reports for the gating sweep.
2026-05-06 15:47:39 -07:00
Jesse Vincent
39d29a6c28 tests: remove test-requesting-code-review.sh (covered by drill code-review-catches-planted-bugs)
Subagent verification: every bash assertion (skill invocation,
subagent dispatch, SQL injection flagged, credential handling
flagged, no merge approval) maps to drill verify checks. Drill is
stricter: bundles severity (Critical/Important) into the same
criteria as the finding itself (bash split severity into a separate
test). Setup parity covered (src/db.js with string concat + identity
hash, two commits).

The drill scenario header explicitly says it is the
"cross-harness, semantically-judged replacement for the bash test."
2026-05-06 15:47:39 -07:00
Jesse Vincent
f1d2005de3 tests: remove test-document-review-system.sh (covered by drill spec-reviewer-catches-planted-flaws)
Subagent verification: every bash assertion (TODO in Requirements
section flagged, "specified later" deferral flagged, Issues section
present, did-not-approve verdict) maps to drill verify.criteria
entries. Setup parity covered by setup.assertions (test-feature-design.md
exists with TODO + 'specified later' content). Drill is stricter:
asserts tool-called Agent (subagent dispatch) which the bash test
did not check.
2026-05-06 15:47:39 -07:00
Jesse Vincent
c0a65f1b4d tests: remove subagent-driven-dev fixtures (covered by drill sdd-go-fractals + sdd-svelte-todo)
The bash test had ZERO output assertions — it just ran claude -p
and printed token usage. Drill's scenarios are strictly more
rigorous:

go-fractals: skill-called SDD + tool-called Agent + go test ./...
passes + cmd/fractals/main.go exists + >=4 commits + LLM criteria
verifying real SDD workflow.

svelte-todo: skill-called SDD + tool-called Agent + npm test passes
+ playwright e2e passes + package.json + svelte.config.js or
vite.config.ts + >=4 commits + LLM criteria.

design.md and plan.md are byte-identical between bash fixtures and
drill fixtures (evals/fixtures/sdd-{go-fractals,svelte-todo}/).
Drill's setup helper (scaffold_sdd_*) forces git init -b main
(stricter than bash's reliance on init.defaultBranch). The
.claude/settings.local.json from bash scaffold.sh is unnecessary
for drill since permissions are managed via backend YAML.

Subagent verification: SAFE TO DELETE for both.
2026-05-06 15:47:39 -07:00
Jesse Vincent
f10cddac0d tests: remove run-claude-describes-sdd.sh (covered by drill mid-conversation-skill-invocation)
Subagent verification: every bash assertion (Skill tool invoked +
specific skill name 'subagent-driven-development' loaded after the
agent describes it conversationally in turn 1) maps to the drill
scenario's skill-called assertion + criteria paragraph requiring
the skill to fire in direct response to the second user message.
Drill additionally asserts tool-called Agent (subagent dispatch)
which is stricter than the bash test.

Other runners in tests/explicit-skill-requests/ (haiku, multiturn,
extended-multiturn) and their prompt files are preserved — they
have no drill coverage and exercise different behaviors.
2026-05-06 15:47:39 -07:00
Jesse Vincent
371f41596b tests: remove skill-triggering bash prompts (covered by drill triggering-* scenarios)
Subagent verification confirmed each prompt's intent matches its
corresponding drill scenario's turns[].intent verbatim, and each
scenario has both a deterministic skill-called assertion and a
semantic LLM criterion confirming the matching skill was loaded
(actually a stronger check than the bash test, which only confirms
the skill fires anywhere in the stream).

All 6 prompts deleted. The runner had no remaining prompts to drive,
so run-test.sh and run-all.sh deleted as well.
2026-05-06 15:47:39 -07:00
Jesse Vincent
6f0adebe96 evals: drop SUPERPOWERS_ROOT setup step from README/CLAUDE
The cli.py helper now defaults the env var. Mention as override only.
2026-05-06 15:47:39 -07:00
Jesse Vincent
fd5b53cb85 evals: drop SUPERPOWERS_ROOT from codex/gemini required_env
These backends only read SUPERPOWERS_ROOT via engine.py/setup.py's
os.environ access, which the new cli.py default helper supplies
automatically. claude*.yaml keep SUPERPOWERS_ROOT in required_env
because they interpolate ${SUPERPOWERS_ROOT} into --plugin-dir args.
2026-05-06 15:47:39 -07:00
Jesse Vincent
be0357f98a evals: default SUPERPOWERS_ROOT to parent of evals/ if unset
Adds _set_superpowers_root_default() to drill/cli.py, called at
module import after load_dotenv(). PROJECT_ROOT resolves to evals/
post-lift; its parent is the superpowers repo root, which is the
correct value for SUPERPOWERS_ROOT.

Existing env values are respected as overrides via os.environ.setdefault.

Tests:
- helper sets default when var is unset
- helper does not override when var is already set
2026-05-06 15:47:39 -07:00
Jesse Vincent
3b412a3836 Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 15:47:39 -07:00
Jesse Vincent
2e46e9590d Plan: lift drill into superpowers as evals/
15-task implementation plan derived from the design spec at
docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md.

Each task is bite-sized (2-5 min steps) with exact commands, exact
file paths, and exact code where required. Subagent verification
gates per the spec are written out as concrete prompt templates.

Self-review:
- Spec coverage: every spec section maps to a task
- Placeholder scan: no TBD/TODO/placeholder/fill-in-later language
- Type consistency: helper named _set_superpowers_root_default
  consistently; drill SHA recorded in evals/.drill-source-sha
  consistently
2026-05-06 15:47:39 -07:00
Jesse Vincent
58f821314d Spec: address adversarial review findings
Two parallel reviewers raised legitimate issues against the lift-drill-
into-evals spec. Updates:

- Coverage map for tests/explicit-skill-requests/ corrected: 6 run-*.sh
  scripts + prompts, not "2 scenarios cover all". Several scripts
  (Haiku, multi-turn, please-use-brainstorming, use-systematic-debugging)
  have no drill counterpart and stay.
- tests/claude-code/test-subagent-driven-development.sh marked as
  meta/documentation test (asks agent to describe SDD); no drill
  scenario covers description tests; defaults to keep.
- Path-defaults section now shows verified evidence: PROJECT_ROOT
  resolves to evals/ post-move; only claude*.yaml substitute
  ${SUPERPOWERS_ROOT} in args (codex/gemini use it via os.environ
  in pre-run hooks); helper invocation order specified (after
  load_dotenv, before click definitions).
- Step 2 copy uses explicit rsync excludes (.git, .venv, results,
  .env, __pycache__, *.egg-info, .private-journal); checksum-level
  verification rather than file-count.
- Drill SHA recorded at copy time in commit message and
  evals/.drill-source-sha for divergence detection.
- evals/tests/ pytest suite added to verification protocol.
- Reference scrub list expanded: RELEASE-NOTES.md,
  docs/superpowers/plans/, .codex-plugin/ (corrected from .codex/),
  lefthook.yml. Excluded dirs called out (node_modules/, .venv/,
  evals/).
- Historical plan docs / RELEASE-NOTES handling: annotate, don't
  rewrite.
- evals/lefthook.yml move documented (drill ships its own;
  contributors run cd evals && lefthook run pre-commit manually).
- PR description checklist includes archival action item for
  obra/drill post-merge.

False finding rejected: svelte-todo fixture is complete on disk
(design.md + plan.md + scaffold.sh present); reviewer #1 #3 dropped.
2026-05-06 15:47:39 -07:00
Jesse Vincent
81472cc9e6 Spec: lift drill into superpowers as evals/
Records scope, branching, architecture, deletion gate, verification
protocol, path/config edits, migration ordering, and post-implementation
verification. Frames CI integration, scenario co-location, and Python
package rename as deferred work.

Per-file deletion of bash tests under superpowers/tests/ is gated by a
subagent that compares each bash assertion to its drill scenario's
verify block. Default keeps the bash test if any assertion is unmatched.

Branching: independent off dev (f/evals-lift), not stacked on
f/cross-platform.
2026-05-06 15:47:39 -07:00
robotsnh
b4363df1b9 docs: turned the dash in "- Jesse" into an escape sequence (#1474)
Replaced the bullet point next to "Jesse" in the sponsorship section of the `README` into a dash. This is needed so the `README` renders properly on markdown viewers.
2026-05-06 11:22:19 -07:00
19 changed files with 71 additions and 650 deletions

View File

@@ -9,7 +9,7 @@
{
"name": "superpowers",
"description": "Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques",
"version": "6.0.0",
"version": "5.1.0",
"source": "./",
"author": {
"name": "Jesse Vincent",

View File

@@ -1,7 +1,7 @@
{
"name": "superpowers",
"description": "Core skills library for Claude Code: TDD, debugging, collaboration patterns, and proven techniques",
"version": "6.0.0",
"version": "5.1.0",
"author": {
"name": "Jesse Vincent",
"email": "jesse@fsck.com"

View File

@@ -1,6 +1,6 @@
{
"name": "superpowers",
"version": "6.0.0",
"version": "5.1.0",
"description": "An agentic skills framework & software development methodology that works: planning, TDD, debugging, and collaboration workflows.",
"author": {
"name": "Jesse Vincent",

View File

@@ -2,7 +2,7 @@
"name": "superpowers",
"displayName": "Superpowers",
"description": "Core skills library: TDD, debugging, collaboration patterns, and proven techniques",
"version": "6.0.0",
"version": "5.1.0",
"author": {
"name": "Jesse Vincent",
"email": "jesse@fsck.com"

View File

@@ -1,6 +1,6 @@
{
"name": "superpowers",
"version": "6.0.0",
"version": "5.1.0",
"description": "An agentic skills framework and software development methodology.",
"author": {
"name": "Jesse Vincent",

View File

@@ -2,13 +2,6 @@
Superpowers is a complete software development methodology for your coding agents, built on top of a set of composable skills and some initial instructions that make sure your agent uses them.
## We're Hiring!
We're hiring someone to help out full time with Superpowers community and code work.
You can read about the job at https://primeradiant.com/jobs/superpowers-community-engineer/
If this sounds like someone you know, definitely send them our way.
## Quickstart
Give your agent Superpowers: [Claude Code](#claude-code), [Antigravity](#antigravity), [Codex App](#codex-app), [Codex CLI](#codex-cli), [Cursor](#cursor), [Factory Droid](#factory-droid), [Gemini CLI](#gemini-cli), [GitHub Copilot CLI](#github-copilot-cli), [Kimi Code](#kimi-code), [OpenCode](#opencode), [Pi](#pi).
@@ -25,9 +18,15 @@ Next up, once you say "go", it launches a *subagent-driven-development* process,
There's a bunch more to it, but that's the core of the system. And because the skills trigger automatically, you don't need to do anything special. Your coding agent just has Superpowers.
## Commercial Services
If you're using Superpowers in enterprise and could benefit from commercial support, additional tooling, or managed spending, please don't hesitate to drop us a line at sales@primeradiant.com.
## Sponsorship
If Superpowers has helped you do stuff that makes money and you are so inclined, I'd greatly appreciate it if you'd consider [sponsoring my opensource work](https://github.com/sponsors/obra).
Thanks!
\- Jesse
## Installation
@@ -274,10 +273,6 @@ Superpowers updates are somewhat coding-agent dependent, but are often automatic
MIT License - see LICENSE file for details
## Visual companion telemetry
Because skills and plugins don't provide any feedback to creators, we have no idea how many of you are using Superpowers. By default, the Prime Radiant logo on brainstorming's optional visual companion feature is loaded from our website. It includes the version of Superpowers in use. It does not include any details about your project, prompt, or coding agent. We don't see your clicks or anything about what you're building. This helps us have a rough idea of how many folks are using Superpowers and which version of Superpowers they're using. It's 100% optional. To disable this, set the environment variable `SUPERPOWERS_DISABLE_TELEMETRY` to any true value. Superpowers also honors Claude Code's `DISABLE_TELEMETRY` and `CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC` opt-outs.
## Community
Superpowers is built by [Jesse Vincent](https://blog.fsck.com) and the rest of the folks at [Prime Radiant](https://primeradiant.com).

View File

@@ -1,103 +1,5 @@
# Superpowers Release Notes
## v6.0.0 (2026-06-16)
Superpowers 6.0 is a big release. The headline is a rewrite of how `subagent-driven-development` reviews each task — cheaper, stricter, and harder to game.
While these numbers won't hold on every harness and for every workload, in our evals, Claude Code and Codex produce similar high-quality results roughly twice as fast and while spending almost 50% fewer tokens.
It also adds three new harnesses (Kimi Code, Pi, and Antigravity), gives the brainstorming visual companion a better security model, and rewrites a number of skills' tool calls to be significantly more vendor-neutral.
### Visible Changes
- **The two per-task reviewer prompts became one.** `spec-reviewer-prompt.md` and `code-quality-reviewer-prompt.md` are gone, replaced by a single `task-reviewer-prompt.md`. If you dispatch the old files directly, switch to the new one.
- **The legacy global worktree directory is gone.** `using-git-worktrees` and `finishing-a-development-branch` no longer use `~/.config/superpowers/worktrees/`. Worktrees now land in the project — an existing `.worktrees/` or `worktrees/` if you have one, otherwise a fresh `.worktrees/` — unless you say otherwise.
### New Harness Support
Superpowers now runs on three more harnesses. Each ships its own bootstrap, a tool-mapping reference, and tests, and each gets its own install section in the README.
- **Kimi Code** — a plugin manifest, install docs, and manifest tests; install from Kimi's marketplace or straight from the repo. (initial manifest by @qer)
- **Pi** — a session-start extension that registers the skills and injects the `using-superpowers` bootstrap. Pi has native skills, so it needs no compatibility shim.
- **Antigravity (`agy`)** — installs the plugin directly and bootstraps from the first message; verified end-to-end against the standard "make a react todo list" acceptance test.
### Subagent-Driven Development
A long run of cost-and-quality experiments on real projects reshaped how the controller reviews each task. The old flow ran two reviewers per task and leaned on the controller's judgment for model choice and severity, and both turned out to be expensive and easy to game. The new flow runs one reviewer per task, hands work off as files instead of pasted text, and takes several judgment calls away from the controller.
- **One reviewer per task, two verdicts.** A single `task-reviewer-prompt.md` reads the task's diff once and returns both a spec-compliance verdict and a quality verdict, so one fix pass clears both. A new "can't verify from the diff" verdict flags requirements that live in untouched code, for the controller to check itself. (#1538, #1543)
- **One broad review at the end.** The run finishes with a single whole-branch review on the most capable model, instead of re-reviewing everything task by task.
- **Plans get a pre-flight read.** Before the first task, the controller checks the plan for internal conflicts — and for anything the plan asks for that a reviewer would flag as a defect — and raises it all at once, rather than stumbling into it mid-run.
- **Diffs and task text move as files.** A pasted diff parks itself permanently in the most expensive context, and a reviewer without one rebuilds it by hand — the single biggest reviewer cost. Two new scripts, `task-brief` and `review-package`, write the task text and the review diff to files for the subagent to read.
- **Every dispatch states its model.** Left to choose, controllers stopped naming a model at all — and an unnamed model quietly inherits the session's most expensive one, so one run put all 26 of its reviewers on the top tier. The templates now require a model, with guidance that reaches for cheaper tiers when the work allows.
- **The controller can't tell a reviewer what to ignore.** Real runs caught controllers coaching reviewers to skip a finding or call it "Minor at most," and the flaw shipped. Suppressing findings and pre-rating severity are now banned outright, and a defect the plan itself mandates gets reported for you to decide on rather than waved through.
- **Reviewers are read-only and skeptical of rationales.** Review no longer touches the working tree or branch — a reviewer running `git checkout` had been orphaning later commits — and an implementer's "I left this unabstracted on purpose" no longer talks a reviewer out of a real finding.
- **Stronger evidence and reporting.** Reviewers back each answer with a file and line, the implementer's report moves to a file and carries red/green evidence when TDD applies, and a progress ledger lets a controller that loses its context resume instead of redoing finished work. (#994)
### Writing Plans
Plans now carry the structure the controller and reviewers used to re-derive on every dispatch.
- **A Global Constraints block** lists the rules that bind every task — version floors, dependency limits, naming and copy, exact values — copied in verbatim, so they actually reach the implementers and reviewers downstream.
- **A per-task Interfaces block** names exactly what each task consumes and produces, so an implementer who sees only its own task still knows its neighbors' contracts.
- **Right-sizing guidance** keeps a task at the size that earns its own test cycle and a reviewer's pass, folding setup, config, and docs into the task that needs them. In testing, a plan written this way needed one round of fixes where the control needed two to four — and the control shipped a real bug.
### Brainstorming Visual Companion
The visual companion is a small web server the agent opens alongside the conversation. It had no authentication at all, so on a shared or remote machine anyone who could reach the port could read your brainstorm — or inject events the agent treats as your input. This release gives it a real security model and makes it survive restarts and dropped connections.
- **A per-session key now guards everything.** The agent's URL carries a one-time key, the browser tucks it into a tab-scoped cookie, and every request and WebSocket connection has to present it. This closes the door to stray local tabs and routable remote hosts alike, including the DNS-rebinding case an origin allowlist can't catch. (Closes #1014)
- **The file server stays in its sandbox.** It refuses symlinks, dotfiles, and any path that climbs out of the content directory, ignores macOS resource-fork files, and sends the usual no-store and deny-framing headers. Files that hold the session key are written owner-only.
- **The companion is offered only when it helps.** The skill raises it the first time a question would read better shown than told, as its own message, and lets a decline stand. Accepting opens your browser to the first screen. (Closes #755)
- **It survives restarts and flaky connections.** Given a project directory, the server keeps the same port and key across restarts, so an open tab simply reconnects. The page reconnects on its own, shows a live status pill, and raises a "paused" overlay while the server is down.
- **Longer idle life, safer shutdown.** The idle timeout went from 30 minutes to 4 hours, and `stop-server.sh` now confirms it owns the right process before signaling, so it never kills an unrelated `node` after a reboot. (#1703)
- **Windows launch hardening** — consolidated shell detection, and Windows now relies on the idle timeout for shutdown, since Node can't track POSIX process ownership across MSYS2.
### Existing Harness Updates
- **Codex** now bootstraps through its own SessionStart hook rather than shared wiring, and the Codex App gained an install section and fuller tool docs (web search, `AGENTS.md`, personal skills). (#1540)
- **OpenCode** got an action-based tool mapping across its plugin, install doc, and README, plus a bootstrap-caching test.
- **Cursor**'s manifest dropped its `agents` and `commands` entries, since those directories no longer exist.
### One Set of Skills, Every Harness
The skills used to speak Claude Code's dialect — "use the Task tool," "put it in CLAUDE.md." This release rewrites that vocabulary in terms of what you're actually doing ("dispatch a subagent," "your instructions file") and adds a per-harness reference that maps each action to the right tool, checked against each runtime. Prose that named "Claude" now says "your agent."
- **A tool reference per harness** at `skills/using-superpowers/references/`, covering Claude Code, Codex, Copilot, Gemini, Pi, and Antigravity.
- **`finishing-a-development-branch` went forge-neutral** — it no longer hardcodes `gh pr create`, so agents push with whatever forge tooling they have. (#1609)
- **One rename:** "Claude Search Optimization" is now "Skill Discovery Optimization," since the technique isn't Claude-specific.
### Writing Skills
Two additions for skill authors.
- **Match the Form to the Failure** — a short table for picking the right kind of guidance. A flat "don't do X" works for discipline slips but backfires when the problem is the *shape* of an output, where a worked example does better. The table, and a tighter scope on the existing rationalization section, steer authors to the form that actually helps.
- **Micro-Test Wording** — a cheap way to check a phrasing before committing to it: sample it a handful of times against a no-guidance control and read every result by hand, treating run-to-run variance as a warning sign.
### Testing
Skill-behavior testing moved out of `tests/` into a new `evals/` submodule built on "drill," which runs real Claude Code, Codex, and Gemini sessions and judges them with an LLM. Several in-tree bash suites retired once a stricter drill scenario covered them; the few with no equivalent stayed. From here on, `tests/` holds plugin-code tests and `evals/` holds skill-behavior tests, and `docs/testing.md` explains the split. New backends reach Antigravity, Pi, and more models, and new shell-lint and pre-commit checks guard the harness. (#1541)
### Bug Fixes
- **systematic-debugging no longer forces every session into extended thinking.** One bullet held the exact keyword Claude Code scans for, quietly tripping the switch on every session that loaded the skill. A hyphen breaks the keyword; the text still reads. (#1283, by @Nick Galatis)
- **The Windows SessionStart hook stopped printing a write error every session** — each `printf` now routes through `cat` to absorb the broken pipe, and the output is otherwise unchanged. (#1612, reported by @silvertakana)
- **Windows foreground mode** tracks the right process and clears its owner PID on MSYS2. (by @nestorluiscamachopaz)
- **The `using-superpowers` bootstrap** no longer lists "debugging" as a skill that doesn't exist. (reported by @mhat)
- **The TDD skill** links the testing anti-patterns reference. (#1532, #1529; link fix #1474 by @Stable Genius)
- **`using-git-worktrees`** fixes its step numbering and drops stale Cursor references. (#1522, and by @fuleinist)
- **The Codex review skill** swaps a private in-joke for plain guidance. (#1531)
### Documentation & Contributor Guidelines
- **A guide to porting Superpowers to a new harness** (`docs/porting-to-a-new-harness.md`) lays out the three pieces every integration needs and the one rule that makes or breaks it: load the bootstrap at session start.
- **Every PR and issue now discloses how it was made** — model, harness, version, and installed plugins, or a note that it was written by hand. We weigh a contribution differently depending on what produced it. PRs also target `dev`, not `main`. The PR template, all three issue templates, and a new platform-support template carry this.
### Contributors
Thanks to @mattvanhorn, @nawfal, @Nick Galatis, @silvertakana, @nestorluiscamachopaz, @qer, @mhat, @Stable Genius, @fuleinist, @dev_Hakaze, @robotsnh, Rahul, and @arittr.
## v5.1.0 (2026-04-30)
### Removals

2
evals

Submodule evals updated: 70a245c36c...db37d5fbec

View File

@@ -1,6 +1,6 @@
{
"name": "superpowers",
"description": "Core skills library: TDD, debugging, collaboration patterns, and proven techniques",
"version": "6.0.0",
"version": "5.1.0",
"contextFileName": "GEMINI.md"
}

View File

@@ -1,6 +1,6 @@
{
"name": "superpowers",
"version": "6.0.0",
"version": "5.1.0",
"description": "Superpowers skills and runtime bootstrap for coding agents",
"type": "module",
"main": ".opencode/plugins/superpowers.js",

View File

@@ -9,7 +9,7 @@
*
* This template provides a consistent frame with:
* - OS-aware light/dark theming
* - Header branding and connection status
* - Fixed header and selection indicator bar
* - Scrollable main content area
* - CSS helpers for common UI patterns
*
@@ -63,37 +63,34 @@
}
/* ===== FRAME STRUCTURE ===== */
.brand { display: flex; align-items: center; min-width: 0; overflow: hidden; color: var(--text-secondary); line-height: 1; }
.brand a { color: inherit; text-decoration: none; display: flex; align-items: center; gap: 0.5rem; min-width: 0; max-width: 100%; line-height: 1; }
.brand-copy { display: block; min-width: 0; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; line-height: 1; transform: translateY(-1px); }
.brand-logo { display: block; height: 1em; width: auto; max-width: 180px; flex-shrink: 0; filter: invert(1); }
@media (prefers-color-scheme: dark) {
.brand-logo { filter: none; }
.header {
background: var(--bg-secondary);
padding: 0.5rem 1.5rem;
display: flex;
justify-content: space-between;
align-items: center;
border-bottom: 1px solid var(--border);
flex-shrink: 0;
}
.status { font-size: 0.7rem; color: var(--status-color, var(--success)); display: flex; align-items: center; gap: 0.4rem; justify-self: end; white-space: nowrap; line-height: 1; }
.status::before { content: ''; width: 6px; height: 6px; background: var(--status-color, var(--success)); border-radius: 50%; }
.header h1 { font-size: 0.85rem; font-weight: 500; color: var(--text-secondary); }
.header .status { font-size: 0.7rem; color: var(--status-color, var(--success)); display: flex; align-items: center; gap: 0.4rem; }
.header .status::before { content: ''; width: 6px; height: 6px; background: var(--status-color, var(--success)); border-radius: 50%; }
.main { flex: 1; overflow-y: auto; }
#frame-content { padding: 2rem; min-height: 100%; }
.header {
.indicator-bar {
background: var(--bg-secondary);
border-bottom: 1px solid var(--border);
border-top: 1px solid var(--border);
padding: 0.5rem 1.5rem;
flex-shrink: 0;
display: grid;
grid-template-columns: minmax(0, 1fr) auto;
align-items: center;
gap: 1rem;
min-height: 42px;
text-align: center;
}
.header .brand { justify-self: start; width: 100%; font-size: 0.75rem; line-height: 1; }
.header .status { grid-column: 2; line-height: 1; }
.header span {
.indicator-bar span {
font-size: 0.75rem;
color: var(--text-secondary);
}
.header .selected-text {
.indicator-bar .selected-text {
color: var(--accent);
font-weight: 500;
}
@@ -199,7 +196,7 @@
</head>
<body>
<div class="header">
<!-- BRANDING -->
<h1><a href="https://github.com/obra/superpowers" style="color: inherit; text-decoration: none;">Superpowers Brainstorming</a></h1>
<div class="status">Connecting…</div>
</div>
@@ -209,5 +206,9 @@
</div>
</div>
<div class="indicator-bar">
<span id="indicator-text">Click an option above, then return to the terminal</span>
</div>
</body>
</html>

View File

@@ -138,6 +138,21 @@
id: target.id || null
});
// Update indicator bar (defer so toggleSelect runs first)
setTimeout(() => {
const indicator = document.getElementById('indicator-text');
if (!indicator) return;
const container = target.closest('.options') || target.closest('.cards');
const selected = container ? container.querySelectorAll('.selected') : [];
if (selected.length === 0) {
indicator.textContent = 'Click an option above, then return to the terminal';
} else if (selected.length === 1) {
const label = selected[0].querySelector('h3, .content h3, .card-body h3')?.textContent?.trim() || selected[0].dataset.choice;
indicator.innerHTML = '<span class="selected-text">' + label + ' selected</span> — return to terminal to continue';
} else {
indicator.innerHTML = '<span class="selected-text">' + selected.length + ' selected</span> — return to terminal to continue';
}
}, 0);
});
// Frame UI: selection tracking

View File

@@ -102,14 +102,6 @@ const URL_HOST = process.env.BRAINSTORM_URL_HOST || (HOST === '127.0.0.1' ? 'loc
const SESSION_DIR = process.env.BRAINSTORM_DIR || '/tmp/brainstorm';
const CONTENT_DIR = path.join(SESSION_DIR, 'content');
const STATE_DIR = path.join(SESSION_DIR, 'state');
const SUPERPOWERS_VERSION = readSuperpowersVersion();
const SUPERPOWERS_BRAND_IMAGE_URL = 'https://primeradiant.com/brand/superpowers-visual-brainstorming-logo.png';
const TELEMETRY_DISABLE_ENV_VARS = [
'SUPERPOWERS_DISABLE_TELEMETRY',
'DISABLE_TELEMETRY',
'CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC'
];
const SUPERPOWERS_TELEMETRY_DISABLED = TELEMETRY_DISABLE_ENV_VARS.some(name => isTruthyEnv(process.env[name]));
let ownerPid = process.env.BRAINSTORM_OWNER_PID ? Number(process.env.BRAINSTORM_OWNER_PID) : null;
// Per-session secret key. The companion is reachable by any local browser tab
@@ -158,22 +150,14 @@ const MIME_TYPES = {
// ========== Templates and Constants ==========
function waitingPage() {
return renderBranding(`<!DOCTYPE html>
const WAITING_PAGE = `<!DOCTYPE html>
<html>
<head><meta charset="utf-8"><title>Brainstorm Companion</title>
<style>
body { font-family: system-ui, sans-serif; padding: 2rem; max-width: 800px; margin: 0 auto; }
h1 { color: #333; } p { color: #666; }
.brand { display: flex; align-items: center; min-width: 0; overflow: hidden; margin-bottom: 1.5rem; color: #666; font-size: 0.9rem; line-height: 1; }
.brand a { color: inherit; text-decoration: none; display: flex; align-items: center; gap: 0.5rem; min-width: 0; max-width: 100%; line-height: 1; }
.brand-copy { display: block; min-width: 0; overflow: hidden; text-overflow: ellipsis; white-space: nowrap; line-height: 1; transform: translateY(-1px); }
.brand-logo { display: block; height: 1em; width: auto; max-width: 180px; filter: invert(1); }
</style>
<style>body { font-family: system-ui, sans-serif; padding: 2rem; max-width: 800px; margin: 0 auto; }
h1 { color: #333; } p { color: #666; }</style>
</head>
<body><!-- BRANDING --><h1>Brainstorm Companion</h1>
<p>Waiting for the agent to push a screen...</p></body></html>`);
}
<body><h1>Brainstorm Companion</h1>
<p>Waiting for the agent to push a screen...</p></body></html>`;
const FORBIDDEN_PAGE = `<!DOCTYPE html>
<html>
@@ -205,55 +189,13 @@ const helperInjection = '<script>\n' + helperScript + '\n</script>';
// ========== Helper Functions ==========
function readSuperpowersVersion() {
try {
const packageJson = JSON.parse(
fs.readFileSync(path.join(__dirname, '../../..', 'package.json'), 'utf-8')
);
return String(packageJson.version || 'unknown');
} catch (e) {
return 'unknown';
}
}
function isTruthyEnv(value) {
if (!value) return false;
const normalized = String(value).trim().toLowerCase();
if (!normalized) return false;
return !['0', 'false', 'no', 'off'].includes(normalized);
}
function escapeHtmlText(value) {
return String(value)
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;');
}
function brandMarkup() {
const version = escapeHtmlText(SUPERPOWERS_VERSION);
const text = SUPERPOWERS_TELEMETRY_DISABLED
? 'Prime Radiant Superpowers v' + version
: 'Superpowers v' + version;
const logo = SUPERPOWERS_TELEMETRY_DISABLED
? ''
: '<img class="brand-logo" src="' + SUPERPOWERS_BRAND_IMAGE_URL + '?v=' + encodeURIComponent(SUPERPOWERS_VERSION) + '" alt="Prime Radiant" referrerpolicy="no-referrer" decoding="async">';
return '<div class="brand"><a href="https://github.com/obra/superpowers">' + logo + '<span class="brand-copy">' + text + '</span></a></div>';
}
function renderBranding(html) {
return html.split('<!-- BRANDING -->').join(brandMarkup());
}
function isFullDocument(html) {
const trimmed = html.trimStart().toLowerCase();
return trimmed.startsWith('<!doctype') || trimmed.startsWith('<html');
}
function wrapInFrame(content) {
return renderBranding(frameTemplate).replace('<!-- CONTENT -->', content);
return frameTemplate.replace('<!-- CONTENT -->', content);
}
function getNewestScreen() {
@@ -399,7 +341,7 @@ function handleRequest(req, res) {
const screenFile = getNewestScreen();
let html = screenFile
? (raw => isFullDocument(raw) ? raw : wrapInFrame(raw))(fs.readFileSync(screenFile, 'utf-8'))
: waitingPage();
: WAITING_PAGE;
if (html.includes('</body>')) {
html = html.replace('</body>', helperInjection + '\n</body>');

View File

@@ -28,7 +28,7 @@ A question *about* a UI topic is not automatically a visual question. "What kind
The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content to `screen_dir`, the user sees it in their browser and can click to select options. Selections are recorded to `state_dir/events` that you read on your next turn.
**Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, connection status, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.
**Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, selection indicator, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.
## Starting a Session
@@ -138,7 +138,7 @@ Use `--url-host` to control what hostname is printed in the returned URL JSON.
## Writing Content Fragments
Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, connection status, and all interactive infrastructure).
Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, selection indicator, and all interactive infrastructure).
**Minimal example:**
@@ -184,7 +184,7 @@ The frame template provides these CSS classes for your content:
</div>
```
**Multi-select:** Add `data-multiselect` to the container to let users select multiple options. Each click toggles the item's selected styling.
**Multi-select:** Add `data-multiselect` to the container to let users select multiple options. Each click toggles the item. The indicator bar shows the count.
```html
<div class="options" data-multiselect>

View File

@@ -4,7 +4,8 @@
# through the controller's context.
#
# Usage: task-brief PLAN_FILE TASK_NUMBER [OUTFILE]
# Default OUTFILE: <git-dir>/sdd/task-<N>.<unique>/task-<N>-brief.md.
# Default OUTFILE: <git-dir>/sdd/task-<N>-brief.md — unique per repo
# instance, so concurrent sessions cannot collide.
set -euo pipefail
if [ $# -lt 2 ] || [ $# -gt 3 ]; then
@@ -22,8 +23,7 @@ else
dir=$(git rev-parse --git-path sdd)
mkdir -p "$dir"
dir=$(cd "$dir" && pwd)
brief_dir=$(mktemp -d "$dir/task-${n}.XXXXXX")
out="$brief_dir/task-${n}-brief.md"
out="$dir/task-${n}-brief.md"
fi
awk -v n="$n" '

View File

@@ -1,309 +0,0 @@
/**
* Tests for the visual companion's Superpowers/Prime Radiant branding.
*/
const { spawn } = require('child_process');
const http = require('http');
const fs = require('fs');
const path = require('path');
const assert = require('assert');
const REPO_ROOT = path.join(__dirname, '../..');
const SERVER_PATH = path.join(REPO_ROOT, 'skills/brainstorming/scripts/server.cjs');
const PACKAGE_VERSION = JSON.parse(
fs.readFileSync(path.join(REPO_ROOT, 'package.json'), 'utf-8')
).version;
const TOKEN = 'testtoken-branding-0123456789abcdef';
const ASSET_URL = 'https://primeradiant.com/brand/superpowers-visual-brainstorming-logo.png';
function cleanup(dir) {
if (fs.existsSync(dir)) {
fs.rmSync(dir, { recursive: true });
}
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
function startServer({ port, dir, env = {} }) {
cleanup(dir);
return spawn('node', [SERVER_PATH], {
env: {
...process.env,
BRAINSTORM_PORT: String(port),
BRAINSTORM_DIR: dir,
BRAINSTORM_TOKEN: TOKEN,
...env
}
});
}
function waitForServer(server) {
let stdout = '';
let stderr = '';
return new Promise((resolve, reject) => {
const timeout = setTimeout(() => reject(new Error(`Server did not start. stderr: ${stderr}`)), 5000);
server.stdout.on('data', (data) => {
stdout += data.toString();
if (stdout.includes('server-started')) {
clearTimeout(timeout);
resolve();
}
});
server.stderr.on('data', (data) => { stderr += data.toString(); });
server.on('error', reject);
});
}
function fetchHtml(port) {
return new Promise((resolve, reject) => {
const headers = { Cookie: `brainstorm-key-${port}=${TOKEN}` };
http.get(`http://localhost:${port}/`, { headers }, (res) => {
let body = '';
res.on('data', chunk => { body += chunk; });
res.on('end', () => resolve(body));
}).on('error', reject);
});
}
function writeFragment(dir) {
const contentDir = path.join(dir, 'content');
fs.mkdirSync(contentDir, { recursive: true });
fs.writeFileSync(path.join(contentDir, 'screen.html'), '<h2>Pick a layout</h2>');
}
async function withServer(options, fn) {
const server = startServer(options);
try {
await waitForServer(server);
await fn();
} finally {
if (server.exitCode === null && server.signalCode === null) {
server.kill();
await new Promise(resolve => server.once('exit', resolve));
}
await sleep(100);
cleanup(options.dir);
}
}
let passed = 0;
let failed = 0;
async function test(name, fn) {
try {
await fn();
console.log(` PASS: ${name}`);
passed++;
} catch (e) {
console.log(` FAIL: ${name}`);
console.log(` ${e.message}`);
failed++;
}
}
function assertBrandedWithLogo(html) {
assert(
html.includes(`Superpowers v${PACKAGE_VERSION}`),
'branding text should include dynamic package version'
);
assert(
!html.includes(`Superpowers v${PACKAGE_VERSION} by`),
'branding text should not include "by" when the logo is visible'
);
assert(
/<img class="brand-logo"[^>]*>\s*<span class="brand-copy">Superpowers v/.test(html),
'visible logo should appear before the Superpowers version text'
);
assert(
/\.brand a\s*\{[^}]*line-height:\s*1/i.test(html),
'brand row should align the logo and version text by their visual height'
);
assert(
/\.brand a\s*\{[^}]*gap:\s*0\.5rem/i.test(html),
'brand row should keep the logo and version text close together'
);
assert(
/\.brand a\s*\{[^}]*max-width:\s*100%/i.test(html),
'brand link should be constrained so it cannot overlap the status column'
);
assert(
/\.brand\s*\{[^}]*line-height:\s*1/i.test(html),
'brand wrapper should not inherit the page line height'
);
assert(
/\.brand\s*\{[^}]*overflow:\s*hidden/i.test(html),
'brand wrapper should clip before it reaches the status column'
);
}
function assertBrandedFallbackText(html) {
assert(
html.includes(`Prime Radiant Superpowers v${PACKAGE_VERSION}`),
'disabled telemetry should keep plain text Prime Radiant/Superpowers branding'
);
}
function assertTelemetryImage(html) {
const expectedUrl = `${ASSET_URL}?v=${encodeURIComponent(PACKAGE_VERSION)}`;
assert(html.includes(`src="${expectedUrl}"`), 'remote image should use the dedicated main-domain asset with only v=');
assert(!html.includes('event='), 'remote image URL must not include event=');
assert(!html.includes('surface='), 'remote image URL must not include surface=');
assert(!html.includes('launch_id='), 'remote image URL must not include launch_id=');
assert(!html.includes('lid='), 'remote image URL must not include lid=');
}
function assertLogoKeepsTransparentBackground(html) {
assert(
/\.brand-logo\s*\{[^}]*height:\s*1em/i.test(html),
'logo should match the surrounding brand text size'
);
assert(
/\.brand-logo\s*\{[^}]*display:\s*block/i.test(html),
'logo should not reserve inline-image descender space'
);
assert(
/\.brand-copy\s*\{[^}]*line-height:\s*1/i.test(html),
'version text should use the same compact line height as the logo'
);
assert(
/\.brand-copy\s*\{[^}]*min-width:\s*0/i.test(html),
'version text should be allowed to shrink inside the brand row'
);
assert(
/\.brand-copy\s*\{[^}]*transform:\s*translateY\(-1px\)/i.test(html),
'version text should compensate for bottom padding inside the logo asset'
);
assert(
/\.brand-logo\s*\{[^}]*filter:\s*invert\(1\)/i.test(html),
'white logo asset should invert on light backgrounds'
);
assert(
!/\.brand-logo\s*\{[^}]*background:/i.test(html),
'logo should keep its transparent background'
);
assert(
!/\.brand-logo\s*\{[^}]*padding:/i.test(html),
'logo should not rely on a padded backing'
);
}
function assertFramedLogoSupportsDarkTheme(html) {
assert(
/@media\s*\(prefers-color-scheme:\s*dark\)[\s\S]*\.brand-logo\s*\{[^}]*filter:\s*none/i.test(html),
'framed screens should leave the white logo unfiltered in dark mode'
);
}
function assertFramedScreenUsesBrandHeader(html) {
const logoCount = (html.match(/class="brand-logo"/g) || []).length;
assert.strictEqual(logoCount, 1, 'framed screens should render the logo only in the header');
assert(!html.includes('<div class="indicator-bar">'), 'framed screens should not render footer chrome');
assert(
/<div class="header">[\s\S]*<div class="brand">[\s\S]*<div class="status">Connecting…<\/div>/.test(html),
'header should contain branding and connection status'
);
assert(!html.includes('id="indicator-text"'), 'header should not render the selection indicator text');
assert(!html.includes('Click an option above'), 'header should not render the selection instruction');
}
function assertHeaderAvoidsNarrowOverlap(html) {
assert(
/grid-template-columns:\s*minmax\(0,\s*1fr\)\s*auto/i.test(html),
'header should allocate shrinkable space to branding before the status column'
);
assert(
/\.header \.status\s*\{[^}]*grid-column:\s*2/i.test(html),
'status should live in the final fixed-width grid column'
);
assert(
/\.header \.brand\s*\{[^}]*width:\s*100%/i.test(html),
'header brand should fill its grid track so overflow clipping prevents overlap'
);
}
async function main() {
console.log('\n--- Visual Companion Branding ---');
await test('framed screens render versioned Prime Radiant logo by default', async () => {
const port = 3451;
const dir = '/tmp/brainstorm-branding-default';
await withServer({ port, dir }, async () => {
writeFragment(dir);
await sleep(300);
const html = await fetchHtml(port);
assertBrandedWithLogo(html);
assertTelemetryImage(html);
assertLogoKeepsTransparentBackground(html);
assertFramedLogoSupportsDarkTheme(html);
assertFramedScreenUsesBrandHeader(html);
assertHeaderAvoidsNarrowOverlap(html);
});
});
await test('waiting screen renders versioned Prime Radiant logo by default', async () => {
const port = 3452;
const dir = '/tmp/brainstorm-branding-waiting';
await withServer({ port, dir }, async () => {
const html = await fetchHtml(port);
assert(html.includes('Waiting for the agent'), 'waiting page should still render');
assertBrandedWithLogo(html);
assertTelemetryImage(html);
assertLogoKeepsTransparentBackground(html);
});
});
await test('SUPERPOWERS_DISABLE_TELEMETRY=true omits remote image but keeps local branding', async () => {
const port = 3453;
const dir = '/tmp/brainstorm-branding-disabled';
await withServer({ port, dir, env: { SUPERPOWERS_DISABLE_TELEMETRY: 'true' } }, async () => {
writeFragment(dir);
await sleep(300);
const html = await fetchHtml(port);
assertBrandedFallbackText(html);
assert(!html.includes(ASSET_URL), 'disabled telemetry should omit the remote image');
});
});
await test('SUPERPOWERS_DISABLE_TELEMETRY=yes also omits the remote image on the waiting screen', async () => {
const port = 3454;
const dir = '/tmp/brainstorm-branding-disabled-waiting';
await withServer({ port, dir, env: { SUPERPOWERS_DISABLE_TELEMETRY: 'yes' } }, async () => {
const html = await fetchHtml(port);
assertBrandedFallbackText(html);
assert(!html.includes(ASSET_URL), 'disabled telemetry should omit the remote image');
});
});
await test('DISABLE_TELEMETRY=true omits remote image for Claude Code telemetry opt-out', async () => {
const port = 3455;
const dir = '/tmp/brainstorm-branding-claude-disable-telemetry';
await withServer({ port, dir, env: { DISABLE_TELEMETRY: 'true' } }, async () => {
writeFragment(dir);
await sleep(300);
const html = await fetchHtml(port);
assertBrandedFallbackText(html);
assert(!html.includes(ASSET_URL), 'Claude Code telemetry opt-out should omit the remote image');
});
});
await test('CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 omits remote image for Claude Code traffic opt-out', async () => {
const port = 3456;
const dir = '/tmp/brainstorm-branding-claude-disable-nonessential';
await withServer({ port, dir, env: { CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC: '1' } }, async () => {
const html = await fetchHtml(port);
assertBrandedFallbackText(html);
assert(!html.includes(ASSET_URL), 'Claude Code non-essential traffic opt-out should omit the remote image');
});
});
console.log(`\n--- Results: ${passed} passed, ${failed} failed ---`);
if (failed > 0) process.exitCode = 1;
}
main().catch((err) => {
console.error('Test failed:', err);
process.exit(1);
});

View File

@@ -2,7 +2,7 @@
"name": "brainstorm-server-tests",
"version": "1.0.0",
"scripts": {
"test": "node ws-protocol.test.js && node helper.test.js && node browser-launcher.test.js && node auth.test.js && node branding.test.js && node server.test.js && node lifecycle.test.js && bash start-server.test.sh && bash stop-server.test.sh"
"test": "node ws-protocol.test.js && node helper.test.js && node browser-launcher.test.js && node auth.test.js && node server.test.js && node lifecycle.test.js && bash start-server.test.sh && bash stop-server.test.sh"
},
"dependencies": {
"ws": "^8.19.0"

View File

@@ -196,7 +196,7 @@ async function runTests() {
const res = await fetch(`http://localhost:${TEST_PORT}/`);
assert(res.body.includes('<h1>Custom Page</h1>'), 'Should contain original content');
assert(res.body.includes('WebSocket'), 'Should still inject helper.js');
assert(!res.body.includes('<div class="header">'), 'Should NOT wrap in frame template');
assert(!res.body.includes('indicator-bar'), 'Should NOT wrap in frame template');
});
await test('wraps content fragments in frame template', async () => {
@@ -205,7 +205,7 @@ async function runTests() {
await sleep(300);
const res = await fetch(`http://localhost:${TEST_PORT}/`);
assert(res.body.includes('<div class="header">'), 'Fragment should get header chrome');
assert(res.body.includes('indicator-bar'), 'Fragment should get indicator bar');
assert(!res.body.includes('<!-- CONTENT -->'), 'Placeholder should be replaced');
assert(res.body.includes('Pick a layout'), 'Fragment content should be present');
assert(res.body.includes('data-choice="a"'), 'Fragment interactive elements intact');
@@ -560,16 +560,8 @@ async function runTests() {
const template = fs.readFileSync(
path.join(__dirname, '../../skills/brainstorming/scripts/frame-template.html'), 'utf-8'
);
assert(template.includes('<div class="header">'), 'Should have top header markup');
assert(!template.includes('indicator-bar'), 'Should not have footer chrome');
assert(!template.includes('indicator-text'), 'Header should not render selection indicator text');
assert(template.includes('<!-- BRANDING -->'), 'Should have branding placeholder');
assert(template.includes('<div class="status">Connecting…</div>'), 'Header should include connection status');
assert(template.includes('grid-template-columns: minmax(0, 1fr) auto;'), 'Header should let brand text shrink before the status column');
assert(template.includes('padding: 0.5rem 1.5rem;'), 'Header should keep equal left and right edge padding');
assert(template.includes('.header .brand { justify-self: start; width: 100%; font-size: 0.75rem; line-height: 1; }'), 'Header brand should align left, fill its grid track, and match header text size');
assert(template.includes('.header .status { grid-column: 2; line-height: 1; }'), 'Header status should sit in the right column');
assert(!template.includes('<div></div>'), 'Header should not use an empty spacer before branding');
assert(template.includes('indicator-bar'), 'Should have indicator bar');
assert(template.includes('indicator-text'), 'Should have indicator text');
assert(template.includes('<!-- CONTENT -->'), 'Should have content placeholder');
assert(template.includes('frame-content'), 'Should have content container');
return Promise.resolve();

View File

@@ -1,117 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
TASK_BRIEF="$REPO_ROOT/skills/subagent-driven-development/scripts/task-brief"
FAILURES=0
TEST_ROOT=""
pass() {
echo " [PASS] $1"
}
fail() {
echo " [FAIL] $1"
FAILURES=$((FAILURES + 1))
}
cleanup() {
if [[ -n "$TEST_ROOT" && -d "$TEST_ROOT" ]]; then
rm -rf "$TEST_ROOT"
fi
}
extract_written_path() {
local output="$1"
printf '%s\n' "$output" | sed -n 's/^wrote \(.*\): [0-9][0-9]* lines$/\1/p'
}
assert_not_equals() {
local actual="$1"
local expected="$2"
local description="$3"
if [[ "$actual" != "$expected" ]]; then
pass "$description"
else
fail "$description"
echo " both were: $actual"
fi
}
assert_file_contains() {
local path="$1"
local needle="$2"
local description="$3"
if grep -Fq -- "$needle" "$path"; then
pass "$description"
else
fail "$description"
echo " expected $path to contain: $needle"
fi
}
main() {
echo "=== Test: task-brief output paths ==="
TEST_ROOT="$(mktemp -d)"
trap cleanup EXIT
local repo="$TEST_ROOT/repo"
local plan="$repo/plan.md"
local output_one
local output_two
local path_one
local path_two
git init -q -b main "$repo"
cat > "$plan" <<'EOF'
# Implementation Plan
## Task 1: First thing
Do the first thing.
## Task 2: Second thing
Do the second thing.
EOF
output_one="$(cd "$repo" && "$TASK_BRIEF" "$plan" 1)"
output_two="$(cd "$repo" && "$TASK_BRIEF" "$plan" 1)"
path_one="$(extract_written_path "$output_one")"
path_two="$(extract_written_path "$output_two")"
assert_not_equals "$path_one" "$path_two" "Default task brief paths are unique per invocation"
assert_file_contains "$path_one" "## Task 1: First thing" "First default brief contains the requested task"
assert_file_contains "$path_two" "## Task 1: First thing" "Second default brief contains the requested task"
if [[ "$path_one" == "$repo/.git/sdd/"* ]]; then
pass "First default brief stays under the repo git metadata directory"
else
fail "First default brief stays under the repo git metadata directory"
echo " actual: $path_one"
fi
if [[ "$path_two" == "$repo/.git/sdd/"* ]]; then
pass "Second default brief stays under the repo git metadata directory"
else
fail "Second default brief stays under the repo git metadata directory"
echo " actual: $path_two"
fi
if [[ $FAILURES -ne 0 ]]; then
echo ""
echo "FAILED: $FAILURES assertion(s) failed."
exit 1
fi
echo ""
echo "PASS"
}
main "$@"