Commit Graph

525 Commits

Author SHA1 Message Date
Jesse Vincent
ff50f01ab2 chore(evals): bump submodule to companion just-in-time scenario
Bump the evals submodule (ff3ee83 -> f1ac859) to include the brainstorming
visual-companion just-in-time eval scenario that validates the SKILL.md
consent-move in this PR (GREEN pass / RED fail on Quorum and drill).

Scope: dev's recorded pointer predates the drill->Quorum migration, so this
bump also carries that migration.
2026-06-09 19:55:22 -07:00
Jesse Vincent
b0fa0f2e36 fix(brainstorm-server): fix auth-integration bugs from full-branch review
A second adversarial review of the merged branch found that combining the
session-key auth with the feature work created real bugs the (vacuous) tests
missed:

- [Critical] GET /files/ (empty name) resolved to CONTENT_DIR and crashed the
  process with uncaught EISDIR — newly reachable because the query-stripping
  refactor turns /files/?key=... into /files/. Reject non-regular-file names.
- [High] --open opened a KEYLESS url, which the auth gate 403s — the headline
  feature landed on the error page. Open the keyed url.
- [High] Same-port restart regenerated the token (port persisted, token not), so
  the open tab's old cookie 403'd and never reconnected — contradicting the
  documented promise. Persist the token (BRAINSTORM_TOKEN_FILE / .last-token)
  alongside the port.
- [Medium] Token sat in world-readable server-info/server.log (0644 in /tmp).
  umask 077 in start-server.sh + mode 0600 on server-info/.last-token.
- [Medium] touchActivity() ran before the auth check, so unauthenticated requests
  defeated the idle timeout. Count activity only after authorization.
- [Low] COOKIE_NAME embedded the pre-fallback port; derive it from the actual
  bound port (also prevents a cross-server cookie-jar collision on fallback).

Tests added/strengthened (previously passed vacuously): /files/ no-crash; the
auto-open url carries the key and is reachable (200); restart reuses the same key
not just the port; unauthenticated requests don't reset the idle clock.
Full suite green (ws-protocol 32, helper 12, auth 13, server 29, lifecycle 8,
stop-server 4); restart smoke confirms same port+key and old URL -> 200.
2026-06-09 19:13:52 -07:00
Jesse Vincent
610e4d39f0 test(brainstorm-server): thread session key through tests after auth merge
Integrating the per-session-key auth onto the same branch as the dotfile and
lifecycle work: two tests added after the auth commit opened WebSockets without a
key (server.test.js dotfile-reload, lifecycle.test.js idle-shutdown), which the
auth gate now resets. Pass ?key=/BRAINSTORM_TOKEN in both. Full suite green:
ws-protocol 32, helper 12, auth 13, server 28, lifecycle 7, stop-server 4.
2026-06-09 18:33:00 -07:00
Jesse Vincent
e3fe480b29 feat(brainstorm-server): gate every endpoint behind a per-session key
The companion server is reachable by any local browser tab (default loopback
bind) and by any host that can route to it (remote --host bind). It served
screens, files, and accepted event-injecting WebSocket connections with no
authentication, so a malicious browser tab or a direct remote client could read
brainstorm content or inject events that the agent reads as the user's input
(prompt injection into a live session).

Generate a per-session secret token, carry it in the served URL as ?key=, and
mirror it into an HttpOnly SameSite=Strict per-port cookie on first load so
same-origin subresources and the WebSocket handshake authenticate automatically.
Every HTTP request and WebSocket upgrade now requires a valid key (query or
cookie, constant-time compared); unauthenticated requests get a friendly 403
explaining they need the full URL. A secret authenticates the client uniformly
across loopback, tunnel, and remote binds and defeats DNS rebinding, which a
Host/Origin allowlist cannot.

Also guard handleMessage against a null JSON payload that crashed the process.

Tests: new auth.test.js (13 cases) covering the key on /, /files/*, and WS plus
cookie bootstrap and the null-payload guard; server.test.js threads the key;
ws-protocol.test.js + auth.test.js wired into npm test.

Closes #1014
Refs #1110, #1553, #1504
2026-06-09 18:29:49 -07:00
Jesse Vincent
3e3c10e671 docs(brainstorm): catalog visual companion issues; choose session-key for security
Records the triage of open issues/PRs touching the brainstorm companion server
and the decision to protect it with a per-session secret key (supersedes the
Host/Origin allowlist approach) so remote-connected users are covered, not just
loopback.
2026-06-09 18:27:43 -07:00
Jesse Vincent
843c473382 fix(brainstorm-server): tie stop-server PID check to the session's port
The node+server.cjs command match (from the adversarial review) still matched any
unrelated node process running a file named server.cjs. When we recorded the
bound port (state/server-info) and lsof is available, additionally require the
PID to be the process actually LISTENING on this session's port — which rules out
a different project's server.cjs / editor task runner that recycled the stale
PID. Falls back to the command match when the port or lsof isn't available.

Test: a 'node server.cjs' process not listening on the recorded port is spared.

Refs #1703
2026-06-09 17:27:30 -07:00
Jesse Vincent
f8f87ff43a fix(brainstorm-server): address adversarial review findings
From a two-reviewer adversarial pass:

- [High] EADDRINUSE fallback clobbered the shared .last-port: onListen wrote the
  bound port unconditionally, so a fallback to a random port overwrote the
  preferred port another live session still owns — stranding that session's open
  tab forever. Now persist only when we bound the preferred port (not on
  fallback). The fallback test now asserts .last-port integrity (teeth-verified).

- [Medium] maybeOpenBrowser ran the URL through a shell (exec + JSON.stringify),
  which does NOT neutralize $(...) in a url-host. Platform launchers now use
  execFile with the URL as an argv element (no shell). The operator-set
  BRAINSTORM_OPEN_CMD path stays shell-based (trusted input).

- [Medium] --open was a silent no-op on native Windows (no win32 branch). Added.

- [Medium] helper.js reconnect/status/tombstone had only substring-grep tests.
  Added behavioral tests driving the state machine against a mocked browser:
  Reconnecting+backoff (500->1000->2000), tombstone after the grace period, and
  reload-on-recovery.

- [Low] status pill showed a false 'Connected' before the socket opened; now
  starts 'Connecting…' until onopen.

Not changed (flagged): stop-server.sh's PID-ownership check still matches any
'node ... server.cjs' (narrow residual — a recycled PID onto an unrelated node
server.cjs); robust fix needs fragile cross-platform process introspection.
2026-06-09 15:59:59 -07:00
Jesse Vincent
7b815ed8c8 feat(brainstorming): offer the visual companion just-in-time; harden lifecycle guidance
Move the companion consent from an upfront, anticipatory offer to the first
moment a question would genuinely be clearer shown than told. If no visual
question ever arises, it's never offered. On approval the agent starts the
server with --open, so the user's browser opens to the first screen — the pop is
tied to that approval, never unsolicited.

Also hardens visual-companion.md: confirming the server is alive (server-info
present, server-stopped absent) before referring to the URL is now a required
step; restart with the same --project-dir reuses the port so the open tab
reconnects on its own (paused overlay while down); idle default corrected to 4h.

NOTE: SKILL.md is behavior-shaping content — this flow change should be
eval-tested (writing-skills adversarial pressure test) before merge.

Refs #1237, #1037
2026-06-09 15:32:58 -07:00
Jesse Vincent
bccc41dffe feat(brainstorm-server): opt-in auto-open of the browser on the first screen
When the user approves the visual companion, open their browser automatically the
first time a screen is actually ready to show — rather than at startup (just the
waiting page) or making them open the URL by hand.

Opt-in and gated on approval: off unless BRAINSTORM_OPEN is set (start-server.sh
--open, which the agent passes only after the user agrees to use the companion).
Even then it fires once, and is skipped if a browser is already connected, on a
non-loopback/remote bind, or when headless. Launcher is the platform default
(open / xdg-open / WSL cmd.exe) or BRAINSTORM_OPEN_CMD; best-effort, never fatal.

lifecycle.test.js: opens once on the first screen when approved; does NOT open
without approval.

Closes #755
Refs #759
2026-06-09 15:30:25 -07:00
Jesse Vincent
b53c62eba8 feat(brainstorm-server): reuse the same port on session restart
When the companion idle-shuts-down and the agent restarts it, a fresh random
port meant the user's open browser tab pointed at a dead URL. Persist the bound
port per project and prefer it on the next start, so the restarted server comes
up on the same port and the open tab's reconnect just works.

- start-server.sh exports BRAINSTORM_PORT_FILE=<project>/.superpowers/brainstorm/
  .last-port for project sessions (not /tmp).
- server.cjs prefers an explicit BRAINSTORM_PORT, else the recorded port, else
  random; writes the actually-bound port back; and on EADDRINUSE (preferred port
  still in use) falls back to a random port once instead of crashing.

lifecycle.test.js: restart reuses the recorded port; a taken preferred port
falls back to a random one without crashing.

Refs #1237
2026-06-09 15:22:23 -07:00
Jesse Vincent
e6cf11f68c feat(brainstorm-companion): resilient reconnect, live status, paused overlay
The injected client reconnected on a fixed 1s timer with no feedback: if the
laptop slept or the server restarted, the page showed 'Connected' over a dead
socket and silently queued events. And when the server stopped, the user got a
bare connection-refused with no explanation.

helper.js now:
- reconnects with exponential backoff (500ms, doubling, capped at 30s; reset on
  open), with an onerror->close handler, nulls the socket on close, and clears a
  pending timer before scheduling another;
- drives the frame status pill Connected/Reconnecting/Disconnected via a
  --status-color custom property (frame-template.html);
- after ~15s disconnected, shows a self-styled 'Companion paused' overlay
  (tombstone) explaining the companion stopped and will reconnect automatically;
- on recovery from a tombstoned outage (e.g. server restarted on the same port)
  reloads to pick up the restarted server's current screen.

The reconnect-backoff is an exported pure function; helper.test.js unit-tests it
(doubling + cap progression) and asserts the status/tombstone/reconnect wiring.
DOM behaviour is verified live.

Refs #856, #1237
2026-06-09 15:18:19 -07:00
Jesse Vincent
f057b4a30b feat(brainstorm-server): 4h configurable idle timeout; close WS on shutdown
The companion shut down after only 30 minutes idle — too short for real
brainstorming, where a single question can sit far longer. And shutdown() never
closed upgraded WebSocket sockets, so an open browser connection could keep the
Node process alive after it was supposed to exit.

- Default idle timeout raised to 4 hours, configurable via BRAINSTORM_IDLE_TIMEOUT_MS
  and start-server.sh --idle-timeout-minutes (validated positive integer).
- Reported as idle_timeout_ms in the server-started JSON / server-info.
- shutdown() now destroys all client sockets so the process exits even with an
  open WebSocket.
- Watchdog check interval is configurable (BRAINSTORM_LIFECYCLE_CHECK_MS, default
  60s) so the lifecycle can be tested without minute-long waits.

Adds lifecycle.test.js (configured timeout reported; idle shutdown exits despite
an open WS — teeth-verified; the start-server flag). Wires ws-protocol,
lifecycle, and stop-server suites into npm test.

Closes #1237
Refs #1689
2026-06-09 15:08:09 -07:00
Jesse Vincent
ddcb56c16e fix(brainstorm-server): verify PID ownership before stopping
stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a
reboot or PID wraparound the pid file can point at an unrelated, live process —
which we would then kill.

Verify the PID is actually our server (a running 'node ... server.cjs') before
signalling it. If ownership can't be proven, fail closed: remove the stale pid
file and report {status: stale_pid} without killing anything. Real servers still
stop ({status: stopped}); a missing pid file still reports not_running.

Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real
server is stopped, and a missing pid file.

Refs #1703
2026-06-09 15:02:25 -07:00
Jesse Vincent
e0442fba00 fix(brainstorm-server): ignore macOS resource-fork dotfiles
On macOS (and ExFAT/SMB volumes) the OS writes ._<name>.html sidecar files
holding binary resource-fork metadata. These end with .html, so they passed the
content filter and could be picked as the newest screen — serving binary garbage
to the browser instead of the mockup — or fetched via /files/.

Skip dotfiles (leading '.') at all four sites that list or serve content:
getNewestScreen, the /files/ endpoint, the known-files seed, and the fs.watch
handler. Tests cover serving (/ and /files/) and the watch path (a ._ file must
not trigger a reload).

Refs #950
2026-06-09 15:02:25 -07:00
Jesse Vincent
f55642e0dd Require contributors to disclose authoring environment and target dev
Add a mandatory self-identification disclosure (model, harness, harness
version, all installed plugins) to the PR template and all three issue
templates, and document the requirement in the contributor guidelines.
We weigh contributions differently depending on what produced them:
content reasoned from documentation is held to a different bar than work
grounded in a real session.

Also state explicitly, in both CLAUDE.md and the PR template, that all
PRs must target the dev branch rather than main.
2026-06-08 22:14:34 -07:00
Drew Ritter
ae1eefb7f9 chore(evals): bump submodule to --scenarios filter (ff3ee83)
Adds `run-all --scenarios` for resuming a scenario subset across the Code
Assist rate-limit windows. Follows the agy rate-limit fix (79f9963).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 22:46:00 -07:00
Drew Ritter
617168aff5 chore(evals): bump submodule to antigravity rate-limit fix (79f9963)
Serialize antigravity against the Gemini Code Assist rate limit
(max_concurrency=1), diagnose 429/RESOURCE_EXHAUSTED honestly instead of as
auth, fail-fast on a latched window, and tolerant preflight OK match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 16:27:35 -07:00
Rahul
d7c260a978 fix(brainstorming): cap websocket frame payloads 2026-06-02 11:24:02 -07:00
Drew Ritter
f3f0789c5c Add shell lint script 2026-06-01 19:48:28 -07:00
Drew Ritter
16a1719988 Tighten Kimi plugin porting coverage 2026-06-01 19:41:58 -07:00
Drew Ritter
c74c22daa7 docs: restore Kimi direct install command 2026-06-01 19:41:58 -07:00
Drew Ritter
773bbf61d6 docs: simplify Kimi README install steps 2026-06-01 19:41:58 -07:00
Drew Ritter
6b76158550 fix: wire Kimi plugin into release metadata 2026-06-01 19:41:58 -07:00
Drew Ritter
7fec40bb55 fix: align Kimi manifest with supported fields 2026-06-01 19:41:58 -07:00
qer
2a8e54735b feat: add Kimi Code plugin manifest 2026-06-01 19:41:58 -07:00
Matt Van Horn
f776394360 feat(subagent-dev): add TDD RED evidence to implementer report format
Add a conditional TDD Evidence field to the implementer report format so controllers can verify RED and GREEN output when TDD was required.

The field asks for the command run, relevant RED/GREEN output, and the expected RED failure reason rather than raw full logs.

Fixes #994.
2026-06-01 16:15:05 -07:00
Drew Ritter
7301c81b4d docs(windows): trim polyglot hook implementation copy 2026-06-01 16:07:01 -07:00
dev_Hakaze
9d3e68a5ad docs(windows): update polyglot hook docs
Rewrite the Windows polyglot hook documentation to match the current run-hook.cmd dispatcher and update the porting guide cross-reference.\n\nFixes #1653.
2026-06-01 15:57:30 -07:00
nestorluiscamachopaz
81c3052416 fix: foreground mode saves node PID and clears OWNER_PID on Windows/MSYS2
Verified on real Windows Git Bash: lifecycle test passed 12/12, manual start/stop released the port, and no brainstorm node processes remained.
2026-06-01 14:26:22 -07:00
nawfal
c879454a0d fix(finishing-a-development-branch): remove gh-specific PR creation instruction
Per obra's guidance on #1609: remove the github-specific instruction rather
than replacing it with a platform-detection table. Agents already know their
forge tooling; the skill only needs to cover the push step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 13:58:22 -07:00
nawfal
ff213eb2cf fix(finishing-a-development-branch): detect remote platform before creating PR/MR
Replaces hardcoded `gh pr create` in Option 2 with a platform-neutral
note: check `git remote get-url origin` first, then use gh (GitHub),
glab (GitLab), or fall back to the compare URL for unknown platforms.

Adds matching Red Flag entry so agents don't skip the detection step.

Fixes #1609

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 13:58:22 -07:00
Jesse Vincent
da00e59958 feat: add Antigravity CLI (agy) support
Antigravity (Google's `agy` CLI) installs the existing Superpowers plugin
directly:

    agy plugin install https://github.com/obra/superpowers

agy imports the bundled skills and runs the plugin's SessionStart hook, so
using-superpowers bootstraps from the first message — verified on agy 1.0.3:
a fresh session given "Let's make a react todo list" auto-triggers the
brainstorming skill instead of writing code. agy discovers skills natively
and, having no Skill tool, loads them by reading SKILL.md with view_file.

No scaffold, installer, or generated context file is needed. This adds only:

- README.md: an Antigravity install section + Quickstart link
- skills/using-superpowers/SKILL.md: reference to the agy tool mapping
- skills/using-superpowers/references/antigravity-tools.md: action->tool
  mapping for agy (view_file, write_to_file, invoke_subagent, manage_task,
  and skill loading via view_file on SKILL.md)
- tests/antigravity/: structural test for the tool mapping, mirroring
  tests/pi/
2026-06-01 11:42:09 -07:00
Jesse Vincent
deceaec78d docs: add 'Porting Superpowers to a New Harness' guide
An evergreen guide for adding support for a new harness (IDE, CLI, or agent
runner). Teaches the invariants — automatic session-start bootstrap, skill
discovery/invocation, tool mapping, the acceptance test — and points at the
closest reference integration shape (shell-hook, in-process plugin,
instructions-file / declared context file) to copy. Covers discovery, build,
local install, tmux-driven verification, distribution, and PR submission, with a
live reference-integration index and a gotchas appendix.

Two non-negotiable rules: (1) never edit skill bodies; (2) everything ships
through the harness's own install mechanism — never edit the user's config. When
a plugin installer strips undeclared files, declare the bootstrap as a recognized
component (a manifest contextFileName-style context file the installer preserves
and the harness loads every session), generated at install time from the live
SKILL.md + tool mapping. Surfaced-skill-description bootstrap is the softer
fallback.

Hardened against real end-to-end ports (Antigravity CLI): shapes can compose; a
fork doesn't inherit its parent's behavior; a hook system != a usable
session-start event; verify @-includes AND context-file preservation with a
marker; web-search the docs and study existing plugins; reverse-engineer
undocumented harnesses; print/headless modes may hang; workspace-trust gates
stall tmux; declared context files survive plugin install while undeclared files
are stripped; skills-path registration is per-harness.
2026-06-01 10:07:38 -07:00
Jesse Vincent
e63e44bedf fix(sync-to-codex-plugin): exclude /.pi/ so the pi extension doesn't leak into the Codex plugin
The .pi/ directory holds the pi-harness extension (.pi/extensions/superpowers.ts),
which is tracked (not git-ignored), so the git-ignored-path exclusion helpers
never caught it. It was also missing from the static EXCLUDES list alongside the
other harness dotdirs (.opencode, .cursor-plugin, .claude-plugin), so a sync
would rsync pi's files into the Codex plugin distribution. Add /.pi/ to EXCLUDES.
2026-05-29 15:05:38 -07:00
Jesse Vincent
8811b0f2d7 Revert "Make visual-companion.md script paths skill-rooted, not plugin-rooted"
This reverts commit e9f5188289.
2026-05-23 17:01:46 -07:00
Jesse Vincent
d48bec6cc3 Revert "Probe per-user Git Bash and Scoop before falling back to PATH on Windows"
This reverts commit a8f0738e3a.
2026-05-23 17:00:15 -07:00
Jesse Vincent
a8f0738e3a Probe per-user Git Bash and Scoop before falling back to PATH on Windows
Stock Windows 10/11 ships C:\Windows\System32\bash.exe (the WSL
launcher) as the first match for `where bash`. WSL's bash cannot
execute Windows-style script paths, so when Git Bash is installed
outside the two standard system locations -- specifically the
per-user "Only for me" Git for Windows installer
(%LOCALAPPDATA%\Programs\Git) or a Scoop install
(%USERPROFILE%\scoop\apps\git\current\usr\bin) -- run-hook.cmd
silently fails: WSL prints "Windows Subsystem for Linux must be
updated", the script returns 0, and Superpowers' SessionStart
bootstrap is never injected. From the user's perspective skills
auto-trigger inconsistently or not at all, with no surfaced error.

Add explicit probes for both locations between the existing system-
wide Git for Windows checks and the `where bash` fallback. Also add
a comment to the fallback documenting the WSL-launcher trap so future
maintainers understand why the explicit probes must come first.

Verified on a Windows 11 VM (dockur/windows 11, Git Bash 2.x, Node
22):
- System Git present: existing probe still matches (no regression)
- System Git absent, per-user Git present via junction: new probe
  matches, hook produces valid 6422-byte JSON, exit 0
- All Git probes absent: confirmed WSL trap fires
  ("Windows Subsystem for Linux must be updated") and the hook exits 0
  silently, demonstrating the original bug

Existing tests/hooks/test-session-start.sh still passes on macOS (7/7).

Reported by @ytchenak in #1607.

Co-authored-by: ytchenak <ytchenak@users.noreply.github.com>
Closes #1607.
2026-05-23 16:58:56 -07:00
Jesse Vincent
f36bad5b78 Pipe SessionStart hook printf through cat to absorb EPIPE on Windows
On Windows + Git Bash, the SessionStart hook prints a confusing
diagnostic at every startup ("printf: write error: Permission denied")
when Claude Code closes the hook's stdout pipe before the printf has
finished writing. The hook still runs to completion and context still
gets injected, but the diagnostic surfaces every session because
Git Bash's printf reports EPIPE as "Permission denied" (not "Broken
pipe" like Linux) and our `set -euo pipefail` lets that error escape.

Piping each printf through `cat` makes the external cat process the
recipient of any SIGPIPE / EPIPE. cat's failure does not propagate to
the parent bash under pipefail because cat is the last command in the
pipeline and exits cleanly when the pipe stays open long enough to
hold the data. On macOS/Linux the cat passthrough is transparent (no
behavior change, no measurable cost).

Verified:
- Existing tests/hooks/test-session-start.sh: 7/7 pass on macOS
- Manual run on Windows 11 + Git Bash 5.2 + Node 22 produces valid JSON,
  clean stderr, and exit 0
- JSON output is byte-identical to the unpatched hook

Reported by @silvertakana in #1612, attribution preserved in the
Co-authored-by trailer below — this is the same fix shape the original
PR proposed.

Co-authored-by: silvertakana <silvertakana@users.noreply.github.com>
Closes #1612.
2026-05-23 16:55:46 -07:00
Nick Galatis
21ad401e90 fix(systematic-debugging): defuse Claude Code ultrathink keyword scanner trigger (#1558)
The "Signals You're Doing It Wrong" bullet in systematic-debugging/SKILL.md
contains the literal token Claude Code's runtime scans for in tool result
bodies. Every Skill-tool invocation of this skill caused the harness to
inject a spurious system-reminder claiming the user requested deeper
reasoning, silently bumping every session into extended thinking.

Replace the bullet's spelling so the contiguous letter sequence the scanner
matches is broken with a hyphen. The signal text remains recognizable to
the agent and the documented action ("Question fundamentals, not just
symptoms") is unchanged.

Fixes obra/superpowers#1283
2026-05-23 16:51:00 -07:00
Jesse Vincent
e9f5188289 Make visual-companion.md script paths skill-rooted, not plugin-rooted
Issue #1134: agents reading visual-companion.md see bare commands like
`scripts/start-server.sh`, correctly identify the plugin install
directory, then look for `<plugin>/scripts/start-server.sh` instead of
`<plugin>/skills/brainstorming/scripts/start-server.sh`. The file
doesn't exist at the plugin-rooted path, so the agent concludes the
visual companion isn't available and falls back to text-only
brainstorming.

Multiple independent reproductions in the issue thread, plus one user's
agent self-reported: "I assumed the scripts folder was in the root
directory of the plugin, it didn't realize it could have been talking
about the skill folder itself."

Change all `scripts/<file>` references in visual-companion.md to
`skills/brainstorming/scripts/<file>`. Agents that correctly identify
the plugin root will now join to the right path.

Closes #1134.
2026-05-23 16:42:13 -07:00
Jesse Vincent
eef50b96f0 Align windows-lifecycle test with current brainstorm server layout
The test had drifted behind three server implementation changes and no
longer ran against the actual server:

- Server entrypoint renamed from server.js to server.cjs; the test still
  invoked node on server.js and failed with MODULE_NOT_FOUND.
- Server state moved to a state/ subdirectory (state/server-info,
  state/server.pid); the test still waited on .server-info and wrote
  .server.pid at the session root.
- Owner-PID startup validation now keeps the server running when the
  owner PID is dead at startup: it logs owner-pid-invalid, disables
  owner monitoring, and falls back to the idle timeout. The test still
  expected the server to self-terminate within 60s of a dead-at-startup
  owner.

Update file/path references to match the current server, and rewrite
the dead-at-startup test to assert the current behavior: server
survives, log contains owner-pid-invalid, log does not contain a
spurious "owner process exited" line.

Verified locally: 9 passed, 0 failed, 3 skipped (Windows-only).
2026-05-23 16:36:45 -07:00
Jesse Vincent
e1d3f71e0d Convert curly to square brackets in code-reviewer.md placeholders
Matches the style used by the spec-reviewer-prompt.md and
code-quality-reviewer-prompt.md call sites, which already use square
brackets ([VAR] or [VAR — description]). No semantic change — these
placeholders are filled in by the controller; nothing programmatic
substitutes them.
2026-05-23 16:14:24 -07:00
Jesse Vincent
b2212dc913 Scope spec reviewer to task diff and make reviewers read-only
Two problems with the SDD reviewer prompts on dev:

- spec-reviewer-prompt.md never received a git range, so the
  general-purpose subagent had to crawl the entire codebase to find what
  changed. Reporter measured 20-33 minute spec reviews on simple tasks
  (#1538).
- Neither reviewer prompt told the subagent that review is read-only.
  A spec reviewer running `git checkout <parent-sha>` for historical
  comparison silently detached HEAD on the controller's branch, then
  subsequent task commits accumulated on the detached HEAD and were
  effectively orphaned (#1543, reproduced independently in #1543's
  thread).

Add a Git Range to Review section to spec-reviewer-prompt.md that
mirrors the one code-reviewer.md already has, plus a Read-Only Review
section in both reviewer prompt templates stating the principle: do
not mutate the working tree, the index, HEAD, or branch state. Allow
inspecting other revisions via a separate temporary worktree, so the
read-only rule does not block legitimate historical comparison.

Closes #1538.
Closes #1543.
2026-05-23 16:14:05 -07:00
Jesse Vincent
180f009090 @mhat reported that his claude got confused about 'debugging' being named as a skill in the bootstrap 2026-05-21 17:23:25 -04:00
Drew Ritter
8c1f7c5dae Bump superpowers-evals submodule 2026-05-14 16:32:24 -07:00
Drew Ritter
201f945838 [codex] support native Codex plugin hooks (#1540)
* docs: specify Codex native hooks parity

* docs: refine Codex hooks spec after review

* docs: record Codex hook contract spike

* docs: plan Codex native hooks implementation

* feat: support Codex native plugin hooks

* test: add Codex native hook drill coverage

* Simplify Codex hook entrypoint
2026-05-14 15:59:38 -07:00
Drew Ritter
49bf5ad6dc Align Pi mapping with action vocabulary 2026-05-13 17:58:46 -07:00
Drew Ritter
4bd0973879 Bump evals submodule for Pi backend 2026-05-13 17:58:46 -07:00
Jesse Vincent
452f1ed40b chore: keep pi extension under .pi 2026-05-13 17:58:46 -07:00
Jesse Vincent
cafbc5a4bd feat: add pi superpowers package extension 2026-05-13 17:58:46 -07:00