* docs: add Codex App compatibility design spec (PRI-823)
Design for making using-git-worktrees, finishing-a-development-branch,
and subagent-driven-development skills work in the Codex App's sandboxed
worktree environment. Read-only environment detection via git-dir vs
git-common-dir comparison, ~48 lines across 4 files, zero breaking changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: address spec review feedback for PRI-823
Fix three Important issues from spec review:
- Clarify Step 1.5 placement relative to existing Steps 2/3
- Re-derive environment state at cleanup time instead of relying on
earlier skill output
- Acknowledge pre-existing Step 5 cleanup inconsistency
Also: precise step references, exact codex-tools.md content, clearer
Integration section update instructions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: address team review feedback for PRI-823 spec
- Add commit SHA + data loss warning to handoff payload (HIGH)
- Add explicit commit step before handoff (HIGH)
- Remove misleading "mark as externally managed" from Path B
- Add executing-plans 1-line edit (was missing)
- Add branch name derivation rules
- Add conditional UI language for non-App environments
- Add sandbox fallback for permission errors
- Add STOP directive after Step 0 reporting
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: clarify executing-plans in What Does NOT Change section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add cleanup guard test (#5) and sandbox fallback test (#10) to spec
Both tests address real risk scenarios:
- #5: cleanup guard bug would delete Codex App's own worktree (data loss)
- #10: Local thread sandbox fallback needs manual Codex App validation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add implementation plan for Codex App compatibility (PRI-823)
8 tasks covering: environment detection in using-git-worktrees,
Step 1.5 + cleanup guard in finishing-a-development-branch,
Integration line updates, codex-tools.md docs, automated tests,
and final verification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(codex-tools): add named agent dispatch mapping for Codex (#647)
* fix(writing-skills): correct false 'only two fields' frontmatter claim (#882)
* Replace subagent review loops with lightweight inline self-review
The subagent review loop (dispatching a fresh agent to review plans/specs)
doubled execution time (~25 min overhead) without measurably improving plan
quality. Regression testing across 5 versions (v3.6.0 through v5.0.4) with
5 trials each showed identical plan sizes, task counts, and quality scores
regardless of whether the review loop ran.
Changes:
- writing-plans: Replace subagent Plan Review Loop with inline Self-Review
checklist (spec coverage, placeholder scan, type consistency)
- writing-plans: Add explicit "No Placeholders" section listing plan failures
(TBD, vague descriptions, undefined references, "similar to Task N")
- brainstorming: Replace subagent Spec Review Loop with inline Spec Self-Review
(placeholder scan, internal consistency, scope check, ambiguity check)
- Both skills now use "look at it with fresh eyes" framing
Testing: 5 trials with the new skill show self-review catches 3-5 real bugs
per run (spawn positions, API mismatches, seed bugs, grid indexing) in ~30s
instead of ~25 min. Remaining defects are comparable to the subagent approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Revert "Replace subagent review loops with lightweight inline self-review"
This reverts commit bf8f7572eb.
* Reapply "Replace subagent review loops with lightweight inline self-review"
This reverts commit b045fa3950.
* Add v5.0.6 release notes
* Move brainstorm server metadata to .meta/ subdirectory
Metadata files (.server-info, .events, .server.pid, .server.log,
.server-stopped) were stored in the same directory served over HTTP,
making them accessible via the /files/ route. They now live in a .meta/
subdirectory that is not web-accessible.
Also fixes a stale test assertion ("Waiting for Claude" → "Waiting for
the agent").
Reported-By: 吉田仁
* Revert "Move brainstorm server metadata to .meta/ subdirectory"
This reverts commit ab500dade6.
* Separate brainstorm server content and state into peer directories
The session directory now contains two peers: content/ (HTML served to
the browser) and state/ (events, server-info, pid, log). Previously
all files shared a single directory, making server state and user
interaction data accessible over the /files/ HTTP route.
Also fixes stale test assertion ("Waiting for Claude" → "Waiting for
the agent").
Reported-By: 吉田仁
* Fix owner-PID false positive when owner runs as different user
ownerAlive() treated EPERM (permission denied) the same as ESRCH
(process not found), causing the server to self-terminate within 60s
whenever the owner process ran as a different user. This affected WSL
(owner is a Windows process), Tailscale SSH, and any cross-user
scenario.
The fix: `return e.code === 'EPERM'` — if we get permission denied,
the process is alive; we just can't signal it.
Tested on Linux via Tailscale SSH with a root-owned grandparent PID:
- Server survives past the 60s lifecycle check (EPERM = alive)
- Server still shuts down when owner genuinely dies (ESRCH = dead)
Fixes#879
* Fix owner-PID lifecycle monitoring for cross-platform reliability
Two bugs caused the brainstorm server to self-terminate within 60s:
1. ownerAlive() treated EPERM (permission denied) as "process dead".
When the owner PID belongs to a different user (Tailscale SSH,
system daemons), process.kill(pid, 0) throws EPERM — but the
process IS alive. Fixed: return e.code === 'EPERM'.
2. On WSL, the grandparent PID resolves to a short-lived subprocess
that exits before the first 60s lifecycle check. The PID is
genuinely dead (ESRCH), so the EPERM fix alone doesn't help.
Fixed: validate the owner PID at server startup — if it's already
dead, it was a bad resolution, so disable monitoring and rely on
the 30-minute idle timeout.
This also removes the Windows/MSYS2-specific OWNER_PID="" carve-out
from start-server.sh, since the server now handles invalid PIDs
generically at startup regardless of platform.
Tested on Linux (magic-kingdom) via Tailscale SSH:
- Root-owned owner PID (EPERM): server survives ✓
- Dead owner PID at startup (WSL sim): monitoring disabled, survives ✓
- Valid owner that dies: server shuts down within 60s ✓
Fixes#879
* Release v5.0.6: inline self-review, brainstorm server restructure, owner-PID fixes
* fix: add Copilot CLI platform detection for sessionStart context injection
Copilot CLI v1.0.11 reads `additionalContext` from sessionStart hook
output, but the session-start script only emits the Claude Code-specific
nested format. Add COPILOT_CLI env var detection so Copilot CLI gets the
SDK-standard top-level `additionalContext` while Claude Code continues
getting `hookSpecificOutput`.
Based on PR #910 by @culinablaz.
* feat: add Copilot CLI tool mapping, docs, and install instructions
- Add references/copilot-tools.md with full tool equivalence table
- Add Copilot CLI to using-superpowers skill platform instructions
- Add marketplace install instructions to README
- Add changelog entry crediting @culinablaz for the hook fix
* fix(opencode): align skills path across bootstrap, runtime, and tests
The bootstrap text advertised a configDir-based skills path that didn't
match the runtime path (resolved relative to the plugin file). Tests
used yet another hardcoded path and referenced a nonexistent lib/ dir.
- Remove misleading skills path from bootstrap text; the agent should
use the native skill tool, not read files by path
- Fix test setup to create a consistent layout matching the plugin's
../../skills resolution
- Export SUPERPOWERS_SKILLS_DIR from setup.sh so tests use a single
source of truth
- Add regression test that bootstrap doesn't advertise the old path
- Remove broken cp of nonexistent lib/ directory
Fixes#847
* docs: add OpenCode path fix to release notes
* fix(opencode): inject bootstrap as user message instead of system message
Move bootstrap injection from experimental.chat.system.transform to
experimental.chat.messages.transform, prepending to the first user
message instead of adding a system message.
This avoids two issues:
- System messages repeated every turn inflate token usage (#750)
- Multiple system messages break Qwen and other models (#894)
Tested on OpenCode 1.3.2 with Claude Sonnet 4.5 — brainstorming skill
fires correctly on "Let's make a React to do list" prompt.
* docs: update release notes with OpenCode bootstrap change
* docs: add worktree rototill design spec (PRI-974)
Design for detect-and-defer worktree support. Superpowers defers to
native harness worktree systems when available, falls back to manual
git worktree creation when not. Covers Phases 0-2: detection, consent,
native tool preference, finishing state detection, and three bug fixes
(#940, #999, #238).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: address SWE review feedback on worktree rototill spec
- Fix Bug #999 order: merge → verify → remove worktree → delete branch
(avoids losing work if merge fails after worktree removal)
- Add submodule guard to Step 0 detection (GIT_DIR != GIT_COMMON is also
true in submodules)
- Preserve global path (~/.config/superpowers/worktrees/) in detection for
backward compatibility, just stop offering it to new users
- Add step numbering note and implementation notes section
- Expand provenance heuristic to cover global path and manual creation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: honest spec revisions after issue/PR deep dive
- Step 1a is the load-bearing assumption, not just a risk — if it fails,
the entire design needs rework. TDD validation must be first impl task.
- #1009 resolution depends on Step 1a working, stated explicitly
- #574 honestly deferred, not "partially addressed"
- Add hooks symlink to Step 1b (PR #965 idea, prevents silent hook loss)
- Add stale worktree pruning to Step 5 (PR #1072 idea, one-line self-heal)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add worktree rototill implementation plan (PRI-974)
5 tasks: TDD gate for Step 1a, using-git-worktrees rewrite,
finishing-a-development-branch rewrite, integration updates,
end-to-end validation. Task 1 is a hard gate — if native tool
preference fails RED/GREEN, stop and redesign.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add RED/GREEN validation for native worktree preference (PRI-974)
Gate test for Step 1a — validates agents prefer EnterWorktree over
git worktree add on Claude Code. Must pass before skill rewrite.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: rewrite using-git-worktrees with detect-and-defer (PRI-974)
Step 0: GIT_DIR != GIT_COMMON detection (skip if already isolated)
Step 0 consent: opt-in prompt before creating worktree (#991)
Step 1a: native tool preference (short, first, declarative)
Step 1b: git worktree fallback with hooks symlink and legacy path compat
Submodule guard prevents false detection
Platform-neutral instruction file references (#1049)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: rewrite finishing-a-development-branch with detect-and-defer (PRI-974)
Step 2: environment detection (GIT_DIR != GIT_COMMON) before presenting menu
Detached HEAD: reduced 3-option menu (no merge from detached HEAD)
Provenance-based cleanup: .worktrees/ = ours, anything else = hands off
Bug #940: Option 2 no longer cleans up worktree
Bug #999: merge -> verify -> remove worktree -> delete branch
Bug #238: cd to main repo root before git worktree remove
Stale worktree pruning after removal (git worktree prune)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address spec review findings in both skill rewrites (PRI-974)
using-git-worktrees: submodule guard now says "treat as normal repo"
instead of "proceed to Step 1" (preserves consent flow)
using-git-worktrees: directory priority summaries include global legacy
finishing-a-development-branch: move git branch -d after Step 6 cleanup
to make Bug #999 ordering unambiguous (merge -> worktree remove -> branch delete)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: update worktree integration references across skills (PRI-974)
Remove REQUIRED language from executing-plans and subagent-driven-development.
Consent and detection now live inside using-git-worktrees itself.
Fix stale 'created by brainstorming' claim in writing-plans.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: include worktrees/ (non-hidden) in finishing provenance check (PRI-974)
The creation skill supports both .worktrees/ and worktrees/ directories,
but the finishing skill's cleanup only checked .worktrees/. Worktrees
under the non-hidden path would be orphaned on merge or discard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Step 1a validated through TDD — explicit naming + consent bridge (PRI-974)
Step 1a failed at 2/6 with the spec's original abstract text ("use your
native tool"). Three REFACTOR iterations found what works (50/50 runs):
1. Explicit tool naming — "do you have EnterWorktree, WorktreeCreate..."
transforms interpretation into factual toolkit check
2. Consent bridge — "user's consent is your authorization" directly
addresses EnterWorktree's "ONLY when user explicitly asks" guardrail
3. Red Flag entry naming the specific anti-pattern
File split was tested but proven unnecessary — the fix is the Step 1a
text quality, not physical separation of git commands. Control test
with full 240-line skill (all git commands visible) passed 20/20.
Test script updated: supports batch runs (./test.sh green 20), "all"
phase, and checks absence of git worktree add (reliable signal) rather
than presence of EnterWorktree text (agent sometimes omits tool name).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update spec with TDD findings on Step 1a (PRI-974)
Step 1a's original "deliberately short, abstract" design was disproven
by TDD (2/6 pass rate). Spec now documents the validated approach:
explicit tool naming + consent bridge + red flag (50/50 pass rate).
- Design Principles: updated to reflect explicit naming over abstraction
- Step 1a: replaced abstract text with validated approach, added design
note explaining the TDD revision and why file splitting was unnecessary
- Risks: Step 1a risk marked RESOLVED with cross-platform validation table
and residual risk note about upstream tool description dependency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: honest cross-platform validation table in spec (PRI-974)
Research confirmed Claude Code is currently the only harness with an
agent-callable mid-session worktree tool. All others either create
worktrees before the agent starts (Codex App, Gemini, Cursor) or have
no native support (Codex CLI, OpenCode).
Table now shows: what was actually tested (Claude Code 50/50, Codex CLI
6/6), what was simulated (Codex App 1/1), and what's untested (Gemini,
Cursor, OpenCode). Step 1a is forward-compatible for when other
harnesses add agent-callable tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: cross-platform validation on 5 harnesses (PRI-974)
Tested on Gemini CLI (gemini -p) and Cursor Agent (cursor-agent -p):
- Gemini: Step 0 detection 1/1, Step 1b fallback 1/1
- Cursor: Step 0 detection 1/1, Step 1b fallback 1/1
Both correctly identified no native agent-callable worktree tool,
fell through to git worktree add, and performed safety verification.
Both correctly detected existing worktrees and skipped creation.
5 of 6 harnesses now tested. Only OpenCode untested (no CLI access).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove incorrect hooks symlink step from worktree skill
Git worktrees inherit hooks from the main repo automatically via
$GIT_COMMON_DIR — this has been the case since git 2.5 (2015).
The symlink step was based on an incorrect premise from PR #965
and also fails in practice (.git is a file in worktrees, not a dir).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: address PR #1121 review — respect user preference, drop y/n
- Consent prompt: drop "(y/n)" and add escape valve for users who
have already declared their worktree preference in global or
project agent instruction files.
- Directory selection: reorder to put declared user preference
ahead of observed filesystem state, and reframe the default as
"if no other guidance available".
- Sandbox fallback: require explicitly informing the user that
the sandbox blocked creation, not just "report accordingly".
- writing-plans: fully qualify the superpowers:using-git-worktrees
reference.
- Plan doc: mirror the consent-prompt change.
Step 1a native-tool framing and the helper-scripts suggestion are
still outstanding — the first needs a benchmark re-run before softer
phrasing can be adopted without regressing compliance; the second is
exploratory and will get a thread reply.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: soften Step 1a native-tool framing per PR #1121 review
Address obra's comment on explicit step numbers / prescriptive tone.
Drops "STOP HERE if available", the "If YES:" gate, and the "even if /
even if / NO EXCEPTIONS" reinforcement paragraph. Keeps the specific
tool-name anchors (EnterWorktree, WorktreeCreate, /worktree, --worktree),
which the original TDD data showed are load-bearing.
A/B verified against drill harness on the 3 creation/consent scenarios
(consent-flow, creation-from-main, creation-from-main-spec-aware):
baseline explicit wording scored 12/12 criteria, softened wording also
scored 12/12. The "agent used the most appropriate tool" criterion
passed in all 3 softened runs — agents still picked EnterWorktree via
ToolSearch without the imperative framing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: drop instruction file enumeration per PR #1121 review
Jesse flagged that the verbose CLAUDE.md/AGENTS.md/GEMINI.md/.cursorrules
enumeration (a) chews tokens, (b) confuses models that anchor on exact
strings, and (c) is repeated DRY-violatingly across 3+ locations.
Replace with abstract "your instructions" framing in four spots:
- skills/using-git-worktrees/SKILL.md Step 0 → Step 1 transition
- skills/using-git-worktrees/SKILL.md Step 1b Directory Selection
- docs/superpowers/plans/2026-04-06-worktree-rototill.md (both mirror locations)
Same intent, harness-agnostic phrasing, ~half the tokens.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace hardcoded /Users/jesse with generic placeholders (#858)
* Remove the deprecated legacy slash commands (#1188)
* fix: prevent subagent-driven-development from pausing every 3 tasks
requesting-code-review had "review after each batch (3 tasks)" for
executing-plans, which leaked into subagent-driven-development as a
check-in cadence. Replaced with flexible "each task or at natural
checkpoints" and added explicit continuous execution directive to
subagent-driven-development.
* Remove Integration sections from skills
These sections don't help with steering and are a legacy of the time
before agents had native skills systems.
* fix(opencode): cache bootstrap content at module level to eliminate per-step file I/O
getBootstrapContent() called fs.existsSync + fs.readFileSync + regex
frontmatter parsing on every agent step with zero caching. The
experimental.chat.messages.transform hook fires every step in opencode's
agent loop (messages are reloaded from DB each step via
filterCompactedEffect). A 10-step turn triggered 10 redundant file
reads + 10 regex parses for content that never changes during a session.
Changes:
- Add module-level _bootstrapCache (undefined = not loaded, null = file
missing) so the first call reads and parses SKILL.md, all subsequent
calls return the cached string with zero filesystem access
- Cache the null sentinel when SKILL.md is missing, preventing repeated
fs.existsSync probes
- Add _testing export (resetCache/getCache) for test infrastructure
- Clarify the injection guard comment explaining how it interacts with
opencode's per-step message reloading
- Add 15 regression tests covering cache behavior, fs call counts,
injection guard, missing file sentinel, cache reset, and source audit
Fixes#1202
* test(opencode): simplify bootstrap cache coverage
* docs: clarify opencode install caveats
* test(opencode): modernize integration tests
* docs: add Factory Droid installation instructions
* Preserve Codex marketplace metadata
* docs: add README quickstart install links (#1293)
* docs(codex-tools): fix subagent wait mapping to wait_agent
Update the Codex tool mapping so Claude Code 'Task returns result' maps to the current Codex spawned-agent result tool, wait_agent. Also clarify that older Codex builds exposed spawned-agent waiting as wait, while current bare wait is the code-mode exec/wait surface for yielded exec cells.
Verified with Drill:
- codex-tool-mapping-comprehension fails against dev with task_returns_result=wait
- codex-tool-mapping-comprehension passes against this PR with task_returns_result=wait_agent and exec/wait scoped correctly
- codex-subagent-wait-mapping passes against this PR with spawn_agent -> wait_agent -> close_agent and PR963_OK returned
* fix(cursor): run SessionStart hook via run-hook.cmd on Windows
Route Cursor's Windows SessionStart hook through the existing run-hook.cmd dispatcher instead of invoking the extensionless session-start script directly. This avoids Windows opening the extensionless hook file and lets Git Bash run the script as intended.
Also removed an accidental UTF-8 BOM from hooks-cursor.json before merging.
Verified:
- hooks-cursor.json parses as JSON and has no BOM
- command is ./hooks/run-hook.cmd session-start
- CURSOR_PLUGIN_ROOT=/tmp/superpowers ./hooks/run-hook.cmd session-start emits valid Cursor JSON with additional_context
* fix(tests): make SDD integration test actually run its assertions
The SDD integration test silently bailed before printing any verification
results. Three independent bugs caused this:
1. `WORKING_DIR_ESCAPED` was computed from `$SCRIPT_DIR/../..` without
resolving `..` segments. The resulting "directory" name contained
literal `..` so `find` was looking in a path that doesn't exist.
2. With `set -euo pipefail`, the `find ... | sort -r | head -1` pipeline
could exit non-zero (SIGPIPE on the producer when head closes early),
killing the script silently before assertions ran.
3. The `claude -p` invocation never passed `--plugin-dir`, so it loaded
the installed plugin instead of the working tree. Local edits to
skills under test were not actually being tested.
Other adjustments:
- Run claude from inside the unique TEST_PROJECT directory instead of
from the plugin root, so its session JSONL lives in its own
`~/.claude/projects/` folder and doesn't race other concurrent
claude sessions for "most recent file".
- Use the same character-normalization claude does (every non-alphanumeric
becomes `-`) when computing the session dir name; macOS-resolved
`/private/var/...` paths and tmp dirs with `.`/`_` in their names need
this to round-trip correctly.
- Accept either `"name":"Agent"` or `"name":"Task"` in the subagent count
— the harness renamed the tool but the test wasn't updated.
Verified on this branch: all six verification tests now pass against a
real end-to-end SDD run (skill invoked, 7 subagents dispatched, 6
TodoWrite calls, working code produced, tests pass, no extra features).
* feat: add Gemini CLI subagent support mapping
Map Gemini Task dispatch to @agent-name/@generalist and document parallel subagent dispatch for independent tasks.
* docs: update Codex plugin install guidance (#1288)
* Lift superpowers:code-reviewer agent into the requesting-code-review skill
The plugin had a single named agent (`agents/code-reviewer.md`) used by
two skills, while every other reviewer/implementer subagent in the repo
is dispatched as `general-purpose` with the prompt template living
alongside its skill. That asymmetry had no upside and several costs:
- Two sources of truth for the code review checklist (the agent file
and `requesting-code-review/code-reviewer.md`), both drifting
independently.
- `Codex` users could not use the named agent directly; the codex-tools
reference doc had a workaround section explaining how to flatten the
named agent into a `worker` dispatch.
- No third-party reliance on `superpowers:code-reviewer` inside this
repo.
Changes:
- Merge `agents/code-reviewer.md` (persona + checklist) and
`skills/requesting-code-review/code-reviewer.md` (placeholder
template) into a single self-contained Task-dispatch template,
matching the shape of `implementer-prompt.md`,
`spec-reviewer-prompt.md`, etc.
- Update `skills/requesting-code-review/SKILL.md` and
`skills/subagent-driven-development/code-quality-reviewer-prompt.md`
to dispatch `Task (general-purpose)` instead of the named agent.
- Drop the now-obsolete "Named agent dispatch" workaround sections from
`codex-tools.md` and `copilot-tools.md` — superpowers no longer ships
any named agents, so those instructions documented nothing.
- Delete `agents/code-reviewer.md` and the empty `agents/` directory.
Tier 3 coverage for the change: a new behavioral test
`tests/claude-code/test-requesting-code-review.sh` plants real bugs
(SQL injection, plaintext password handling, credential logging) into
a tiny project, runs the actual `requesting-code-review` skill against
the working tree, and asserts the dispatched reviewer flags every
planted issue at Critical/Important severity and refuses to approve
the diff.
Verified end-to-end on this branch:
- The new test passes (5/5 assertions; reviewer caught all planted
bugs and several others).
- The existing SDD integration test still passes (7/7 subagents
dispatched, all as `general-purpose`; spec compliance still
rejects extra features; produced code is correct).
- Session JSONLs confirm zero remaining `superpowers:code-reviewer`
dispatches anywhere in the SDD pipeline.
* Prepare v5.1.0: release notes and version bump
Add v5.1.0 release notes covering:
- Removals: legacy slash commands (/brainstorm, /execute-plan,
/write-plan), skill Integration sections
- Worktree skills rewrite (PRI-974, PR #1121)
- Contributor guidelines for AI agents
- Codex plugin mirror tooling (PR #1165)
- OpenCode bootstrap caching (#1202)
- SDD pause-every-3-tasks fix; SDD integration test fixes
- Cursor Windows hook routing
- Gemini CLI subagent dispatch mapping
- Skill terminology cleanups
- Install docs (Factory Droid, Codex, quickstart links)
Bumps version 5.0.7 -> 5.1.0 across all declared files via
scripts/bump-version.sh; not yet tagged or released.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Drew Ritter <drewritter@workerbee.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Drew Ritter <drew@primeradiant.com>
Co-authored-by: Blaž Čulina <culina.blaz@nsoft.com>
Co-authored-by: Jesse Vincent <jesse@primeradiant.com>
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Richard Luo <luo.richard@gmail.com>
Co-authored-by: Drew Ritter <drew@ritter.dev>
Co-authored-by: leonsong09 <59187950+leonsong09@users.noreply.github.com>
Co-authored-by: YuXiang Hong <41331696+starumiQAQ@users.noreply.github.com>
Co-authored-by: Sathvik Gilakamsetty <spacetime1007@gmail.com>
Most new-harness PRs ship integrations that copy skill files or wrap
with `npx skills` instead of loading the using-superpowers bootstrap at
session start. Those integrations look like they work but skills never
auto-trigger.
Add an acceptance test ("Let's make a react todo list" must auto-trigger
brainstorming in a clean session) and require the transcript in the PR.
- commit .codex-plugin/plugin.json and brand assets in this repo
- sync tracked Codex plugin files instead of generating or seeding them
- honor upstream gitignored files during rsync
- cover the new sync behavior with regression tests
Codex plugin pages use interface.defaultPrompt to show suggested
prompts on the plugin's app card; the generator now emits two
domain-neutral seed prompts so the superpowers listing isn't empty.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rsync exclude patterns without a leading "/" match any directory of
the given name at any depth. The previous "scripts/" pattern was
meant to exclude upstream's top-level scripts/ dir (which contains
sync-to-codex-plugin.sh itself, bump-version.sh, etc.) but also
incorrectly excluded skills/brainstorming/scripts/ — a legitimate
skill-adjacent dir with 5 files (frame-template.html, helper.js,
server.cjs, start-server.sh, stop-server.sh).
Found during a determinism check: comparing the hand-crafted
add-superpowers-plugin bootstrap PR against an automated bootstrap
PR produced a diff showing those 5 files were missing from the
automated version.
Fix: anchor every top-level-only exclude with a leading "/".
.DS_Store stays unanchored because Finder creates them anywhere.
This also prevents future drift if anyone adds a tests/, hooks/,
docs/, lib/, etc. subdir inside a skill.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two coupled changes:
1. Add assets/ to EXCLUDES. A normal sync run was deleting
plugins/superpowers/assets/ via --delete because the corresponding
directory doesn't exist upstream. Confirmed via dry-run that the
previous version would wipe both brand asset files on next sync.
2. Add --bootstrap and --assets-src flags to support creating the
initial plugin PR from scratch. Bootstrap mode skips the
"plugin must exist on base" preflight, creates the plugin
directory, rsyncs upstream content, then copies
PrimeRadiant_Favicon.{svg,png} from --assets-src into
plugins/superpowers/assets/ as superpowers-small.svg and
app-icon.png. Run once by one team member to open the initial
PR; every subsequent run is a normal (non-bootstrap) sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The live .codex-plugin/plugin.json in the downstream fork was cleaned up
(websiteURL, privacyPolicyURL, termsOfServiceURL, and defaultPrompt
removed) and icon fields were added (composerIcon, logo pointing at
assets/superpowers-small.svg and assets/app-icon.png). Update the
heredoc to produce the same shape so future sync runs don't wipe the
icon fields or reintroduce the removed URL fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove CODE_OF_CONDUCT.md from EXCLUDES so it syncs from upstream
(per PR #1165 review feedback on the exclude list)
- Remove the agents/openai.yaml overlay generator and its exclude entry
— the file duplicates fields already in .codex-plugin/plugin.json and
only 6 of 28 upstream plugins ship one, so we match the 22-plugin
majority shape
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous version was a local rsync helper that required a hand-maintained
destination path. This rewrite makes it path/user-agnostic and gives every team
member the same flow:
- Clones prime-radiant-inc/openai-codex-plugins fresh into a temp dir per run
(trap EXIT cleans up)
- Auto-detects upstream from the script's own location
- Preflight: rsync, git, gh auth, python3, upstream package.json
- Reads upstream version from package.json and bakes it into the regenerated
.codex-plugin/plugin.json, so version bumps flow through
- Regenerates both overlay files (.codex-plugin/plugin.json and
agents/openai.yaml) inline via heredoc — single source of truth
- Pushes a sync/superpowers-<sha>-<UTC-timestamp> branch and opens a PR via
gh pr create; prints PR URL and /files diff URL on completion
- --dry-run, --yes, --base BRANCH, --local PATH flags for all the usual modes
- Deterministic: two runs against the same upstream SHA produce PRs with
identical diffs, so the tool itself can be sanity-checked by running twice
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Speak directly to AI agents at the top of CLAUDE.md: reframe slop
PRs as harmful to their human partner, give a concrete pre-submission
checklist, and explicitly authorize pushing back on vague instructions.
CLAUDE.md (symlinked to AGENTS.md) covers every major rejection
pattern from auditing the last 100 closed PRs (94% rejection rate):
AI slop, ignored PR template, duplicates, speculative fixes, domain-
specific skills, fork confusion, fabricated content, bundled changes,
and misunderstanding project philosophy.
Move bootstrap injection from experimental.chat.system.transform to
experimental.chat.messages.transform, prepending to the first user
message instead of adding a system message.
This avoids two issues:
- System messages repeated every turn inflate token usage (#750)
- Multiple system messages break Qwen and other models (#894)
Tested on OpenCode 1.3.2 with Claude Sonnet 4.5 — brainstorming skill
fires correctly on "Let's make a React to do list" prompt.
The bootstrap text advertised a configDir-based skills path that didn't
match the runtime path (resolved relative to the plugin file). Tests
used yet another hardcoded path and referenced a nonexistent lib/ dir.
- Remove misleading skills path from bootstrap text; the agent should
use the native skill tool, not read files by path
- Fix test setup to create a consistent layout matching the plugin's
../../skills resolution
- Export SUPERPOWERS_SKILLS_DIR from setup.sh so tests use a single
source of truth
- Add regression test that bootstrap doesn't advertise the old path
- Remove broken cp of nonexistent lib/ directory
Fixes#847
Copilot CLI v1.0.11 reads `additionalContext` from sessionStart hook
output, but the session-start script only emits the Claude Code-specific
nested format. Add COPILOT_CLI env var detection so Copilot CLI gets the
SDK-standard top-level `additionalContext` while Claude Code continues
getting `hookSpecificOutput`.
Based on PR #910 by @culinablaz.
Two bugs caused the brainstorm server to self-terminate within 60s:
1. ownerAlive() treated EPERM (permission denied) as "process dead".
When the owner PID belongs to a different user (Tailscale SSH,
system daemons), process.kill(pid, 0) throws EPERM — but the
process IS alive. Fixed: return e.code === 'EPERM'.
2. On WSL, the grandparent PID resolves to a short-lived subprocess
that exits before the first 60s lifecycle check. The PID is
genuinely dead (ESRCH), so the EPERM fix alone doesn't help.
Fixed: validate the owner PID at server startup — if it's already
dead, it was a bad resolution, so disable monitoring and rely on
the 30-minute idle timeout.
This also removes the Windows/MSYS2-specific OWNER_PID="" carve-out
from start-server.sh, since the server now handles invalid PIDs
generically at startup regardless of platform.
Tested on Linux (magic-kingdom) via Tailscale SSH:
- Root-owned owner PID (EPERM): server survives ✓
- Dead owner PID at startup (WSL sim): monitoring disabled, survives ✓
- Valid owner that dies: server shuts down within 60s ✓
Fixes#879
ownerAlive() treated EPERM (permission denied) the same as ESRCH
(process not found), causing the server to self-terminate within 60s
whenever the owner process ran as a different user. This affected WSL
(owner is a Windows process), Tailscale SSH, and any cross-user
scenario.
The fix: `return e.code === 'EPERM'` — if we get permission denied,
the process is alive; we just can't signal it.
Tested on Linux via Tailscale SSH with a root-owned grandparent PID:
- Server survives past the 60s lifecycle check (EPERM = alive)
- Server still shuts down when owner genuinely dies (ESRCH = dead)
Fixes#879
The session directory now contains two peers: content/ (HTML served to
the browser) and state/ (events, server-info, pid, log). Previously
all files shared a single directory, making server state and user
interaction data accessible over the /files/ HTTP route.
Also fixes stale test assertion ("Waiting for Claude" → "Waiting for
the agent").
Reported-By: 吉田仁
Metadata files (.server-info, .events, .server.pid, .server.log,
.server-stopped) were stored in the same directory served over HTTP,
making them accessible via the /files/ route. They now live in a .meta/
subdirectory that is not web-accessible.
Also fixes a stale test assertion ("Waiting for Claude" → "Waiting for
the agent").
Reported-By: 吉田仁
The subagent review loop (dispatching a fresh agent to review plans/specs)
doubled execution time (~25 min overhead) without measurably improving plan
quality. Regression testing across 5 versions (v3.6.0 through v5.0.4) with
5 trials each showed identical plan sizes, task counts, and quality scores
regardless of whether the review loop ran.
Changes:
- writing-plans: Replace subagent Plan Review Loop with inline Self-Review
checklist (spec coverage, placeholder scan, type consistency)
- writing-plans: Add explicit "No Placeholders" section listing plan failures
(TBD, vague descriptions, undefined references, "similar to Task N")
- brainstorming: Replace subagent Spec Review Loop with inline Spec Self-Review
(placeholder scan, internal consistency, scope check, ambiguity check)
- Both skills now use "look at it with fresh eyes" framing
Testing: 5 trials with the new skill show self-review catches 3-5 real bugs
per run (spawn positions, API mismatches, seed bugs, grid indexing) in ~30s
instead of ~25 min. Remaining defects are comparable to the subagent approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 tasks covering: environment detection in using-git-worktrees,
Step 1.5 + cleanup guard in finishing-a-development-branch,
Integration line updates, codex-tools.md docs, automated tests,
and final verification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>