mirror of https://github.com/obra/superpowers.git synced 2026-06-11 21:29:07 +08:00

Files

Jesse Vincent f2cbfbefeb Release v5.1.0 (#1468 )

* docs: add Codex App compatibility design spec (PRI-823)

Design for making using-git-worktrees, finishing-a-development-branch,
and subagent-driven-development skills work in the Codex App's sandboxed
worktree environment. Read-only environment detection via git-dir vs
git-common-dir comparison, ~48 lines across 4 files, zero breaking changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address spec review feedback for PRI-823

Fix three Important issues from spec review:
- Clarify Step 1.5 placement relative to existing Steps 2/3
- Re-derive environment state at cleanup time instead of relying on
  earlier skill output
- Acknowledge pre-existing Step 5 cleanup inconsistency

Also: precise step references, exact codex-tools.md content, clearer
Integration section update instructions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address team review feedback for PRI-823 spec

- Add commit SHA + data loss warning to handoff payload (HIGH)
- Add explicit commit step before handoff (HIGH)
- Remove misleading "mark as externally managed" from Path B
- Add executing-plans 1-line edit (was missing)
- Add branch name derivation rules
- Add conditional UI language for non-App environments
- Add sandbox fallback for permission errors
- Add STOP directive after Step 0 reporting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify executing-plans in What Does NOT Change section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add cleanup guard test (#5) and sandbox fallback test (#10) to spec

Both tests address real risk scenarios:
- #5: cleanup guard bug would delete Codex App's own worktree (data loss)
- #10: Local thread sandbox fallback needs manual Codex App validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add implementation plan for Codex App compatibility (PRI-823)

8 tasks covering: environment detection in using-git-worktrees,
Step 1.5 + cleanup guard in finishing-a-development-branch,
Integration line updates, codex-tools.md docs, automated tests,
and final verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(codex-tools): add named agent dispatch mapping for Codex (#647)

* fix(writing-skills): correct false 'only two fields' frontmatter claim (#882)

* Replace subagent review loops with lightweight inline self-review

The subagent review loop (dispatching a fresh agent to review plans/specs)
doubled execution time (~25 min overhead) without measurably improving plan
quality. Regression testing across 5 versions (v3.6.0 through v5.0.4) with
5 trials each showed identical plan sizes, task counts, and quality scores
regardless of whether the review loop ran.

Changes:
- writing-plans: Replace subagent Plan Review Loop with inline Self-Review
  checklist (spec coverage, placeholder scan, type consistency)
- writing-plans: Add explicit "No Placeholders" section listing plan failures
  (TBD, vague descriptions, undefined references, "similar to Task N")
- brainstorming: Replace subagent Spec Review Loop with inline Spec Self-Review
  (placeholder scan, internal consistency, scope check, ambiguity check)
- Both skills now use "look at it with fresh eyes" framing

Testing: 5 trials with the new skill show self-review catches 3-5 real bugs
per run (spawn positions, API mismatches, seed bugs, grid indexing) in ~30s
instead of ~25 min. Remaining defects are comparable to the subagent approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "Replace subagent review loops with lightweight inline self-review"

This reverts commit bf8f7572eb.

* Reapply "Replace subagent review loops with lightweight inline self-review"

This reverts commit b045fa3950.

* Add v5.0.6 release notes

* Move brainstorm server metadata to .meta/ subdirectory

Metadata files (.server-info, .events, .server.pid, .server.log,
.server-stopped) were stored in the same directory served over HTTP,
making them accessible via the /files/ route. They now live in a .meta/
subdirectory that is not web-accessible.

Also fixes a stale test assertion ("Waiting for Claude" → "Waiting for
the agent").

Reported-By: 吉田仁

* Revert "Move brainstorm server metadata to .meta/ subdirectory"

This reverts commit ab500dade6.

* Separate brainstorm server content and state into peer directories

The session directory now contains two peers: content/ (HTML served to
the browser) and state/ (events, server-info, pid, log). Previously
all files shared a single directory, making server state and user
interaction data accessible over the /files/ HTTP route.

Also fixes stale test assertion ("Waiting for Claude" → "Waiting for
the agent").

Reported-By: 吉田仁

* Fix owner-PID false positive when owner runs as different user

ownerAlive() treated EPERM (permission denied) the same as ESRCH
(process not found), causing the server to self-terminate within 60s
whenever the owner process ran as a different user. This affected WSL
(owner is a Windows process), Tailscale SSH, and any cross-user
scenario.

The fix: `return e.code === 'EPERM'` — if we get permission denied,
the process is alive; we just can't signal it.

Tested on Linux via Tailscale SSH with a root-owned grandparent PID:
- Server survives past the 60s lifecycle check (EPERM = alive)
- Server still shuts down when owner genuinely dies (ESRCH = dead)

Fixes #879

* Fix owner-PID lifecycle monitoring for cross-platform reliability

Two bugs caused the brainstorm server to self-terminate within 60s:

1. ownerAlive() treated EPERM (permission denied) as "process dead".
   When the owner PID belongs to a different user (Tailscale SSH,
   system daemons), process.kill(pid, 0) throws EPERM — but the
   process IS alive. Fixed: return e.code === 'EPERM'.

2. On WSL, the grandparent PID resolves to a short-lived subprocess
   that exits before the first 60s lifecycle check. The PID is
   genuinely dead (ESRCH), so the EPERM fix alone doesn't help.
   Fixed: validate the owner PID at server startup — if it's already
   dead, it was a bad resolution, so disable monitoring and rely on
   the 30-minute idle timeout.

This also removes the Windows/MSYS2-specific OWNER_PID="" carve-out
from start-server.sh, since the server now handles invalid PIDs
generically at startup regardless of platform.

Tested on Linux (magic-kingdom) via Tailscale SSH:
- Root-owned owner PID (EPERM): server survives ✓
- Dead owner PID at startup (WSL sim): monitoring disabled, survives ✓
- Valid owner that dies: server shuts down within 60s ✓

Fixes #879

* Release v5.0.6: inline self-review, brainstorm server restructure, owner-PID fixes

* fix: add Copilot CLI platform detection for sessionStart context injection

Copilot CLI v1.0.11 reads `additionalContext` from sessionStart hook
output, but the session-start script only emits the Claude Code-specific
nested format. Add COPILOT_CLI env var detection so Copilot CLI gets the
SDK-standard top-level `additionalContext` while Claude Code continues
getting `hookSpecificOutput`.

Based on PR #910 by @culinablaz.

* feat: add Copilot CLI tool mapping, docs, and install instructions

- Add references/copilot-tools.md with full tool equivalence table
- Add Copilot CLI to using-superpowers skill platform instructions
- Add marketplace install instructions to README
- Add changelog entry crediting @culinablaz for the hook fix

* fix(opencode): align skills path across bootstrap, runtime, and tests

The bootstrap text advertised a configDir-based skills path that didn't
match the runtime path (resolved relative to the plugin file). Tests
used yet another hardcoded path and referenced a nonexistent lib/ dir.

- Remove misleading skills path from bootstrap text; the agent should
  use the native skill tool, not read files by path
- Fix test setup to create a consistent layout matching the plugin's
  ../../skills resolution
- Export SUPERPOWERS_SKILLS_DIR from setup.sh so tests use a single
  source of truth
- Add regression test that bootstrap doesn't advertise the old path
- Remove broken cp of nonexistent lib/ directory

Fixes #847

* docs: add OpenCode path fix to release notes

* fix(opencode): inject bootstrap as user message instead of system message

Move bootstrap injection from experimental.chat.system.transform to
experimental.chat.messages.transform, prepending to the first user
message instead of adding a system message.

This avoids two issues:
- System messages repeated every turn inflate token usage (#750)
- Multiple system messages break Qwen and other models (#894)

Tested on OpenCode 1.3.2 with Claude Sonnet 4.5 — brainstorming skill
fires correctly on "Let's make a React to do list" prompt.

* docs: update release notes with OpenCode bootstrap change

* docs: add worktree rototill design spec (PRI-974)

Design for detect-and-defer worktree support. Superpowers defers to
native harness worktree systems when available, falls back to manual
git worktree creation when not. Covers Phases 0-2: detection, consent,
native tool preference, finishing state detection, and three bug fixes
(#940, #999, #238).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address SWE review feedback on worktree rototill spec

- Fix Bug #999 order: merge → verify → remove worktree → delete branch
  (avoids losing work if merge fails after worktree removal)
- Add submodule guard to Step 0 detection (GIT_DIR != GIT_COMMON is also
  true in submodules)
- Preserve global path (~/.config/superpowers/worktrees/) in detection for
  backward compatibility, just stop offering it to new users
- Add step numbering note and implementation notes section
- Expand provenance heuristic to cover global path and manual creation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: honest spec revisions after issue/PR deep dive

- Step 1a is the load-bearing assumption, not just a risk — if it fails,
  the entire design needs rework. TDD validation must be first impl task.
- #1009 resolution depends on Step 1a working, stated explicitly
- #574 honestly deferred, not "partially addressed"
- Add hooks symlink to Step 1b (PR #965 idea, prevents silent hook loss)
- Add stale worktree pruning to Step 5 (PR #1072 idea, one-line self-heal)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add worktree rototill implementation plan (PRI-974)

5 tasks: TDD gate for Step 1a, using-git-worktrees rewrite,
finishing-a-development-branch rewrite, integration updates,
end-to-end validation. Task 1 is a hard gate — if native tool
preference fails RED/GREEN, stop and redesign.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add RED/GREEN validation for native worktree preference (PRI-974)

Gate test for Step 1a — validates agents prefer EnterWorktree over
git worktree add on Claude Code. Must pass before skill rewrite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: rewrite using-git-worktrees with detect-and-defer (PRI-974)

Step 0: GIT_DIR != GIT_COMMON detection (skip if already isolated)
Step 0 consent: opt-in prompt before creating worktree (#991)
Step 1a: native tool preference (short, first, declarative)
Step 1b: git worktree fallback with hooks symlink and legacy path compat
Submodule guard prevents false detection
Platform-neutral instruction file references (#1049)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: rewrite finishing-a-development-branch with detect-and-defer (PRI-974)

Step 2: environment detection (GIT_DIR != GIT_COMMON) before presenting menu
Detached HEAD: reduced 3-option menu (no merge from detached HEAD)
Provenance-based cleanup: .worktrees/ = ours, anything else = hands off
Bug #940: Option 2 no longer cleans up worktree
Bug #999: merge -> verify -> remove worktree -> delete branch
Bug #238: cd to main repo root before git worktree remove
Stale worktree pruning after removal (git worktree prune)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address spec review findings in both skill rewrites (PRI-974)

using-git-worktrees: submodule guard now says "treat as normal repo"
instead of "proceed to Step 1" (preserves consent flow)
using-git-worktrees: directory priority summaries include global legacy

finishing-a-development-branch: move git branch -d after Step 6 cleanup
to make Bug #999 ordering unambiguous (merge -> worktree remove -> branch delete)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update worktree integration references across skills (PRI-974)

Remove REQUIRED language from executing-plans and subagent-driven-development.
Consent and detection now live inside using-git-worktrees itself.
Fix stale 'created by brainstorming' claim in writing-plans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: include worktrees/ (non-hidden) in finishing provenance check (PRI-974)

The creation skill supports both .worktrees/ and worktrees/ directories,
but the finishing skill's cleanup only checked .worktrees/. Worktrees
under the non-hidden path would be orphaned on merge or discard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Step 1a validated through TDD — explicit naming + consent bridge (PRI-974)

Step 1a failed at 2/6 with the spec's original abstract text ("use your
native tool"). Three REFACTOR iterations found what works (50/50 runs):

1. Explicit tool naming — "do you have EnterWorktree, WorktreeCreate..."
   transforms interpretation into factual toolkit check
2. Consent bridge — "user's consent is your authorization" directly
   addresses EnterWorktree's "ONLY when user explicitly asks" guardrail
3. Red Flag entry naming the specific anti-pattern

File split was tested but proven unnecessary — the fix is the Step 1a
text quality, not physical separation of git commands. Control test
with full 240-line skill (all git commands visible) passed 20/20.

Test script updated: supports batch runs (./test.sh green 20), "all"
phase, and checks absence of git worktree add (reliable signal) rather
than presence of EnterWorktree text (agent sometimes omits tool name).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update spec with TDD findings on Step 1a (PRI-974)

Step 1a's original "deliberately short, abstract" design was disproven
by TDD (2/6 pass rate). Spec now documents the validated approach:
explicit tool naming + consent bridge + red flag (50/50 pass rate).

- Design Principles: updated to reflect explicit naming over abstraction
- Step 1a: replaced abstract text with validated approach, added design
  note explaining the TDD revision and why file splitting was unnecessary
- Risks: Step 1a risk marked RESOLVED with cross-platform validation table
  and residual risk note about upstream tool description dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: honest cross-platform validation table in spec (PRI-974)

Research confirmed Claude Code is currently the only harness with an
agent-callable mid-session worktree tool. All others either create
worktrees before the agent starts (Codex App, Gemini, Cursor) or have
no native support (Codex CLI, OpenCode).

Table now shows: what was actually tested (Claude Code 50/50, Codex CLI
6/6), what was simulated (Codex App 1/1), and what's untested (Gemini,
Cursor, OpenCode). Step 1a is forward-compatible for when other
harnesses add agent-callable tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: cross-platform validation on 5 harnesses (PRI-974)

Tested on Gemini CLI (gemini -p) and Cursor Agent (cursor-agent -p):
- Gemini: Step 0 detection 1/1, Step 1b fallback 1/1
- Cursor: Step 0 detection 1/1, Step 1b fallback 1/1

Both correctly identified no native agent-callable worktree tool,
fell through to git worktree add, and performed safety verification.
Both correctly detected existing worktrees and skipped creation.

5 of 6 harnesses now tested. Only OpenCode untested (no CLI access).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove incorrect hooks symlink step from worktree skill

Git worktrees inherit hooks from the main repo automatically via
$GIT_COMMON_DIR — this has been the case since git 2.5 (2015).
The symlink step was based on an incorrect premise from PR #965
and also fails in practice (.git is a file in worktrees, not a dir).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: address PR #1121 review — respect user preference, drop y/n

- Consent prompt: drop "(y/n)" and add escape valve for users who
  have already declared their worktree preference in global or
  project agent instruction files.
- Directory selection: reorder to put declared user preference
  ahead of observed filesystem state, and reframe the default as
  "if no other guidance available".
- Sandbox fallback: require explicitly informing the user that
  the sandbox blocked creation, not just "report accordingly".
- writing-plans: fully qualify the superpowers:using-git-worktrees
  reference.
- Plan doc: mirror the consent-prompt change.

Step 1a native-tool framing and the helper-scripts suggestion are
still outstanding — the first needs a benchmark re-run before softer
phrasing can be adopted without regressing compliance; the second is
exploratory and will get a thread reply.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: soften Step 1a native-tool framing per PR #1121 review

Address obra's comment on explicit step numbers / prescriptive tone.
Drops "STOP HERE if available", the "If YES:" gate, and the "even if /
even if / NO EXCEPTIONS" reinforcement paragraph. Keeps the specific
tool-name anchors (EnterWorktree, WorktreeCreate, /worktree, --worktree),
which the original TDD data showed are load-bearing.

A/B verified against drill harness on the 3 creation/consent scenarios
(consent-flow, creation-from-main, creation-from-main-spec-aware):
baseline explicit wording scored 12/12 criteria, softened wording also
scored 12/12. The "agent used the most appropriate tool" criterion
passed in all 3 softened runs — agents still picked EnterWorktree via
ToolSearch without the imperative framing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: drop instruction file enumeration per PR #1121 review

Jesse flagged that the verbose CLAUDE.md/AGENTS.md/GEMINI.md/.cursorrules
enumeration (a) chews tokens, (b) confuses models that anchor on exact
strings, and (c) is repeated DRY-violatingly across 3+ locations.

Replace with abstract "your instructions" framing in four spots:
- skills/using-git-worktrees/SKILL.md Step 0 → Step 1 transition
- skills/using-git-worktrees/SKILL.md Step 1b Directory Selection
- docs/superpowers/plans/2026-04-06-worktree-rototill.md (both mirror locations)

Same intent, harness-agnostic phrasing, ~half the tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace hardcoded /Users/jesse with generic placeholders (#858)

* Remove the deprecated legacy slash commands (#1188)

* fix: prevent subagent-driven-development from pausing every 3 tasks

requesting-code-review had "review after each batch (3 tasks)" for
executing-plans, which leaked into subagent-driven-development as a
check-in cadence. Replaced with flexible "each task or at natural
checkpoints" and added explicit continuous execution directive to
subagent-driven-development.

* Remove Integration sections from skills

These sections don't help with steering and are a legacy of the time
before agents had native skills systems.

* fix(opencode): cache bootstrap content at module level to eliminate per-step file I/O

getBootstrapContent() called fs.existsSync + fs.readFileSync + regex
frontmatter parsing on every agent step with zero caching.  The
experimental.chat.messages.transform hook fires every step in opencode's
agent loop (messages are reloaded from DB each step via
filterCompactedEffect).  A 10-step turn triggered 10 redundant file
reads + 10 regex parses for content that never changes during a session.

Changes:
- Add module-level _bootstrapCache (undefined = not loaded, null = file
  missing) so the first call reads and parses SKILL.md, all subsequent
  calls return the cached string with zero filesystem access
- Cache the null sentinel when SKILL.md is missing, preventing repeated
  fs.existsSync probes
- Add _testing export (resetCache/getCache) for test infrastructure
- Clarify the injection guard comment explaining how it interacts with
  opencode's per-step message reloading
- Add 15 regression tests covering cache behavior, fs call counts,
  injection guard, missing file sentinel, cache reset, and source audit

Fixes #1202

* test(opencode): simplify bootstrap cache coverage

* docs: clarify opencode install caveats

* test(opencode): modernize integration tests

* docs: add Factory Droid installation instructions

* Preserve Codex marketplace metadata

* docs: add README quickstart install links (#1293)

* docs(codex-tools): fix subagent wait mapping to wait_agent

Update the Codex tool mapping so Claude Code 'Task returns result' maps to the current Codex spawned-agent result tool, wait_agent. Also clarify that older Codex builds exposed spawned-agent waiting as wait, while current bare wait is the code-mode exec/wait surface for yielded exec cells.

Verified with Drill:
- codex-tool-mapping-comprehension fails against dev with task_returns_result=wait
- codex-tool-mapping-comprehension passes against this PR with task_returns_result=wait_agent and exec/wait scoped correctly
- codex-subagent-wait-mapping passes against this PR with spawn_agent -> wait_agent -> close_agent and PR963_OK returned

* fix(cursor): run SessionStart hook via run-hook.cmd on Windows

Route Cursor's Windows SessionStart hook through the existing run-hook.cmd dispatcher instead of invoking the extensionless session-start script directly. This avoids Windows opening the extensionless hook file and lets Git Bash run the script as intended.

Also removed an accidental UTF-8 BOM from hooks-cursor.json before merging.

Verified:
- hooks-cursor.json parses as JSON and has no BOM
- command is ./hooks/run-hook.cmd session-start
- CURSOR_PLUGIN_ROOT=/tmp/superpowers ./hooks/run-hook.cmd session-start emits valid Cursor JSON with additional_context

* fix(tests): make SDD integration test actually run its assertions

The SDD integration test silently bailed before printing any verification
results. Three independent bugs caused this:

1. `WORKING_DIR_ESCAPED` was computed from `$SCRIPT_DIR/../..` without
   resolving `..` segments. The resulting "directory" name contained
   literal `..` so `find` was looking in a path that doesn't exist.

2. With `set -euo pipefail`, the `find ... | sort -r | head -1` pipeline
   could exit non-zero (SIGPIPE on the producer when head closes early),
   killing the script silently before assertions ran.

3. The `claude -p` invocation never passed `--plugin-dir`, so it loaded
   the installed plugin instead of the working tree. Local edits to
   skills under test were not actually being tested.

Other adjustments:
- Run claude from inside the unique TEST_PROJECT directory instead of
  from the plugin root, so its session JSONL lives in its own
  `~/.claude/projects/` folder and doesn't race other concurrent
  claude sessions for "most recent file".
- Use the same character-normalization claude does (every non-alphanumeric
  becomes `-`) when computing the session dir name; macOS-resolved
  `/private/var/...` paths and tmp dirs with `.`/`_` in their names need
  this to round-trip correctly.
- Accept either `"name":"Agent"` or `"name":"Task"` in the subagent count
  — the harness renamed the tool but the test wasn't updated.

Verified on this branch: all six verification tests now pass against a
real end-to-end SDD run (skill invoked, 7 subagents dispatched, 6
TodoWrite calls, working code produced, tests pass, no extra features).

* feat: add Gemini CLI subagent support mapping

Map Gemini Task dispatch to @agent-name/@generalist and document parallel subagent dispatch for independent tasks.

* docs: update Codex plugin install guidance (#1288)

* Lift superpowers:code-reviewer agent into the requesting-code-review skill

The plugin had a single named agent (`agents/code-reviewer.md`) used by
two skills, while every other reviewer/implementer subagent in the repo
is dispatched as `general-purpose` with the prompt template living
alongside its skill. That asymmetry had no upside and several costs:

- Two sources of truth for the code review checklist (the agent file
  and `requesting-code-review/code-reviewer.md`), both drifting
  independently.
- `Codex` users could not use the named agent directly; the codex-tools
  reference doc had a workaround section explaining how to flatten the
  named agent into a `worker` dispatch.
- No third-party reliance on `superpowers:code-reviewer` inside this
  repo.

Changes:
- Merge `agents/code-reviewer.md` (persona + checklist) and
  `skills/requesting-code-review/code-reviewer.md` (placeholder
  template) into a single self-contained Task-dispatch template,
  matching the shape of `implementer-prompt.md`,
  `spec-reviewer-prompt.md`, etc.
- Update `skills/requesting-code-review/SKILL.md` and
  `skills/subagent-driven-development/code-quality-reviewer-prompt.md`
  to dispatch `Task (general-purpose)` instead of the named agent.
- Drop the now-obsolete "Named agent dispatch" workaround sections from
  `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships
  any named agents, so those instructions documented nothing.
- Delete `agents/code-reviewer.md` and the empty `agents/` directory.

Tier 3 coverage for the change: a new behavioral test
`tests/claude-code/test-requesting-code-review.sh` plants real bugs
(SQL injection, plaintext password handling, credential logging) into
a tiny project, runs the actual `requesting-code-review` skill against
the working tree, and asserts the dispatched reviewer flags every
planted issue at Critical/Important severity and refuses to approve
the diff.

Verified end-to-end on this branch:
- The new test passes (5/5 assertions; reviewer caught all planted
  bugs and several others).
- The existing SDD integration test still passes (7/7 subagents
  dispatched, all as `general-purpose`; spec compliance still
  rejects extra features; produced code is correct).
- Session JSONLs confirm zero remaining `superpowers:code-reviewer`
  dispatches anywhere in the SDD pipeline.

* Prepare v5.1.0: release notes and version bump

Add v5.1.0 release notes covering:
- Removals: legacy slash commands (/brainstorm, /execute-plan,
  /write-plan), skill Integration sections
- Worktree skills rewrite (PRI-974, PR #1121)
- Contributor guidelines for AI agents
- Codex plugin mirror tooling (PR #1165)
- OpenCode bootstrap caching (#1202)
- SDD pause-every-3-tasks fix; SDD integration test fixes
- Cursor Windows hook routing
- Gemini CLI subagent dispatch mapping
- Skill terminology cleanups
- Install docs (Factory Droid, Codex, quickstart links)

Bumps version 5.0.7 -> 5.1.0 across all declared files via
scripts/bump-version.sh; not yet tagged or released.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Drew Ritter <drewritter@workerbee.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Drew Ritter <drew@primeradiant.com>
Co-authored-by: Blaž Čulina <culina.blaz@nsoft.com>
Co-authored-by: Jesse Vincent <jesse@primeradiant.com>
Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Richard Luo <luo.richard@gmail.com>
Co-authored-by: Drew Ritter <drew@ritter.dev>
Co-authored-by: leonsong09 <59187950+leonsong09@users.noreply.github.com>
Co-authored-by: YuXiang Hong <41331696+starumiQAQ@users.noreply.github.com>
Co-authored-by: Sathvik Gilakamsetty <spacetime1007@gmail.com>

2026-05-04 15:05:01 -07:00

20 KiB

Raw Blame History

Worktree Rototill: Detect-and-Defer

Date: 2026-04-06 Status: Draft Ticket: PRI-974 Subsumes: PRI-823 (Codex App compatibility)

Problem

Superpowers is opinionated about worktree management — specific paths (.worktrees/<branch>), specific commands (git worktree add), specific cleanup (git worktree remove). Meanwhile, Claude Code, Codex App, Gemini CLI, and Cursor all provide native worktree support with their own paths, lifecycle management, and cleanup.

This creates three failure modes:

Duplication — on Claude Code, the skill does what EnterWorktree/ExitWorktree already does
Conflict — on Codex App, the skill tries to create worktrees inside an already-managed worktree
Phantom state — skill-created worktrees at .worktrees/ are invisible to the harness; harness-created worktrees at .claude/worktrees/ are invisible to the skill

For harnesses without native support (Codex CLI, OpenCode, Copilot standalone), superpowers fills a real gap. The skill shouldn't go away — it should get out of the way when native support exists.

Goals

Defer to native harness worktree systems when they exist
Continue providing worktree support for harnesses that lack it
Fix three known bugs in finishing-a-development-branch (#940, #999, #238)
Make worktree creation opt-in rather than mandatory (#991)
Replace hardcoded CLAUDE.md references with platform-neutral language (#1049)

Non-Goals

Per-worktree environment conventions (.worktree-env.sh, port offsetting) — Phase 4
PreToolUse hooks for path enforcement — Phase 4
Multi-repo worktree documentation — Phase 4
Brainstorming checklist changes for worktrees — Phase 4
.superpowers-session.json metadata tracking (interesting PR #997 idea, not needed for v1)
Hooks symlinking into worktrees (PR #965 idea, separate concern)

Design Principles

Detect state, not platform

Use GIT_DIR != GIT_COMMON to determine "am I already in a worktree?" rather than sniffing environment variables to identify the harness. This is a stable git primitive (since git 2.5, 2015), works universally across all harnesses, and requires zero maintenance as new harnesses appear.

Declarative intent, prescriptive fallback

The skill describes the goal ("ensure work happens in an isolated workspace") and defers to native tools when available. It prescribes specific git commands only as a fallback for harnesses without native worktree support. Step 1a comes first and names native tools explicitly (EnterWorktree, WorktreeCreate, /worktree, --worktree); Step 1b comes second with the git fallback. The original spec kept Step 1a abstract ("you know your own toolkit"), but TDD proved that agents anchor on Step 1b's concrete commands when Step 1a is too vague. Explicit tool naming and a consent-authorization bridge were required to make the preference reliable.

Provenance-based ownership

Whoever creates the worktree owns its cleanup. If the harness created it, superpowers doesn't touch it. If superpowers created it (via git fallback), superpowers cleans it up. The heuristic: if the worktree lives under .worktrees/ or ~/.config/superpowers/worktrees/, superpowers owns it. Anything else (.claude/worktrees/, ~/.codex/worktrees/, .gemini/worktrees/) belongs to the harness.

Design

1. `using-git-worktrees` SKILL.md Rewrite

The skill gains three new steps before creation and simplifies the creation flow.

Step 0: Detect Existing Isolation

GIT_DIR=$(cd "$(git rev-parse --git-dir)" 2>/dev/null && pwd -P)
GIT_COMMON=$(cd "$(git rev-parse --git-common-dir)" 2>/dev/null && pwd -P)
BRANCH=$(git branch --show-current)

Three outcomes:

Condition	Meaning	Action
`GIT_DIR == GIT_COMMON`	Normal repo checkout	Proceed to Step 0.5
`GIT_DIR != GIT_COMMON`, named branch	Already in a linked worktree	Skip to Step 3 (project setup). Report: "Already in isolated workspace at `<path>` on branch `<name>`."
`GIT_DIR != GIT_COMMON`, detached HEAD	Externally managed worktree (e.g., Codex App sandbox)	Skip to Step 3. Report: "Already in isolated workspace at `<path>` (detached HEAD, externally managed)."

Step 0 does not care who created the worktree or which harness is running. A worktree is a worktree regardless of origin.

Submodule guard: GIT_DIR != GIT_COMMON is also true inside git submodules. Before concluding "already in a worktree," check that we're not in a submodule:

# If this returns a path, we're in a submodule, not a worktree
git rev-parse --show-superproject-working-tree 2>/dev/null

If in a submodule, treat as GIT_DIR == GIT_COMMON (proceed to Step 0.5).

When Step 0 finds no existing isolation (GIT_DIR == GIT_COMMON), ask before creating:

"Would you like me to set up an isolated worktree? This protects your current branch from changes. (y/n)"

If yes, proceed to Step 1. If no, work in place — skip to Step 3 with no worktree.

This step is skipped entirely when Step 0 detects existing isolation (no point asking about what already exists).

Step 1a: Native Tools (preferred)

The user has asked for an isolated workspace (Step 0 consent). Check your available tools — do you have EnterWorktree, WorktreeCreate, a /worktree command, or a --worktree flag? If YES: the user's consent to create a worktree is your authorization to use it. Use it now and skip to Step 3.

After using a native tool, skip to Step 3 (project setup).

Design note — TDD revision: The original spec used a deliberately short, abstract Step 1a ("You know your own toolkit — the skill does not need to name specific tools"). TDD validation disproved this: agents anchored on Step 1b's concrete git commands and ignored the abstract guidance (2/6 pass rate). Three changes fixed it (50/50 pass rate across GREEN and PRESSURE tests):

Explicit tool naming — listing EnterWorktree, WorktreeCreate, /worktree, --worktree by name transforms the decision from interpretation ("do I have a native tool?") into factual lookup ("is EnterWorktree in my tool list?"). Agents on platforms without these tools simply check, find nothing, and fall through to Step 1b. No false positives observed.
Consent bridge — "the user's consent to create a worktree is your authorization to use it" directly addresses EnterWorktree's tool-level guardrail ("ONLY when user explicitly asks"). Tool descriptions override skill instructions (Claude Code #29950), so the skill must frame user consent as the authorization the tool requires.
Red Flag entry — naming the specific anti-pattern ("Use git worktree add when you have a native worktree tool — this is the #1 mistake") in the Red Flags section.

File splitting (Step 1b in a separate skill) was tested and proven unnecessary. The anchoring problem is solved by the quality of Step 1a's text, not by physical separation of git commands. Control tests with the full 240-line skill (all git commands visible) passed 20/20.

Step 1b: Git Worktree Fallback

When no native tool is available, create a worktree manually.

Directory selection (priority order):

Check for existing .worktrees/ or worktrees/ directory — if found, use it. If both exist, .worktrees/ wins.
Check for existing ~/.config/superpowers/worktrees/<project>/ directory — if found, use it (backward compatibility with legacy global path).
Check the project's agent instruction file (CLAUDE.md, GEMINI.md, AGENTS.md, .cursorrules, or equivalent) for a worktree directory preference.
Default to .worktrees/.

No interactive directory selection prompt. The global path (~/.config/superpowers/worktrees/) is no longer offered as a choice to new users, but existing worktrees at that location are detected and used for backward compatibility.

Safety verification (project-local directories only):

git check-ignore -q .worktrees 2>/dev/null

If not ignored, add to .gitignore and commit before proceeding.

Create:

git worktree add "$path" -b "$BRANCH_NAME"
cd "$path"

Hooks awareness: Git worktrees do not inherit the parent repo's hooks directory. After creating a worktree via 1b, symlink the hooks directory from the main repo if one exists:

if [ -d "$MAIN_ROOT/.git/hooks" ]; then
    ln -sf "$MAIN_ROOT/.git/hooks" "$path/.git/hooks"
fi

This prevents pre-commit checks, linters, and other hooks from silently stopping when work moves to a worktree. (Idea from PR #965.)

Sandbox fallback: If git worktree add fails with a permission error, treat as a restricted environment. Skip creation, work in current directory, proceed to Step 3.

Step numbering note: The current skill has Steps 1-4 as a flat list. This redesign uses 0, 0.5, 1a, 1b, 3, 4. There is no Step 2 — it was the old monolithic "Create Isolated Workspace" which is now split into the 1a/1b structure. The implementation should renumber cleanly (e.g., 0 → "Step 0: Detect", 0.5 → within Step 0's flow, 1a/1b → "Step 1", 3 → "Step 2", 4 → "Step 3") or keep the current numbering with a note. Implementer's choice.

Steps 3-4: Project Setup and Baseline Tests (unchanged)

Regardless of which path created the workspace (Step 0 detected existing, Step 1a native tool, Step 1b git fallback, or no worktree at all), execution converges:

Step 3: Auto-detect and run project setup (npm install, cargo build, pip install, go mod download, etc.)
Step 4: Run the test suite. If tests fail, report failures and ask whether to proceed.

2. `finishing-a-development-branch` SKILL.md Rewrite

The finishing skill gains environment detection and fixes three bugs.

Step 1: Verify Tests (unchanged)

Run the project's test suite. If tests fail, stop. Don't offer completion options.

Step 1.5: Detect Environment (new)

Re-run the same detection as Step 0 in creation:

GIT_DIR=$(cd "$(git rev-parse --git-dir)" 2>/dev/null && pwd -P)
GIT_COMMON=$(cd "$(git rev-parse --git-common-dir)" 2>/dev/null && pwd -P)

Three paths:

State	Menu	Cleanup
`GIT_DIR == GIT_COMMON` (normal repo)	Standard 4 options	No worktree to clean up
`GIT_DIR != GIT_COMMON`, named branch	Standard 4 options	Provenance-based (see Step 5)
`GIT_DIR != GIT_COMMON`, detached HEAD	Reduced menu: push as new branch + PR, keep as-is, discard	No merge options (can't merge from detached HEAD)

Step 2: Determine Base Branch (unchanged)

Step 3: Present Options

Normal repo and named-branch worktree:

Merge back to <base-branch> locally
Push and create a Pull Request
Keep the branch as-is (I'll handle it later)
Discard this work

Detached HEAD:

Push as new branch and create a Pull Request
Keep as-is (I'll handle it later)
Discard this work

Step 4: Execute Choice

Option 1 (Merge locally):

# Get main repo root for CWD safety (Bug #238 fix)
MAIN_ROOT=$(git -C "$(git rev-parse --git-common-dir)/.." rev-parse --show-toplevel)
cd "$MAIN_ROOT"

# Merge first, verify success before removing anything
git checkout <base-branch>
git pull
git merge <feature-branch>
<run tests>

# Only after merge succeeds: remove worktree, then delete branch (Bug #999 fix)
git worktree remove "$WORKTREE_PATH"  # only if superpowers owns it
git branch -d <feature-branch>

The order is critical: merge → verify → remove worktree → delete branch. The old skill deleted the branch before removing the worktree (which fails because the worktree still references the branch). The naive fix of removing the worktree first is also wrong — if the merge then fails, the working directory is gone and changes are lost.

Option 2 (Create PR):

Push branch, create PR. Do NOT clean up worktree — user needs it for PR iteration. (Bug #940 fix: remove contradictory "Then: Cleanup worktree" prose.)

Option 3 (Keep as-is): No action.

Option 4 (Discard): Require typed "discard" confirmation. Then remove worktree (if superpowers owns it), force-delete branch.

Step 5: Cleanup (updated)

if GIT_DIR == GIT_COMMON:
    # Normal repo, no worktree to clean up
    done

if worktree path is under .worktrees/ or ~/.config/superpowers/worktrees/:
    # Superpowers created it — we own cleanup
    cd to main repo root       # Bug #238 fix
    git worktree remove <path>

else:
    # Harness created it — hands off
    # If platform provides a workspace-exit tool, use it
    # Otherwise, leave the worktree in place

Cleanup only runs for Options 1 and 4. Options 2 and 3 always preserve the worktree. (Bug #940 fix.)

Stale worktree pruning: After any git worktree remove, run git worktree prune as a self-healing step. Worktree directories can get deleted out-of-band (e.g., by harness cleanup, manual rm, or .claude/ cleanup), leaving stale registrations that cause confusing errors. One line, prevents silent rot. (Idea from PR #1072.)

3. Integration Updates

`subagent-driven-development` and `executing-plans`

Both currently list using-git-worktrees as REQUIRED in their integration sections. Change to:

using-git-worktrees — Ensures isolated workspace (creates one or verifies existing)

The skill itself now handles consent (Step 0.5) and detection (Step 0), so calling skills don't need to gate or prompt.

`writing-plans`

Remove the stale claim "should be run in a dedicated worktree (created by brainstorming skill)." Brainstorming is a design skill and does not create worktrees. The worktree prompt happens at execution time via using-git-worktrees.

4. Platform-Neutral Instruction File References

All instances of hardcoded CLAUDE.md in worktree-related skills are replaced with:

"your project's agent instruction file (CLAUDE.md, GEMINI.md, AGENTS.md, .cursorrules, or equivalent)"

This applies to directory preference checks in Step 1b.

Bug Fixes (bundled)

Bug	Problem	Fix	Location
#940	Option 2 prose says "Then: Cleanup worktree (Step 5)" but quick reference says keep it. Step 5 says "For Options 1, 2, 4" but Common Mistakes says "Options 1 and 4 only."	Remove cleanup from Option 2. Step 5 applies to Options 1 and 4 only.	finishing SKILL.md
#999	Option 1 deletes branch before removing worktree. `git branch -d` can fail because worktree still references the branch.	Reorder to: merge → verify tests → remove worktree → delete branch. Merge must succeed before anything is removed.	finishing SKILL.md
#238	`git worktree remove` fails silently if CWD is inside the worktree being removed.	Add CWD guard: `cd` to main repo root before `git worktree remove`.	finishing SKILL.md

Issues Resolved

Issue	Resolution
#940	Direct fix (Bug #940)
#991	Opt-in consent in Step 0.5
#918	Step 0 detection + Step 1.5 finishing detection
#1009	Resolved by Step 1a — agents use native tools (e.g., `EnterWorktree`) which create at harness-native paths. Depends on Step 1a working; see Risks.
#999	Direct fix (Bug #999)
#238	Direct fix (Bug #238)
#1049	Platform-neutral instruction file references
#279	Solved by detect-and-defer — native paths respected because we don't override them
#574	Deferred. Nothing in this spec touches the brainstorming skill where the bug lives. Full fix (adding a worktree step to brainstorming's checklist) is Phase 4.

Risks

Step 1a is the load-bearing assumption — RESOLVED

Step 1a — agents preferring native worktree tools over the git fallback — is the foundation the entire design rests on. If agents ignore Step 1a and fall through to Step 1b on harnesses with native support, detect-and-defer fails entirely.

Status: This risk materialized during implementation. The original abstract Step 1a ("You know your own toolkit") failed at 2/6 on Claude Code. The TDD gate worked as designed — it caught the failure before any skill files were modified, preventing a broken release. Three REFACTOR iterations identified the root causes (agent anchoring on concrete commands, tool-description guardrail overriding skill instructions) and produced a fix validated at 50/50 across GREEN and PRESSURE tests. See Step 1a design note above for details.

Cross-platform validation:

As of 2026-04-06, Claude Code is the only harness with an agent-callable mid-session worktree tool (EnterWorktree). All others either create worktrees before the agent starts (Codex App, Gemini CLI, Cursor) or have no native worktree support (Codex CLI, OpenCode). Step 1a is forward-compatible: when other harnesses add agent-callable worktree tools, agents will match them against the named examples and use them without skill changes.

Harness	Current worktree model	Skill mechanism	Tested
Claude Code	Agent-callable `EnterWorktree`	Step 1a	50/50 (GREEN + PRESSURE)
Codex CLI	No native tool (shell only)	Step 1b git fallback	6/6 (`codex exec`)
Gemini CLI	Launch-time `--worktree` flag, no agent tool	Step 0 if launched with flag, Step 1b if not	Step 0: 1/1, Step 1b: 1/1 (`gemini -p`)
Cursor Agent	User-facing `/worktree`, no agent tool	Step 0 if user activated, Step 1b if not	Step 0: 1/1, Step 1b: 1/1 (`cursor-agent -p`)
Codex App	Platform-managed, detached HEAD, no agent tool	Step 0 detects existing	1/1 simulated
OpenCode	Detection only (`ctx.worktree`), no agent tool	Step 1b git fallback	Untested (no CLI access)

Residual risks:

If Anthropic changes EnterWorktree's tool description to be more restrictive (e.g., "Do not use based on skill instructions"), the consent bridge breaks. Worth filing an issue requesting that the tool description accommodate skill-driven invocation.
When other harnesses add agent-callable worktree tools, they may use names not in Step 1a's list. The list should be updated as new tools appear. The generic phrasing ("a worktree or workspace-isolation tool") provides some forward coverage.

Provenance heuristic

The .worktrees/ or ~/.config/superpowers/worktrees/ = ours, anything else = hands offheuristic works for every current harness. If a future harness adopts.worktrees/as its convention, we'd have a false positive (superpowers tries to clean up a harness-owned worktree). Similarly, if a user manually runsgit worktree add .worktrees/experimentwithout superpowers, we'd incorrectly claim ownership. Both are low risk — every harness uses branded paths, and manual.worktrees/` creation is unlikely — but worth noting.

Detached HEAD finishing

The reduced menu for detached HEAD worktrees (no merge option) is correct for Codex App's sandbox model. If a user is in detached HEAD for another reason, the reduced menu still makes sense — you genuinely can't merge from detached HEAD without creating a branch first.

Implementation Notes

Both skill files contain sections beyond the core steps that need updating during implementation:

Frontmatter (name, description): Update to reflect detect-and-defer behavior
Quick Reference tables: Rewrite to match new step structure and bug fixes
Common Mistakes sections: Update or remove items that reference old behavior (e.g., "Skip CLAUDE.md check" is now wrong)
Red Flags sections: Update to reflect new priorities (e.g., "Never create a worktree when Step 0 detects existing isolation")
Integration sections: Update cross-references between skills

The spec describes what changes; the implementation plan will specify exact edits to these secondary sections.

Future Work (not in this spec)

Phase 3 remainder: $TMPDIR directory option (#666), setup docs for caching and env inheritance (#299)
Phase 4: PreToolUse hooks for path enforcement (#1040), per-worktree env conventions (#597), brainstorming checklist worktree step (#574), multi-repo documentation (#710)

20 KiB Raw Blame History