superpowers

mirror of https://github.com/obra/superpowers.git synced 2026-06-13 14:19:05 +08:00

Author	SHA1	Message	Date
Drew Ritter	fa07663322	fix(skills): plans reference the spec instead of restating it (SUP-333 #1 ) writing-plans told agents to "document everything they need to know" assuming zero context — every agent in the 2026-06-09 six-agent quorum sweep obeyed and restated the entire spec inline in the plan (cost-spec-plan-duplication failed 5/5 completed agents; pi's plan was 683 lines of duplicated spec). - writing-plans: state the division of labor — spec owns WHAT/WHY, plan owns HOW; cite the spec by path/section, never restate it. "Zero context" means mechanically executable steps, not duplication. Add a Spec: line to the plan header template. - brainstorming: close the path loophole the re-run exposed — claude shortened docs/superpowers/specs/ to docs/specs/ in 2/2 runs; both path mentions now explicitly forbid the shortening. TDD evidence (quorum): - RED: batch-20260609T023452Z-68aa et al — 5/5 agents fail - GREEN: cost-spec-plan-duplication-claude-20260609T234142Z-9625 pass (plan: "this plan does not restate them" + spec cited by path; both docs in docs/superpowers/) - Canary: triggering-writing-plans-claude pass (skill still fires) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-09 16:52:21 -07:00
Jesse Vincent	f2cbfbefeb	Release v5.1.0 (#1468 ) * docs: add Codex App compatibility design spec (PRI-823) Design for making using-git-worktrees, finishing-a-development-branch, and subagent-driven-development skills work in the Codex App's sandboxed worktree environment. Read-only environment detection via git-dir vs git-common-dir comparison, ~48 lines across 4 files, zero breaking changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address spec review feedback for PRI-823 Fix three Important issues from spec review: - Clarify Step 1.5 placement relative to existing Steps 2/3 - Re-derive environment state at cleanup time instead of relying on earlier skill output - Acknowledge pre-existing Step 5 cleanup inconsistency Also: precise step references, exact codex-tools.md content, clearer Integration section update instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address team review feedback for PRI-823 spec - Add commit SHA + data loss warning to handoff payload (HIGH) - Add explicit commit step before handoff (HIGH) - Remove misleading "mark as externally managed" from Path B - Add executing-plans 1-line edit (was missing) - Add branch name derivation rules - Add conditional UI language for non-App environments - Add sandbox fallback for permission errors - Add STOP directive after Step 0 reporting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify executing-plans in What Does NOT Change section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add cleanup guard test (#5) and sandbox fallback test (#10) to spec Both tests address real risk scenarios: - #5: cleanup guard bug would delete Codex App's own worktree (data loss) - #10: Local thread sandbox fallback needs manual Codex App validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add implementation plan for Codex App compatibility (PRI-823) 8 tasks covering: environment detection in using-git-worktrees, Step 1.5 + cleanup guard in finishing-a-development-branch, Integration line updates, codex-tools.md docs, automated tests, and final verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(codex-tools): add named agent dispatch mapping for Codex (#647) * fix(writing-skills): correct false 'only two fields' frontmatter claim (#882) * Replace subagent review loops with lightweight inline self-review The subagent review loop (dispatching a fresh agent to review plans/specs) doubled execution time (~25 min overhead) without measurably improving plan quality. Regression testing across 5 versions (v3.6.0 through v5.0.4) with 5 trials each showed identical plan sizes, task counts, and quality scores regardless of whether the review loop ran. Changes: - writing-plans: Replace subagent Plan Review Loop with inline Self-Review checklist (spec coverage, placeholder scan, type consistency) - writing-plans: Add explicit "No Placeholders" section listing plan failures (TBD, vague descriptions, undefined references, "similar to Task N") - brainstorming: Replace subagent Spec Review Loop with inline Spec Self-Review (placeholder scan, internal consistency, scope check, ambiguity check) - Both skills now use "look at it with fresh eyes" framing Testing: 5 trials with the new skill show self-review catches 3-5 real bugs per run (spawn positions, API mismatches, seed bugs, grid indexing) in ~30s instead of ~25 min. Remaining defects are comparable to the subagent approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert "Replace subagent review loops with lightweight inline self-review" This reverts commit `bf8f7572eb`. * Reapply "Replace subagent review loops with lightweight inline self-review" This reverts commit `b045fa3950`. * Add v5.0.6 release notes * Move brainstorm server metadata to .meta/ subdirectory Metadata files (.server-info, .events, .server.pid, .server.log, .server-stopped) were stored in the same directory served over HTTP, making them accessible via the /files/ route. They now live in a .meta/ subdirectory that is not web-accessible. Also fixes a stale test assertion ("Waiting for Claude" → "Waiting for the agent"). Reported-By: 吉田仁 * Revert "Move brainstorm server metadata to .meta/ subdirectory" This reverts commit `ab500dade6`. * Separate brainstorm server content and state into peer directories The session directory now contains two peers: content/ (HTML served to the browser) and state/ (events, server-info, pid, log). Previously all files shared a single directory, making server state and user interaction data accessible over the /files/ HTTP route. Also fixes stale test assertion ("Waiting for Claude" → "Waiting for the agent"). Reported-By: 吉田仁 * Fix owner-PID false positive when owner runs as different user ownerAlive() treated EPERM (permission denied) the same as ESRCH (process not found), causing the server to self-terminate within 60s whenever the owner process ran as a different user. This affected WSL (owner is a Windows process), Tailscale SSH, and any cross-user scenario. The fix: `return e.code === 'EPERM'` — if we get permission denied, the process is alive; we just can't signal it. Tested on Linux via Tailscale SSH with a root-owned grandparent PID: - Server survives past the 60s lifecycle check (EPERM = alive) - Server still shuts down when owner genuinely dies (ESRCH = dead) Fixes #879 * Fix owner-PID lifecycle monitoring for cross-platform reliability Two bugs caused the brainstorm server to self-terminate within 60s: 1. ownerAlive() treated EPERM (permission denied) as "process dead". When the owner PID belongs to a different user (Tailscale SSH, system daemons), process.kill(pid, 0) throws EPERM — but the process IS alive. Fixed: return e.code === 'EPERM'. 2. On WSL, the grandparent PID resolves to a short-lived subprocess that exits before the first 60s lifecycle check. The PID is genuinely dead (ESRCH), so the EPERM fix alone doesn't help. Fixed: validate the owner PID at server startup — if it's already dead, it was a bad resolution, so disable monitoring and rely on the 30-minute idle timeout. This also removes the Windows/MSYS2-specific OWNER_PID="" carve-out from start-server.sh, since the server now handles invalid PIDs generically at startup regardless of platform. Tested on Linux (magic-kingdom) via Tailscale SSH: - Root-owned owner PID (EPERM): server survives ✓ - Dead owner PID at startup (WSL sim): monitoring disabled, survives ✓ - Valid owner that dies: server shuts down within 60s ✓ Fixes #879 * Release v5.0.6: inline self-review, brainstorm server restructure, owner-PID fixes * fix: add Copilot CLI platform detection for sessionStart context injection Copilot CLI v1.0.11 reads `additionalContext` from sessionStart hook output, but the session-start script only emits the Claude Code-specific nested format. Add COPILOT_CLI env var detection so Copilot CLI gets the SDK-standard top-level `additionalContext` while Claude Code continues getting `hookSpecificOutput`. Based on PR #910 by @culinablaz. * feat: add Copilot CLI tool mapping, docs, and install instructions - Add references/copilot-tools.md with full tool equivalence table - Add Copilot CLI to using-superpowers skill platform instructions - Add marketplace install instructions to README - Add changelog entry crediting @culinablaz for the hook fix * fix(opencode): align skills path across bootstrap, runtime, and tests The bootstrap text advertised a configDir-based skills path that didn't match the runtime path (resolved relative to the plugin file). Tests used yet another hardcoded path and referenced a nonexistent lib/ dir. - Remove misleading skills path from bootstrap text; the agent should use the native skill tool, not read files by path - Fix test setup to create a consistent layout matching the plugin's ../../skills resolution - Export SUPERPOWERS_SKILLS_DIR from setup.sh so tests use a single source of truth - Add regression test that bootstrap doesn't advertise the old path - Remove broken cp of nonexistent lib/ directory Fixes #847 * docs: add OpenCode path fix to release notes * fix(opencode): inject bootstrap as user message instead of system message Move bootstrap injection from experimental.chat.system.transform to experimental.chat.messages.transform, prepending to the first user message instead of adding a system message. This avoids two issues: - System messages repeated every turn inflate token usage (#750) - Multiple system messages break Qwen and other models (#894) Tested on OpenCode 1.3.2 with Claude Sonnet 4.5 — brainstorming skill fires correctly on "Let's make a React to do list" prompt. * docs: update release notes with OpenCode bootstrap change * docs: add worktree rototill design spec (PRI-974) Design for detect-and-defer worktree support. Superpowers defers to native harness worktree systems when available, falls back to manual git worktree creation when not. Covers Phases 0-2: detection, consent, native tool preference, finishing state detection, and three bug fixes (#940, #999, #238). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address SWE review feedback on worktree rototill spec - Fix Bug #999 order: merge → verify → remove worktree → delete branch (avoids losing work if merge fails after worktree removal) - Add submodule guard to Step 0 detection (GIT_DIR != GIT_COMMON is also true in submodules) - Preserve global path (~/.config/superpowers/worktrees/) in detection for backward compatibility, just stop offering it to new users - Add step numbering note and implementation notes section - Expand provenance heuristic to cover global path and manual creation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: honest spec revisions after issue/PR deep dive - Step 1a is the load-bearing assumption, not just a risk — if it fails, the entire design needs rework. TDD validation must be first impl task. - #1009 resolution depends on Step 1a working, stated explicitly - #574 honestly deferred, not "partially addressed" - Add hooks symlink to Step 1b (PR #965 idea, prevents silent hook loss) - Add stale worktree pruning to Step 5 (PR #1072 idea, one-line self-heal) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add worktree rototill implementation plan (PRI-974) 5 tasks: TDD gate for Step 1a, using-git-worktrees rewrite, finishing-a-development-branch rewrite, integration updates, end-to-end validation. Task 1 is a hard gate — if native tool preference fails RED/GREEN, stop and redesign. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add RED/GREEN validation for native worktree preference (PRI-974) Gate test for Step 1a — validates agents prefer EnterWorktree over git worktree add on Claude Code. Must pass before skill rewrite. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: rewrite using-git-worktrees with detect-and-defer (PRI-974) Step 0: GIT_DIR != GIT_COMMON detection (skip if already isolated) Step 0 consent: opt-in prompt before creating worktree (#991) Step 1a: native tool preference (short, first, declarative) Step 1b: git worktree fallback with hooks symlink and legacy path compat Submodule guard prevents false detection Platform-neutral instruction file references (#1049) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: rewrite finishing-a-development-branch with detect-and-defer (PRI-974) Step 2: environment detection (GIT_DIR != GIT_COMMON) before presenting menu Detached HEAD: reduced 3-option menu (no merge from detached HEAD) Provenance-based cleanup: .worktrees/ = ours, anything else = hands off Bug #940: Option 2 no longer cleans up worktree Bug #999: merge -> verify -> remove worktree -> delete branch Bug #238: cd to main repo root before git worktree remove Stale worktree pruning after removal (git worktree prune) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address spec review findings in both skill rewrites (PRI-974) using-git-worktrees: submodule guard now says "treat as normal repo" instead of "proceed to Step 1" (preserves consent flow) using-git-worktrees: directory priority summaries include global legacy finishing-a-development-branch: move git branch -d after Step 6 cleanup to make Bug #999 ordering unambiguous (merge -> worktree remove -> branch delete) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update worktree integration references across skills (PRI-974) Remove REQUIRED language from executing-plans and subagent-driven-development. Consent and detection now live inside using-git-worktrees itself. Fix stale 'created by brainstorming' claim in writing-plans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: include worktrees/ (non-hidden) in finishing provenance check (PRI-974) The creation skill supports both .worktrees/ and worktrees/ directories, but the finishing skill's cleanup only checked .worktrees/. Worktrees under the non-hidden path would be orphaned on merge or discard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Step 1a validated through TDD — explicit naming + consent bridge (PRI-974) Step 1a failed at 2/6 with the spec's original abstract text ("use your native tool"). Three REFACTOR iterations found what works (50/50 runs): 1. Explicit tool naming — "do you have EnterWorktree, WorktreeCreate..." transforms interpretation into factual toolkit check 2. Consent bridge — "user's consent is your authorization" directly addresses EnterWorktree's "ONLY when user explicitly asks" guardrail 3. Red Flag entry naming the specific anti-pattern File split was tested but proven unnecessary — the fix is the Step 1a text quality, not physical separation of git commands. Control test with full 240-line skill (all git commands visible) passed 20/20. Test script updated: supports batch runs (./test.sh green 20), "all" phase, and checks absence of git worktree add (reliable signal) rather than presence of EnterWorktree text (agent sometimes omits tool name). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update spec with TDD findings on Step 1a (PRI-974) Step 1a's original "deliberately short, abstract" design was disproven by TDD (2/6 pass rate). Spec now documents the validated approach: explicit tool naming + consent bridge + red flag (50/50 pass rate). - Design Principles: updated to reflect explicit naming over abstraction - Step 1a: replaced abstract text with validated approach, added design note explaining the TDD revision and why file splitting was unnecessary - Risks: Step 1a risk marked RESOLVED with cross-platform validation table and residual risk note about upstream tool description dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: honest cross-platform validation table in spec (PRI-974) Research confirmed Claude Code is currently the only harness with an agent-callable mid-session worktree tool. All others either create worktrees before the agent starts (Codex App, Gemini, Cursor) or have no native support (Codex CLI, OpenCode). Table now shows: what was actually tested (Claude Code 50/50, Codex CLI 6/6), what was simulated (Codex App 1/1), and what's untested (Gemini, Cursor, OpenCode). Step 1a is forward-compatible for when other harnesses add agent-callable tools. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: cross-platform validation on 5 harnesses (PRI-974) Tested on Gemini CLI (gemini -p) and Cursor Agent (cursor-agent -p): - Gemini: Step 0 detection 1/1, Step 1b fallback 1/1 - Cursor: Step 0 detection 1/1, Step 1b fallback 1/1 Both correctly identified no native agent-callable worktree tool, fell through to git worktree add, and performed safety verification. Both correctly detected existing worktrees and skipped creation. 5 of 6 harnesses now tested. Only OpenCode untested (no CLI access). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove incorrect hooks symlink step from worktree skill Git worktrees inherit hooks from the main repo automatically via $GIT_COMMON_DIR — this has been the case since git 2.5 (2015). The symlink step was based on an incorrect premise from PR #965 and also fails in practice (.git is a file in worktrees, not a dir). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: address PR #1121 review — respect user preference, drop y/n - Consent prompt: drop "(y/n)" and add escape valve for users who have already declared their worktree preference in global or project agent instruction files. - Directory selection: reorder to put declared user preference ahead of observed filesystem state, and reframe the default as "if no other guidance available". - Sandbox fallback: require explicitly informing the user that the sandbox blocked creation, not just "report accordingly". - writing-plans: fully qualify the superpowers:using-git-worktrees reference. - Plan doc: mirror the consent-prompt change. Step 1a native-tool framing and the helper-scripts suggestion are still outstanding — the first needs a benchmark re-run before softer phrasing can be adopted without regressing compliance; the second is exploratory and will get a thread reply. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: soften Step 1a native-tool framing per PR #1121 review Address obra's comment on explicit step numbers / prescriptive tone. Drops "STOP HERE if available", the "If YES:" gate, and the "even if / even if / NO EXCEPTIONS" reinforcement paragraph. Keeps the specific tool-name anchors (EnterWorktree, WorktreeCreate, /worktree, --worktree), which the original TDD data showed are load-bearing. A/B verified against drill harness on the 3 creation/consent scenarios (consent-flow, creation-from-main, creation-from-main-spec-aware): baseline explicit wording scored 12/12 criteria, softened wording also scored 12/12. The "agent used the most appropriate tool" criterion passed in all 3 softened runs — agents still picked EnterWorktree via ToolSearch without the imperative framing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: drop instruction file enumeration per PR #1121 review Jesse flagged that the verbose CLAUDE.md/AGENTS.md/GEMINI.md/.cursorrules enumeration (a) chews tokens, (b) confuses models that anchor on exact strings, and (c) is repeated DRY-violatingly across 3+ locations. Replace with abstract "your instructions" framing in four spots: - skills/using-git-worktrees/SKILL.md Step 0 → Step 1 transition - skills/using-git-worktrees/SKILL.md Step 1b Directory Selection - docs/superpowers/plans/2026-04-06-worktree-rototill.md (both mirror locations) Same intent, harness-agnostic phrasing, ~half the tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace hardcoded /Users/jesse with generic placeholders (#858) * Remove the deprecated legacy slash commands (#1188) * fix: prevent subagent-driven-development from pausing every 3 tasks requesting-code-review had "review after each batch (3 tasks)" for executing-plans, which leaked into subagent-driven-development as a check-in cadence. Replaced with flexible "each task or at natural checkpoints" and added explicit continuous execution directive to subagent-driven-development. * Remove Integration sections from skills These sections don't help with steering and are a legacy of the time before agents had native skills systems. * fix(opencode): cache bootstrap content at module level to eliminate per-step file I/O getBootstrapContent() called fs.existsSync + fs.readFileSync + regex frontmatter parsing on every agent step with zero caching. The experimental.chat.messages.transform hook fires every step in opencode's agent loop (messages are reloaded from DB each step via filterCompactedEffect). A 10-step turn triggered 10 redundant file reads + 10 regex parses for content that never changes during a session. Changes: - Add module-level _bootstrapCache (undefined = not loaded, null = file missing) so the first call reads and parses SKILL.md, all subsequent calls return the cached string with zero filesystem access - Cache the null sentinel when SKILL.md is missing, preventing repeated fs.existsSync probes - Add _testing export (resetCache/getCache) for test infrastructure - Clarify the injection guard comment explaining how it interacts with opencode's per-step message reloading - Add 15 regression tests covering cache behavior, fs call counts, injection guard, missing file sentinel, cache reset, and source audit Fixes #1202 * test(opencode): simplify bootstrap cache coverage * docs: clarify opencode install caveats * test(opencode): modernize integration tests * docs: add Factory Droid installation instructions * Preserve Codex marketplace metadata * docs: add README quickstart install links (#1293) * docs(codex-tools): fix subagent wait mapping to wait_agent Update the Codex tool mapping so Claude Code 'Task returns result' maps to the current Codex spawned-agent result tool, wait_agent. Also clarify that older Codex builds exposed spawned-agent waiting as wait, while current bare wait is the code-mode exec/wait surface for yielded exec cells. Verified with Drill: - codex-tool-mapping-comprehension fails against dev with task_returns_result=wait - codex-tool-mapping-comprehension passes against this PR with task_returns_result=wait_agent and exec/wait scoped correctly - codex-subagent-wait-mapping passes against this PR with spawn_agent -> wait_agent -> close_agent and PR963_OK returned * fix(cursor): run SessionStart hook via run-hook.cmd on Windows Route Cursor's Windows SessionStart hook through the existing run-hook.cmd dispatcher instead of invoking the extensionless session-start script directly. This avoids Windows opening the extensionless hook file and lets Git Bash run the script as intended. Also removed an accidental UTF-8 BOM from hooks-cursor.json before merging. Verified: - hooks-cursor.json parses as JSON and has no BOM - command is ./hooks/run-hook.cmd session-start - CURSOR_PLUGIN_ROOT=/tmp/superpowers ./hooks/run-hook.cmd session-start emits valid Cursor JSON with additional_context * fix(tests): make SDD integration test actually run its assertions The SDD integration test silently bailed before printing any verification results. Three independent bugs caused this: 1. `WORKING_DIR_ESCAPED` was computed from `$SCRIPT_DIR/../..` without resolving `..` segments. The resulting "directory" name contained literal `..` so `find` was looking in a path that doesn't exist. 2. With `set -euo pipefail`, the `find ... \| sort -r \| head -1` pipeline could exit non-zero (SIGPIPE on the producer when head closes early), killing the script silently before assertions ran. 3. The `claude -p` invocation never passed `--plugin-dir`, so it loaded the installed plugin instead of the working tree. Local edits to skills under test were not actually being tested. Other adjustments: - Run claude from inside the unique TEST_PROJECT directory instead of from the plugin root, so its session JSONL lives in its own `~/.claude/projects/` folder and doesn't race other concurrent claude sessions for "most recent file". - Use the same character-normalization claude does (every non-alphanumeric becomes `-`) when computing the session dir name; macOS-resolved `/private/var/...` paths and tmp dirs with `.`/`_` in their names need this to round-trip correctly. - Accept either `"name":"Agent"` or `"name":"Task"` in the subagent count — the harness renamed the tool but the test wasn't updated. Verified on this branch: all six verification tests now pass against a real end-to-end SDD run (skill invoked, 7 subagents dispatched, 6 TodoWrite calls, working code produced, tests pass, no extra features). * feat: add Gemini CLI subagent support mapping Map Gemini Task dispatch to @agent-name/@generalist and document parallel subagent dispatch for independent tasks. * docs: update Codex plugin install guidance (#1288) * Lift superpowers:code-reviewer agent into the requesting-code-review skill The plugin had a single named agent (`agents/code-reviewer.md`) used by two skills, while every other reviewer/implementer subagent in the repo is dispatched as `general-purpose` with the prompt template living alongside its skill. That asymmetry had no upside and several costs: - Two sources of truth for the code review checklist (the agent file and `requesting-code-review/code-reviewer.md`), both drifting independently. - `Codex` users could not use the named agent directly; the codex-tools reference doc had a workaround section explaining how to flatten the named agent into a `worker` dispatch. - No third-party reliance on `superpowers:code-reviewer` inside this repo. Changes: - Merge `agents/code-reviewer.md` (persona + checklist) and `skills/requesting-code-review/code-reviewer.md` (placeholder template) into a single self-contained Task-dispatch template, matching the shape of `implementer-prompt.md`, `spec-reviewer-prompt.md`, etc. - Update `skills/requesting-code-review/SKILL.md` and `skills/subagent-driven-development/code-quality-reviewer-prompt.md` to dispatch `Task (general-purpose)` instead of the named agent. - Drop the now-obsolete "Named agent dispatch" workaround sections from `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships any named agents, so those instructions documented nothing. - Delete `agents/code-reviewer.md` and the empty `agents/` directory. Tier 3 coverage for the change: a new behavioral test `tests/claude-code/test-requesting-code-review.sh` plants real bugs (SQL injection, plaintext password handling, credential logging) into a tiny project, runs the actual `requesting-code-review` skill against the working tree, and asserts the dispatched reviewer flags every planted issue at Critical/Important severity and refuses to approve the diff. Verified end-to-end on this branch: - The new test passes (5/5 assertions; reviewer caught all planted bugs and several others). - The existing SDD integration test still passes (7/7 subagents dispatched, all as `general-purpose`; spec compliance still rejects extra features; produced code is correct). - Session JSONLs confirm zero remaining `superpowers:code-reviewer` dispatches anywhere in the SDD pipeline. * Prepare v5.1.0: release notes and version bump Add v5.1.0 release notes covering: - Removals: legacy slash commands (/brainstorm, /execute-plan, /write-plan), skill Integration sections - Worktree skills rewrite (PRI-974, PR #1121) - Contributor guidelines for AI agents - Codex plugin mirror tooling (PR #1165) - OpenCode bootstrap caching (#1202) - SDD pause-every-3-tasks fix; SDD integration test fixes - Cursor Windows hook routing - Gemini CLI subagent dispatch mapping - Skill terminology cleanups - Install docs (Factory Droid, Codex, quickstart links) Bumps version 5.0.7 -> 5.1.0 across all declared files via scripts/bump-version.sh; not yet tagged or released. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Drew Ritter <drewritter@workerbee.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Drew Ritter <drew@primeradiant.com> Co-authored-by: Blaž Čulina <culina.blaz@nsoft.com> Co-authored-by: Jesse Vincent <jesse@primeradiant.com> Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> Co-authored-by: Richard Luo <luo.richard@gmail.com> Co-authored-by: Drew Ritter <drew@ritter.dev> Co-authored-by: leonsong09 <59187950+leonsong09@users.noreply.github.com> Co-authored-by: YuXiang Hong <41331696+starumiQAQ@users.noreply.github.com> Co-authored-by: Sathvik Gilakamsetty <spacetime1007@gmail.com>	2026-05-04 15:05:01 -07:00
Jesse Vincent	3f80f1c769	Reapply "Replace subagent review loops with lightweight inline self-review" This reverts commit `b045fa3950`.	2026-03-25 11:03:53 -07:00
Jesse Vincent	4ae1a3d6a6	Revert "Replace subagent review loops with lightweight inline self-review" This reverts commit `bf8f7572eb`.	2026-03-25 11:03:53 -07:00
Jesse Vincent	e6221a48c5	Replace subagent review loops with lightweight inline self-review The subagent review loop (dispatching a fresh agent to review plans/specs) doubled execution time (~25 min overhead) without measurably improving plan quality. Regression testing across 5 versions (v3.6.0 through v5.0.4) with 5 trials each showed identical plan sizes, task counts, and quality scores regardless of whether the review loop ran. Changes: - writing-plans: Replace subagent Plan Review Loop with inline Self-Review checklist (spec coverage, placeholder scan, type consistency) - writing-plans: Add explicit "No Placeholders" section listing plan failures (TBD, vague descriptions, undefined references, "similar to Task N") - brainstorming: Replace subagent Spec Review Loop with inline Spec Self-Review (placeholder scan, internal consistency, scope check, ambiguity check) - Both skills now use "look at it with fresh eyes" framing Testing: 5 trials with the new skill show self-review catches 3-5 real bugs per run (spawn positions, API mismatches, seed bugs, grid indexing) in ~30s instead of ~25 min. Remaining defects are comparable to the subagent approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-25 11:03:53 -07:00
jesse	f34ee479b7	fix: Windows brainstorm server lifecycle, restore execution choice - Skip OWNER_PID monitoring on Windows/MSYS2 where the PID namespace is invisible to Node.js, preventing server self-termination after 60s (#770) - Document run_in_background: true for Claude Code on Windows (#767) - Restore user choice between subagent-driven and inline execution after plan writing; subagent-driven is recommended but no longer mandatory - Add Windows lifecycle test script verified on Windows 11 VM - Note #723 (stop-server.sh reliability) as already fixed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 04:09:36 +00:00
Jesse Vincent	2c6a8a352d	Tone down review loops: single-pass plan review, raise issue bar - Remove chunk-based plan review in favor of single whole-plan review - Add Calibration sections to both reviewer prompts so only serious issues block approval - Reduce max review iterations from 5 to 3 - Streamline reviewer checklists (spec: 7→5, plan: 7→4 categories)	2026-03-16 15:57:23 -07:00
Jesse Vincent	9ccce3bf07	Add context isolation principle to all delegation skills Subagents should never inherit the parent session's context or history. The dispatcher constructs exactly what each subagent needs, keeping both sides focused: the subagent on its task, the controller on coordination. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 18:47:56 -07:00
Jesse Vincent	f3083e55b0	Replace 'For Claude' with 'For agentic workers' in plan headers	2026-03-06 19:33:30 -08:00
Jesse Vincent	d48b14e5ac	Add project-level scope assessment to brainstorming pipeline Brainstorming now assesses whether a project is too large for a single spec and helps decompose into sub-projects. Scope check is inline in the understanding phase (testing showed it was skipped as a separate step). Spec reviewer also checks scope. Writing-plans has a backstop.	2026-03-06 14:48:48 -08:00
Jesse Vincent	daa3fb2322	Add architecture guidance and capability-aware escalation to skills Add design-for-isolation and working-in-existing-codebases guidance to brainstorming. Add file size awareness and escalation prompts to SDD implementer and code quality reviewer. Writing-plans gets architecture section sizing guidance. Spec and plan reviewers get architecture and file size checks.	2026-03-06 14:48:48 -08:00
Jesse Vincent	7b99c39c08	Add plan review loop and checkbox syntax to writing-plans skill Plans now include a review loop dispatching plan-document-reviewer subagent. Checkbox syntax (- [ ]) on steps for tracking progress.	2026-03-06 14:26:27 -08:00
Jesse Vincent	5e51c3ee5a	feat: enforce subagent-driven-development on capable harnesses - Subagent-driven-development is now mandatory when harness supports it - No longer offer choice between subagent-driven and executing-plans - Executing-plans reserved for harnesses without subagent capability - Update plan header to reference both execution paths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-06 13:01:31 -08:00
Jesse Vincent	f57638a747	refactor: restructure specs and plans directories - Specs (brainstorming output) now go to docs/superpowers/specs/ - Plans (writing-plans output) now go to docs/superpowers/plans/ - User preferences for locations override these defaults - Update all skill references and test files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-06 13:01:31 -08:00
coltwindy	19df3db59b	fix(writing-plans): use 4-backtick fence for nested code blocks in Task Structure template	2026-02-12 12:40:35 +09:00
Jesse Vincent	030a222af1	Fix skill descriptions: remove workflow summaries that override flowcharts Testing revealed that descriptions summarizing workflow cause Claude to follow the description instead of reading the skill body. Changed all descriptions to "when to use" triggers only: - dispatching-parallel-agents: 2+ independent tasks without shared state - executing-plans: have a written plan to execute with review checkpoints - requesting-code-review: completing tasks, features, or before merging - systematic-debugging: encountering bugs before proposing fixes - test-driven-development: implementing features before writing code - writing-plans: have spec/requirements for multi-step task before coding - writing-skills: updated with "description trap" documentation The description trap: workflow summaries in descriptions create shortcuts Claude takes, skipping the skill body entirely.	2025-12-17 16:44:52 -08:00
Jesse Vincent	79436abffa	Update all superpowers skill references to use namespace prefix Skills are now namespaced as superpowers:<name> when referenced. Updated all REQUIRED SUB-SKILL, RECOMMENDED SUB-SKILL, and REQUIRED BACKGROUND references to use the superpowers: prefix. Also added -design suffix to brainstorming skill's design document filename to distinguish from implementation plan documents. Files updated: - brainstorming: Added -design suffix, updated skill references - executing-plans: Updated finishing-a-development-branch reference - subagent-driven-development: Updated finishing-a-development-branch reference - systematic-debugging: Updated root-cause-tracing and test-driven-development references - testing-skills-with-subagents: Updated test-driven-development reference - writing-plans: Updated executing-plans and subagent-driven-development references - writing-skills: Updated test-driven-development, systematic-debugging, and testing-skills-with-subagents references	2025-10-18 10:38:54 -07:00
Jesse Vincent	141953a4be	Improve skill cross-references for clarity and compliance Update all skill references to use explicit requirement markers: - REQUIRED BACKGROUND: For prerequisite understanding - REQUIRED SUB-SKILL: For mandatory workflow dependencies - Complementary skills: For optional but helpful related skills Changes: - Remove old path format (skills/collaboration/X → X) - Add explicit "REQUIRED" markers to make dependencies clear - Update Integration sections with categorized skill relationships - Fix non-existent skill references - Update cross-reference documentation in writing-skills This makes it immediately clear which skills MUST be used vs optional references, helping Claude understand and comply with skill dependencies.	2025-10-17 10:18:50 -07:00
Jesse Vincent	48410c7f19	Standardize skill frontmatter names to lowercase and kebab-case - Update all 20 skill frontmatter names to match their directory names in lowercase - Fix defense-in-depth name (was Defense-in-Depth-Validation) - Fix receiving-code-review name (was Code-Review-Reception) - Update all skill announcements and cross-references to use lowercase names - Update commands redirects to reference lowercase skill names Ensures consistent naming: skill directory names, frontmatter names, and documentation references all use lowercase kebab-case format (e.g., brainstorming, test-driven-development)	2025-10-17 09:40:36 -07:00
Jesse Vincent	9c9547cc04	Now that skills are a first-class thing in Claude Code, restore them to the primary plugin	2025-10-16 07:19:00 -07:00

20 Commits