Commit Graph

22 Commits

Author SHA1 Message Date
Jesse Vincent
471fe326c8 Lift superpowers:code-reviewer agent into the requesting-code-review skill
The plugin had a single named agent (`agents/code-reviewer.md`) used by
two skills, while every other reviewer/implementer subagent in the repo
is dispatched as `general-purpose` with the prompt template living
alongside its skill. That asymmetry had no upside and several costs:

- Two sources of truth for the code review checklist (the agent file
  and `requesting-code-review/code-reviewer.md`), both drifting
  independently.
- `Codex` users could not use the named agent directly; the codex-tools
  reference doc had a workaround section explaining how to flatten the
  named agent into a `worker` dispatch.
- No third-party reliance on `superpowers:code-reviewer` inside this
  repo.

Changes:
- Merge `agents/code-reviewer.md` (persona + checklist) and
  `skills/requesting-code-review/code-reviewer.md` (placeholder
  template) into a single self-contained Task-dispatch template,
  matching the shape of `implementer-prompt.md`,
  `spec-reviewer-prompt.md`, etc.
- Update `skills/requesting-code-review/SKILL.md` and
  `skills/subagent-driven-development/code-quality-reviewer-prompt.md`
  to dispatch `Task (general-purpose)` instead of the named agent.
- Drop the now-obsolete "Named agent dispatch" workaround sections from
  `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships
  any named agents, so those instructions documented nothing.
- Delete `agents/code-reviewer.md` and the empty `agents/` directory.

Tier 3 coverage for the change: a new behavioral test
`tests/claude-code/test-requesting-code-review.sh` plants real bugs
(SQL injection, plaintext password handling, credential logging) into
a tiny project, runs the actual `requesting-code-review` skill against
the working tree, and asserts the dispatched reviewer flags every
planted issue at Critical/Important severity and refuses to approve
the diff.

Verified end-to-end on this branch:
- The new test passes (5/5 assertions; reviewer caught all planted
  bugs and several others).
- The existing SDD integration test still passes (7/7 subagents
  dispatched, all as `general-purpose`; spec compliance still
  rejects extra features; produced code is correct).
- Session JSONLs confirm zero remaining `superpowers:code-reviewer`
  dispatches anywhere in the SDD pipeline.
2026-04-30 14:26:30 -07:00
Jesse Vincent
e795530c23 fix(tests): make SDD integration test actually run its assertions
The SDD integration test silently bailed before printing any verification
results. Three independent bugs caused this:

1. `WORKING_DIR_ESCAPED` was computed from `$SCRIPT_DIR/../..` without
   resolving `..` segments. The resulting "directory" name contained
   literal `..` so `find` was looking in a path that doesn't exist.

2. With `set -euo pipefail`, the `find ... | sort -r | head -1` pipeline
   could exit non-zero (SIGPIPE on the producer when head closes early),
   killing the script silently before assertions ran.

3. The `claude -p` invocation never passed `--plugin-dir`, so it loaded
   the installed plugin instead of the working tree. Local edits to
   skills under test were not actually being tested.

Other adjustments:
- Run claude from inside the unique TEST_PROJECT directory instead of
  from the plugin root, so its session JSONL lives in its own
  `~/.claude/projects/` folder and doesn't race other concurrent
  claude sessions for "most recent file".
- Use the same character-normalization claude does (every non-alphanumeric
  becomes `-`) when computing the session dir name; macOS-resolved
  `/private/var/...` paths and tmp dirs with `.`/`_` in their names need
  this to round-trip correctly.
- Accept either `"name":"Agent"` or `"name":"Task"` in the subagent count
  — the harness renamed the tool but the test wasn't updated.

Verified on this branch: all six verification tests now pass against a
real end-to-end SDD run (skill invoked, 7 subagents dispatched, 6
TodoWrite calls, working code produced, tests pass, no extra features).
2026-04-28 12:20:31 -07:00
Drew Ritter
61ad4821da fix: Step 1a validated through TDD — explicit naming + consent bridge (PRI-974)
Step 1a failed at 2/6 with the spec's original abstract text ("use your
native tool"). Three REFACTOR iterations found what works (50/50 runs):

1. Explicit tool naming — "do you have EnterWorktree, WorktreeCreate..."
   transforms interpretation into factual toolkit check
2. Consent bridge — "user's consent is your authorization" directly
   addresses EnterWorktree's "ONLY when user explicitly asks" guardrail
3. Red Flag entry naming the specific anti-pattern

File split was tested but proven unnecessary — the fix is the Step 1a
text quality, not physical separation of git commands. Control test
with full 240-line skill (all git commands visible) passed 20/20.

Test script updated: supports batch runs (./test.sh green 20), "all"
phase, and checks absence of git worktree add (reliable signal) rather
than presence of EnterWorktree text (agent sometimes omits tool name).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:13:19 -07:00
Drew Ritter
abaaf8a6e6 test: add RED/GREEN validation for native worktree preference (PRI-974)
Gate test for Step 1a — validates agents prefer EnterWorktree over
git worktree add on Claude Code. Must pass before skill rewrite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:13:19 -07:00
daniel-graham
991e9d4de9 fix: replace bare except with except Exception
Co-authored-by: daniel-graham <daniel-graham@users.noreply.github.com>
2026-03-09 17:10:07 -07:00
Jesse Vincent
69eaf3cf34 Add end-to-end tests for document review system 2026-03-06 14:48:46 -08:00
Jesse Vincent
f57638a747 refactor: restructure specs and plans directories
- Specs (brainstorming output) now go to docs/superpowers/specs/
- Plans (writing-plans output) now go to docs/superpowers/plans/
- User preferences for locations override these defaults
- Update all skill references and test files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-06 13:01:31 -08:00
CL Kao
c7816ee2a6 docs: change main branch red flag to require explicit user consent
Instead of prohibiting main branch work entirely, allow it with explicit
user consent. This is more flexible while still ensuring users are aware
of the implications.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 15:12:50 -08:00
CL Kao
f8dbe7b196 test: add Test 9 - main branch red flag warning
TDD: Test verifies that subagent-driven-development skill warns
against starting implementation directly on main/master branch.
Test expects skill to recommend worktree or feature branch instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:41:39 -08:00
CL Kao
93cf2ee84f test: add worktree requirement test for subagent-driven-development
Add Test 8 to verify that using-git-worktrees is mentioned as a required
skill for subagent-driven-development. This test will initially fail per
TDD approach - the skill file needs to be updated to pass this test.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:39:50 -08:00
CL Kao
1872f50b64 fix(tests): handle case variations in skill recognition test
The assertion now matches "subagent-driven-development", "Subagent-Driven
Development", and "Subagent Driven" since Claude's responses may use
different casing and formatting styles.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 14:34:38 -08:00
Jesse Vincent
faa65e7163 Add token usage analysis to subagent-driven-development test
- Rewrote analyze-token-usage.py to parse main session file correctly
- Extracts usage from toolUseResult fields for each subagent
- Shows breakdown by agent with descriptions
- Integrated into test-subagent-driven-development-integration.sh
- Displays token usage automatically after each test run
2025-12-17 16:44:50 -08:00
Jesse Vincent
24ca8cd9d5 fix: verify skill usage via session transcript not text output
The skill instructions are internal and don't appear in user-facing
output. Updated verification to parse the session JSONL transcript
and check for actual tool usage:
- Skill tool invocation
- Task tool (subagents)
- TodoWrite (tracking)
- Implementation results
2025-12-17 16:44:50 -08:00
Jesse Vincent
8fbeca830a test: use bypassPermissions mode for unrestricted testing
dontAsk mode was auto-denying Write tool. Use bypassPermissions
instead to allow full tool access in this controlled test environment.
2025-12-17 16:44:50 -08:00
Jesse Vincent
67de772d7f test: auto-approve permissions with --permission-mode dontAsk
Headless tests need automatic permission approval to write files.
Using dontAsk mode to auto-approve permissions for test directory.
2025-12-17 16:44:50 -08:00
Jesse Vincent
cf72863792 test: add --add-dir flag for temp directory access
Claude needs explicit permission to access the temp test directory.
Added --add-dir flag to grant access to the test project.
2025-12-17 16:44:50 -08:00
Jesse Vincent
baa23b16bb test: show Claude output in real-time during integration test
Use tee instead of redirection so test output is visible during
execution while still being saved to file for analysis.
2025-12-17 16:44:50 -08:00
Jesse Vincent
0aba33be1c fix: run integration test from superpowers dir to access local dev skills
The superpowers-dev marketplace makes skills available only when running
from the plugin directory. Updated test to run claude from superpowers
directory while working on the test project.
2025-12-17 16:44:50 -08:00
Jesse Vincent
06310d6f5f Fix tests to use --allowed-tools flag
Claude Code headless mode requires --allowed-tools flag to actually
execute tool calls. Without it, Claude only responds as if it's doing
things but doesn't actually use tools.

Changes:
- Updated run_claude helper to accept allowed_tools parameter
- Updated integration test to use --allowed-tools=all
- This enables actual tool execution (Write, Task, Bash, etc.)

Now the integration test should actually execute the workflow instead
of just talking about it.
2025-12-17 16:44:50 -08:00
Jesse Vincent
dc11a093c3 Fix syntax error in integration test
Simplified command substitution to avoid shell parsing issues.
Instead of nested heredoc in command substitution, write prompt
to file first then read it.
2025-12-17 16:44:50 -08:00
Jesse Vincent
fa946ae465 Add integration test for subagent-driven-development
Created full end-to-end integration test that executes a real plan
and verifies the new workflow improvements actually work.

New test: test-subagent-driven-development-integration.sh
- Creates real Node.js test project
- Generates implementation plan (2 tasks)
- Executes using subagent-driven-development skill
- Verifies 8 key behaviors:
  1. Plan read once at beginning (not per task)
  2. Full task text provided to subagents (not file reading)
  3. Subagents perform self-review
  4. Spec compliance review before code quality
  5. Spec reviewer reads code independently
  6. Working implementation produced
  7. Tests pass
  8. No extra features added (spec compliance)

Integration tests are opt-in (--integration flag) due to 10-30 min runtime.

Updated run-skill-tests.sh:
- Added --integration flag
- Separates fast tests from integration tests
- Shows note when integration tests skipped

Updated README with integration test documentation.

Run with:
  ./run-skill-tests.sh                # Fast tests only
  ./run-skill-tests.sh --integration  # Include integration tests
2025-12-17 16:44:50 -08:00
Jesse Vincent
51a171cd14 Add Claude Code skills test framework
Created automated test suite for testing superpowers skills using
Claude Code CLI in headless mode.

New files:
- tests/claude-code/run-skill-tests.sh - Main test runner
- tests/claude-code/test-helpers.sh - Helper functions for testing
- tests/claude-code/test-subagent-driven-development.sh - First test
- tests/claude-code/README.md - Documentation

Test framework features:
- Run Claude Code with prompts and capture output
- Assertion helpers (contains, not_contains, count, order)
- Test project creation helpers
- Timeout support (default 5 minutes)
- Verbose mode for debugging
- Specific test selection

First test verifies subagent-driven-development skill:
- Skill loading
- Workflow ordering (spec compliance before code quality)
- Self-review requirements
- Plan reading efficiency (read once)
- Spec compliance reviewer skepticism
- Review loops
- Task context provision

Run with: cd tests/claude-code && ./run-skill-tests.sh
2025-12-17 16:44:50 -08:00