Files
superpowers/tests/claude-code/README.md
Jesse Vincent 471fe326c8 Lift superpowers:code-reviewer agent into the requesting-code-review skill
The plugin had a single named agent (`agents/code-reviewer.md`) used by
two skills, while every other reviewer/implementer subagent in the repo
is dispatched as `general-purpose` with the prompt template living
alongside its skill. That asymmetry had no upside and several costs:

- Two sources of truth for the code review checklist (the agent file
  and `requesting-code-review/code-reviewer.md`), both drifting
  independently.
- `Codex` users could not use the named agent directly; the codex-tools
  reference doc had a workaround section explaining how to flatten the
  named agent into a `worker` dispatch.
- No third-party reliance on `superpowers:code-reviewer` inside this
  repo.

Changes:
- Merge `agents/code-reviewer.md` (persona + checklist) and
  `skills/requesting-code-review/code-reviewer.md` (placeholder
  template) into a single self-contained Task-dispatch template,
  matching the shape of `implementer-prompt.md`,
  `spec-reviewer-prompt.md`, etc.
- Update `skills/requesting-code-review/SKILL.md` and
  `skills/subagent-driven-development/code-quality-reviewer-prompt.md`
  to dispatch `Task (general-purpose)` instead of the named agent.
- Drop the now-obsolete "Named agent dispatch" workaround sections from
  `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships
  any named agents, so those instructions documented nothing.
- Delete `agents/code-reviewer.md` and the empty `agents/` directory.

Tier 3 coverage for the change: a new behavioral test
`tests/claude-code/test-requesting-code-review.sh` plants real bugs
(SQL injection, plaintext password handling, credential logging) into
a tiny project, runs the actual `requesting-code-review` skill against
the working tree, and asserts the dispatched reviewer flags every
planted issue at Critical/Important severity and refuses to approve
the diff.

Verified end-to-end on this branch:
- The new test passes (5/5 assertions; reviewer caught all planted
  bugs and several others).
- The existing SDD integration test still passes (7/7 subagents
  dispatched, all as `general-purpose`; spec compliance still
  rejects extra features; produced code is correct).
- Session JSONLs confirm zero remaining `superpowers:code-reviewer`
  dispatches anywhere in the SDD pipeline.
2026-04-30 14:26:30 -07:00

171 lines
4.8 KiB
Markdown

# Claude Code Skills Tests
Automated tests for superpowers skills using Claude Code CLI.
## Overview
This test suite verifies that skills are loaded correctly and Claude follows them as expected. Tests invoke Claude Code in headless mode (`claude -p`) and verify the behavior.
## Requirements
- Claude Code CLI installed and in PATH (`claude --version` should work)
- Local superpowers plugin installed (see main README for installation)
## Running Tests
### Run all fast tests (recommended):
```bash
./run-skill-tests.sh
```
### Run integration tests (slow, 10-30 minutes):
```bash
./run-skill-tests.sh --integration
```
### Run specific test:
```bash
./run-skill-tests.sh --test test-subagent-driven-development.sh
```
### Run with verbose output:
```bash
./run-skill-tests.sh --verbose
```
### Set custom timeout:
```bash
./run-skill-tests.sh --timeout 1800 # 30 minutes for integration tests
```
## Test Structure
### test-helpers.sh
Common functions for skills testing:
- `run_claude "prompt" [timeout]` - Run Claude with prompt
- `assert_contains output pattern name` - Verify pattern exists
- `assert_not_contains output pattern name` - Verify pattern absent
- `assert_count output pattern count name` - Verify exact count
- `assert_order output pattern_a pattern_b name` - Verify order
- `create_test_project` - Create temp test directory
- `create_test_plan project_dir` - Create sample plan file
### Test Files
Each test file:
1. Sources `test-helpers.sh`
2. Runs Claude Code with specific prompts
3. Verifies expected behavior using assertions
4. Returns 0 on success, non-zero on failure
## Example Test
```bash
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/test-helpers.sh"
echo "=== Test: My Skill ==="
# Ask Claude about the skill
output=$(run_claude "What does the my-skill skill do?" 30)
# Verify response
assert_contains "$output" "expected behavior" "Skill describes behavior"
echo "=== All tests passed ==="
```
## Current Tests
### Fast Tests (run by default)
#### test-subagent-driven-development.sh
Tests skill content and requirements (~2 minutes):
- Skill loading and accessibility
- Workflow ordering (spec compliance before code quality)
- Self-review requirements documented
- Plan reading efficiency documented
- Spec compliance reviewer skepticism documented
- Review loops documented
- Task context provision documented
### Integration Tests (use --integration flag)
#### test-subagent-driven-development-integration.sh
Full workflow execution test (~10-30 minutes):
- Creates real test project with Node.js setup
- Creates implementation plan with 2 tasks
- Executes plan using subagent-driven-development
- Verifies actual behaviors:
- Plan read once at start (not per task)
- Full task text provided in subagent prompts
- Subagents perform self-review before reporting
- Spec compliance review happens before code quality
- Spec reviewer reads code independently
- Working implementation is produced
- Tests pass
- Proper git commits created
**What it tests:**
- The workflow actually works end-to-end
- Our improvements are actually applied
- Subagents follow the skill correctly
- Final code is functional and tested
#### test-requesting-code-review.sh
Behavioral test for the code reviewer subagent (~5 minutes):
- Builds a tiny project with a baseline commit
- Adds a second commit that plants two real bugs (SQL injection, plaintext password handling)
- Dispatches the code reviewer via the requesting-code-review skill
- Verifies the reviewer flags the planted bugs at Critical/Important severity and refuses to approve
**What it tests:**
- The skill actually dispatches a working code reviewer subagent
- The reviewer template produces reviewers that catch obvious security bugs
- The reviewer is not sycophantic — it does not approve a diff with planted Critical issues
## Adding New Tests
1. Create new test file: `test-<skill-name>.sh`
2. Source test-helpers.sh
3. Write tests using `run_claude` and assertions
4. Add to test list in `run-skill-tests.sh`
5. Make executable: `chmod +x test-<skill-name>.sh`
## Timeout Considerations
- Default timeout: 5 minutes per test
- Claude Code may take time to respond
- Adjust with `--timeout` if needed
- Tests should be focused to avoid long runs
## Debugging Failed Tests
With `--verbose`, you'll see full Claude output:
```bash
./run-skill-tests.sh --verbose --test test-subagent-driven-development.sh
```
Without verbose, only failures show output.
## CI/CD Integration
To run in CI:
```bash
# Run with explicit timeout for CI environments
./run-skill-tests.sh --timeout 900
# Exit code 0 = success, non-zero = failure
```
## Notes
- Tests verify skill *instructions*, not full execution
- Full workflow tests would be very slow
- Focus on verifying key skill requirements
- Tests should be deterministic
- Avoid testing implementation details