mirror of https://github.com/obra/superpowers.git synced 2026-05-01 06:29:05 +08:00

Files

Jesse Vincent 471fe326c8 Lift superpowers:code-reviewer agent into the requesting-code-review skill

The plugin had a single named agent (`agents/code-reviewer.md`) used by
two skills, while every other reviewer/implementer subagent in the repo
is dispatched as `general-purpose` with the prompt template living
alongside its skill. That asymmetry had no upside and several costs:

- Two sources of truth for the code review checklist (the agent file
  and `requesting-code-review/code-reviewer.md`), both drifting
  independently.
- `Codex` users could not use the named agent directly; the codex-tools
  reference doc had a workaround section explaining how to flatten the
  named agent into a `worker` dispatch.
- No third-party reliance on `superpowers:code-reviewer` inside this
  repo.

Changes:
- Merge `agents/code-reviewer.md` (persona + checklist) and
  `skills/requesting-code-review/code-reviewer.md` (placeholder
  template) into a single self-contained Task-dispatch template,
  matching the shape of `implementer-prompt.md`,
  `spec-reviewer-prompt.md`, etc.
- Update `skills/requesting-code-review/SKILL.md` and
  `skills/subagent-driven-development/code-quality-reviewer-prompt.md`
  to dispatch `Task (general-purpose)` instead of the named agent.
- Drop the now-obsolete "Named agent dispatch" workaround sections from
  `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships
  any named agents, so those instructions documented nothing.
- Delete `agents/code-reviewer.md` and the empty `agents/` directory.

Tier 3 coverage for the change: a new behavioral test
`tests/claude-code/test-requesting-code-review.sh` plants real bugs
(SQL injection, plaintext password handling, credential logging) into
a tiny project, runs the actual `requesting-code-review` skill against
the working tree, and asserts the dispatched reviewer flags every
planted issue at Critical/Important severity and refuses to approve
the diff.

Verified end-to-end on this branch:
- The new test passes (5/5 assertions; reviewer caught all planted
  bugs and several others).
- The existing SDD integration test still passes (7/7 subagents
  dispatched, all as `general-purpose`; spec compliance still
  rejects extra features; produced code is correct).
- Session JSONLs confirm zero remaining `superpowers:code-reviewer`
  dispatches anywhere in the SDD pipeline.

2026-04-30 14:26:30 -07:00

analyze-token-usage.py

fix: replace bare except with except Exception

2026-03-09 17:10:07 -07:00

README.md

Lift superpowers:code-reviewer agent into the requesting-code-review skill

2026-04-30 14:26:30 -07:00

run-skill-tests.sh

Lift superpowers:code-reviewer agent into the requesting-code-review skill

2026-04-30 14:26:30 -07:00

test-document-review-system.sh

Add end-to-end tests for document review system

2026-03-06 14:48:46 -08:00

test-helpers.sh

refactor: restructure specs and plans directories

2026-03-06 13:01:31 -08:00

test-requesting-code-review.sh

Lift superpowers:code-reviewer agent into the requesting-code-review skill

2026-04-30 14:26:30 -07:00

test-subagent-driven-development-integration.sh

fix(tests): make SDD integration test actually run its assertions

2026-04-28 12:20:31 -07:00

test-subagent-driven-development.sh

docs: change main branch red flag to require explicit user consent

2026-01-29 15:12:50 -08:00

test-worktree-native-preference.sh

fix: Step 1a validated through TDD — explicit naming + consent bridge (PRI-974)

2026-04-06 17:13:19 -07:00

README.md

Claude Code Skills Tests

Automated tests for superpowers skills using Claude Code CLI.

Overview

This test suite verifies that skills are loaded correctly and Claude follows them as expected. Tests invoke Claude Code in headless mode (claude -p) and verify the behavior.

Requirements

Claude Code CLI installed and in PATH (claude --version should work)
Local superpowers plugin installed (see main README for installation)

Running Tests

Run all fast tests (recommended):

./run-skill-tests.sh

Run integration tests (slow, 10-30 minutes):

./run-skill-tests.sh --integration

Run specific test:

./run-skill-tests.sh --test test-subagent-driven-development.sh

Run with verbose output:

./run-skill-tests.sh --verbose

Set custom timeout:

./run-skill-tests.sh --timeout 1800  # 30 minutes for integration tests

Test Structure

test-helpers.sh

Common functions for skills testing:

run_claude "prompt" [timeout] - Run Claude with prompt
assert_contains output pattern name - Verify pattern exists
assert_not_contains output pattern name - Verify pattern absent
assert_count output pattern count name - Verify exact count
assert_order output pattern_a pattern_b name - Verify order
create_test_project - Create temp test directory
create_test_plan project_dir - Create sample plan file

Test Files

Each test file:

Sources test-helpers.sh
Runs Claude Code with specific prompts
Verifies expected behavior using assertions
Returns 0 on success, non-zero on failure

Example Test

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/test-helpers.sh"

echo "=== Test: My Skill ==="

# Ask Claude about the skill
output=$(run_claude "What does the my-skill skill do?" 30)

# Verify response
assert_contains "$output" "expected behavior" "Skill describes behavior"

echo "=== All tests passed ==="

Current Tests

Fast Tests (run by default)

test-subagent-driven-development.sh

Tests skill content and requirements (~2 minutes):

Skill loading and accessibility
Workflow ordering (spec compliance before code quality)
Self-review requirements documented
Plan reading efficiency documented
Spec compliance reviewer skepticism documented
Review loops documented
Task context provision documented

Integration Tests (use --integration flag)

test-subagent-driven-development-integration.sh

Full workflow execution test (~10-30 minutes):

Creates real test project with Node.js setup
Creates implementation plan with 2 tasks
Executes plan using subagent-driven-development
Verifies actual behaviors:
- Plan read once at start (not per task)
- Full task text provided in subagent prompts
- Subagents perform self-review before reporting
- Spec compliance review happens before code quality
- Spec reviewer reads code independently
- Working implementation is produced
- Tests pass
- Proper git commits created

What it tests:

The workflow actually works end-to-end
Our improvements are actually applied
Subagents follow the skill correctly
Final code is functional and tested

test-requesting-code-review.sh

Behavioral test for the code reviewer subagent (~5 minutes):

Builds a tiny project with a baseline commit
Adds a second commit that plants two real bugs (SQL injection, plaintext password handling)
Dispatches the code reviewer via the requesting-code-review skill
Verifies the reviewer flags the planted bugs at Critical/Important severity and refuses to approve

What it tests:

The skill actually dispatches a working code reviewer subagent
The reviewer template produces reviewers that catch obvious security bugs
The reviewer is not sycophantic — it does not approve a diff with planted Critical issues

Adding New Tests

Create new test file: test-<skill-name>.sh
Source test-helpers.sh
Write tests using run_claude and assertions
Add to test list in run-skill-tests.sh
Make executable: chmod +x test-<skill-name>.sh

Timeout Considerations

Default timeout: 5 minutes per test
Claude Code may take time to respond
Adjust with --timeout if needed
Tests should be focused to avoid long runs

Debugging Failed Tests

With --verbose, you'll see full Claude output:

./run-skill-tests.sh --verbose --test test-subagent-driven-development.sh

Without verbose, only failures show output.

CI/CD Integration

To run in CI:

# Run with explicit timeout for CI environments
./run-skill-tests.sh --timeout 900

# Exit code 0 = success, non-zero = failure

Notes

Tests verify skill instructions, not full execution
Full workflow tests would be very slow
Focus on verifying key skill requirements
Tests should be deterministic
Avoid testing implementation details