Compare commits

..

20 Commits

Author SHA1 Message Date
Drew Ritter
bad4708a7b evals: use pre-commit hooks 2026-05-06 15:41:52 -07:00
Drew Ritter
ec9b96a7bf evals: add Gemini 2.5 Flash backend 2026-05-06 15:09:59 -07:00
Drew Ritter
2d4cdea2bb evals: drop drill source marker 2026-05-06 14:55:14 -07:00
Drew Ritter
af465f9687 evals: remove unreleased wave scenarios 2026-05-06 14:43:08 -07:00
Jesse Vincent
e4191c3609 Address adversarial review findings
- evals/README.md, evals/CLAUDE.md: fix uv install command from
  'uv sync --dev' to 'uv sync --extra dev'. Drill's pyproject.toml
  uses [project.optional-dependencies], so --dev is a no-op for
  pytest/ruff/ty; --extra dev is the correct invocation.
- tests/claude-code/run-skill-tests.sh: drop test-requesting-code-review.sh
  from integration_tests array (file deleted earlier in this branch).
- tests/claude-code/README.md: replace test-requesting-code-review.sh
  section with test-worktree-native-preference.sh (the worktree test
  is kept; the code-review test was lifted into drill).
- docs/testing.md, CLAUDE.md: remove "Copilot CLI" from the harness
  list. evals/backends/ has claude*, codex, gemini configs but no
  copilot.yaml, so the claim was unsupported.

Adversarial review credit: reviewer #2 found four legitimate issues
(uv-sync, run-skill-tests stale ref, README stale ref via #1, and
Copilot CLI fabrication); reviewer #1 found two distinct issues
(run-skill-tests + tests/claude-code/README.md). Reviewer #2 wins
this round.
2026-05-06 12:41:28 -07:00
Jesse Vincent
d545612825 docs: introduce evals/ as the canonical skill-behavior eval harness
- docs/testing.md split into Plugin tests + Skill behavior evals.
  Plugin tests section enumerates the bash tests that survive
  (kept by drill-coverage analysis or as describe-skill tests).
- CLAUDE.md adds Eval harness section pointing at evals/.
- README.md Contributing section mentions evals/ alongside tests/.
- .gitignore adds evals/{results,.venv,.env} as belt-and-suspenders
  (evals/.gitignore covers these locally; root-level entries help
  tooling that does not recurse into nested ignore files).
2026-05-06 12:33:10 -07:00
Jesse Vincent
b43d14f87f docs: annotate dated artifacts referencing lifted bash tests
- RELEASE-NOTES.md: note that test-requesting-code-review.sh and
  test-document-review-system.sh were lifted into drill scenarios
  on 2026-05-06; references are preserved as dated artifacts.
- docs/superpowers/plans/2026-03-23-codex-app-compatibility.md:
  note that tests/skill-triggering/ was lifted into drill scenarios
  on 2026-05-06; the run-all.sh reference is a dated artifact.

Subagent second-pass scrub confirmed no other active references in
the tree (excluding evals/ and the spec/plan for this work itself).
2026-05-06 12:32:00 -07:00
Jesse Vincent
11d5db1b22 tests: annotate three kept bash tests with drill coverage notes
- test-worktree-native-preference.sh: drill covers PRESSURE phase only;
  RED + GREEN baselines have no drill counterpart and are kept so
  the RED-GREEN-REFACTOR validation remains rerunnable end-to-end.
- test-subagent-driven-development-integration.sh: drill covers the
  YAGNI subset (forbidden exports + reviewer-as-gate). Bash adds
  >=3 commits, >=2 subagent dispatches, TodoWrite usage, test file
  existence check, and token-budget telemetry. Kept until drill
  scenario covers those or they are retired.
- test-subagent-driven-development.sh: tests agent's ability to
  *describe* SDD (string matches against expected keywords). Drill
  scenarios test behavior, not description-recall. Kept by design.

Subagent verification recorded in commit messages of subsequent
deletions; gap analyses driving these annotations are also in the
verification subagent reports for the gating sweep.
2026-05-06 12:29:59 -07:00
Jesse Vincent
051bff661b tests: remove test-requesting-code-review.sh (covered by drill code-review-catches-planted-bugs)
Subagent verification: every bash assertion (skill invocation,
subagent dispatch, SQL injection flagged, credential handling
flagged, no merge approval) maps to drill verify checks. Drill is
stricter: bundles severity (Critical/Important) into the same
criteria as the finding itself (bash split severity into a separate
test). Setup parity covered (src/db.js with string concat + identity
hash, two commits).

The drill scenario header explicitly says it is the
"cross-harness, semantically-judged replacement for the bash test."
2026-05-06 12:28:40 -07:00
Jesse Vincent
dc6255291b tests: remove test-document-review-system.sh (covered by drill spec-reviewer-catches-planted-flaws)
Subagent verification: every bash assertion (TODO in Requirements
section flagged, "specified later" deferral flagged, Issues section
present, did-not-approve verdict) maps to drill verify.criteria
entries. Setup parity covered by setup.assertions (test-feature-design.md
exists with TODO + 'specified later' content). Drill is stricter:
asserts tool-called Agent (subagent dispatch) which the bash test
did not check.
2026-05-06 12:28:40 -07:00
Jesse Vincent
d337f4a18a tests: remove subagent-driven-dev fixtures (covered by drill sdd-go-fractals + sdd-svelte-todo)
The bash test had ZERO output assertions — it just ran claude -p
and printed token usage. Drill's scenarios are strictly more
rigorous:

go-fractals: skill-called SDD + tool-called Agent + go test ./...
passes + cmd/fractals/main.go exists + >=4 commits + LLM criteria
verifying real SDD workflow.

svelte-todo: skill-called SDD + tool-called Agent + npm test passes
+ playwright e2e passes + package.json + svelte.config.js or
vite.config.ts + >=4 commits + LLM criteria.

design.md and plan.md are byte-identical between bash fixtures and
drill fixtures (evals/fixtures/sdd-{go-fractals,svelte-todo}/).
Drill's setup helper (scaffold_sdd_*) forces git init -b main
(stricter than bash's reliance on init.defaultBranch). The
.claude/settings.local.json from bash scaffold.sh is unnecessary
for drill since permissions are managed via backend YAML.

Subagent verification: SAFE TO DELETE for both.
2026-05-06 12:27:31 -07:00
Jesse Vincent
6fe9cf7515 tests: remove run-claude-describes-sdd.sh (covered by drill mid-conversation-skill-invocation)
Subagent verification: every bash assertion (Skill tool invoked +
specific skill name 'subagent-driven-development' loaded after the
agent describes it conversationally in turn 1) maps to the drill
scenario's skill-called assertion + criteria paragraph requiring
the skill to fire in direct response to the second user message.
Drill additionally asserts tool-called Agent (subagent dispatch)
which is stricter than the bash test.

Other runners in tests/explicit-skill-requests/ (haiku, multiturn,
extended-multiturn) and their prompt files are preserved — they
have no drill coverage and exercise different behaviors.
2026-05-06 12:25:46 -07:00
Jesse Vincent
3177c87aa8 tests: remove skill-triggering bash prompts (covered by drill triggering-* scenarios)
Subagent verification confirmed each prompt's intent matches its
corresponding drill scenario's turns[].intent verbatim, and each
scenario has both a deterministic skill-called assertion and a
semantic LLM criterion confirming the matching skill was loaded
(actually a stronger check than the bash test, which only confirms
the skill fires anywhere in the stream).

All 6 prompts deleted. The runner had no remaining prompts to drive,
so run-test.sh and run-all.sh deleted as well.
2026-05-06 12:24:53 -07:00
Jesse Vincent
a94d2cc414 evals: drop SUPERPOWERS_ROOT setup step from README/CLAUDE
The cli.py helper now defaults the env var. Mention as override only.
2026-05-06 12:21:35 -07:00
Jesse Vincent
dcffaa087a evals: drop SUPERPOWERS_ROOT from codex/gemini required_env
These backends only read SUPERPOWERS_ROOT via engine.py/setup.py's
os.environ access, which the new cli.py default helper supplies
automatically. claude*.yaml keep SUPERPOWERS_ROOT in required_env
because they interpolate ${SUPERPOWERS_ROOT} into --plugin-dir args.
2026-05-06 12:20:47 -07:00
Jesse Vincent
b3817bba4f evals: default SUPERPOWERS_ROOT to parent of evals/ if unset
Adds _set_superpowers_root_default() to drill/cli.py, called at
module import after load_dotenv(). PROJECT_ROOT resolves to evals/
post-lift; its parent is the superpowers repo root, which is the
correct value for SUPERPOWERS_ROOT.

Existing env values are respected as overrides via os.environ.setdefault.

Tests:
- helper sets default when var is unset
- helper does not override when var is already set
2026-05-06 12:19:39 -07:00
Jesse Vincent
3c046f579e Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 12:15:46 -07:00
Jesse Vincent
895bb732d5 Plan: lift drill into superpowers as evals/
15-task implementation plan derived from the design spec at
docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md.

Each task is bite-sized (2-5 min steps) with exact commands, exact
file paths, and exact code where required. Subagent verification
gates per the spec are written out as concrete prompt templates.

Self-review:
- Spec coverage: every spec section maps to a task
- Placeholder scan: no TBD/TODO/placeholder/fill-in-later language
- Type consistency: helper named _set_superpowers_root_default
  consistently; drill SHA recorded in evals/.drill-source-sha
  consistently
2026-05-06 12:08:58 -07:00
Jesse Vincent
cf5914a31f Spec: address adversarial review findings
Two parallel reviewers raised legitimate issues against the lift-drill-
into-evals spec. Updates:

- Coverage map for tests/explicit-skill-requests/ corrected: 6 run-*.sh
  scripts + prompts, not "2 scenarios cover all". Several scripts
  (Haiku, multi-turn, please-use-brainstorming, use-systematic-debugging)
  have no drill counterpart and stay.
- tests/claude-code/test-subagent-driven-development.sh marked as
  meta/documentation test (asks agent to describe SDD); no drill
  scenario covers description tests; defaults to keep.
- Path-defaults section now shows verified evidence: PROJECT_ROOT
  resolves to evals/ post-move; only claude*.yaml substitute
  ${SUPERPOWERS_ROOT} in args (codex/gemini use it via os.environ
  in pre-run hooks); helper invocation order specified (after
  load_dotenv, before click definitions).
- Step 2 copy uses explicit rsync excludes (.git, .venv, results,
  .env, __pycache__, *.egg-info, .private-journal); checksum-level
  verification rather than file-count.
- Drill SHA recorded at copy time in commit message and
  evals/.drill-source-sha for divergence detection.
- evals/tests/ pytest suite added to verification protocol.
- Reference scrub list expanded: RELEASE-NOTES.md,
  docs/superpowers/plans/, .codex-plugin/ (corrected from .codex/),
  lefthook.yml. Excluded dirs called out (node_modules/, .venv/,
  evals/).
- Historical plan docs / RELEASE-NOTES handling: annotate, don't
  rewrite.
- evals/lefthook.yml move documented (drill ships its own;
  contributors run cd evals && lefthook run pre-commit manually).
- PR description checklist includes archival action item for
  obra/drill post-merge.

False finding rejected: svelte-todo fixture is complete on disk
(design.md + plan.md + scaffold.sh present); reviewer #1 #3 dropped.
2026-05-06 12:03:24 -07:00
Jesse Vincent
cf34cef01e Spec: lift drill into superpowers as evals/
Records scope, branching, architecture, deletion gate, verification
protocol, path/config edits, migration ordering, and post-implementation
verification. Frames CI integration, scenario co-location, and Python
package rename as deferred work.

Per-file deletion of bash tests under superpowers/tests/ is gated by a
subagent that compares each bash assertion to its drill scenario's
verify block. Default keeps the bash test if any assertion is unmatched.

Branching: independent off dev (f/evals-lift), not stacked on
f/cross-platform.
2026-05-06 11:54:12 -07:00
11 changed files with 75 additions and 124 deletions

View File

@@ -19,5 +19,7 @@
"workflows" "workflows"
], ],
"skills": "./skills/", "skills": "./skills/",
"agents": "./agents/",
"commands": "./commands/",
"hooks": "./hooks/hooks-cursor.json" "hooks": "./hooks/hooks-cursor.json"
} }

View File

@@ -275,16 +275,23 @@ If no native tool is available, create a worktree manually using git.
Follow this priority order: Follow this priority order:
1. **Check your instructions for a worktree directory preference.** If specified, use it without asking. 1. **Check existing directories:**
2. **Check existing project-local directories:**
```bash ```bash
ls -d .worktrees 2>/dev/null # Preferred (hidden) ls -d .worktrees 2>/dev/null # Preferred (hidden)
ls -d worktrees 2>/dev/null # Alternative ls -d worktrees 2>/dev/null # Alternative
``` ```
If found, use that directory. If both exist, `.worktrees` wins. If found, use that directory. If both exist, `.worktrees` wins.
3. **Default to `.worktrees/`.** 2. **Check for existing global directory:**
```bash
project=$(basename "$(git rev-parse --show-toplevel)")
ls -d ~/.config/superpowers/worktrees/$project 2>/dev/null
```
If found, use it (backward compatibility with legacy global path).
3. **Check your instructions for a worktree directory preference.** If specified, use it without asking.
4. **Default to `.worktrees/`.**
#### Safety Verification (project-local directories only) #### Safety Verification (project-local directories only)
@@ -298,11 +305,16 @@ git check-ignore -q .worktrees 2>/dev/null || git check-ignore -q worktrees 2>/d
**Why critical:** Prevents accidentally committing worktree contents to repository. **Why critical:** Prevents accidentally committing worktree contents to repository.
Global directories (`~/.config/superpowers/worktrees/`) need no verification.
#### Create the Worktree #### Create the Worktree
```bash ```bash
project=$(basename "$(git rev-parse --show-toplevel)")
# Determine path based on chosen location # Determine path based on chosen location
path="$LOCATION/$BRANCH_NAME" # For project-local: path="$LOCATION/$BRANCH_NAME"
# For global: path="~/.config/superpowers/worktrees/$project/$BRANCH_NAME"
git worktree add "$path" -b "$BRANCH_NAME" git worktree add "$path" -b "$BRANCH_NAME"
cd "$path" cd "$path"
@@ -375,6 +387,7 @@ Ready to implement <feature-name>
| `worktrees/` exists | Use it (verify ignored) | | `worktrees/` exists | Use it (verify ignored) |
| Both exist | Use `.worktrees/` | | Both exist | Use `.worktrees/` |
| Neither exists | Check instruction file, then default `.worktrees/` | | Neither exists | Check instruction file, then default `.worktrees/` |
| Global path exists | Use it (backward compat) |
| Directory not ignored | Add to .gitignore + commit | | Directory not ignored | Add to .gitignore + commit |
| Permission error on create | Sandbox fallback, work in place | | Permission error on create | Sandbox fallback, work in place |
| Tests fail during baseline | Report failures + ask | | Tests fail during baseline | Report failures + ask |
@@ -451,7 +464,7 @@ git commit -m "feat: rewrite using-git-worktrees with detect-and-defer (PRI-974)
Step 0: GIT_DIR != GIT_COMMON detection (skip if already isolated) Step 0: GIT_DIR != GIT_COMMON detection (skip if already isolated)
Step 0 consent: opt-in prompt before creating worktree (#991) Step 0 consent: opt-in prompt before creating worktree (#991)
Step 1a: native tool preference (short, first, declarative) Step 1a: native tool preference (short, first, declarative)
Step 1b: git worktree fallback with project-local directory policy Step 1b: git worktree fallback with hooks symlink and legacy path compat
Submodule guard prevents false detection Submodule guard prevents false detection
Platform-neutral instruction file references (#1049)" Platform-neutral instruction file references (#1049)"
``` ```
@@ -650,7 +663,7 @@ WORKTREE_PATH=$(git rev-parse --show-toplevel)
**If `GIT_DIR == GIT_COMMON`:** Normal repo, no worktree to clean up. Done. **If `GIT_DIR == GIT_COMMON`:** Normal repo, no worktree to clean up. Done.
**If worktree path is under `.worktrees/` or `worktrees/`:** Superpowers created this worktree — we own cleanup. **If worktree path is under `.worktrees/` or `~/.config/superpowers/worktrees/`:** Superpowers created this worktree — we own cleanup.
```bash ```bash
MAIN_ROOT=$(git -C "$(git rev-parse --git-common-dir)/.." rev-parse --show-toplevel) MAIN_ROOT=$(git -C "$(git rev-parse --git-common-dir)/.." rev-parse --show-toplevel)
@@ -694,7 +707,7 @@ git worktree prune # Self-healing: clean up any stale registrations
**Cleaning up harness-owned worktrees** **Cleaning up harness-owned worktrees**
- **Problem:** Removing a worktree the harness created causes phantom state - **Problem:** Removing a worktree the harness created causes phantom state
- **Fix:** Only clean up worktrees under `.worktrees/` or `worktrees/` - **Fix:** Only clean up worktrees under `.worktrees/` or `~/.config/superpowers/worktrees/`
**No confirmation for discard** **No confirmation for discard**
- **Problem:** Accidentally delete work - **Problem:** Accidentally delete work

View File

@@ -46,7 +46,7 @@ The skill describes the goal ("ensure work happens in an isolated workspace") an
### Provenance-based ownership ### Provenance-based ownership
Whoever creates the worktree owns its cleanup. If the harness created it, superpowers doesn't touch it. If superpowers created it (via git fallback), superpowers cleans it up. The heuristic: if the worktree lives under `.worktrees/` or `worktrees/`, superpowers owns it. Anything else (`.claude/worktrees/`, `~/.codex/worktrees/`, `.gemini/worktrees/`, or old user-global Superpowers paths) belongs to the harness or user and is left alone. Whoever creates the worktree owns its cleanup. If the harness created it, superpowers doesn't touch it. If superpowers created it (via git fallback), superpowers cleans it up. The heuristic: if the worktree lives under `.worktrees/` or `~/.config/superpowers/worktrees/`, superpowers owns it. Anything else (`.claude/worktrees/`, `~/.codex/worktrees/`, `.gemini/worktrees/`) belongs to the harness.
## Design ## Design
@@ -110,11 +110,12 @@ File splitting (Step 1b in a separate skill) was tested and proven unnecessary.
When no native tool is available, create a worktree manually. When no native tool is available, create a worktree manually.
**Directory selection** (priority order): **Directory selection** (priority order):
1. Check the project's agent instruction file (CLAUDE.md, GEMINI.md, AGENTS.md, .cursorrules, or equivalent) for a worktree directory preference. 1. Check for existing `.worktrees/` or `worktrees/` directory — if found, use it. If both exist, `.worktrees/` wins.
2. Check for existing `.worktrees/` or `worktrees/` directory — if found, use it. If both exist, `.worktrees/` wins. 2. Check for existing `~/.config/superpowers/worktrees/<project>/` directory — if found, use it (backward compatibility with legacy global path).
3. Default to `.worktrees/`. 3. Check the project's agent instruction file (CLAUDE.md, GEMINI.md, AGENTS.md, .cursorrules, or equivalent) for a worktree directory preference.
4. Default to `.worktrees/`.
No interactive directory selection prompt. Old user-global Superpowers worktree paths are not detected or offered; new manual worktrees are project-local unless the user explicitly specifies another location. No interactive directory selection prompt. The global path (`~/.config/superpowers/worktrees/`) is no longer offered as a choice to new users, but existing worktrees at that location are detected and used for backward compatibility.
**Safety verification** (project-local directories only): **Safety verification** (project-local directories only):
@@ -231,7 +232,7 @@ if GIT_DIR == GIT_COMMON:
# Normal repo, no worktree to clean up # Normal repo, no worktree to clean up
done done
if worktree path is under .worktrees/ or worktrees/: if worktree path is under .worktrees/ or ~/.config/superpowers/worktrees/:
# Superpowers created it — we own cleanup # Superpowers created it — we own cleanup
cd to main repo root # Bug #238 fix cd to main repo root # Bug #238 fix
git worktree remove <path> git worktree remove <path>
@@ -317,7 +318,7 @@ As of 2026-04-06, Claude Code is the only harness with an agent-callable mid-ses
### Provenance heuristic ### Provenance heuristic
The `.worktrees/` or `worktrees/` = ours, anything else = hands off` heuristic works for every current harness. If a future harness adopts one of those project-local directories as its convention, we'd have a false positive (superpowers tries to clean up a harness-owned worktree). Similarly, if a user manually runs `git worktree add .worktrees/experiment` without superpowers, we'd incorrectly claim ownership. Both are low risk — every harness uses branded paths, and manual `.worktrees/` creation is unlikely — but worth noting. The `.worktrees/` or `~/.config/superpowers/worktrees/` = ours, anything else = hands off` heuristic works for every current harness. If a future harness adopts `.worktrees/` as its convention, we'd have a false positive (superpowers tries to clean up a harness-owned worktree). Similarly, if a user manually runs `git worktree add .worktrees/experiment` without superpowers, we'd incorrectly claim ownership. Both are low risk — every harness uses branded paths, and manual `.worktrees/` creation is unlikely — but worth noting.
### Detached HEAD finishing ### Detached HEAD finishing

View File

@@ -180,7 +180,7 @@ WORKTREE_PATH=$(git rev-parse --show-toplevel)
**If `GIT_DIR == GIT_COMMON`:** Normal repo, no worktree to clean up. Done. **If `GIT_DIR == GIT_COMMON`:** Normal repo, no worktree to clean up. Done.
**If worktree path is under `.worktrees/` or `worktrees/`:** Superpowers created this worktree — we own cleanup. **If worktree path is under `.worktrees/`, `worktrees/`, or `~/.config/superpowers/worktrees/`:** Superpowers created this worktree — we own cleanup.
```bash ```bash
MAIN_ROOT=$(git -C "$(git rev-parse --git-common-dir)/.." rev-parse --show-toplevel) MAIN_ROOT=$(git -C "$(git rev-parse --git-common-dir)/.." rev-parse --show-toplevel)
@@ -224,7 +224,7 @@ git worktree prune # Self-healing: clean up any stale registrations
**Cleaning up harness-owned worktrees** **Cleaning up harness-owned worktrees**
- **Problem:** Removing a worktree the harness created causes phantom state - **Problem:** Removing a worktree the harness created causes phantom state
- **Fix:** Only clean up worktrees under `.worktrees/` or `worktrees/` - **Fix:** Only clean up worktrees under `.worktrees/`, `worktrees/`, or `~/.config/superpowers/worktrees/`
**No confirmation for discard** **No confirmation for discard**
- **Problem:** Accidentally delete work - **Problem:** Accidentally delete work

View File

@@ -356,7 +356,7 @@ Never fix bugs without a test.
## Testing Anti-Patterns ## Testing Anti-Patterns
When adding mocks or test utilities, read [testing-anti-patterns.md](testing-anti-patterns.md) to avoid common pitfalls: When adding mocks or test utilities, read @testing-anti-patterns.md to avoid common pitfalls:
- Testing mock behavior instead of real behavior - Testing mock behavior instead of real behavior
- Adding test-only methods to production classes - Adding test-only methods to production classes
- Mocking without understanding dependencies - Mocking without understanding dependencies

View File

@@ -30,7 +30,7 @@ BRANCH=$(git branch --show-current)
git rev-parse --show-superproject-working-tree 2>/dev/null git rev-parse --show-superproject-working-tree 2>/dev/null
``` ```
**If `GIT_DIR != GIT_COMMON` (and not a submodule):** You are already in a linked worktree. Skip to Step 2 (Project Setup). Do NOT create another worktree. **If `GIT_DIR != GIT_COMMON` (and not a submodule):** You are already in a linked worktree. Skip to Step 3 (Project Setup). Do NOT create another worktree.
Report with branch state: Report with branch state:
- On a branch: "Already in isolated workspace at `<path>` on branch `<name>`." - On a branch: "Already in isolated workspace at `<path>` on branch `<name>`."
@@ -42,7 +42,7 @@ Has the user already indicated their worktree preference in your instructions? I
> "Would you like me to set up an isolated worktree? It protects your current branch from changes." > "Would you like me to set up an isolated worktree? It protects your current branch from changes."
Honor any existing declared preference without asking. If the user declines consent, work in place and skip to Step 2. Honor any existing declared preference without asking. If the user declines consent, work in place and skip to Step 3.
## Step 1: Create Isolated Workspace ## Step 1: Create Isolated Workspace
@@ -50,7 +50,7 @@ Honor any existing declared preference without asking. If the user declines cons
### 1a. Native Worktree Tools (preferred) ### 1a. Native Worktree Tools (preferred)
The user has asked for an isolated workspace (Step 0 consent). Do you already have a way to create a worktree? It might be a tool with a name like `EnterWorktree`, `WorktreeCreate`, a `/worktree` command, or a `--worktree` flag. If you do, use it and skip to Step 2. The user has asked for an isolated workspace (Step 0 consent). Do you already have a way to create a worktree? It might be a tool with a name like `EnterWorktree`, `WorktreeCreate`, a `/worktree` command, or a `--worktree` flag. If you do, use it and skip to Step 3.
Native tools handle directory placement, branch creation, and cleanup automatically. Using `git worktree add` when you have a native tool creates phantom state your harness can't see or manage. Native tools handle directory placement, branch creation, and cleanup automatically. Using `git worktree add` when you have a native tool creates phantom state your harness can't see or manage.
@@ -73,7 +73,14 @@ Follow this priority order. Explicit user preference always beats observed files
``` ```
If found, use it. If both exist, `.worktrees` wins. If found, use it. If both exist, `.worktrees` wins.
3. **If there is no other guidance available**, default to `.worktrees/` at the project root. 3. **Check for an existing global directory:**
```bash
project=$(basename "$(git rev-parse --show-toplevel)")
ls -d ~/.config/superpowers/worktrees/$project 2>/dev/null
```
If found, use it (backward compatibility with legacy global path).
4. **If there is no other guidance available**, default to `.worktrees/` at the project root.
#### Safety Verification (project-local directories only) #### Safety Verification (project-local directories only)
@@ -87,11 +94,16 @@ git check-ignore -q .worktrees 2>/dev/null || git check-ignore -q worktrees 2>/d
**Why critical:** Prevents accidentally committing worktree contents to repository. **Why critical:** Prevents accidentally committing worktree contents to repository.
Global directories (`~/.config/superpowers/worktrees/`) need no verification.
#### Create the Worktree #### Create the Worktree
```bash ```bash
project=$(basename "$(git rev-parse --show-toplevel)")
# Determine path based on chosen location # Determine path based on chosen location
path="$LOCATION/$BRANCH_NAME" # For project-local: path="$LOCATION/$BRANCH_NAME"
# For global: path="~/.config/superpowers/worktrees/$project/$BRANCH_NAME"
git worktree add "$path" -b "$BRANCH_NAME" git worktree add "$path" -b "$BRANCH_NAME"
cd "$path" cd "$path"
@@ -99,7 +111,7 @@ cd "$path"
**Sandbox fallback:** If `git worktree add` fails with a permission error (sandbox denial), tell the user the sandbox blocked worktree creation and you're working in the current directory instead. Then run setup and baseline tests in place. **Sandbox fallback:** If `git worktree add` fails with a permission error (sandbox denial), tell the user the sandbox blocked worktree creation and you're working in the current directory instead. Then run setup and baseline tests in place.
## Step 2: Project Setup ## Step 3: Project Setup
Auto-detect and run appropriate setup: Auto-detect and run appropriate setup:
@@ -118,7 +130,7 @@ if [ -f pyproject.toml ]; then poetry install; fi
if [ -f go.mod ]; then go mod download; fi if [ -f go.mod ]; then go mod download; fi
``` ```
## Step 3: Verify Clean Baseline ## Step 4: Verify Clean Baseline
Run tests to ensure workspace starts clean: Run tests to ensure workspace starts clean:
@@ -151,6 +163,7 @@ Ready to implement <feature-name>
| `worktrees/` exists | Use it (verify ignored) | | `worktrees/` exists | Use it (verify ignored) |
| Both exist | Use `.worktrees/` | | Both exist | Use `.worktrees/` |
| Neither exists | Check instruction file, then default `.worktrees/` | | Neither exists | Check instruction file, then default `.worktrees/` |
| Global path exists | Use it (backward compat) |
| Directory not ignored | Add to .gitignore + commit | | Directory not ignored | Add to .gitignore + commit |
| Permission error on create | Sandbox fallback, work in place | | Permission error on create | Sandbox fallback, work in place |
| Tests fail during baseline | Report failures + ask | | Tests fail during baseline | Report failures + ask |
@@ -176,7 +189,7 @@ Ready to implement <feature-name>
### Assuming directory location ### Assuming directory location
- **Problem:** Creates inconsistency, violates project conventions - **Problem:** Creates inconsistency, violates project conventions
- **Fix:** Follow priority: explicit instructions > existing project-local directory > default - **Fix:** Follow priority: existing > global legacy > instruction file > default
### Proceeding with failing tests ### Proceeding with failing tests
@@ -196,7 +209,7 @@ Ready to implement <feature-name>
**Always:** **Always:**
- Run Step 0 detection first - Run Step 0 detection first
- Prefer native tools over git fallback - Prefer native tools over git fallback
- Follow directory priority: explicit instructions > existing project-local directory > default - Follow directory priority: existing > global legacy > instruction file > default
- Verify directory is ignored for project-local - Verify directory is ignored for project-local
- Auto-detect and run project setup - Auto-detect and run project setup
- Verify clean test baseline - Verify clean test baseline

View File

@@ -553,7 +553,7 @@ Run same scenarios WITH skill. Agent should now comply.
Agent found new rationalization? Add explicit counter. Re-test until bulletproof. Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
**Testing methodology:** See [testing-skills-with-subagents.md](testing-skills-with-subagents.md) for the complete testing methodology: **Testing methodology:** See @testing-skills-with-subagents.md for the complete testing methodology:
- How to write pressure scenarios - How to write pressure scenarios
- Pressure types (time, sunk cost, authority, exhaustion) - Pressure types (time, sunk cost, authority, exhaustion)
- Plugging holes systematically - Plugging holes systematically

View File

@@ -25,7 +25,7 @@ fi
# Parse command line arguments # Parse command line arguments
VERBOSE=false VERBOSE=false
SPECIFIC_TEST="" SPECIFIC_TEST=""
TIMEOUT=600 # Default 10 minute timeout per test TIMEOUT=300 # Default 5 minute timeout per test
RUN_INTEGRATION=false RUN_INTEGRATION=false
while [[ $# -gt 0 ]]; do while [[ $# -gt 0 ]]; do
@@ -73,7 +73,6 @@ done
# List of skill tests to run (fast unit tests) # List of skill tests to run (fast unit tests)
tests=( tests=(
"test-worktree-path-policy.sh"
"test-subagent-driven-development.sh" "test-subagent-driven-development.sh"
) )

View File

@@ -9,14 +9,14 @@ run_claude() {
local allowed_tools="${3:-}" local allowed_tools="${3:-}"
local output_file=$(mktemp) local output_file=$(mktemp)
# Build command as an argv array so timeout wraps claude directly. # Build command
local cmd=(claude -p "$prompt") local cmd="claude -p \"$prompt\""
if [ -n "$allowed_tools" ]; then if [ -n "$allowed_tools" ]; then
cmd+=(--allowed-tools="$allowed_tools") cmd="$cmd --allowed-tools=$allowed_tools"
fi fi
# Run Claude in headless mode with timeout # Run Claude in headless mode with timeout
if timeout "$timeout" "${cmd[@]}" > "$output_file" 2>&1; then if timeout "$timeout" bash -c "$cmd" > "$output_file" 2>&1; then
cat "$output_file" cat "$output_file"
rm -f "$output_file" rm -f "$output_file"
return 0 return 0

View File

@@ -12,15 +12,13 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$SCRIPT_DIR/test-helpers.sh" source "$SCRIPT_DIR/test-helpers.sh"
CLAUDE_PROMPT_TIMEOUT="${CLAUDE_PROMPT_TIMEOUT:-90}"
echo "=== Test: subagent-driven-development skill ===" echo "=== Test: subagent-driven-development skill ==="
echo "" echo ""
# Test 1: Verify skill can be loaded # Test 1: Verify skill can be loaded
echo "Test 1: Skill loading..." echo "Test 1: Skill loading..."
output=$(run_claude "What is the subagent-driven-development skill? Describe its key steps briefly." "$CLAUDE_PROMPT_TIMEOUT") output=$(run_claude "What is the subagent-driven-development skill? Describe its key steps briefly." 30)
if assert_contains "$output" "subagent-driven-development\|Subagent-Driven Development\|Subagent Driven" "Skill is recognized"; then if assert_contains "$output" "subagent-driven-development\|Subagent-Driven Development\|Subagent Driven" "Skill is recognized"; then
: # pass : # pass
@@ -39,11 +37,9 @@ echo ""
# Test 2: Verify skill describes correct workflow order # Test 2: Verify skill describes correct workflow order
echo "Test 2: Workflow ordering..." echo "Test 2: Workflow ordering..."
output=$(run_claude "In the subagent-driven-development skill, what comes first: spec compliance review or code quality review? Answer using exactly this structure: output=$(run_claude "In the subagent-driven-development skill, what comes first: spec compliance review or code quality review? Be specific about the order." 30)
First: <review type>
Second: <review type>" "$CLAUDE_PROMPT_TIMEOUT")
if assert_order "$output" "First:.*spec.*compliance" "Second:.*code.*quality" "Spec compliance before code quality"; then if assert_order "$output" "spec.*compliance" "code.*quality" "Spec compliance before code quality"; then
: # pass : # pass
else else
exit 1 exit 1
@@ -54,17 +50,15 @@ echo ""
# Test 3: Verify self-review is mentioned # Test 3: Verify self-review is mentioned
echo "Test 3: Self-review requirement..." echo "Test 3: Self-review requirement..."
output=$(run_claude "Does the subagent-driven-development skill require implementers to self-review before handoff, and can self-review replace the external reviews? Answer using exactly this structure: output=$(run_claude "Does the subagent-driven-development skill require implementers to do self-review? What should they check?" 30)
Self-review required: <yes or no>
Self-review replaces external review: <yes or no>" "$CLAUDE_PROMPT_TIMEOUT")
if assert_contains "$output" "Self-review required:.*yes" "Mentions self-review"; then if assert_contains "$output" "self-review\|self review" "Mentions self-review"; then
: # pass : # pass
else else
exit 1 exit 1
fi fi
if assert_contains "$output" "Self-review replaces external review:.*no" "Self-review does not replace external review"; then if assert_contains "$output" "completeness\|Completeness" "Checks completeness"; then
: # pass : # pass
else else
exit 1 exit 1
@@ -75,7 +69,7 @@ echo ""
# Test 4: Verify plan is read once # Test 4: Verify plan is read once
echo "Test 4: Plan reading efficiency..." echo "Test 4: Plan reading efficiency..."
output=$(run_claude "In subagent-driven-development, how many times should the controller read the plan file? When does this happen?" "$CLAUDE_PROMPT_TIMEOUT") output=$(run_claude "In subagent-driven-development, how many times should the controller read the plan file? When does this happen?" 30)
if assert_contains "$output" "once\|one time\|single" "Read plan once"; then if assert_contains "$output" "once\|one time\|single" "Read plan once"; then
: # pass : # pass
@@ -94,7 +88,7 @@ echo ""
# Test 5: Verify spec compliance reviewer is skeptical # Test 5: Verify spec compliance reviewer is skeptical
echo "Test 5: Spec compliance reviewer mindset..." echo "Test 5: Spec compliance reviewer mindset..."
output=$(run_claude "What is the spec compliance reviewer's attitude toward the implementer's report in subagent-driven-development?" "$CLAUDE_PROMPT_TIMEOUT") output=$(run_claude "What is the spec compliance reviewer's attitude toward the implementer's report in subagent-driven-development?" 30)
if assert_contains "$output" "not trust\|don't trust\|skeptical\|verify.*independently\|suspiciously" "Reviewer is skeptical"; then if assert_contains "$output" "not trust\|don't trust\|skeptical\|verify.*independently\|suspiciously" "Reviewer is skeptical"; then
: # pass : # pass
@@ -113,7 +107,7 @@ echo ""
# Test 6: Verify review loops # Test 6: Verify review loops
echo "Test 6: Review loop requirements..." echo "Test 6: Review loop requirements..."
output=$(run_claude "In subagent-driven-development, what happens if a reviewer finds issues? Is it a one-time review or a loop?" "$CLAUDE_PROMPT_TIMEOUT") output=$(run_claude "In subagent-driven-development, what happens if a reviewer finds issues? Is it a one-time review or a loop?" 30)
if assert_contains "$output" "loop\|again\|repeat\|until.*approved\|until.*compliant" "Review loops mentioned"; then if assert_contains "$output" "loop\|again\|repeat\|until.*approved\|until.*compliant" "Review loops mentioned"; then
: # pass : # pass
@@ -132,9 +126,7 @@ echo ""
# Test 7: Verify full task text is provided # Test 7: Verify full task text is provided
echo "Test 7: Task context provision..." echo "Test 7: Task context provision..."
output=$(run_claude "In subagent-driven-development, how does the controller provide task information to the implementer subagent? Answer using exactly this structure: output=$(run_claude "In subagent-driven-development, how does the controller provide task information to the implementer subagent? Does it make them read a file or provide it directly?" 30)
Controller provides: <directly or by file>
Implementer must read plan file: <yes or no>" "$CLAUDE_PROMPT_TIMEOUT")
if assert_contains "$output" "provide.*directly\|full.*text\|paste\|include.*prompt" "Provides text directly"; then if assert_contains "$output" "provide.*directly\|full.*text\|paste\|include.*prompt" "Provides text directly"; then
: # pass : # pass
@@ -142,7 +134,7 @@ else
exit 1 exit 1
fi fi
if assert_contains "$output" "Implementer must read plan file:.*no" "Doesn't make subagent read file"; then if assert_not_contains "$output" "read.*file\|open.*file" "Doesn't make subagent read file"; then
: # pass : # pass
else else
exit 1 exit 1
@@ -153,7 +145,7 @@ echo ""
# Test 8: Verify worktree requirement # Test 8: Verify worktree requirement
echo "Test 8: Worktree requirement..." echo "Test 8: Worktree requirement..."
output=$(run_claude "What workflow skills are required before using subagent-driven-development? List any prerequisites or required skills." "$CLAUDE_PROMPT_TIMEOUT") output=$(run_claude "What workflow skills are required before using subagent-driven-development? List any prerequisites or required skills." 30)
if assert_contains "$output" "using-git-worktrees\|worktree" "Mentions worktree requirement"; then if assert_contains "$output" "using-git-worktrees\|worktree" "Mentions worktree requirement"; then
: # pass : # pass
@@ -166,7 +158,7 @@ echo ""
# Test 9: Verify main branch warning # Test 9: Verify main branch warning
echo "Test 9: Main branch red flag..." echo "Test 9: Main branch red flag..."
output=$(run_claude "In subagent-driven-development, is it okay to start implementation directly on the main branch?" "$CLAUDE_PROMPT_TIMEOUT") output=$(run_claude "In subagent-driven-development, is it okay to start implementation directly on the main branch?" 30)
if assert_contains "$output" "worktree\|feature.*branch\|not.*main\|never.*main\|avoid.*main\|don't.*main\|consent\|permission" "Warns against main branch"; then if assert_contains "$output" "worktree\|feature.*branch\|not.*main\|never.*main\|avoid.*main\|don't.*main\|consent\|permission" "Warns against main branch"; then
: # pass : # pass

View File

@@ -1,69 +0,0 @@
#!/usr/bin/env bash
# Regression check: Superpowers should not route new worktrees through the old
# global worktree directory.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
USING_SKILL="$REPO_ROOT/skills/using-git-worktrees/SKILL.md"
FINISHING_SKILL="$REPO_ROOT/skills/finishing-a-development-branch/SKILL.md"
ROTOTILL_SPEC="$REPO_ROOT/docs/superpowers/specs/2026-04-06-worktree-rototill-design.md"
ROTOTILL_PLAN="$REPO_ROOT/docs/superpowers/plans/2026-04-06-worktree-rototill.md"
failures=0
assert_contains() {
local file="$1"
local pattern="$2"
local label="$3"
if grep -Fq "$pattern" "$file"; then
echo " [PASS] $label"
else
echo " [FAIL] $label"
echo " Expected to find: $pattern"
echo " In file: $file"
failures=$((failures + 1))
fi
}
assert_not_contains() {
local file="$1"
local pattern="$2"
local label="$3"
if grep -Fq "$pattern" "$file"; then
echo " [FAIL] $label"
echo " Did not expect to find: $pattern"
echo " In file: $file"
failures=$((failures + 1))
else
echo " [PASS] $label"
fi
}
echo "=== Worktree Path Policy Test ==="
echo ""
assert_not_contains "$USING_SKILL" "~/.config/superpowers/worktrees" "using-git-worktrees does not mention old global path"
assert_not_contains "$USING_SKILL" "global legacy" "using-git-worktrees does not use unclear global legacy shorthand"
assert_not_contains "$USING_SKILL" "Global path" "using-git-worktrees has no global path quick-reference row"
assert_contains "$USING_SKILL" 'default to `.worktrees/` at the project root' "using-git-worktrees defaults new manual worktrees to .worktrees/"
assert_not_contains "$FINISHING_SKILL" "~/.config/superpowers/worktrees" "finishing-a-development-branch does not treat old global path as owned"
assert_contains "$FINISHING_SKILL" '`.worktrees/` or `worktrees/`' "finishing-a-development-branch keeps project-local cleanup ownership"
assert_not_contains "$ROTOTILL_SPEC" "~/.config/superpowers/worktrees" "rototill spec does not preserve old global path policy"
assert_not_contains "$ROTOTILL_PLAN" "~/.config/superpowers/worktrees" "rototill plan does not preserve old global path policy"
assert_not_contains "$ROTOTILL_PLAN" "legacy path compat" "rototill plan does not advertise legacy path compatibility"
echo ""
if [ "$failures" -gt 0 ]; then
echo "STATUS: FAILED ($failures failures)"
exit 1
fi
echo "STATUS: PASSED"