From 2d4cdea2bb38dd0b2db8302a74fd1e935a90ff22 Mon Sep 17 00:00:00 2001 From: Drew Ritter Date: Wed, 6 May 2026 14:55:14 -0700 Subject: [PATCH] evals: drop drill source marker --- .../plans/2026-05-06-lift-drill-into-evals.md | 40 ++++++------------- ...2026-05-06-lift-drill-into-evals-design.md | 6 +-- evals/.drill-source-sha | 1 - 3 files changed, 16 insertions(+), 31 deletions(-) delete mode 100644 evals/.drill-source-sha diff --git a/docs/superpowers/plans/2026-05-06-lift-drill-into-evals.md b/docs/superpowers/plans/2026-05-06-lift-drill-into-evals.md index 48fb579a..b1c01ca2 100644 --- a/docs/superpowers/plans/2026-05-06-lift-drill-into-evals.md +++ b/docs/superpowers/plans/2026-05-06-lift-drill-into-evals.md @@ -53,8 +53,7 @@ Expected output begins with whatever commit `origin/dev` points to (currently `b ## Task 2: Capture drill SHA at copy time -**Files:** -- Create: `evals/.drill-source-sha` (in next task; this task just records the value) +**Files:** none (records the value for the lift commit message) - [ ] **Step 1: Get the current drill HEAD SHA** @@ -85,7 +84,6 @@ echo "DRILL_SHA=$DRILL_SHA" # write this down for use in Task 3 **Files:** - Create: `evals/` (entire directory tree from drill, minus excludes) -- Create: `evals/.drill-source-sha` (records the source SHA) - [ ] **Step 1: Verify source and destination paths** @@ -127,14 +125,12 @@ find evals -name '*.egg-info' -type d Expected: every command returns no output. If any returns a path, manually `rm -rf` it before continuing. -- [ ] **Step 4: Write the SHA file** +- [ ] **Step 4: Confirm the source SHA for the commit message** ```bash cd /Users/jesse/Documents/GitHub/superpowers/drill DRILL_SHA=$(git rev-parse HEAD) -cd /Users/jesse/Documents/GitHub/superpowers/superpowers -echo "$DRILL_SHA" > evals/.drill-source-sha -cat evals/.drill-source-sha +echo "$DRILL_SHA" ``` Expected: the SHA from Task 2 step 1. @@ -151,7 +147,7 @@ Expected output starts with `A evals/...` lines listing many added files. Many - [ ] **Step 6: Commit** ```bash -DRILL_SHA=$(cat evals/.drill-source-sha) +: "${DRILL_SHA:?Set DRILL_SHA from Task 2 before committing}" git commit -m "$(cat < ./.drill-source-sha -``` - -(One additional file in evals: the SHA pin. No other differences.) +Expected: no output. - [ ] **Step 4: Per-file checksum verification** @@ -259,17 +248,14 @@ You are verifying a verbatim copy of the drill repo at Verify: -1. The file -/Users/jesse/Documents/GitHub/superpowers/superpowers/evals/.drill-source-sha -exists and contains the SHA reported by: +1. The lift commit message records the SHA reported by: cd /Users/jesse/Documents/GitHub/superpowers/drill && git rev-parse HEAD 2. None of these excluded paths exist under evals/: .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. 3. Every non-excluded file in drill has a SHA-256-identical -counterpart in evals/, and there is no extra file in evals/ except -.drill-source-sha. +counterpart in evals/, and there are no extra files in evals/. 4. The pyproject.toml, uv.lock, scenarios/*.yaml, backends/*.yaml, setup_helpers/*.py, drill/*.py, prompts/*.md, fixtures/, bin/, and @@ -1247,7 +1233,7 @@ Run: git log --oneline dev..HEAD; git diff dev..HEAD --stat Look hard at: 1. Did the rsync-with-excludes actually exclude what it claimed? (find evals -name '.git' -type d should return nothing) -2. Does evals/.drill-source-sha point at a real commit in obra/drill? +2. Does the lift commit message point at a real commit in obra/drill? 3. Does the SUPERPOWERS_ROOT helper actually default correctly when the env var is unset? (cd evals && unset SUPERPOWERS_ROOT && uv run drill list — does it work?) @@ -1305,7 +1291,7 @@ Drill — the standalone Python skill-compliance benchmark at obra/drill — is ## What does this PR change? -- Lifts the obra/drill repo (at SHA ``) into superpowers as `evals/`, with explicit rsync excludes (.git, .venv, results, .env, __pycache__, *.egg-info, .private-journal). +- Lifts the obra/drill repo into superpowers as `evals/`, with explicit rsync excludes (.git, .venv, results, .env, __pycache__, *.egg-info, .private-journal). The lift commit records the source SHA. - Adds a `_set_superpowers_root_default()` helper to drill/cli.py so SUPERPOWERS_ROOT defaults to the parent of evals/ — no manual env-var setup. - Drops SUPERPOWERS_ROOT from required_env in codex.yaml/gemini.yaml (the helper supplies it). Claude*.yaml keep it because they interpolate ${SUPERPOWERS_ROOT} into --plugin-dir args. - Deletes redundant bash tests under tests/skill-triggering/, tests/explicit-skill-requests/, tests/subagent-driven-dev/, and tests/claude-code/ — gated per-file by a subagent that compared each bash test's assertions to its drill scenario's verify block. Anything not 100% covered was kept. @@ -1377,12 +1363,12 @@ Expected: browser opens to the new PR. Take a screenshot or note the URL for fol ## Verification checklist (run after Task 15) - [ ] `git log --oneline dev..HEAD` shows the expected commits in order -- [ ] `evals/.drill-source-sha` matches the SHA recorded in the lift commit message +- [ ] The lift commit message records the source SHA - [ ] `find evals -name '.git' -type d` returns no output - [ ] `cd evals && unset SUPERPOWERS_ROOT && uv run pytest` passes - [ ] `cd evals && unset SUPERPOWERS_ROOT && uv run drill list` returns scenarios - [ ] `cd evals && unset SUPERPOWERS_ROOT && uv run drill run triggering-test-driven-development -b claude` passes - [ ] `tests/brainstorm-server/server.test.js` still passes (regression gate for non-LLM tests) - [ ] `git diff dev..HEAD docs/superpowers/plans/2026-04-06-worktree-rototill.md docs/superpowers/plans/2026-03-23-codex-app-compatibility.md RELEASE-NOTES.md` shows annotations only, no path rewrites -- [ ] `cd ../drill && git log --oneline -1` shows obra/drill is unchanged from the recorded source SHA +- [ ] `cd ../drill && git log --oneline -1` shows obra/drill is unchanged from the source SHA recorded in the lift commit - [ ] PR body lists the post-merge archival action item diff --git a/docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md b/docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md index be69e873..b3e63e5a 100644 --- a/docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md +++ b/docs/superpowers/specs/2026-05-06-lift-drill-into-evals-design.md @@ -106,7 +106,7 @@ Every change in the implementation plan gets cross-checked by an independent sub | Change category | Subagent verification | |----------------|----------------------| | Each bash-test deletion | Dispatch a subagent with: (a) the bash test file content, (b) the candidate drill scenario YAML, (c) the prompt: *"List every assertion the bash test makes. List every verify entry in the drill scenario. For each bash assertion, find a matching drill check or report it as unmatched. Output a per-assertion table."* The subagent's output is the gate — only delete if every bash assertion has a match. | -| Initial `evals/` copy | Subagent verifies: (a) drill SHA being copied is recorded in commit message and `evals/.drill-source-sha` (a checked-in file) so divergence is detectable; (b) **per-file SHA-256 checksum** matches drill repo for every file (not just file count); (c) excluded paths (`.git/`, `.venv/`, `results/`, `.env`, `__pycache__/`, `*.egg-info/`, any `.private-journal/`) are absent from `evals/`; (d) all backend YAMLs reference paths that exist post-move; (e) `pyproject.toml`, `uv.lock`, `.gitignore` are intact. | +| Initial `evals/` copy | Subagent verifies: (a) drill SHA being copied is recorded in the lift commit message so provenance is auditable; (b) **per-file SHA-256 checksum** matches drill repo for every file (not just file count); (c) excluded paths (`.git/`, `.venv/`, `results/`, `.env`, `__pycache__/`, `*.egg-info/`, any `.private-journal/`) are absent from `evals/`; (d) all backend YAMLs reference paths that exist post-move; (e) `pyproject.toml`, `uv.lock`, `.gitignore` are intact. | | Drill's own pytest suite | Subagent runs `cd evals && uv run pytest` after the path-default change. Drill ships its own pytest suite at `evals/tests/` including `test_backend.py` which exercises `SUPERPOWERS_ROOT` env-var behavior — these tests must update to match the helper and continue to pass. | | Reference scrubbing after deletion | Subagent greps the entire superpowers tree (excluding `node_modules/`, `.venv/`, and `evals/`) for references to deleted bash test paths. Search targets: `docs/`, `docs/superpowers/plans/`, `RELEASE-NOTES.md`, `CLAUDE.md`, `GEMINI.md`, `AGENTS.md`, `README.md`, `.github/`, `scripts/`, `.opencode/INSTALL.md`, `.codex-plugin/INSTALL.md`, `lefthook.yml`. Any hit is either updated or surfaces a missed dependency. | | Path defaults change (`SUPERPOWERS_ROOT` default) | Subagent runs at least one cheap drill scenario after the path changes (e.g., `triggering-test-driven-development`) and confirms it still passes. Real validation, not just code review. | @@ -149,7 +149,7 @@ Each step is a separate commit (or small group of commits). Step 2 is the bigges 1. Branch off `dev` (f/evals-lift) 2. Copy drill repo into evals/ (single commit, easy to revert) - ├─ Record drill SHA at copy time → commit message + evals/.drill-source-sha + ├─ Record drill SHA at copy time → commit message ├─ Use `rsync -a --exclude=.git --exclude=.venv --exclude=results │ --exclude=.env --exclude=__pycache__ --exclude='*.egg-info' │ --exclude=.private-journal /path/to/drill/ evals/` @@ -220,7 +220,7 @@ The implementation plan must show: - All non-excluded drill source files present at `evals/` after step 2 (subagent **per-file SHA-256 checksum diff** vs `obra/drill@`). - Excluded paths (`.git/`, `.venv/`, `results/`, `.env`, `__pycache__/`, `*.egg-info/`, `.private-journal/`) absent from `evals/`. -- `evals/.drill-source-sha` matches the SHA referenced in the step-2 commit message. +- The step-2 commit message records the drill source SHA. - `cd evals && uv sync` succeeds without `SUPERPOWERS_ROOT` set. - `cd evals && uv run pytest` passes (drill's own pytest suite). - `cd evals && uv run drill list` returns the same scenario count as the standalone drill repo at the recorded SHA. diff --git a/evals/.drill-source-sha b/evals/.drill-source-sha deleted file mode 100644 index 94c39314..00000000 --- a/evals/.drill-source-sha +++ /dev/null @@ -1 +0,0 @@ -013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b