Commit Graph

10 Commits

Author SHA1 Message Date
Jesse Vincent
7d06d7e4f0 evals: add pi backend 2026-05-07 11:11:18 -07:00
Drew Ritter
7f02ccd91b evals: use pre-commit hooks 2026-05-06 15:47:39 -07:00
Drew Ritter
35e42a16ce evals: add Gemini 2.5 Flash backend 2026-05-06 15:47:39 -07:00
Drew Ritter
58082d04f8 evals: drop drill source marker 2026-05-06 15:47:39 -07:00
Drew Ritter
3dc0ea6876 evals: remove unreleased wave scenarios 2026-05-06 15:47:39 -07:00
Jesse Vincent
0bf37499b4 Address adversarial review findings
- evals/README.md, evals/CLAUDE.md: fix uv install command from
  'uv sync --dev' to 'uv sync --extra dev'. Drill's pyproject.toml
  uses [project.optional-dependencies], so --dev is a no-op for
  pytest/ruff/ty; --extra dev is the correct invocation.
- tests/claude-code/run-skill-tests.sh: drop test-requesting-code-review.sh
  from integration_tests array (file deleted earlier in this branch).
- tests/claude-code/README.md: replace test-requesting-code-review.sh
  section with test-worktree-native-preference.sh (the worktree test
  is kept; the code-review test was lifted into drill).
- docs/testing.md, CLAUDE.md: remove "Copilot CLI" from the harness
  list. evals/backends/ has claude*, codex, gemini configs but no
  copilot.yaml, so the claim was unsupported.

Adversarial review credit: reviewer #2 found four legitimate issues
(uv-sync, run-skill-tests stale ref, README stale ref via #1, and
Copilot CLI fabrication); reviewer #1 found two distinct issues
(run-skill-tests + tests/claude-code/README.md). Reviewer #2 wins
this round.
2026-05-06 15:47:39 -07:00
Jesse Vincent
6f0adebe96 evals: drop SUPERPOWERS_ROOT setup step from README/CLAUDE
The cli.py helper now defaults the env var. Mention as override only.
2026-05-06 15:47:39 -07:00
Jesse Vincent
fd5b53cb85 evals: drop SUPERPOWERS_ROOT from codex/gemini required_env
These backends only read SUPERPOWERS_ROOT via engine.py/setup.py's
os.environ access, which the new cli.py default helper supplies
automatically. claude*.yaml keep SUPERPOWERS_ROOT in required_env
because they interpolate ${SUPERPOWERS_ROOT} into --plugin-dir args.
2026-05-06 15:47:39 -07:00
Jesse Vincent
be0357f98a evals: default SUPERPOWERS_ROOT to parent of evals/ if unset
Adds _set_superpowers_root_default() to drill/cli.py, called at
module import after load_dotenv(). PROJECT_ROOT resolves to evals/
post-lift; its parent is the superpowers repo root, which is the
correct value for SUPERPOWERS_ROOT.

Existing env values are respected as overrides via os.environ.setdefault.

Tests:
- helper sets default when var is unset
- helper does not override when var is already set
2026-05-06 15:47:39 -07:00
Jesse Vincent
3b412a3836 Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 15:47:39 -07:00