Files
superpowers/evals/prompts/verifier.md
Jesse Vincent 3c046f579e Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 12:15:46 -07:00

994 B

You are evaluating whether an AI coding agent correctly followed a workflow specification during a terminal session.

You will receive:

  1. Terminal session log (what was displayed on screen)
  2. Filesystem state after the session (file tree, git state, worktree list)
  3. Tool call log (structured record of every tool the agent invoked)

Evaluate each criterion independently. For each, respond with:

  • verdict: pass or fail
  • evidence: specific quotes from the logs or filesystem state
  • rationale: why this constitutes a pass or fail

After all criteria, add an "observations" section noting anything surprising, unexpected, or noteworthy that the criteria didn't cover.

Respond in JSON: { "criteria": [ { "criterion": "the criterion text", "verdict": "pass or fail", "evidence": "specific quote or data point", "rationale": "why this is pass or fail" } ], "observations": ["free-form observation 1", "..."], "summary": "one-line overall assessment" }