Files
superpowers/evals/scenarios/triggering-test-driven-development.yaml
Jesse Vincent 3c046f579e Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 12:15:46 -07:00

48 lines
1.5 KiB
YAML

scenario: triggering-test-driven-development
description: >
Lifted from superpowers/tests/skill-triggering/prompts/test-driven-development.txt.
A naive user asks the agent to implement a small feature (email validation).
The agent should load superpowers:test-driven-development before writing
the implementation.
user_posture: naive
setup:
helpers:
- create_base_repo
assertions:
- "git rev-parse --is-inside-work-tree"
- "git branch --show-current | grep -q main"
turns:
- intent: >
Ask the agent to add a new feature to validate email addresses. Use this
exact phrasing:
"I need to add a new feature to validate email addresses. It should:
- Check that there's an @ symbol
- Check that there's at least one character before the @
- Check that there's a dot in the domain part
- Return true/false
Can you implement this?"
Do NOT mention TDD, tests-first, or any superpowers concept.
- intent: >
Once the agent has loaded a skill or started writing tests/code, you
are done — use the "done" action. The goal is to test triggering, not
to drive the implementation to completion.
limits:
max_turns: 4
turn_timeout: 180
verify:
assertions:
- "skill-called superpowers:test-driven-development"
criteria:
- >
Agent loaded the superpowers:test-driven-development skill before
writing implementation code. Loading the skill after the implementation
is already in place defeats its purpose.
observe: true