mirror of
https://github.com/obra/superpowers.git
synced 2026-05-09 02:29:05 +08:00
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
48 lines
1.5 KiB
YAML
48 lines
1.5 KiB
YAML
scenario: triggering-test-driven-development
|
|
description: >
|
|
Lifted from superpowers/tests/skill-triggering/prompts/test-driven-development.txt.
|
|
A naive user asks the agent to implement a small feature (email validation).
|
|
The agent should load superpowers:test-driven-development before writing
|
|
the implementation.
|
|
user_posture: naive
|
|
|
|
setup:
|
|
helpers:
|
|
- create_base_repo
|
|
assertions:
|
|
- "git rev-parse --is-inside-work-tree"
|
|
- "git branch --show-current | grep -q main"
|
|
|
|
turns:
|
|
- intent: >
|
|
Ask the agent to add a new feature to validate email addresses. Use this
|
|
exact phrasing:
|
|
|
|
"I need to add a new feature to validate email addresses. It should:
|
|
- Check that there's an @ symbol
|
|
- Check that there's at least one character before the @
|
|
- Check that there's a dot in the domain part
|
|
- Return true/false
|
|
|
|
Can you implement this?"
|
|
|
|
Do NOT mention TDD, tests-first, or any superpowers concept.
|
|
- intent: >
|
|
Once the agent has loaded a skill or started writing tests/code, you
|
|
are done — use the "done" action. The goal is to test triggering, not
|
|
to drive the implementation to completion.
|
|
|
|
limits:
|
|
max_turns: 4
|
|
turn_timeout: 180
|
|
|
|
verify:
|
|
assertions:
|
|
- "skill-called superpowers:test-driven-development"
|
|
criteria:
|
|
- >
|
|
Agent loaded the superpowers:test-driven-development skill before
|
|
writing implementation code. Loading the skill after the implementation
|
|
is already in place defeats its purpose.
|
|
observe: true
|