Files
superpowers/evals/scenarios/triggering-writing-plans.yaml
Jesse Vincent 3b412a3836 Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 15:47:39 -07:00

52 lines
1.6 KiB
YAML

scenario: triggering-writing-plans
description: >
Lifted from superpowers/tests/skill-triggering/prompts/writing-plans.txt.
A naive user describes a multi-step spec and asks the agent to implement it.
Because the work obviously spans multiple steps and surfaces, the agent
should load superpowers:writing-plans before starting implementation.
user_posture: naive
setup:
helpers:
- create_base_repo
assertions:
- "git rev-parse --is-inside-work-tree"
- "git branch --show-current | grep -q main"
turns:
- intent: >
Hand the agent a spec for a new auth system that obviously requires
multiple steps. Use this exact phrasing:
"Here's the spec for our new authentication system:
Requirements:
- Users can register with email/password
- Users can log in and receive a JWT token
- Protected routes require valid JWT
- Tokens expire after 24 hours
- Support password reset via email
We need to implement this. There are multiple steps involved - user
model, auth routes, middleware, email service integration."
Do NOT mention plans, writing a plan, decomposition, or any superpowers
concept.
- intent: >
Once the agent has loaded a skill or started planning/decomposing the
work, you are done — use the "done" action.
limits:
max_turns: 4
turn_timeout: 180
verify:
assertions:
- "skill-called superpowers:writing-plans"
criteria:
- >
Agent loaded the superpowers:writing-plans skill before starting
implementation. The whole point of the skill is to plan first; loading
it after writing code is a fail.
observe: true