mirror of
https://github.com/obra/superpowers.git
synced 2026-05-09 18:49:04 +08:00
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
52 lines
1.6 KiB
YAML
52 lines
1.6 KiB
YAML
scenario: triggering-writing-plans
|
|
description: >
|
|
Lifted from superpowers/tests/skill-triggering/prompts/writing-plans.txt.
|
|
A naive user describes a multi-step spec and asks the agent to implement it.
|
|
Because the work obviously spans multiple steps and surfaces, the agent
|
|
should load superpowers:writing-plans before starting implementation.
|
|
user_posture: naive
|
|
|
|
setup:
|
|
helpers:
|
|
- create_base_repo
|
|
assertions:
|
|
- "git rev-parse --is-inside-work-tree"
|
|
- "git branch --show-current | grep -q main"
|
|
|
|
turns:
|
|
- intent: >
|
|
Hand the agent a spec for a new auth system that obviously requires
|
|
multiple steps. Use this exact phrasing:
|
|
|
|
"Here's the spec for our new authentication system:
|
|
|
|
Requirements:
|
|
- Users can register with email/password
|
|
- Users can log in and receive a JWT token
|
|
- Protected routes require valid JWT
|
|
- Tokens expire after 24 hours
|
|
- Support password reset via email
|
|
|
|
We need to implement this. There are multiple steps involved - user
|
|
model, auth routes, middleware, email service integration."
|
|
|
|
Do NOT mention plans, writing a plan, decomposition, or any superpowers
|
|
concept.
|
|
- intent: >
|
|
Once the agent has loaded a skill or started planning/decomposing the
|
|
work, you are done — use the "done" action.
|
|
|
|
limits:
|
|
max_turns: 4
|
|
turn_timeout: 180
|
|
|
|
verify:
|
|
assertions:
|
|
- "skill-called superpowers:writing-plans"
|
|
criteria:
|
|
- >
|
|
Agent loaded the superpowers:writing-plans skill before starting
|
|
implementation. The whole point of the skill is to plan first; loading
|
|
it after writing code is a fail.
|
|
observe: true
|