Files
superpowers/evals/scenarios/triggering-systematic-debugging.yaml
Jesse Vincent 3b412a3836 Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
2026-05-06 15:47:39 -07:00

52 lines
1.7 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
scenario: triggering-systematic-debugging
description: >
Lifted from superpowers/tests/skill-triggering/prompts/systematic-debugging.txt.
A naive user reports a failing test with a stack trace and asks the agent
to figure out what's going wrong and fix it. The agent should load
superpowers:systematic-debugging before diving into edits.
user_posture: naive
setup:
helpers:
- create_base_repo
assertions:
- "git rev-parse --is-inside-work-tree"
- "git branch --show-current | grep -q main"
turns:
- intent: >
Open the conversation by reporting a failing test. Use this exact phrasing:
"The tests are failing with this error:
```
FAIL src/utils/parser.test.ts
● Parser should handle nested objects
TypeError: Cannot read property 'value' of undefined
at parse (src/utils/parser.ts:42:18)
at Object.<anonymous> (src/utils/parser.test.ts:28:20)
```
Can you figure out what's going wrong and fix it?"
Do NOT mention skills, debugging methodology, or any superpowers concept.
- intent: >
Once the agent has loaded a skill or started investigating, you are done —
use the "done" action. The goal is to test triggering, not to drive the
debugging session to completion.
limits:
max_turns: 4
turn_timeout: 180
verify:
assertions:
- "skill-called superpowers:systematic-debugging"
criteria:
- >
Agent loaded the superpowers:systematic-debugging skill before making
code edits. Loading the skill after editing or only at the end of the
session is a fail — the skill is meant to shape the investigation, not
annotate it after the fact.
observe: true