mirror of
https://github.com/obra/superpowers.git
synced 2026-05-11 03:29:04 +08:00
Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
This commit is contained in:
committed by
Drew Ritter
parent
2e46e9590d
commit
3b412a3836
52
evals/scenarios/triggering-dispatching-parallel-agents.yaml
Normal file
52
evals/scenarios/triggering-dispatching-parallel-agents.yaml
Normal file
@@ -0,0 +1,52 @@
|
||||
scenario: triggering-dispatching-parallel-agents
|
||||
description: >
|
||||
Lifted from superpowers/tests/skill-triggering/prompts/dispatching-parallel-agents.txt.
|
||||
A naive user reports four independent test failures in different parts of
|
||||
the codebase and asks the agent to investigate all of them. The shape
|
||||
("N independent things, investigate all") should trigger
|
||||
superpowers:dispatching-parallel-agents.
|
||||
user_posture: naive
|
||||
|
||||
setup:
|
||||
helpers:
|
||||
- create_base_repo
|
||||
assertions:
|
||||
- "git rev-parse --is-inside-work-tree"
|
||||
- "git branch --show-current | grep -q main"
|
||||
|
||||
turns:
|
||||
- intent: >
|
||||
Ask the agent to investigate four independent test failures. Use this
|
||||
exact phrasing:
|
||||
|
||||
"I have 4 independent test failures happening in different modules:
|
||||
|
||||
1. tests/auth/login.test.ts - 'should redirect after login' is failing
|
||||
2. tests/api/users.test.ts - 'should return user list' returns 500
|
||||
3. tests/components/Button.test.tsx - snapshot mismatch
|
||||
4. tests/utils/date.test.ts - timezone handling broken
|
||||
|
||||
These are unrelated issues in different parts of the codebase. Can you
|
||||
investigate all of them?"
|
||||
|
||||
Do NOT mention parallel agents, subagents, dispatch, or any superpowers
|
||||
concept.
|
||||
- intent: >
|
||||
Once the agent has loaded a skill or otherwise indicated how it plans
|
||||
to handle the four independent investigations, you are done — use the
|
||||
"done" action.
|
||||
|
||||
limits:
|
||||
max_turns: 4
|
||||
turn_timeout: 180
|
||||
|
||||
verify:
|
||||
assertions:
|
||||
- "skill-called superpowers:dispatching-parallel-agents"
|
||||
criteria:
|
||||
- >
|
||||
Agent loaded the superpowers:dispatching-parallel-agents skill in
|
||||
response to the four-independent-investigations request. Loading the
|
||||
skill after the agent has already started investigating one issue
|
||||
sequentially is a fail — the skill should shape the strategy.
|
||||
observe: true
|
||||
Reference in New Issue
Block a user