mirror of
https://github.com/obra/superpowers.git
synced 2026-05-11 11:39:04 +08:00
Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
This commit is contained in:
53
evals/scenarios/codex-subagent-wait-mapping.yaml
Normal file
53
evals/scenarios/codex-subagent-wait-mapping.yaml
Normal file
@@ -0,0 +1,53 @@
|
||||
scenario: codex-subagent-wait-mapping
|
||||
description: >
|
||||
Measures whether a Codex agent follows Superpowers' Codex tool mapping
|
||||
when translating Claude Code Task subagent result collection.
|
||||
user_posture: spec-aware
|
||||
|
||||
setup:
|
||||
helpers:
|
||||
- create_base_repo
|
||||
assertions:
|
||||
- "git rev-parse --is-inside-work-tree"
|
||||
- "test -f .agents/skills/superpowers/using-superpowers/references/codex-tools.md"
|
||||
|
||||
turns:
|
||||
- intent: >
|
||||
When Codex is at its input prompt, type this exact message and press
|
||||
Enter. Do not type a placeholder, a period, or a shorter paraphrase:
|
||||
|
||||
"Read .agents/skills/superpowers/using-superpowers/references/codex-tools.md.
|
||||
Treat that file as authoritative for translating Claude Code tool
|
||||
names to Codex tool names. Do not use your prior knowledge of Codex
|
||||
tool names when the file gives a mapping. Now simulate this Claude
|
||||
Code workflow using the mapped Codex tools: Task tool dispatches a
|
||||
subagent; Task returns result; Task completes automatically. The
|
||||
subagent's whole job is to answer exactly PR963_OK. Wait for the
|
||||
subagent result using the mapped equivalent for 'Task returns result',
|
||||
then report the result. Do not edit files."
|
||||
- intent: >
|
||||
If the agent asks a short clarifying question, answer briefly and
|
||||
tell it to follow the mapping file. If it reports PR963_OK or says
|
||||
it cannot complete the mapped wait step, you are done.
|
||||
|
||||
limits:
|
||||
max_turns: 8
|
||||
turn_timeout: 240
|
||||
|
||||
verify:
|
||||
assertions:
|
||||
- "tool-called spawn_agent"
|
||||
- "tool-called wait_agent"
|
||||
- "tool-not-called wait"
|
||||
- "tool-before spawn_agent wait_agent"
|
||||
- "grep -q PR963_OK session.log"
|
||||
criteria:
|
||||
- >
|
||||
Agent read the Codex tool mapping file before choosing which tool
|
||||
waits for a spawned subagent result.
|
||||
- >
|
||||
Agent used wait_agent, not wait, to collect the spawned subagent
|
||||
result.
|
||||
- >
|
||||
Agent successfully reported the spawned subagent sentinel PR963_OK.
|
||||
observe: true
|
||||
Reference in New Issue
Block a user