mirror of
https://github.com/obra/superpowers.git
synced 2026-05-10 02:59:04 +08:00
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
64 lines
2.8 KiB
YAML
64 lines
2.8 KiB
YAML
scenario: gemini-subagent-tool-mapping-comprehension
|
|
description: >
|
|
Measures whether a Gemini CLI agent correctly reports the Superpowers Gemini
|
|
mapping for Claude Code Task subagent dispatch, including parallel dispatch.
|
|
user_posture: spec-aware
|
|
|
|
setup:
|
|
helpers:
|
|
- create_base_repo
|
|
assertions:
|
|
- "git rev-parse --is-inside-work-tree"
|
|
- "test -f GEMINI.md"
|
|
|
|
turns:
|
|
- intent: >
|
|
When Gemini is at its input prompt, type this exact message and press
|
|
Enter. Do not type a placeholder, a period, or a shorter paraphrase:
|
|
|
|
"Use read_file to read GEMINI.md. Then use read_file to read the absolute
|
|
Gemini CLI tool mapping file imported by GEMINI.md. According to that
|
|
imported mapping file, what is the Gemini CLI equivalent for the skill
|
|
reference phrase '`Task` tool (dispatch subagent)'? Do not perform any
|
|
subagent workflow. Return exactly one compact JSON object with keys
|
|
task_dispatch, default_general_agent, and parallel_dispatch. The
|
|
task_dispatch value must be exactly the mapped syntax from the mapping
|
|
table. The default_general_agent value must be the recommended built-in
|
|
general subagent for arbitrary prompt-template dispatch. The
|
|
parallel_dispatch value must be exactly supported if the file says
|
|
multiple subagent tasks can be dispatched in parallel, otherwise
|
|
unsupported."
|
|
- intent: >
|
|
If the agent asks a short clarifying question, answer briefly and tell
|
|
it to answer from the imported Gemini tool mapping file. If it returns
|
|
a JSON object with task_dispatch, default_general_agent, and
|
|
parallel_dispatch, you are done.
|
|
|
|
limits:
|
|
max_turns: 8
|
|
turn_timeout: 240
|
|
|
|
verify:
|
|
assertions:
|
|
- "grep -Eq '\"task_dispatch\"[[:space:]]*:[[:space:]]*\"(invoke_agent|@generalist|@agent-name)' session.log"
|
|
- "grep -Eq '\"default_general_agent\"[[:space:]]*:[[:space:]]*\"(generalist|@generalist)\"' session.log"
|
|
- "grep -Eq '\"parallel_dispatch\"[[:space:]]*:[[:space:]]*\"supported\"' session.log"
|
|
- "! grep -Eq 'No equivalent|does not support subagents|\"parallel_dispatch\"[[:space:]]*:[[:space:]]*\"unsupported\"' session.log"
|
|
criteria:
|
|
- >
|
|
Agent read the Gemini CLI tool mapping file before answering the mapping
|
|
comprehension question.
|
|
- >
|
|
Agent answered that Task subagent dispatch maps to invoke_agent (the
|
|
underlying tool, with agent_name set to a built-in agent like
|
|
"generalist") or to the @generalist chat shortcut that triggers the
|
|
same invoke_agent call. Either form is correct per Gemini CLI's source
|
|
and docs.
|
|
- >
|
|
Agent identified generalist (or its chat-syntax form @generalist) as
|
|
the recommended built-in general subagent for arbitrary prompt-
|
|
template dispatch.
|
|
- >
|
|
Agent reported parallel subagent dispatch as supported.
|
|
observe: true
|