mirror of
https://github.com/obra/superpowers.git
synced 2026-05-09 02:29:05 +08:00
Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
This commit is contained in:
committed by
Drew Ritter
parent
2e46e9590d
commit
3b412a3836
@@ -0,0 +1,63 @@
|
||||
scenario: gemini-subagent-tool-mapping-comprehension
|
||||
description: >
|
||||
Measures whether a Gemini CLI agent correctly reports the Superpowers Gemini
|
||||
mapping for Claude Code Task subagent dispatch, including parallel dispatch.
|
||||
user_posture: spec-aware
|
||||
|
||||
setup:
|
||||
helpers:
|
||||
- create_base_repo
|
||||
assertions:
|
||||
- "git rev-parse --is-inside-work-tree"
|
||||
- "test -f GEMINI.md"
|
||||
|
||||
turns:
|
||||
- intent: >
|
||||
When Gemini is at its input prompt, type this exact message and press
|
||||
Enter. Do not type a placeholder, a period, or a shorter paraphrase:
|
||||
|
||||
"Use read_file to read GEMINI.md. Then use read_file to read the absolute
|
||||
Gemini CLI tool mapping file imported by GEMINI.md. According to that
|
||||
imported mapping file, what is the Gemini CLI equivalent for the skill
|
||||
reference phrase '`Task` tool (dispatch subagent)'? Do not perform any
|
||||
subagent workflow. Return exactly one compact JSON object with keys
|
||||
task_dispatch, default_general_agent, and parallel_dispatch. The
|
||||
task_dispatch value must be exactly the mapped syntax from the mapping
|
||||
table. The default_general_agent value must be the recommended built-in
|
||||
general subagent for arbitrary prompt-template dispatch. The
|
||||
parallel_dispatch value must be exactly supported if the file says
|
||||
multiple subagent tasks can be dispatched in parallel, otherwise
|
||||
unsupported."
|
||||
- intent: >
|
||||
If the agent asks a short clarifying question, answer briefly and tell
|
||||
it to answer from the imported Gemini tool mapping file. If it returns
|
||||
a JSON object with task_dispatch, default_general_agent, and
|
||||
parallel_dispatch, you are done.
|
||||
|
||||
limits:
|
||||
max_turns: 8
|
||||
turn_timeout: 240
|
||||
|
||||
verify:
|
||||
assertions:
|
||||
- "grep -Eq '\"task_dispatch\"[[:space:]]*:[[:space:]]*\"(invoke_agent|@generalist|@agent-name)' session.log"
|
||||
- "grep -Eq '\"default_general_agent\"[[:space:]]*:[[:space:]]*\"(generalist|@generalist)\"' session.log"
|
||||
- "grep -Eq '\"parallel_dispatch\"[[:space:]]*:[[:space:]]*\"supported\"' session.log"
|
||||
- "! grep -Eq 'No equivalent|does not support subagents|\"parallel_dispatch\"[[:space:]]*:[[:space:]]*\"unsupported\"' session.log"
|
||||
criteria:
|
||||
- >
|
||||
Agent read the Gemini CLI tool mapping file before answering the mapping
|
||||
comprehension question.
|
||||
- >
|
||||
Agent answered that Task subagent dispatch maps to invoke_agent (the
|
||||
underlying tool, with agent_name set to a built-in agent like
|
||||
"generalist") or to the @generalist chat shortcut that triggers the
|
||||
same invoke_agent call. Either form is correct per Gemini CLI's source
|
||||
and docs.
|
||||
- >
|
||||
Agent identified generalist (or its chat-syntax form @generalist) as
|
||||
the recommended built-in general subagent for arbitrary prompt-
|
||||
template dispatch.
|
||||
- >
|
||||
Agent reported parallel subagent dispatch as supported.
|
||||
observe: true
|
||||
Reference in New Issue
Block a user