Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b

rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding
.git/, .venv/, results/, .env/, __pycache__/, *.egg-info/,
.private-journal/.

The drill repo is unaffected by this commit; archival is a separate
manual step after this PR merges.

Source SHA recorded at evals/.drill-source-sha for divergence
detection.
This commit is contained in:
Jesse Vincent
2026-05-06 12:15:46 -07:00
committed by Drew Ritter
parent 2e46e9590d
commit 3b412a3836
124 changed files with 13806 additions and 0 deletions

View File

@@ -0,0 +1,63 @@
scenario: gemini-subagent-tool-mapping-comprehension
description: >
Measures whether a Gemini CLI agent correctly reports the Superpowers Gemini
mapping for Claude Code Task subagent dispatch, including parallel dispatch.
user_posture: spec-aware
setup:
helpers:
- create_base_repo
assertions:
- "git rev-parse --is-inside-work-tree"
- "test -f GEMINI.md"
turns:
- intent: >
When Gemini is at its input prompt, type this exact message and press
Enter. Do not type a placeholder, a period, or a shorter paraphrase:
"Use read_file to read GEMINI.md. Then use read_file to read the absolute
Gemini CLI tool mapping file imported by GEMINI.md. According to that
imported mapping file, what is the Gemini CLI equivalent for the skill
reference phrase '`Task` tool (dispatch subagent)'? Do not perform any
subagent workflow. Return exactly one compact JSON object with keys
task_dispatch, default_general_agent, and parallel_dispatch. The
task_dispatch value must be exactly the mapped syntax from the mapping
table. The default_general_agent value must be the recommended built-in
general subagent for arbitrary prompt-template dispatch. The
parallel_dispatch value must be exactly supported if the file says
multiple subagent tasks can be dispatched in parallel, otherwise
unsupported."
- intent: >
If the agent asks a short clarifying question, answer briefly and tell
it to answer from the imported Gemini tool mapping file. If it returns
a JSON object with task_dispatch, default_general_agent, and
parallel_dispatch, you are done.
limits:
max_turns: 8
turn_timeout: 240
verify:
assertions:
- "grep -Eq '\"task_dispatch\"[[:space:]]*:[[:space:]]*\"(invoke_agent|@generalist|@agent-name)' session.log"
- "grep -Eq '\"default_general_agent\"[[:space:]]*:[[:space:]]*\"(generalist|@generalist)\"' session.log"
- "grep -Eq '\"parallel_dispatch\"[[:space:]]*:[[:space:]]*\"supported\"' session.log"
- "! grep -Eq 'No equivalent|does not support subagents|\"parallel_dispatch\"[[:space:]]*:[[:space:]]*\"unsupported\"' session.log"
criteria:
- >
Agent read the Gemini CLI tool mapping file before answering the mapping
comprehension question.
- >
Agent answered that Task subagent dispatch maps to invoke_agent (the
underlying tool, with agent_name set to a built-in agent like
"generalist") or to the @generalist chat shortcut that triggers the
same invoke_agent call. Either form is correct per Gemini CLI's source
and docs.
- >
Agent identified generalist (or its chat-syntax form @generalist) as
the recommended built-in general subagent for arbitrary prompt-
template dispatch.
- >
Agent reported parallel subagent dispatch as supported.
observe: true