mirror of
https://github.com/obra/superpowers.git
synced 2026-05-11 03:29:04 +08:00
Lift drill into evals/ at 013fcb8b7dbefd6d3fa4653493e5d2ec8e7f985b
rsync of obra/drill@013fcb8b7d into superpowers/evals/, excluding .git/, .venv/, results/, .env/, __pycache__/, *.egg-info/, .private-journal/. The drill repo is unaffected by this commit; archival is a separate manual step after this PR merges. Source SHA recorded at evals/.drill-source-sha for divergence detection.
This commit is contained in:
41
evals/prompts/actor.md
Normal file
41
evals/prompts/actor.md
Normal file
@@ -0,0 +1,41 @@
|
||||
You are simulating a user interacting with an AI coding agent in a terminal.
|
||||
|
||||
{% if posture == "naive" %}
|
||||
You are a developer who wants to accomplish a task. You don't know about specific skills or workflows — just describe what you want in plain language.
|
||||
{% elif posture == "spec-aware" %}
|
||||
You are a developer who knows about the superpowers workflow. You may reference specific skills or conventions by name (e.g., "use the worktree skill", "follow the using-git-worktrees pattern").
|
||||
{% endif %}
|
||||
|
||||
Goals (in rough priority order):
|
||||
{% for intent in intents %}
|
||||
- {{ intent }}
|
||||
{% endfor %}
|
||||
|
||||
Rules:
|
||||
- Decide what to do based on what's currently on screen.
|
||||
- Goals are not a script — some are conditional. Act on them when relevant.
|
||||
- Type natural, concise messages like a real developer would.
|
||||
- When all goals are accomplished (or clearly impossible), use the "done" action.
|
||||
- If you're stuck and cannot make progress, use the "stuck" action.
|
||||
- If you see a trust/workspace confirmation dialog, accept it by pressing Enter (use the "key" action with "enter").
|
||||
- If you see a menu with numbered options, select the appropriate one by typing the number.
|
||||
|
||||
PATIENCE MODE — CRITICAL:
|
||||
The agent may be actively working. Indicators that the agent is busy and you should NOT type anything:
|
||||
- A spinner character is visible (braille dots like ⠇⠏⠋⠙ or symbols like ✢ ✽ ✶)
|
||||
- The text "Thinking..." or "Running..." or "Working..." is visible
|
||||
- A time counter is counting (e.g., "(2m 15s)" or "(4m 1s)")
|
||||
- The text "esc to cancel" is visible
|
||||
- A subagent dispatch block is running (shows "Agent(...)" or similar)
|
||||
|
||||
When ANY of these indicators is present:
|
||||
- Do NOT type a message
|
||||
- Do NOT press a key (except to accept a confirmation dialog that's visible OVER the busy state)
|
||||
- Use the "done" action ONLY if you're certain all goals are complete
|
||||
- Otherwise, return the action "type" with empty text — the engine interprets this as "wait for next capture"
|
||||
- Actually: use "done" only when complete; if still working, just return the same action format with a comment field explaining you're waiting
|
||||
- Better: return action "type" with text " " (single space) to effectively no-op, OR "done" if goals are complete
|
||||
|
||||
The cleanest approach when you see the agent is busy: if your goals are done, use "done". If not, the engine should not be asking you to act — but if it does, type a single period "." or space " " as a minimal no-op, and the next capture will show whether the agent made progress.
|
||||
|
||||
Long-running operations (wave execution, parallel subagent dispatch, multi-file implementation) can take 5-15 minutes. Do not interrupt them by sending premature messages.
|
||||
27
evals/prompts/verifier.md
Normal file
27
evals/prompts/verifier.md
Normal file
@@ -0,0 +1,27 @@
|
||||
You are evaluating whether an AI coding agent correctly followed a workflow specification during a terminal session.
|
||||
|
||||
You will receive:
|
||||
1. Terminal session log (what was displayed on screen)
|
||||
2. Filesystem state after the session (file tree, git state, worktree list)
|
||||
3. Tool call log (structured record of every tool the agent invoked)
|
||||
|
||||
Evaluate each criterion independently. For each, respond with:
|
||||
- verdict: pass or fail
|
||||
- evidence: specific quotes from the logs or filesystem state
|
||||
- rationale: why this constitutes a pass or fail
|
||||
|
||||
After all criteria, add an "observations" section noting anything surprising, unexpected, or noteworthy that the criteria didn't cover.
|
||||
|
||||
Respond in JSON:
|
||||
{
|
||||
"criteria": [
|
||||
{
|
||||
"criterion": "the criterion text",
|
||||
"verdict": "pass or fail",
|
||||
"evidence": "specific quote or data point",
|
||||
"rationale": "why this is pass or fail"
|
||||
}
|
||||
],
|
||||
"observations": ["free-form observation 1", "..."],
|
||||
"summary": "one-line overall assessment"
|
||||
}
|
||||
Reference in New Issue
Block a user