superpowers/evals/scenarios/mid-conversation-skill-invocation.yaml

scenario: mid-conversation-skill-invocation
description: >
  Lifted from superpowers/tests/explicit-skill-requests/run-claude-describes-sdd.sh.
  Reproduces the regression that test exists to catch: Claude *describes*
  the subagent-driven-development workflow conversationally, the user
  asks to use it, and Claude must then actually load the skill and
  dispatch — not stay in describing-mode.

  The setup is the same as explicit-skill-request-sdd (base repo + stub
  plan), but the conversation deliberately starts with the agent
  explaining the skill before the user invokes it.
user_posture: spec-aware

setup:
  helpers:
    - create_base_repo
    - add_sdd_auth_plan
  assertions:
    - "git rev-parse --is-inside-work-tree"
    - "git branch --show-current | grep -q main"
    - "test -f docs/superpowers/plans/auth-system.md"

turns:
  - intent: >
      Open the conversation by asking the agent to summarize, in plain
      English, how the superpowers:subagent-driven-development workflow
      executes a multi-task plan. Use phrasing like:

      "Quick question before we start — can you describe how
      subagent-driven-development works? I want to make sure I
      understand the workflow before I commit to using it."

      Do NOT ask the agent to use the skill yet. The point is to put
      the agent in describing-mode first.
  - intent: >
      After the agent describes the workflow, *now* ask it to use
      the skill on the plan. Use phrasing like:

      "Got it, that's what I want. I have a plan at
      docs/superpowers/plans/auth-system.md. subagent-driven-development,
      please — dispatch the first subagent."

      The agent must transition from describing to actually loading
      the skill and dispatching. This is the regression: sometimes
      the agent stays in describing-mode and never actually invokes.
  - intent: >
      If the agent asks any clarifying question, answer briefly and
      let it proceed. If it offers to start, say "yes, go ahead."
  - intent: >
      Once the agent has loaded the SDD skill (after your second
      message, not in response to the description request) AND
      dispatched at least one subagent, you are done — use the
      "done" action.

limits:
  max_turns: 10
  turn_timeout: 300

verify:
  assertions:
    - "skill-called superpowers:subagent-driven-development"
    - "tool-called Agent"
  criteria:
    - >
      Agent transitioned from describing the skill to actually using
      it. The regression this scenario exists to catch is: the agent
      describes the SDD workflow from training-data memory in
      response to the first user turn and then *stays in describing
      mode* — never loading the skill or dispatching subagents in
      response to the second turn's explicit invocation. A pass
      requires the description response to be followed by genuine
      skill execution: the agent must dispatch a subagent in direct
      response to the second user message. (Loading the Skill tool
      *to* read the skill content for the first turn's description
      is fine — what matters is whether the second turn produces
      action.)
  observe: true