scenario: spec-reviewer-catches-planted-flaws
description: >
  Lifted from superpowers/tests/claude-code/test-document-review-system.sh.
  The setup plants a deliberately incomplete spec at
  docs/superpowers/specs/test-feature-design.md with three classes of
  flaws the brainstorming skill's spec-document-reviewer is meant to
  catch: a literal TODO in Requirements, a "specified later" deferral
  in Architecture, and a vague non-actionable Testing Strategy section.

  Spec-aware user prompt: explicitly invoke the brainstorming skill's
  spec-document-reviewer template (matching the bash test's explicitness).
  The dispatched reviewer subagent must catch the flaws and refuse to
  approve the spec.
user_posture: spec-aware

setup:
  helpers:
    - create_base_repo
    - add_flawed_spec_for_review
  assertions:
    - "git rev-parse --is-inside-work-tree"
    - "git branch --show-current | grep -q main"
    - "test -f docs/superpowers/specs/test-feature-design.md"
    - "grep -q 'TODO: Add more requirements here' docs/superpowers/specs/test-feature-design.md"
    - "grep -q 'specified later' docs/superpowers/specs/test-feature-design.md"

turns:
  - intent: >
      You drafted a spec at docs/superpowers/specs/test-feature-design.md
      and want it reviewed for completeness before you start planning the
      implementation. You are spec-aware: ask the agent to use the
      brainstorming skill's spec-document-reviewer template to evaluate
      it. Use phrasing like:

      "I drafted a spec at docs/superpowers/specs/test-feature-design.md.
      Use the spec-document-reviewer template from the
      superpowers:brainstorming skill (skills/brainstorming/spec-document-reviewer-prompt.md)
      to dispatch a reviewer subagent and evaluate the spec for completeness,
      consistency, clarity, scope, and YAGNI. Report back what it found."

      Do NOT name the planted flaws (TODO, "specified later", vague
      testing). Do NOT volunteer hints about completeness. The reviewer
      subagent should discover them.
  - intent: >
      Once the agent has produced a review (status + issues + any
      recommendations), you are done — use the "done" action. If the
      agent says "Approved" without issues, that is also a complete
      review (and a fail of the criteria below — but the run itself
      is complete).

limits:
  max_turns: 6
  turn_timeout: 300

verify:
  assertions:
    - "tool-called Agent"
  criteria:
    - >
      Reviewer flagged the literal TODO in the Requirements section
      ("TODO: Add more requirements here"). Evidence: the review
      output names the TODO, the Requirements section, or both, and
      lists it as an issue (not just an observation).
    - >
      Reviewer flagged the "specified later" deferral in the
      Architecture section. Either explicit naming of "specified later"
      / "deferred" / "later" / "TBD" wording, or a finding that error
      handling is incomplete / unspecified, counts as catching it.
    - >
      Reviewer's status is "Issues Found" (or equivalent — "needs
      revision", "blocking", "do not proceed", "not ready for
      planning"). A reviewer that returns Approved without issues
      fails this criterion. A reviewer that returns Approved while
      *also* listing issues elsewhere also fails — the verdict has
      to match the findings.
  observe: true