Iteration-1 profiling: implementers and per-dispatch overhead dominate
(429 of 686 subagent turns; controller coordination is half the dollars
and scales with dispatch count), reviewers are individually lean, and
the controller pasted the diff in only 2 of 22 review dispatches when
the guidance was phrased as optional.
Changes: spec-reviewer-prompt.md + code-quality-reviewer-prompt.md
replaced by task-reviewer-prompt.md (one reviewer, one reading of a
pasted diff, two verdicts: spec compliance ✅/❌/⚠️ and task quality);
one fix dispatch can address both kinds of findings; controller now
runs git diff itself and pastes it (imperative, not optional);
implementers run focused tests while iterating and the full suite once
before committing; flowchart, example, Red Flags, tool tables updated.
The broad final whole-branch review is unchanged.
Live eval deliverables shipped five polish defects; tracing each through
the transcripts showed three mechanisms, each now addressed:
- reviewers answered pointed checklist items with unsupported yes
(evidence rule: every What-to-Check answer needs file:line evidence)
- no reviewer ever saw the design's global constraints (controllers now
paste binding constraints into task requirements)
- test output noise was invisible everywhere (pristine-output checks in
implementer self-review and quality review)
Add a conditional TDD Evidence field to the implementer report format so controllers can verify RED and GREEN output when TDD was required.
The field asks for the command run, relevant RED/GREEN output, and the expected RED failure reason rather than raw full logs.
Fixes#994.
Replace Claude-Code-specific tool names in skill prose, prompt
templates, and OpenCode-facing docs with action-language descriptions
that resolve to each runtime's native tool via the per-platform refs.
Changes by category:
- Prose mentions ("Use TodoWrite to track...", "Use Task tool with
general-purpose type") → action language ("Track each item as a
todo", "Dispatch a general-purpose subagent")
- Prompt template headers (6 files): "Task tool (general-purpose):"
→ "Subagent (general-purpose):" — preserves the type information
without naming Claude Code's specific dispatch tool
- DOT flowchart node labels: "Invoke Skill tool" → "Invoke the
skill"; "Create TodoWrite todo per item" → "Create a todo per
item"
- OpenCode INSTALL.md and docs/README.opencode.md: replace the old
"TodoWrite → todowrite, Task → @mention" mapping (which both
taught a vocabulary skills no longer use AND was wrong about
@mention being a real OpenCode syntax) with an action-language
mapping verified against the installed OpenCode CLI's tool
inventory.
The platform-tools refs landed in Phase B already document each
runtime's resolution; skills now speak in the actions those refs
map. Tool names that genuinely belong only in the per-platform
dispatch section ("In Claude Code: Use the `Skill` tool") and the
Claude-Code-specific Bash run_in_background flag note in
visual-companion remain — those are intentional carve-outs.
Add design-for-isolation and working-in-existing-codebases guidance to
brainstorming. Add file size awareness and escalation prompts to SDD
implementer and code quality reviewer. Writing-plans gets architecture
section sizing guidance. Spec and plan reviewers get architecture and
file size checks.