Close three review blind spots found by defect tracing

Live eval deliverables shipped five polish defects; tracing each through the transcripts showed three mechanisms, each now addressed: - reviewers answered pointed checklist items with unsupported yes (evidence rule: every What-to-Check answer needs file:line evidence) - no reviewer ever saw the design's global constraints (controllers now paste binding constraints into task requirements) - test output noise was invisible everywhere (pristine-output checks in implementer self-review and quality review)
2026-07-27 12:44:01 +08:00 · 2026-06-09 21:19:08 -07:00
parent 5cfdb75b94
commit c7900f1698
4 changed files with 13 additions and 2 deletions
--- a/skills/subagent-driven-development/SKILL.md
+++ b/skills/subagent-driven-development/SKILL.md
@@ -149,6 +149,9 @@ final whole-branch review. When you fill a reviewer template:
  ignore or not flag a specific issue. If you believe a finding would be a
  false positive, let the reviewer raise it and adjudicate it in the review
  loop.
+- Include the spec/design's global constraints that bind the task (version
+  floors, naming and copy rules, platform requirements) in the requirements
+  you paste — a reviewer can only enforce what you hand them.

 ## Prompt Templates

--- a/skills/subagent-driven-development/code-quality-reviewer-prompt.md
+++ b/skills/subagent-driven-development/code-quality-reviewer-prompt.md
@@ -61,6 +61,9 @@ Subagent (general-purpose):
    running it. If you cannot run commands in this environment, name the
    test you would run.

+    Warnings or other noise in the implementer's reported test output are
+    findings — test output should be pristine.
+
    ## What to Check

    **Code quality:**
@@ -81,6 +84,9 @@ Subagent (general-purpose):
      significantly grow existing files? (Don't flag pre-existing file
      sizes — focus on what this change contributed.)

+    Answer each item above with file:line evidence, not a bare yes or no.
+    An unsupported "yes" is not a review.
+
    ## Calibration

    Categorize issues by actual severity. Not everything is Critical.
--- a/skills/subagent-driven-development/implementer-prompt.md
+++ b/skills/subagent-driven-development/implementer-prompt.md
@@ -94,6 +94,7 @@ Subagent (general-purpose):
    - Do tests actually verify behavior (not just mock behavior)?
    - Did I follow TDD if required?
    - Are tests comprehensive?
+    - Is the test output pristine (no stray warnings or noise)?

    If you find issues during self-review, fix them now before reporting.