Fix plan doc: correct Task 1 grep expectation; sync Task 5 story block

This commit is contained in:
Jesse Vincent
2026-06-09 17:21:06 -07:00
parent 71dc271a08
commit b3bb9a68d7

View File

@@ -157,7 +157,7 @@ Run: `grep -c "requesting-code-review" skills/subagent-driven-development/code-q
Expected: `ABSENT` (no more delegation) Expected: `ABSENT` (no more delegation)
Run: `grep -n "Task quality:" skills/subagent-driven-development/code-quality-reviewer-prompt.md | head -2` Run: `grep -n "Task quality:" skills/subagent-driven-development/code-quality-reviewer-prompt.md | head -2`
Expected: two matches (output format + placeholder note) Expected: one match (the Output Format verdict line; the "Reviewer returns" footer says "Task quality verdict" without a colon)
Run: `grep -n "worktree add\|Ready to merge" skills/subagent-driven-development/code-quality-reviewer-prompt.md || echo CLEAN` Run: `grep -n "worktree add\|Ready to merge" skills/subagent-driven-development/code-quality-reviewer-prompt.md || echo CLEAN`
Expected: `CLEAN` Expected: `CLEAN`
@@ -637,21 +637,21 @@ you are done.
- A `Skill` invocation naming `superpowers:subagent-driven-development` - A `Skill` invocation naming `superpowers:subagent-driven-development`
and at least one `Agent` (subagent dispatch) tool call appear in the and at least one `Agent` (subagent dispatch) tool call appear in the
session log. session log.
- The duplicated report-formatting logic did not survive to the end of the - The duplicated report-formatting logic did not survive to the end of
run. Either (a) the implementer never introduced the duplication (wrote or the run. Either (a) the implementer never introduced the duplication
self-reviewed its way to shared logic), or (b) the per-task code quality (wrote or self-reviewed its way to shared logic), or (b) the per-task
reviewer flagged the duplication as an issue and a review-fix loop removed code quality reviewer flagged the duplication as an issue and a
it. A fail looks like the duplicated logic shipping with the per-task review-fix loop removed it. A fail looks like the duplicated logic
quality reviewer approving it, or the duplication being caught only by the shipping with the per-task quality reviewer approving it, or the
final whole-branch review. duplication being caught only by the final whole-branch review.
- The per-task quality reviewers stayed task-scoped: no package-wide test - The per-task quality reviewers stayed task-scoped: no package-wide
suites, race detector runs, or repeated/high-count test loops appear in test suites, race detector runs, or repeated/high-count test loops
reviewer subagent activity, and reviewers did not re-run the full test appear in reviewer subagent activity, and reviewers did not re-run
suite merely to confirm the implementer's report. the full test suite merely to confirm the implementer's report.
- `npm test` passes in the main checkout and both `formatUserReport` and - `npm test` passes in the main checkout and both `formatUserReport` and
`formatAdminReport` are exported from src/report.js. The deterministic `formatAdminReport` are exported from src/report.js. The deterministic
assertions gate this; the criteria above are about whether the *per-task assertions gate this; the criteria above are about whether the
quality review* was the mechanism that kept the code clean. *per-task quality review* was the mechanism that kept the code clean.
``` ```
- [ ] **Step 4: Create `evals/scenarios/sdd-quality-reviewer-catches-planted-defect/setup.sh`:** - [ ] **Step 4: Create `evals/scenarios/sdd-quality-reviewer-catches-planted-defect/setup.sh`:**