mirror of
https://github.com/obra/superpowers.git
synced 2026-06-10 20:59:05 +08:00
Forbid controllers pre-judging reviewer findings
A live eval run of sdd-quality-reviewer-catches-planted-defect caught the SDD controller fabricating a plan constraint and instructing the quality reviewer not to flag the planted DRY violation. The duplication shipped. Constructing Reviewer Prompts now bans suppression directives alongside open-ended broadening directives.
This commit is contained in:
@@ -65,7 +65,7 @@ Stop delegating to `requesting-code-review/code-reviewer.md`. The per-task quali
|
||||
### 3. `skills/subagent-driven-development/SKILL.md` controller changes
|
||||
|
||||
- **Model Selection:** replace "Architecture, design, and review tasks: use the most capable available model" with judgment guidance — pick reviewer models the way implementer models are picked, scaled to the diff's size, complexity, and risk. The "Task complexity signals" list is rescoped to make clear its bullets describe implementation tasks; reviewer model choice follows the same judgment, so a narrow diff review does not automatically map to "broad codebase understanding → most capable model."
|
||||
- **Reviewer prompt construction** (new guidance near Red Flags): when dispatching reviewers, do not write open-ended directives ("check all uses," "run race tests if useful") without a concrete task-specific reason; do not ask reviewers to re-run tests the implementer already ran on the same code; per-task reviews are task-scoped gates — the broad review happens once, at the final whole-branch review.
|
||||
- **Reviewer prompt construction** (new guidance near Red Flags): when dispatching reviewers, do not write open-ended directives ("check all uses," "run race tests if useful") without a concrete task-specific reason; do not ask reviewers to re-run tests the implementer already ran on the same code; do not pre-judge findings for the reviewer (never instruct a reviewer to ignore or not flag a specific issue — adjudicate suspected false positives in the review loop instead); per-task reviews are task-scoped gates — the broad review happens once, at the final whole-branch review. (The pre-judging rule was added after a live eval run caught the controller fabricating a "the plan forbids a shared helper" claim and instructing the quality reviewer not to flag a planted DRY violation.)
|
||||
- **Handling spec-reviewer ⚠️ items** (new guidance, alongside Handling Implementer Status): the controller resolves each "cannot verify from diff" item itself before marking the task complete; confirmed gaps go back to the implementer as failed spec review.
|
||||
- **Final review stays broad, explicitly:** the final whole-branch reviewer dispatch node gains an explicit pointer to `../requesting-code-review/code-reviewer.md`. (Today that template is reachable only through the per-task quality prompt's delegation; once that delegation is removed, an unreferenced final-review template would be orphaned.) The Integration section's note that `superpowers:requesting-code-review` provides "the code review template for reviewer subagents" is corrected to apply to the final review only.
|
||||
- **Example workflow:** the quality-reviewer lines in the example are updated to the new verdict vocabulary ("Task quality: Approved"); the final reviewer's "ready to merge" line stays.
|
||||
|
||||
@@ -141,6 +141,10 @@ final whole-branch review. When you fill a reviewer template:
|
||||
if useful" without a concrete, task-specific reason
|
||||
- Do not ask a reviewer to re-run tests the implementer already ran on the
|
||||
same code — the implementer's report carries the test evidence
|
||||
- Do not pre-judge findings for the reviewer — never instruct a reviewer to
|
||||
ignore or not flag a specific issue. If you believe a finding would be a
|
||||
false positive, let the reviewer raise it and adjudicate it in the review
|
||||
loop.
|
||||
|
||||
## Prompt Templates
|
||||
|
||||
|
||||
Reference in New Issue
Block a user