mirror of
https://github.com/obra/superpowers.git
synced 2026-06-12 21:59:04 +08:00
Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not
This commit is contained in:
@@ -118,7 +118,25 @@ target economics and ambiguity, not placeholder hygiene.
|
|||||||
|
|
||||||
### L2 — Controller tier (est. −$4-5/run; the biggest single lever, gated hardest)
|
### L2 — Controller tier (est. −$4-5/run; the biggest single lever, gated hardest)
|
||||||
|
|
||||||
**Status 2026-06-11: recon positive (n=2), gates still owed.**
|
**Status 2026-06-11 (final): DIED AT THE GATES, as pre-registered — with
|
||||||
|
useful anatomy.** Recon was positive ($6.68/$8.05, n=2, mechanics clean).
|
||||||
|
The full battery split the judgment surface: the new
|
||||||
|
`sdd-escalates-broken-plan` scenario (explicit plan self-contradiction;
|
||||||
|
the human never volunteers it) passed **5/5 at sonnet** ($1.02-1.37/run;
|
||||||
|
opus baseline 2/2) — explicit conflicts get escalated. But the
|
||||||
|
planted-defect battery failed decisively: under a sonnet controller the
|
||||||
|
per-task quality gate collapsed into plan-compliance advocacy ("no
|
||||||
|
assertion, as required" listed under Strengths), the defect shipped in
|
||||||
|
4/5 runs (deterministic check), and only the tier-pinned opus final
|
||||||
|
reviewer ever caught it — while the same sonnet-tier reviewers under an
|
||||||
|
opus controller flagged it 5/5. Cheap controllers handle explicit
|
||||||
|
escalation; they absorb implicit authority-vs-quality adjudication.
|
||||||
|
A possible L2b (discrete rule: "a reviewer finding that conflicts with
|
||||||
|
the plan's text is the human's decision — escalate it") would route the
|
||||||
|
failing judgment through the escalation behavior that held; untested.
|
||||||
|
Original recon notes follow.
|
||||||
|
|
||||||
|
**Recon (superseded):**
|
||||||
Sonnet-controller runs (claude-sonnet coding-agent): all gates green at
|
Sonnet-controller runs (claude-sonnet coding-agent): all gates green at
|
||||||
**$6.68 and $8.05** / 31-41 min (combo band $11.67-14.84), tokens inside
|
**$6.68 and $8.05** / 31-41 min (combo band $11.67-14.84), tokens inside
|
||||||
the combo band — no cheap-controller turn inflation. 26/26 and 31/31
|
the combo band — no cheap-controller turn inflation. 26/26 and 31/31
|
||||||
|
|||||||
Reference in New Issue
Block a user