Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance

2026-07-28 21:21:37 +08:00 · 2026-06-10 21:44:23 -07:00
parent ec014e7a7f
commit de1d35e5e7
1 changed files with 15 additions and 7 deletions
--- a/docs/superpowers/specs/2026-06-10-strict-cost-sdd-design.md
+++ b/docs/superpowers/specs/2026-06-10-strict-cost-sdd-design.md
@@ -65,13 +65,21 @@ fewer, better-sized tasks, SDD still runs one fresh subagent per task.

 ### L1 — Plan-side crispness (writing-plans changes; est. −$1.5-3/run, plus variance reduction)

-**Status 2026-06-11: validated in effect.** A hand-crisped fractals plan
-(10 → 7 tasks, `## Global Constraints` header, per-task `Interfaces:`
-lines — scenario `sdd-go-fractals-crisp`) ran 3/3 green at $9.51-12.65
-(mean $11.60 vs combo band $11.67-14.84), 20-24 dispatches vs 28, fix
-waves flat. What remains is elicitation: getting writing-plans guidance
-to *produce* such plans (micro-test per the doctrine, then the follow-up
-PR). See the experiments log, Batch A-E.
+**Status 2026-06-11 (final): elicitation tested end-to-end; claims
+re-attributed.** Micro-tests: constraints header and Interfaces blocks
+elicit deterministically (0→5/5, 0→100% of tasks, exact values);
+right-sizing is modest and scale-dependent (9.4→8.4 tasks at svelte
+scale, nothing to move at fractals scale). Full runs: an elicited plan
+executed at $6.34/$8.49 — but the no-guidance control (opus plan,
+complete code) hit $7.59/$7.73, inside that range. **The cost win
+belongs to opus-written complete-code plans; the hand-written prose
+fixture plans all prior numbers used are unrepresentative and ~2×
+costlier to execute.** The guidance owns fidelity and variance instead:
+deterministic constraints propagation (the one elicited-run fix was a
+version-floor catch), exact cross-task interfaces, fix waves 1 vs 2-4
+(the control plan shipped a real Sierpinski bug both runs had to fix).
+The writing-plans PR claims those grounds, not dollars. Draft at
+/tmp/sdd-exp/writing-plans-l1 (branch writing-plans-crisp).

 The plan is upstream of every cost: task count sets dispatch count; plan
 ambiguity sets review-loop count; plan completeness sets implementer