Commit Graph

  • afcbf8bacb Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) Jesse Vincent 2026-06-10 17:11:26 -07:00
  • fa14c8d671 Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) Jesse Vincent 2026-06-10 16:59:43 -07:00
  • 8b76932337 Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first Jesse Vincent 2026-06-10 14:35:00 -07:00
  • 0702ec2c6f Record writing-plans micro-test result: resolved, no change needed Jesse Vincent 2026-06-10 14:31:50 -07:00
  • 85a9324a53 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) Jesse Vincent 2026-06-10 13:08:40 -07:00
  • 610b09874e Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist Jesse Vincent 2026-06-10 13:08:19 -07:00
  • 6df501ea5d Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget Jesse Vincent 2026-06-10 13:08:06 -07:00
  • 1585f40c8e Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants Jesse Vincent 2026-06-10 12:32:06 -07:00
  • 60c0b744b4 Shared: unique review-package collateral names Jesse Vincent 2026-06-10 09:39:21 -07:00
  • 6b3e4ad407 Add review-package script; close fix-dispatch test gap Jesse Vincent 2026-06-10 08:51:16 -07:00
  • a84bb0f52b Describe the review design as current state, not as a delta Jesse Vincent 2026-06-10 08:28:28 -07:00
  • 69d396a676 Spec: record iterations 2-3 results and final frozen-config matrix Jesse Vincent 2026-06-10 05:06:59 -07:00
  • e4457c970e Hand reviewers the diff as a file, not a paste Jesse Vincent 2026-06-10 03:44:19 -07:00
  • fac5888846 Reviewer skepticism covers the implementer's design rationales Jesse Vincent 2026-06-10 02:20:28 -07:00
  • 8ac14c0450 Make diff-pasting non-optional for task reviewer dispatch Jesse Vincent 2026-06-10 02:10:34 -07:00
  • 3ed554d557 Close the Minor-severity escape hatch Jesse Vincent 2026-06-10 02:09:10 -07:00
  • 4e8edca36e Spec: document cost iterations and the per-task review consolidation Jesse Vincent 2026-06-09 23:59:22 -07:00
  • d7726d99dc Merge per-task reviews into one task reviewer (iteration 2) Jesse Vincent 2026-06-09 23:58:28 -07:00
  • 4c1f1e5cc5 Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence Jesse Vincent 2026-06-09 22:42:54 -07:00
  • 7288393773 Add phrase-level pre-judging triggers to reviewer prompt rule Jesse Vincent 2026-06-09 21:49:51 -07:00
  • 254a8e2e32 Red Flags: never tell a reviewer what not to flag or pre-rate severity Jesse Vincent 2026-06-09 21:47:41 -07:00
  • 7c11cee649 Close three review blind spots found by defect tracing Jesse Vincent 2026-06-09 21:19:08 -07:00
  • b36cf86afd Require explicit model on subagent dispatch Jesse Vincent 2026-06-09 21:11:45 -07:00
  • 06bec17a34 Forbid controllers pre-judging reviewer findings Jesse Vincent 2026-06-09 18:28:24 -07:00
  • 236524413b Sync plan: escaped pre() pattern in Task 5 checks block Jesse Vincent 2026-06-09 18:19:00 -07:00
  • 6e019e0316 Fix plan doc: correct Task 1 grep expectation; sync Task 5 story block Jesse Vincent 2026-06-09 17:21:06 -07:00
  • d4bb8d268f Sync plan's Task 5 blocks with review fixes Jesse Vincent 2026-06-09 17:13:03 -07:00
  • d519ba65fd SDD controller: reviewer prompt budgets, ⚠️ handling, final-review pointer, model judgment Jesse Vincent 2026-06-09 16:59:05 -07:00
  • d32a56dc32 Implementer prompt: re-run covering tests after fixing review findings Jesse Vincent 2026-06-09 16:56:28 -07:00
  • 994bc26d2a Scope spec reviewer's Your Job wording to the diff Jesse Vincent 2026-06-09 16:55:28 -07:00
  • d5850df1bc Spec reviewer: judge from the diff, grounded skepticism, ⚠️ verdict channel Jesse Vincent 2026-06-09 16:53:30 -07:00
  • b5edd40d2c Use bare placeholder names in quality reviewer prompt body Jesse Vincent 2026-06-09 16:51:54 -07:00
  • 6a02446953 Make per-task quality reviewer prompt self-contained and task-scoped Jesse Vincent 2026-06-09 16:47:27 -07:00
  • 042d238b26 Add implementation plan for task-scoped review dispatch Jesse Vincent 2026-06-09 16:42:50 -07:00
  • cf81ad2ac3 Harden review-dispatch spec per adversarial review findings Jesse Vincent 2026-06-09 16:33:44 -07:00
  • cb0dbeb095 Add design spec: task-scoped review dispatch for SDD Jesse Vincent 2026-06-09 16:26:00 -07:00
  • db6077bb21 Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not sdd-review-dispatch Jesse Vincent 2026-06-11 13:11:32 -07:00
  • 65e702f92a writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks Jesse Vincent 2026-06-10 20:44:48 -07:00
  • 7f126acda6 Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules Jesse Vincent 2026-06-11 10:31:48 -07:00
  • 06f7789487 Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance Jesse Vincent 2026-06-10 21:44:23 -07:00
  • 330aba6dd6 Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) Jesse Vincent 2026-06-10 17:11:26 -07:00
  • 5b9eb20f76 Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) Jesse Vincent 2026-06-10 16:59:43 -07:00
  • 7e421713ac Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first Jesse Vincent 2026-06-10 14:35:00 -07:00
  • 8476908a1b Record writing-plans micro-test result: resolved, no change needed Jesse Vincent 2026-06-10 14:31:50 -07:00
  • e6118e02b9 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) Jesse Vincent 2026-06-10 13:08:40 -07:00
  • 9a221229a5 Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist Jesse Vincent 2026-06-10 13:08:19 -07:00
  • 7d8f0ce9e9 Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget Jesse Vincent 2026-06-10 13:08:06 -07:00
  • f37c5e5115 Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants Jesse Vincent 2026-06-10 12:32:06 -07:00
  • 618698d9b3 Shared: unique review-package collateral names Jesse Vincent 2026-06-10 09:39:21 -07:00
  • 2d6e56ee90 Add review-package script; close fix-dispatch test gap Jesse Vincent 2026-06-10 08:51:16 -07:00
  • 4a92407ae7 Describe the review design as current state, not as a delta Jesse Vincent 2026-06-10 08:28:28 -07:00
  • cc81ffe7f3 Spec: record iterations 2-3 results and final frozen-config matrix Jesse Vincent 2026-06-10 05:06:59 -07:00
  • a0dcb77596 Hand reviewers the diff as a file, not a paste Jesse Vincent 2026-06-10 03:44:19 -07:00
  • bc7d93de1a Reviewer skepticism covers the implementer's design rationales Jesse Vincent 2026-06-10 02:20:28 -07:00
  • 63a155692b Make diff-pasting non-optional for task reviewer dispatch Jesse Vincent 2026-06-10 02:10:34 -07:00
  • 4866fe8b2d Close the Minor-severity escape hatch Jesse Vincent 2026-06-10 02:09:10 -07:00
  • e45a8f2548 Spec: document cost iterations and the per-task review consolidation Jesse Vincent 2026-06-09 23:59:22 -07:00
  • fc75b0b3b4 Merge per-task reviews into one task reviewer (iteration 2) Jesse Vincent 2026-06-09 23:58:28 -07:00
  • da0a11f6d4 Cut review-cost drivers: turn-aware models, inline diffs, scoped evidence Jesse Vincent 2026-06-09 22:42:54 -07:00
  • b42846401f Add phrase-level pre-judging triggers to reviewer prompt rule Jesse Vincent 2026-06-09 21:49:51 -07:00
  • c087105ff3 Red Flags: never tell a reviewer what not to flag or pre-rate severity Jesse Vincent 2026-06-09 21:47:41 -07:00
  • 29e5842917 Close three review blind spots found by defect tracing Jesse Vincent 2026-06-09 21:19:08 -07:00
  • 1d94bc939d Require explicit model on subagent dispatch Jesse Vincent 2026-06-09 21:11:45 -07:00
  • 833ec4177e Forbid controllers pre-judging reviewer findings Jesse Vincent 2026-06-09 18:28:24 -07:00
  • c4abda336c Sync plan: escaped pre() pattern in Task 5 checks block Jesse Vincent 2026-06-09 18:19:00 -07:00
  • c874cf0cb3 Fix plan doc: correct Task 1 grep expectation; sync Task 5 story block Jesse Vincent 2026-06-09 17:21:06 -07:00
  • 08a2e7eed3 Sync plan's Task 5 blocks with review fixes Jesse Vincent 2026-06-09 17:13:03 -07:00
  • 077dd192a7 SDD controller: reviewer prompt budgets, ⚠️ handling, final-review pointer, model judgment Jesse Vincent 2026-06-09 16:59:05 -07:00
  • 441d22a2c0 Implementer prompt: re-run covering tests after fixing review findings Jesse Vincent 2026-06-09 16:56:28 -07:00
  • efcaa40f1f Scope spec reviewer's Your Job wording to the diff Jesse Vincent 2026-06-09 16:55:28 -07:00
  • 622a3887f3 Spec reviewer: judge from the diff, grounded skepticism, ⚠️ verdict channel Jesse Vincent 2026-06-09 16:53:30 -07:00
  • d3d6800b07 Use bare placeholder names in quality reviewer prompt body Jesse Vincent 2026-06-09 16:51:54 -07:00
  • 246b493db4 Make per-task quality reviewer prompt self-contained and task-scoped Jesse Vincent 2026-06-09 16:47:27 -07:00
  • 7dc323c28b Add implementation plan for task-scoped review dispatch Jesse Vincent 2026-06-09 16:42:50 -07:00
  • 55938589d3 Harden review-dispatch spec per adversarial review findings Jesse Vincent 2026-06-09 16:33:44 -07:00
  • 450b02a11b Add design spec: task-scoped review dispatch for SDD Jesse Vincent 2026-06-09 16:26:00 -07:00
  • 85a635a6f1 Job posting hiring Jesse Vincent 2026-06-15 11:46:19 -07:00
  • 9eb452afe7 chore: bump evals submodule to claude transcript-capture fix Drew Ritter 2026-06-13 15:15:46 -07:00
  • 16856963f2 chore: bump evals submodule to claude transcript-capture fix bump/evals-claude-transcript-capture Drew Ritter 2026-06-13 15:15:46 -07:00
  • 9d2b0e971d writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks writing-plans-crisp Jesse Vincent 2026-06-10 20:44:48 -07:00
  • 93f2ce91b8 Fix companion stop metadata and token permissions Drew Ritter 2026-06-11 10:25:19 -07:00
  • e9ee6c5b4d Harden Windows browser launcher Drew Ritter 2026-06-10 20:33:56 -07:00
  • 5415cb8ccf Fix Windows lifecycle validation Drew Ritter 2026-06-10 20:09:55 -07:00
  • 1c21a91e01 Align visual companion docs with shipped scope Drew Ritter 2026-06-10 19:41:28 -07:00
  • 441335ee3e Fix companion test cleanup and argv assertions Drew Ritter 2026-06-10 19:37:30 -07:00
  • 377192f7a1 Harden companion platform tests Drew Ritter 2026-06-10 19:26:53 -07:00
  • 5eea0d09d7 Fix companion lifecycle test ownership metadata Drew Ritter 2026-06-10 19:12:17 -07:00
  • a6a4cd85b9 Harden companion stop ownership proof Drew Ritter 2026-06-10 18:49:38 -07:00
  • 8034176801 Isolate companion fallback tokens Drew Ritter 2026-06-10 18:39:37 -07:00
  • 2bab677ba7 Fix server test fallback cleanup Drew Ritter 2026-06-10 18:33:38 -07:00
  • c4cde1eed9 Harden root screen containment Drew Ritter 2026-06-10 18:25:03 -07:00
  • 5f3b317741 Plan visual companion final hardening fixup Drew Ritter 2026-06-10 18:19:31 -07:00
  • 7bb6af2f67 Tighten visual companion hardening spec Drew Ritter 2026-06-10 18:13:18 -07:00
  • 4f88b89c75 Document visual companion final hardening fixup Drew Ritter 2026-06-10 18:05:55 -07:00
  • c7d7e3550f Harden companion Windows lifecycle coverage Drew Ritter 2026-06-10 16:23:13 -07:00
  • a2e67bbd9b Harden brainstorm companion auth regressions Drew Ritter 2026-06-10 14:58:16 -07:00
  • fe812c418f Document visual companion auth hardening plan Drew Ritter 2026-06-10 14:14:15 -07:00
  • f4d1788ffb fix(brainstorm-server): fix auth-integration bugs from full-branch review Jesse Vincent 2026-06-09 19:13:52 -07:00
  • 4341c3f4d5 test(brainstorm-server): thread session key through tests after auth merge Jesse Vincent 2026-06-09 18:33:00 -07:00
  • c64c4ea6f4 feat(brainstorm-server): gate every endpoint behind a per-session key Jesse Vincent 2026-06-09 12:22:53 -07:00