Commit Graph

627 Commits

Author SHA1 Message Date
Drew Ritter
79d83245aa Preserve hooks in Codex package manifest 2026-06-30 17:45:41 -07:00
Drew Ritter
f0fece9404 Strip hooks from Codex portal package 2026-06-30 17:32:44 -07:00
Drew Ritter
4575372ed3 docs: re-anchor Shape A examples away from Codex 2026-06-30 17:28:14 -07:00
Drew Ritter
af6104527b chore(codex): remove orphaned session-start-codex hook + refresh hook docs
hooks/session-start-codex has had no caller since "Remove Codex hooks"
(#1845) deleted hooks-codex.json and its manifest registration; the
Codex manifest now declares an empty hooks object so Codex registers no
session-start hook at all. The script is Codex-specific dead code —
nothing executes it on Codex or any other harness.

- Delete hooks/session-start-codex.
- tests/hooks/test-session-start.sh: drop the two Codex cases that are
  redundant with the generic session-start tests (nested-format and the
  legacy-warning omission are already covered by the Claude Code cases).
  Re-point the "wrapper dispatches" case to the live `session-start`
  script so run-hook.cmd dispatch coverage — used by Claude Code and
  Cursor in production — is preserved rather than lost.
- docs/porting-to-a-new-harness.md: Codex is no longer a Shape A
  (shell-hook) harness, so re-anchor that worked example to Cursor (a
  live shell-hook harness that demonstrates the same per-harness field,
  schema, and matcher variance) and mark Codex as native skill discovery
  with no session-start hook. Clears the references to the deleted
  hooks-codex.json.
- docs/windows/polyglot-hooks.md: the "check hooks-codex.json" pointer
  referenced a file deleted in #1845; re-point to hooks-cursor.json.

RELEASE-NOTES.md keeps its historical mention of hooks-codex.json (it
accurately records what that release did). The tests/codex-plugin-sync
fixtures build their own synthetic session-start-codex and test the sync
mechanism generically, so they are intentionally left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 17:28:14 -07:00
Drew Ritter
43c10985cf Fix Codex plugin category 2026-06-30 17:27:33 -07:00
Drew Ritter
8e19a0c3e6 Default Codex portal package to zip 2026-06-30 17:02:56 -07:00
Drew Ritter
6770bfbcc5 Harden Codex package script checks 2026-06-30 17:02:56 -07:00
Drew Ritter
3a1d8fe8d7 Add Codex portal package script 2026-06-30 17:02:56 -07:00
Drew Ritter
b15ef6ebbe fix(codex): suppress SessionStart hook auto-discovery with empty hooks object
Codex auto-discovers a plugin's hooks/hooks.json whenever the Codex
manifest has no `hooks` field: load_plugin_hooks falls back to a
hardcoded DEFAULT_HOOKS_CONFIG_FILE = "hooks/hooks.json" and registers
it. hooks/hooks.json is the Claude Code SessionStart hook, it is tracked
in this repo, and the Codex marketplace installs the whole repo root
(source url "./"), so the fallback re-registered the SessionStart hook
and its install-time trust prompt on Codex.

Removing the Codex hook file and the manifest `hooks` pointer (commit
"Remove Codex hooks") did not disable the hook on Codex — it removed the
explicit declaration that was overriding the fallback, so the fallback
took over and found the Claude hooks/hooks.json.

Declare an empty inline hooks object ({}) in .codex-plugin/plugin.json.
It parses as an empty inline hook set and stops Codex reaching the
auto-discovery fallback. An absent field, an empty array ([]), and an
empty inline list all collapse back to the fallback, so the value must
be exactly {}.

Update the test to assert the manifest declares hooks: {} (and that
hooks/hooks.json exists, which is what makes the declaration necessary),
replacing the prior assertion that the field was absent — which passed
while the hook was still being auto-discovered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 15:52:20 -07:00
Jesse Vincent
8554b7215c Release v6.1.0: leaner per-session bootstrap, Codex marketplace install, Gemini removed
Bump all manifests to 6.1.0 and add RELEASE-NOTES for v6.1.0:
- Compress the using-superpowers bootstrap and prune per-harness
  tool-mapping references (lower per-session token cost).
- Add a Codex marketplace manifest so the plugin installs from Codex;
  drop the Codex SessionStart hook.
- Remove Gemini CLI support (Google EOLed the Gemini CLI 2026-06-18).
2026-06-30 10:29:02 -07:00
Jesse Vincent
9c9b9bd7c8 test(codex): assert Codex manifest ships no hooks
Commit 1f0c76e removed the Codex SessionStart hook — dropping the hooks
field from .codex-plugin/plugin.json and deleting hooks-codex.json — but
left test-marketplace-manifest.sh asserting the old hooks pointer, so the
test has failed on dev since. Assert the field is absent instead, locking
in the no-Codex-hooks decision.
2026-06-30 10:28:53 -07:00
Jesse Vincent
98b080041d Compress the using-superpowers bootstrap
The bootstrap is injected into every session, so its token cost is paid
constantly. Condense it without dropping behavior-shaping content:

- Replace the graphviz skill-flow diagram with the prose it encoded (the
  1% rule, the plan-mode to brainstorm gate, announce + checklist to todos).
- Fold the standalone Instruction-Priority section into User Instructions.
- Drop the per-platform 'How to Access Skills' walkthrough.
- Trim the Platform Adaptation pointer to the harnesses that still have a
  reference file (Codex, Pi, Antigravity).

Keeps the full Red Flags rationalization table, skill priority framed as
process-before-implementation, and user-instruction precedence.
2026-06-24 19:35:57 -07:00
Jesse Vincent
4000288dac Prune per-harness tool-mapping boilerplate
The verbose action-to-tool tables and skill-loading explainers in the
per-harness reference files restated guidance modern agents already
follow. Trim each file to the harness-specific notes that still carry
weight (subagent dispatch, task tracking, instructions-file paths), and
delete claude-code-tools.md and copilot-tools.md, which had nothing left
that wasn't generic.
2026-06-24 19:35:20 -07:00
Jesse Vincent
6be431b772 Remove Gemini CLI support
Google EOLed the Gemini CLI on 2026-06-18; the extension can no longer
be installed or updated. Remove Gemini from the install docs, the
subagent-capable platform lists, and the eval-harness description, and
delete its tool-mapping reference.
2026-06-24 19:34:40 -07:00
Jesse Vincent
1f0c76e0b0 Remove Codex hooks
Codex reliably triggers skills on its own, and the SessionStart hook
made the UX worse rather than better. Drop the Codex hook config and
its registration in the plugin manifest.
2026-06-24 19:33:57 -07:00
Ada Sen
321c8cd24c fix(codex): stop bootstrap re-firing on resume (match Claude startup|clear|compact)
Bug: the SessionStart hook matcher in hooks-codex.json included "resume",
causing the superpowers bootstrap to re-fire on every Codex session resume.

Fix: align with Claude's hooks/hooks.json matcher "startup|clear|compact":
- drop "resume" (the bug: resume should not trigger re-bootstrap)
- add "compact" (so bootstrap re-injects after context compaction, like Claude)

Before: "matcher": "startup|resume|clear"
After:  "matcher": "startup|clear|compact"
2026-06-23 16:15:56 -07:00
Jesse Vincent
bfa3e4137a Keep Codex hooks manifest in plugin metadata
Prompt: Jesse questioned whether the PR should remove the hooks config from the Codex plugin manifest.

Runtime investigation showed Codex accepts a committed plugin manifest with hooks and installs the plugin successfully. Removing the field changes behavior: Codex falls back to the default hooks/hooks.json, which uses the non-Codex session-start hook and CLAUDE_PLUGIN_ROOT path, instead of hooks/hooks-codex.json and the session-start-codex script.

Changes: restore .codex-plugin/plugin.json hooks to ./hooks/hooks-codex.json and update the Codex marketplace manifest test to require that Codex-specific hook pointer instead of rejecting hooks.

Validation: bash tests/codex/test-marketplace-manifest.sh; scripts/lint-shell.sh tests/codex/test-marketplace-manifest.sh; bash tests/codex-plugin-sync/test-sync-to-codex-plugin.sh; bash tests/kimi/test-plugin-manifest.sh; bash tests/shell-lint/test-lint-shell.sh.
2026-06-22 11:51:28 -07:00
Jesse Vincent
a17aaaef3a Add Codex marketplace manifest
Prompt: Jesse asked for a new worktree off the local superpowers dev branch to add the Codex manifest after diagnosing why github.com/obra/superpowers did not show installable Codex plugins.

Root cause: Codex marketplace sources expect a .agents/plugins/marketplace.json at the marketplace root. The superpowers repo only had the Claude marketplace file and the Codex plugin manifest, so Codex could configure the marketplace name but found no installable plugin entries.

Changes: add a repo-local Codex marketplace manifest for superpowers-dev that points at this same repository root via the same-root source pattern Codex already accepts; add a focused marketplace manifest test; remove the unsupported hooks field from .codex-plugin/plugin.json so the plugin validator accepts the manifest.

Validation: bash tests/codex/test-marketplace-manifest.sh; uv run --with PyYAML python /Users/jesse/.codex/skills/.system/plugin-creator/scripts/validate_plugin.py /Users/jesse/git/superpowers/superpowers/.worktrees/codex-marketplace-manifest; throwaway HOME codex plugin marketplace add/list/add; bash tests/codex-plugin-sync/test-sync-to-codex-plugin.sh; bash tests/kimi/test-plugin-manifest.sh; bash tests/shell-lint/test-lint-shell.sh; scripts/lint-shell.sh tests/codex/test-marketplace-manifest.sh.
2026-06-22 11:51:28 -07:00
Jesse Vincent
896224c4b1 Release v6.0.3: SDD artifacts move out of the .git/ protected path
Bump all plugin manifests to 6.0.3. This release moves subagent-driven-
development's scratch artifacts (task briefs, implementer reports, review
diffs, progress ledger) from .git/sdd/ — which Claude Code denies agent
writes to — into a self-ignoring working-tree .superpowers/sdd/ dir, and
bumps the brainstorm-server test harness's ws dependency to clear two
dependabot alerts. See RELEASE-NOTES.md.
v6.0.3
2026-06-18 15:44:22 -07:00
Jesse Vincent
549dee6f64 test(deps): bump ws to ^8.21.0 in brainstorm-server tests
Clears two dependabot alerts on the test harness's ws dependency:
GHSA-96hv-2xvq-fx4p (high, memory-exhaustion DoS, fixed 8.21.0) and
GHSA-58qx-3vcg-4xpx (medium, uninitialized memory disclosure, fixed
8.20.1). Test-only — the shipped brainstorm server hand-rolls its
WebSocket framing and does not depend on ws. Suite passes (57/57).
2026-06-18 15:44:22 -07:00
Jesse Vincent
4f9bd3131e docs: add v6.0.3 release notes for the SDD .git/ workspace fix 2026-06-18 15:44:22 -07:00
Jesse Vincent
caf14aac66 test(sdd): wire test-sdd-workspace.sh into the runner; note git clean -fdx
The per-worktree workspace test was added but never registered in
run-skill-tests.sh, so it only ran when invoked by hand. Add it to the
fast unit-test array alongside the other pure-shell test.

Also document, in the Durable Progress section, that the ledger now
lives in git-ignored working-tree scratch, so `git clean -fdx` deletes
it — recover from `git log` if that happens.
2026-06-18 15:44:22 -07:00
Jesse Vincent
667b2c4a2e test(sdd): lock in per-worktree workspace isolation (#1780) 2026-06-18 15:44:22 -07:00
Jesse Vincent
93b8444b51 fix(sdd): write artifacts to working-tree .superpowers/sdd, not .git/ (#1780) 2026-06-18 15:44:22 -07:00
Jesse Vincent
207a12b203 feat(sdd): add sdd-workspace helper for a self-ignoring artifact dir 2026-06-18 15:44:22 -07:00
Jesse Vincent
b62616fc12 Release v6.0.2: stop shipping the evals submodule
It broke plugin installs for some users (#1778, #1774). The eval harness
now lives in its own repo, separate from the published plugin.
v6.0.2
2026-06-16 22:42:19 -07:00
Jesse Vincent
a21956e48c Release v6.0.1: Codex fixes
- Brainstorm companion reads version from .codex-plugin/plugin.json when package.json is absent (PRI-2240)
- sync-to-codex script excludes .gitmodules and .pre-commit-config.yaml (PRI-1168)
2026-06-16 17:02:33 -07:00
Drew Ritter
29c0b1b7db fix: read Codex plugin version from manifest (PRI-2240) 2026-06-16 17:02:33 -07:00
Drew Ritter
cf32920d3a fix: exclude repo metadata from Codex sync (PRI-1168) 2026-06-16 17:02:33 -07:00
Jesse Vincent
284be5905e Set v6.0.0 release date to 2026-06-16 v6.0.0 2026-06-16 10:09:47 -07:00
Jesse Vincent
77879bbb91 Bump evals submodule: unify per-agent bootstrap scenarios
Points evals at superpowers-evals 70a245c, which replaces the seven
per-agent *-superpowers-bootstrap scenarios with one cross-agent
superpowers-bootstrap scenario (adds the QUORUM_CODING_AGENT env var and
the bootstrap-installed dispatcher check verb).
2026-06-16 10:09:47 -07:00
Jesse Vincent
c5a965101b Bump version to 6.0.0 2026-06-16 10:09:47 -07:00
Drew Ritter
b3ee712d3a Add visual companion Prime Radiant branding 2026-06-16 10:09:47 -07:00
Jesse Vincent
9c61797773 Draft Superpowers 6 release notes 2026-06-16 10:09:47 -07:00
Jesse Vincent
b61b55013a E37: pre-flight plan review — surface plan conflicts as one batched question before Task 1 2026-06-16 10:09:47 -07:00
Jesse Vincent
be400204b3 Spec: L2b tested — opus structural win, sonnet transmission+attention gap (E35/E36); bump evals to 9919b27 2026-06-16 10:09:47 -07:00
Jesse Vincent
530476fd00 L2b: plan-mandated defects are findings the human adjudicates
Reviewer tripwire (Calibration): a plan-mandated defect IS a finding,
reported as Important and labeled plan-mandated — the plan's authorship
does not grade its own work.

Controller rule (review loop): a plan-mandated finding, or any finding
conflicting with the plan's text, escalates to the human like any plan
contradiction — never dismissed because the plan mandates it.

E35 micro (frozen 0a98 replay, sonnet reviewer, 6v6): without the
tripwire 0/6 reports give the controller anything to escalate on (all
Approved, defect endorsed as spec-required); with it 6/6 report the
defect as a labeled finding.
2026-06-16 10:09:47 -07:00
Jesse Vincent
e97faafb5a E27 stack: conditional impl tier + final-review tier pin + narration recipe + terse reviewer contract 2026-06-16 10:09:47 -07:00
Jesse Vincent
cfe48c28ac E03: cheapest-tier implementers when plan carries complete code (transcription hypothesis) 2026-06-16 10:09:47 -07:00
Jesse Vincent
8bcefb12cb Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not 2026-06-16 10:09:47 -07:00
Jesse Vincent
8e1262a3ba writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks
Claims are fidelity and variance, not dollars (full attribution in the
superpowers-evals experiment log, 2026-06-11 L1 entry):
- Global Constraints header: 0/5 -> 5/5 adoption in micro-tests, exact
  values verbatim; makes constraints mechanically propagatable to briefs
  and reviewers (a version-floor violation class shipped because they
  weren't). The one fix wave in the elicited full runs was a version-floor
  catch this header enabled.
- Per-task Interfaces blocks: 0 -> 100% of tasks, exact signatures,
  within-plan consistent; removes the controller's per-dispatch interface
  re-derivation.
- Task right-sizing: 9.4 -> 8.4 mean tasks at svelte scale (kills
  standalone Types/README micro-tasks); no effect at small scale.
- End-to-end (opus-written plan executed under SDD): guidance plan ran 1
  fix wave vs control's 2-4 (control plan shipped a real Sierpinski bug);
  execution cost equal within noise.
2026-06-16 10:09:47 -07:00
Jesse Vincent
de4672b171 Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules
E30 replay: the planted-DRY catch is causally determined by the
controller-composed constraints block (0/6 with process-shaped vs 5/6
with the spec's own wording). E31 micro: this recipe doubles the rate
at which composed blocks carry the spec's cross-component relationship
(6/6 vs 3/6). Affects dev and the redesign equally (E29: both 4/5).
2026-06-16 10:09:47 -07:00
Jesse Vincent
25192df30b Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance 2026-06-16 10:09:47 -07:00
Jesse Vincent
f5e8df4252 Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed) 2026-06-16 10:09:47 -07:00
Jesse Vincent
b5b3b5d99c Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead) 2026-06-16 10:09:47 -07:00
Jesse Vincent
30bbeefe89 Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first 2026-06-16 10:09:47 -07:00
Jesse Vincent
d3dd1ecc7d Record writing-plans micro-test result: resolved, no change needed 2026-06-16 10:09:47 -07:00
Jesse Vincent
b2872a4a66 Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges) 2026-06-16 10:09:47 -07:00
Jesse Vincent
e9b88d05c8 Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist 2026-06-16 10:09:47 -07:00
Jesse Vincent
4298eac856 Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget
Validated 2026-06-10 (all gates pass): go-fractals 54.1-54.7 min / $12.81-14.31
(baseline 64.9 / $16.07); svelte-todo 55.0 min / 19.3M / $14.99 (baseline
79.7 / 27.3M / $20.98); planted-defect pass $2.77. Dispatch-model discipline
3/3 runs after moving model: into the templates as a REQUIRED line.
Full experiment log: evals docs/experiments/2026-06-10-sdd-cost-experiments.md
2026-06-16 10:09:47 -07:00