superpowers

mirror of https://github.com/obra/superpowers.git synced 2026-07-01 23:19:04 +08:00

Author	SHA1	Message	Date
Drew Ritter	ded04a41e3	fix(codex): suppress SessionStart hook auto-discovery with empty hooks object Codex auto-discovers a plugin's hooks/hooks.json whenever the Codex manifest has no `hooks` field: load_plugin_hooks falls back to a hardcoded DEFAULT_HOOKS_CONFIG_FILE = "hooks/hooks.json" and registers it. hooks/hooks.json is the Claude Code SessionStart hook, it is tracked in this repo, and the Codex marketplace installs the whole repo root (source url "./"), so the fallback re-registered the SessionStart hook and its install-time trust prompt on Codex. Removing the Codex hook file and the manifest `hooks` pointer (commit "Remove Codex hooks") did not disable the hook on Codex — it removed the explicit declaration that was overriding the fallback, so the fallback took over and found the Claude hooks/hooks.json. Declare an empty inline hooks object ({}) in .codex-plugin/plugin.json. It parses as an empty inline hook set and stops Codex reaching the auto-discovery fallback. An absent field, an empty array ([]), and an empty inline list all collapse back to the fallback, so the value must be exactly {}. Update the test to assert the manifest declares hooks: {} (and that hooks/hooks.json exists, which is what makes the declaration necessary), replacing the prior assertion that the field was absent — which passed while the hook was still being auto-discovered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-30 15:38:20 -07:00
Jesse Vincent	8554b7215c	Release v6.1.0: leaner per-session bootstrap, Codex marketplace install, Gemini removed Bump all manifests to 6.1.0 and add RELEASE-NOTES for v6.1.0: - Compress the using-superpowers bootstrap and prune per-harness tool-mapping references (lower per-session token cost). - Add a Codex marketplace manifest so the plugin installs from Codex; drop the Codex SessionStart hook. - Remove Gemini CLI support (Google EOLed the Gemini CLI 2026-06-18).	2026-06-30 10:29:02 -07:00
Jesse Vincent	9c9b9bd7c8	test(codex): assert Codex manifest ships no hooks Commit `1f0c76e` removed the Codex SessionStart hook — dropping the hooks field from .codex-plugin/plugin.json and deleting hooks-codex.json — but left test-marketplace-manifest.sh asserting the old hooks pointer, so the test has failed on dev since. Assert the field is absent instead, locking in the no-Codex-hooks decision.	2026-06-30 10:28:53 -07:00
Jesse Vincent	98b080041d	Compress the using-superpowers bootstrap The bootstrap is injected into every session, so its token cost is paid constantly. Condense it without dropping behavior-shaping content: - Replace the graphviz skill-flow diagram with the prose it encoded (the 1% rule, the plan-mode to brainstorm gate, announce + checklist to todos). - Fold the standalone Instruction-Priority section into User Instructions. - Drop the per-platform 'How to Access Skills' walkthrough. - Trim the Platform Adaptation pointer to the harnesses that still have a reference file (Codex, Pi, Antigravity). Keeps the full Red Flags rationalization table, skill priority framed as process-before-implementation, and user-instruction precedence.	2026-06-24 19:35:57 -07:00
Jesse Vincent	4000288dac	Prune per-harness tool-mapping boilerplate The verbose action-to-tool tables and skill-loading explainers in the per-harness reference files restated guidance modern agents already follow. Trim each file to the harness-specific notes that still carry weight (subagent dispatch, task tracking, instructions-file paths), and delete claude-code-tools.md and copilot-tools.md, which had nothing left that wasn't generic.	2026-06-24 19:35:20 -07:00
Jesse Vincent	6be431b772	Remove Gemini CLI support Google EOLed the Gemini CLI on 2026-06-18; the extension can no longer be installed or updated. Remove Gemini from the install docs, the subagent-capable platform lists, and the eval-harness description, and delete its tool-mapping reference.	2026-06-24 19:34:40 -07:00
Jesse Vincent	1f0c76e0b0	Remove Codex hooks Codex reliably triggers skills on its own, and the SessionStart hook made the UX worse rather than better. Drop the Codex hook config and its registration in the plugin manifest.	2026-06-24 19:33:57 -07:00
Ada Sen	321c8cd24c	fix(codex): stop bootstrap re-firing on resume (match Claude startup\|clear\|compact) Bug: the SessionStart hook matcher in hooks-codex.json included "resume", causing the superpowers bootstrap to re-fire on every Codex session resume. Fix: align with Claude's hooks/hooks.json matcher "startup\|clear\|compact": - drop "resume" (the bug: resume should not trigger re-bootstrap) - add "compact" (so bootstrap re-injects after context compaction, like Claude) Before: "matcher": "startup\|resume\|clear" After: "matcher": "startup\|clear\|compact"	2026-06-23 16:15:56 -07:00
Jesse Vincent	bfa3e4137a	Keep Codex hooks manifest in plugin metadata Prompt: Jesse questioned whether the PR should remove the hooks config from the Codex plugin manifest. Runtime investigation showed Codex accepts a committed plugin manifest with hooks and installs the plugin successfully. Removing the field changes behavior: Codex falls back to the default hooks/hooks.json, which uses the non-Codex session-start hook and CLAUDE_PLUGIN_ROOT path, instead of hooks/hooks-codex.json and the session-start-codex script. Changes: restore .codex-plugin/plugin.json hooks to ./hooks/hooks-codex.json and update the Codex marketplace manifest test to require that Codex-specific hook pointer instead of rejecting hooks. Validation: bash tests/codex/test-marketplace-manifest.sh; scripts/lint-shell.sh tests/codex/test-marketplace-manifest.sh; bash tests/codex-plugin-sync/test-sync-to-codex-plugin.sh; bash tests/kimi/test-plugin-manifest.sh; bash tests/shell-lint/test-lint-shell.sh.	2026-06-22 11:51:28 -07:00
Jesse Vincent	a17aaaef3a	Add Codex marketplace manifest Prompt: Jesse asked for a new worktree off the local superpowers dev branch to add the Codex manifest after diagnosing why github.com/obra/superpowers did not show installable Codex plugins. Root cause: Codex marketplace sources expect a .agents/plugins/marketplace.json at the marketplace root. The superpowers repo only had the Claude marketplace file and the Codex plugin manifest, so Codex could configure the marketplace name but found no installable plugin entries. Changes: add a repo-local Codex marketplace manifest for superpowers-dev that points at this same repository root via the same-root source pattern Codex already accepts; add a focused marketplace manifest test; remove the unsupported hooks field from .codex-plugin/plugin.json so the plugin validator accepts the manifest. Validation: bash tests/codex/test-marketplace-manifest.sh; uv run --with PyYAML python /Users/jesse/.codex/skills/.system/plugin-creator/scripts/validate_plugin.py /Users/jesse/git/superpowers/superpowers/.worktrees/codex-marketplace-manifest; throwaway HOME codex plugin marketplace add/list/add; bash tests/codex-plugin-sync/test-sync-to-codex-plugin.sh; bash tests/kimi/test-plugin-manifest.sh; bash tests/shell-lint/test-lint-shell.sh; scripts/lint-shell.sh tests/codex/test-marketplace-manifest.sh.	2026-06-22 11:51:28 -07:00
Jesse Vincent	896224c4b1	Release v6.0.3: SDD artifacts move out of the .git/ protected path Bump all plugin manifests to 6.0.3. This release moves subagent-driven- development's scratch artifacts (task briefs, implementer reports, review diffs, progress ledger) from .git/sdd/ — which Claude Code denies agent writes to — into a self-ignoring working-tree .superpowers/sdd/ dir, and bumps the brainstorm-server test harness's ws dependency to clear two dependabot alerts. See RELEASE-NOTES.md. v6.0.3	2026-06-18 15:44:22 -07:00
Jesse Vincent	549dee6f64	test(deps): bump ws to ^8.21.0 in brainstorm-server tests Clears two dependabot alerts on the test harness's ws dependency: GHSA-96hv-2xvq-fx4p (high, memory-exhaustion DoS, fixed 8.21.0) and GHSA-58qx-3vcg-4xpx (medium, uninitialized memory disclosure, fixed 8.20.1). Test-only — the shipped brainstorm server hand-rolls its WebSocket framing and does not depend on ws. Suite passes (57/57).	2026-06-18 15:44:22 -07:00
Jesse Vincent	4f9bd3131e	docs: add v6.0.3 release notes for the SDD .git/ workspace fix	2026-06-18 15:44:22 -07:00
Jesse Vincent	caf14aac66	test(sdd): wire test-sdd-workspace.sh into the runner; note git clean -fdx The per-worktree workspace test was added but never registered in run-skill-tests.sh, so it only ran when invoked by hand. Add it to the fast unit-test array alongside the other pure-shell test. Also document, in the Durable Progress section, that the ledger now lives in git-ignored working-tree scratch, so `git clean -fdx` deletes it — recover from `git log` if that happens.	2026-06-18 15:44:22 -07:00
Jesse Vincent	667b2c4a2e	test(sdd): lock in per-worktree workspace isolation (#1780 )	2026-06-18 15:44:22 -07:00
Jesse Vincent	93b8444b51	fix(sdd): write artifacts to working-tree .superpowers/sdd, not .git/ (#1780 )	2026-06-18 15:44:22 -07:00
Jesse Vincent	207a12b203	feat(sdd): add sdd-workspace helper for a self-ignoring artifact dir	2026-06-18 15:44:22 -07:00
Jesse Vincent	b62616fc12	Release v6.0.2: stop shipping the evals submodule It broke plugin installs for some users (#1778, #1774). The eval harness now lives in its own repo, separate from the published plugin. v6.0.2	2026-06-16 22:42:19 -07:00
Jesse Vincent	a21956e48c	Release v6.0.1: Codex fixes - Brainstorm companion reads version from .codex-plugin/plugin.json when package.json is absent (PRI-2240) - sync-to-codex script excludes .gitmodules and .pre-commit-config.yaml (PRI-1168)	2026-06-16 17:02:33 -07:00
Drew Ritter	29c0b1b7db	fix: read Codex plugin version from manifest (PRI-2240)	2026-06-16 17:02:33 -07:00
Drew Ritter	cf32920d3a	fix: exclude repo metadata from Codex sync (PRI-1168)	2026-06-16 17:02:33 -07:00
Jesse Vincent	284be5905e	Set v6.0.0 release date to 2026-06-16 v6.0.0	2026-06-16 10:09:47 -07:00
Jesse Vincent	77879bbb91	Bump evals submodule: unify per-agent bootstrap scenarios Points evals at superpowers-evals 70a245c, which replaces the seven per-agent *-superpowers-bootstrap scenarios with one cross-agent superpowers-bootstrap scenario (adds the QUORUM_CODING_AGENT env var and the bootstrap-installed dispatcher check verb).	2026-06-16 10:09:47 -07:00
Jesse Vincent	c5a965101b	Bump version to 6.0.0	2026-06-16 10:09:47 -07:00
Drew Ritter	b3ee712d3a	Add visual companion Prime Radiant branding	2026-06-16 10:09:47 -07:00
Jesse Vincent	9c61797773	Draft Superpowers 6 release notes	2026-06-16 10:09:47 -07:00
Jesse Vincent	b61b55013a	E37: pre-flight plan review — surface plan conflicts as one batched question before Task 1	2026-06-16 10:09:47 -07:00
Jesse Vincent	be400204b3	Spec: L2b tested — opus structural win, sonnet transmission+attention gap (E35/E36); bump evals to 9919b27	2026-06-16 10:09:47 -07:00
Jesse Vincent	530476fd00	L2b: plan-mandated defects are findings the human adjudicates Reviewer tripwire (Calibration): a plan-mandated defect IS a finding, reported as Important and labeled plan-mandated — the plan's authorship does not grade its own work. Controller rule (review loop): a plan-mandated finding, or any finding conflicting with the plan's text, escalates to the human like any plan contradiction — never dismissed because the plan mandates it. E35 micro (frozen 0a98 replay, sonnet reviewer, 6v6): without the tripwire 0/6 reports give the controller anything to escalate on (all Approved, defect endorsed as spec-required); with it 6/6 report the defect as a labeled finding.	2026-06-16 10:09:47 -07:00
Jesse Vincent	e97faafb5a	E27 stack: conditional impl tier + final-review tier pin + narration recipe + terse reviewer contract	2026-06-16 10:09:47 -07:00
Jesse Vincent	cfe48c28ac	E03: cheapest-tier implementers when plan carries complete code (transcription hypothesis)	2026-06-16 10:09:47 -07:00
Jesse Vincent	8bcefb12cb	Strict-cost spec: L2 final — died at gates; explicit escalation holds at sonnet, implicit adjudication does not	2026-06-16 10:09:47 -07:00
Jesse Vincent	8e1262a3ba	writing-plans: task right-sizing, Global Constraints header, per-task Interfaces blocks Claims are fidelity and variance, not dollars (full attribution in the superpowers-evals experiment log, 2026-06-11 L1 entry): - Global Constraints header: 0/5 -> 5/5 adoption in micro-tests, exact values verbatim; makes constraints mechanically propagatable to briefs and reviewers (a version-floor violation class shipped because they weren't). The one fix wave in the elicited full runs was a version-floor catch this header enabled. - Per-task Interfaces blocks: 0 -> 100% of tasks, exact signatures, within-plan consistent; removes the controller's per-dispatch interface re-derivation. - Task right-sizing: 9.4 -> 8.4 mean tasks at svelte scale (kills standalone Types/README micro-tasks); no effect at small scale. - End-to-end (opus-written plan executed under SDD): guidance plan ran 1 fix wave vs control's 2-4 (control plan shipped a real Sierpinski bug); execution cost equal within noise.	2026-06-16 10:09:47 -07:00
Jesse Vincent	de4672b171	Constraints block is the reviewer's attention lens: copy spec verbatim, never improvise process rules E30 replay: the planted-DRY catch is causally determined by the controller-composed constraints block (0/6 with process-shaped vs 5/6 with the spec's own wording). E31 micro: this recipe doubles the rate at which composed blocks carry the spec's cross-component relationship (6/6 vs 3/6). Affects dev and the redesign equally (E29: both 4/5).	2026-06-16 10:09:47 -07:00
Jesse Vincent	25192df30b	Strict-cost spec: L1 final — cost win re-attributed to complete-code plans; guidance owns fidelity/variance	2026-06-16 10:09:47 -07:00
Jesse Vincent	f5e8df4252	Strict-cost spec: L2 recon n=2 (sonnet controller $6.68/$8.05, judgment clean, escalation points unstressed)	2026-06-16 10:09:47 -07:00
Jesse Vincent	b5b3b5d99c	Strict-cost spec: record batch A-E rung verdicts (L1 validated, L2 recon positive, L3 dead)	2026-06-16 10:09:47 -07:00
Jesse Vincent	30bbeefe89	Spec: strict-cost SDD experiment ladder — judgment as co-invariant, plan-side crispness first	2026-06-16 10:09:47 -07:00
Jesse Vincent	d3dd1ecc7d	Record writing-plans micro-test result: resolved, no change needed	2026-06-16 10:09:47 -07:00
Jesse Vincent	b2872a4a66	Spec: record iterations 4-5 (variance honesty, structural fixes, final validated ranges)	2026-06-16 10:09:47 -07:00
Jesse Vincent	e9b88d05c8	Adopt audited positive phrasings: evidence rule leads positive; fix-report completeness as checklist	2026-06-16 10:09:47 -07:00
Jesse Vincent	4298eac856	Land eval-tuned combo: file handoffs, progress ledger, final-review package, REQUIRED model lines, reviewer risk budget Validated 2026-06-10 (all gates pass): go-fractals 54.1-54.7 min / $12.81-14.31 (baseline 64.9 / $16.07); svelte-todo 55.0 min / 19.3M / $14.99 (baseline 79.7 / 27.3M / $20.98); planted-defect pass $2.77. Dispatch-model discipline 3/3 runs after moving model: into the templates as a REQUIRED line. Full experiment log: evals docs/experiments/2026-06-10-sdd-cost-experiments.md	2026-06-16 10:09:47 -07:00
Jesse Vincent	69a00350ff	Spec: positive-instruction redesign — audit results, micro-test method, writing-plans variants	2026-06-16 10:09:47 -07:00
Jesse Vincent	d7a8c07fe3	Shared: unique review-package collateral names	2026-06-16 10:09:47 -07:00
Jesse Vincent	c30d822efe	Add review-package script; close fix-dispatch test gap scripts/review-package generates the reviewer's input deterministically: commit list, stat summary, and net diff with -U10 context, written to a file from an explicit BASE. Live runs showed controllers improvising 'git diff HEAD~1..HEAD', which silently truncates multi-commit tasks, and svelte's five fix dispatches shipped without re-running any tests — fix dispatches now explicitly carry the implementer's re-run-and-report contract.	2026-06-16 10:09:47 -07:00
Jesse Vincent	68c9ddb870	Describe the review design as current state, not as a delta The skill read as a changelog: 'combined task review,' 'one reviewer, one reading,' 'one dispatch,' and an example still showing diffs pasted into prompts. A reader who never saw the two-reviewer design has no referent for 'combined.' Prose now states the design directly, and the flowchart/example reflect the diff-file handoff.	2026-06-16 10:09:47 -07:00
Jesse Vincent	aa80399355	Spec: record iterations 2-3 results and final frozen-config matrix	2026-06-16 10:09:47 -07:00
Jesse Vincent	ee656563c9	Hand reviewers the diff as a file, not a paste Paste adoption stayed at 0/15 even as a Red Flag — and the controller's reluctance is locally rational: pasting loads the diff into the (most expensive) controller context permanently, while a reviewer self-fetch costs a few cheap turns. The diff-file handoff is cheap for both sides: the controller redirects git diff to /tmp without reading it, and the reviewer gets the whole change in one Read call.	2026-06-16 10:09:47 -07:00
Jesse Vincent	3280a32259	Reviewer skepticism covers the implementer's design rationales Fourth planted-defect failure mode: the implementer's self-report said 'noted mild structural duplication; left unabstracted per YAGNI' and the reviewer deferred to that framing, rating the duplication no finding at all. The pre-judging keeps relocating — controller prompt, then reviewer calibration, now the implementer's report. Rationales are claims; they never downgrade severity.	2026-06-16 10:09:47 -07:00
Jesse Vincent	84d033e967	Make diff-pasting non-optional for task reviewer dispatch Adoption was 6/11 reviews on fractals and 0/17 on svelte when phrased as guidance; reviewers without the diff re-derive it by hand, which is the single largest remaining reviewer cost. Now a Red Flags Never entry and a REQUIRED marker on the template placeholder.	2026-06-16 10:09:47 -07:00

1 2 3 4 5 ...

619 Commits