mirror of
https://github.com/obra/superpowers.git
synced 2026-06-12 13:49:05 +08:00
Compare commits
1 Commits
writing-sk
...
codex/shel
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
cb6bdf9dd3 |
7
.github/ISSUE_TEMPLATE/bug_report.md
vendored
7
.github/ISSUE_TEMPLATE/bug_report.md
vendored
@@ -12,17 +12,14 @@ add a comment or reaction to the existing one instead.
|
||||
|
||||
- [ ] I searched existing issues and this is not a duplicate
|
||||
|
||||
## Environment (required)
|
||||
<!-- Required. We assume an agent filed this report — tell us which one and
|
||||
where it ran. We weigh reports by what produced them. -->
|
||||
## Environment
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Superpowers version | |
|
||||
| Harness (Claude Code, Cursor, etc.) | |
|
||||
| Harness version | |
|
||||
| Your model + version | |
|
||||
| All plugins installed | |
|
||||
| Model | |
|
||||
| OS + shell | |
|
||||
|
||||
## Is this a Superpowers issue or a platform issue?
|
||||
|
||||
15
.github/ISSUE_TEMPLATE/feature_request.md
vendored
15
.github/ISSUE_TEMPLATE/feature_request.md
vendored
@@ -30,18 +30,5 @@ progress, and some were intentionally declined.
|
||||
of project? If this is specific to your domain, workflow, or a
|
||||
third-party tool, it may belong as its own plugin instead. -->
|
||||
|
||||
## Environment (required)
|
||||
<!-- Required. We assume an agent wrote this request — tell us which one and
|
||||
where it ran. We weigh proposals reasoned from documentation differently
|
||||
than ones grounded in a real session where the problem actually came up. -->
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Superpowers version | |
|
||||
| Harness (Claude Code, Cursor, etc.) | |
|
||||
| Harness version | |
|
||||
| Your model + version | |
|
||||
| All plugins installed | |
|
||||
|
||||
## Context
|
||||
<!-- Optional: the workflow where you hit this, links, transcripts. -->
|
||||
<!-- Optional: version info, harness, model, workflow where you hit this. -->
|
||||
|
||||
11
.github/ISSUE_TEMPLATE/platform_support.md
vendored
11
.github/ISSUE_TEMPLATE/platform_support.md
vendored
@@ -21,14 +21,3 @@ requested or discussed.
|
||||
## Have you tried manual installation?
|
||||
<!-- Many tools work with Superpowers through manual setup even without
|
||||
official support. Did you try? What happened? -->
|
||||
|
||||
## Environment (required)
|
||||
<!-- Required. We assume an agent wrote this request — tell us which one and
|
||||
where it ran. -->
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Harness you currently use (Claude Code, Cursor, etc.) | |
|
||||
| Harness version | |
|
||||
| Your model + version | |
|
||||
| All plugins installed | |
|
||||
|
||||
17
.github/PULL_REQUEST_TEMPLATE.md
vendored
17
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -4,23 +4,6 @@ sections blank, contain multiple unrelated changes, or show no evidence
|
||||
of human involvement will be closed without review.
|
||||
-->
|
||||
|
||||
> **This PR MUST target the `dev` branch, not `main`.** `main` is the
|
||||
> released branch; active work lands on `dev` first. PRs opened against
|
||||
> `main` will be asked to retarget `dev` before review.
|
||||
|
||||
## Who is submitting this PR? (required)
|
||||
<!-- Required. PRs that omit this will be closed. We assume an agent wrote
|
||||
this PR — tell us which one and where it ran. We weigh contributions by
|
||||
what produced them: content reasoned from documentation is held to a
|
||||
different bar than work grounded in a real session. -->
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Your model + version | |
|
||||
| Harness + version | |
|
||||
| All plugins installed | |
|
||||
| Human partner who reviewed this diff | |
|
||||
|
||||
## What problem are you trying to solve?
|
||||
<!-- Describe the specific problem you encountered. If this was a session
|
||||
issue, include: what you were doing, what went wrong, the model's
|
||||
|
||||
@@ -14,8 +14,7 @@ Before you open a PR against this repo, you MUST:
|
||||
2. **Search for existing PRs** — open AND closed — that address the same problem. If duplicates exist, STOP and tell your human partner. Do not open another duplicate.
|
||||
3. **Verify this is a real problem.** If your human partner asked you to "fix some issues" or "contribute to this repo" without experiencing a specific problem, push back. Ask them what broke, what failed, what the user experience was.
|
||||
4. **Confirm the change belongs in core.** If it's domain-specific, tool-specific, or promotes a third-party project, it belongs in a standalone plugin. Tell your human partner.
|
||||
5. **Identify yourself.** Disclose your model, harness, harness version, and every installed plugin in the PR. Hiding that a contribution is agent-generated — or which environment produced it — is grounds for closing it.
|
||||
6. **Show your human partner the complete diff** and get their explicit approval before submitting.
|
||||
5. **Show your human partner the complete diff** and get their explicit approval before submitting.
|
||||
|
||||
If any of these checks fail, do not open the PR. Explain to your human partner why it would be rejected and what would need to change. They will thank you for saving them the embarrassment.
|
||||
|
||||
@@ -27,10 +26,6 @@ If any of these checks fail, do not open the PR. Explain to your human partner w
|
||||
|
||||
**PRs that show no evidence of human involvement will be closed.** A human must review the complete proposed diff before submission.
|
||||
|
||||
**Submitters MUST identify themselves.** Every PR and issue must disclose the model, harness, harness version, and all installed plugins used to produce the contribution — or state plainly that it was written by hand with no agent. This is not optional. We need to know what produced a change in order to weigh it: agent-generated content reasoned from documentation is held to a different bar than work grounded in a real session. Contributions that hide their authoring environment will be closed.
|
||||
|
||||
**All PRs MUST target the `dev` branch, not `main`.** `main` is the released branch; active work lands on `dev` first. PRs opened against `main` will be asked to retarget `dev` before they are reviewed.
|
||||
|
||||
## What We Will Not Accept
|
||||
|
||||
### Third-party dependencies
|
||||
|
||||
2
evals
2
evals
Submodule evals updated: f8e5a9949f...e2b37138c8
@@ -97,7 +97,7 @@ if [[ -f "$PID_FILE" ]]; then
|
||||
rm -f "$PID_FILE"
|
||||
fi
|
||||
|
||||
cd "$SCRIPT_DIR"
|
||||
cd "$SCRIPT_DIR" || exit
|
||||
|
||||
# Resolve the harness PID (grandparent of this script).
|
||||
# $PPID is the ephemeral shell the harness spawned to run us — it dies
|
||||
@@ -135,7 +135,7 @@ disown "$SERVER_PID" 2>/dev/null
|
||||
echo "$SERVER_PID" > "$PID_FILE"
|
||||
|
||||
# Wait for server-started message (check log file)
|
||||
for i in {1..50}; do
|
||||
for _ in {1..50}; do
|
||||
if grep -q "server-started" "$LOG_FILE" 2>/dev/null; then
|
||||
# Verify server is still alive after a short window (catches process reapers)
|
||||
alive="true"
|
||||
|
||||
@@ -23,7 +23,7 @@ if [[ -f "$PID_FILE" ]]; then
|
||||
kill "$pid" 2>/dev/null || true
|
||||
|
||||
# Wait for graceful shutdown (up to ~2s)
|
||||
for i in {1..20}; do
|
||||
for _ in {1..20}; do
|
||||
if ! kill -0 "$pid" 2>/dev/null; then
|
||||
break
|
||||
fi
|
||||
|
||||
@@ -456,29 +456,10 @@ Different skill types need different test approaches:
|
||||
|
||||
**All of these mean: Test before deploying. No exceptions.**
|
||||
|
||||
## Match the Form to the Failure
|
||||
|
||||
Before writing guidance, classify the baseline failure. The form that bulletproofs one failure type measurably backfires on another.
|
||||
|
||||
| Baseline failure | Right form | Wrong form |
|
||||
|---|---|---|
|
||||
| Skips/violates a rule under pressure (knows better, does it anyway) | Prohibition + rationalization table + red flags (see Bulletproofing below) | Soft guidance ("prefer...", "consider...") |
|
||||
| Complies, but output has the wrong shape (bloated prompt, buried verdict, restated spec) | Positive recipe or contract: state what the output IS — its parts, in order | Prohibition list ("don't restate", "never narrate") |
|
||||
| Omits a required element from something they already produce | Structural: REQUIRED field or slot in the template they fill in | Prose reminders near the template |
|
||||
| Behavior should depend on a condition | Conditional keyed to an observable predicate ("if the brief exists, reference it") | Unconditional rule + exemption clauses |
|
||||
|
||||
**Why prohibitions backfire on shaping problems:** under a competing incentive ("make the prompt self-contained"), agents negotiate with "don't X". In head-to-head wording tests on dispatch-prompt guidance, the prohibition arm produced clearly more of the unwanted content than the recipe arm (fully separated distributions), and trended worse than even the no-guidance control — micro-test your own case rather than assuming, but never reach for the prohibition by default. A recipe leaves nothing to negotiate: the output matches the stated shape or it doesn't.
|
||||
|
||||
**Rules for whichever form you pick:**
|
||||
- **No nuance clauses.** "Don't X unless it matters" reopens the negotiation — appending a single nuance clause to a winning recipe degraded it from consistent to noisy in the same wording tests. Express a real exception as its own conditional on an observable predicate.
|
||||
- **Exemption clauses don't scope.** "This limit doesn't apply to code blocks" still suppresses code blocks. If part of the output must be exempt, restructure so the rule can't reach it.
|
||||
|
||||
## Bulletproofing Skills Against Rationalization
|
||||
|
||||
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
|
||||
|
||||
**Scope:** this toolkit is for discipline failures — an agent that knows the rule and skips it under pressure. For wrong-shaped output or omitted elements, prohibition-based bulletproofing backfires; use the forms in Match the Form to the Failure instead.
|
||||
|
||||
**Psychology note:** Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
|
||||
|
||||
### Close Every Loophole Explicitly
|
||||
@@ -572,18 +553,6 @@ Run same scenarios WITH skill. Agent should now comply.
|
||||
|
||||
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
|
||||
|
||||
### Micro-Test Wording Before Full Scenarios
|
||||
|
||||
Full pressure-scenario runs are the final gate, but they are slow and expensive per iteration. Verify the wording itself first with micro-tests:
|
||||
|
||||
1. **One fresh-context sample per call** — a raw API call, or a single-shot subagent if you don't have API access. System prompt = the realistic context the guidance will live in (the full skill or prompt template, not the guidance in isolation); user message = a task that tempts the failure.
|
||||
2. **Always include a no-guidance control.** If the control doesn't exhibit the failure, there is nothing to fix — stop, don't author the guidance.
|
||||
3. **5+ reps per variant.** Single samples lie.
|
||||
4. **Manually read every flagged match.** Score programmatically if you like, but template echoes and quoted counter-examples masquerade as hits; automated counts alone overstate both failure and success.
|
||||
5. **Variance is a metric.** When guidance lands, reps converge on the same shape. Five different interpretations across five reps means the wording isn't binding — tighten the form before adding words.
|
||||
|
||||
Micro-tests verify wording; they do not replace pressure scenarios for discipline skills.
|
||||
|
||||
**Testing methodology:** See [testing-skills-with-subagents.md](testing-skills-with-subagents.md) for the complete testing methodology:
|
||||
- How to write pressure scenarios
|
||||
- Pressure types (time, sunk cost, authority, exhaustion)
|
||||
@@ -641,8 +610,6 @@ Deploying untested skills = deploying untested code. It's a violation of quality
|
||||
- [ ] Keywords throughout for search (errors, symptoms, tools)
|
||||
- [ ] Clear overview with core principle
|
||||
- [ ] Address specific baseline failures identified in RED
|
||||
- [ ] Guidance form matches the failure type (see Match the Form to the Failure)
|
||||
- [ ] For behavior-shaping guidance: wording micro-tested against a no-guidance control (5+ reps, every flagged match read manually) — N/A for pure reference skills
|
||||
- [ ] Code inline OR link to separate file
|
||||
- [ ] One excellent example (not multi-language)
|
||||
- [ ] Run scenarios WITH skill - verify agents now comply
|
||||
|
||||
@@ -7,7 +7,8 @@ run_claude() {
|
||||
local prompt="$1"
|
||||
local timeout="${2:-60}"
|
||||
local allowed_tools="${3:-}"
|
||||
local output_file=$(mktemp)
|
||||
local output_file
|
||||
output_file="$(mktemp)"
|
||||
|
||||
# Build command as an argv array so timeout wraps claude directly.
|
||||
local cmd=(claude -p "$prompt")
|
||||
@@ -74,7 +75,8 @@ assert_count() {
|
||||
local expected="$3"
|
||||
local test_name="${4:-test}"
|
||||
|
||||
local actual=$(echo "$output" | grep -c "$pattern" || echo "0")
|
||||
local actual
|
||||
actual="$(echo "$output" | grep -c "$pattern" || true)"
|
||||
|
||||
if [ "$actual" -eq "$expected" ]; then
|
||||
echo " [PASS] $test_name (found $actual instances)"
|
||||
@@ -98,8 +100,10 @@ assert_order() {
|
||||
local test_name="${4:-test}"
|
||||
|
||||
# Get line numbers where patterns appear
|
||||
local line_a=$(echo "$output" | grep -n "$pattern_a" | head -1 | cut -d: -f1)
|
||||
local line_b=$(echo "$output" | grep -n "$pattern_b" | head -1 | cut -d: -f1)
|
||||
local line_a
|
||||
local line_b
|
||||
line_a="$(echo "$output" | grep -n "$pattern_a" | head -1 | cut -d: -f1 || true)"
|
||||
line_b="$(echo "$output" | grep -n "$pattern_b" | head -1 | cut -d: -f1 || true)"
|
||||
|
||||
if [ -z "$line_a" ]; then
|
||||
echo " [FAIL] $test_name: pattern A not found: $pattern_a"
|
||||
@@ -125,7 +129,8 @@ assert_order() {
|
||||
# Create a temporary test project directory
|
||||
# Usage: test_project=$(create_test_project)
|
||||
create_test_project() {
|
||||
local test_dir=$(mktemp -d)
|
||||
local test_dir
|
||||
test_dir="$(mktemp -d)"
|
||||
echo "$test_dir"
|
||||
}
|
||||
|
||||
|
||||
@@ -37,7 +37,10 @@ TEST_PROJECT=$(create_test_project)
|
||||
echo "Test project: $TEST_PROJECT"
|
||||
|
||||
# Trap to cleanup
|
||||
trap "cleanup_test_project $TEST_PROJECT" EXIT
|
||||
cleanup_integration_test_project() {
|
||||
cleanup_test_project "$TEST_PROJECT"
|
||||
}
|
||||
trap cleanup_integration_test_project EXIT
|
||||
|
||||
# Set up minimal Node.js project
|
||||
cd "$TEST_PROJECT"
|
||||
@@ -164,12 +167,19 @@ PLUGIN_DIR=$(cd "$SCRIPT_DIR/../.." && pwd)
|
||||
# other concurrent claude sessions.
|
||||
echo "Running Claude (plugin-dir: $PLUGIN_DIR, cwd: $TEST_PROJECT)..."
|
||||
echo "================================================================================"
|
||||
cd "$TEST_PROJECT" && timeout 1800 claude -p "$PROMPT" --plugin-dir "$PLUGIN_DIR" --allowed-tools=all --permission-mode bypassPermissions 2>&1 | tee "$OUTPUT_FILE" || {
|
||||
set +e
|
||||
(
|
||||
cd "$TEST_PROJECT" &&
|
||||
timeout 1800 claude -p "$PROMPT" --plugin-dir "$PLUGIN_DIR" --allowed-tools=all --permission-mode bypassPermissions
|
||||
) 2>&1 | tee "$OUTPUT_FILE"
|
||||
execution_status=$?
|
||||
set -e
|
||||
if [[ "$execution_status" -ne 0 ]]; then
|
||||
echo ""
|
||||
echo "================================================================================"
|
||||
echo "EXECUTION FAILED (exit code: $?)"
|
||||
echo "EXECUTION FAILED (exit code: $execution_status)"
|
||||
exit 1
|
||||
}
|
||||
fi
|
||||
echo "================================================================================"
|
||||
|
||||
echo ""
|
||||
|
||||
@@ -47,16 +47,20 @@ assert_not_contains() {
|
||||
echo "=== Worktree Path Policy Test ==="
|
||||
echo ""
|
||||
|
||||
assert_not_contains "$USING_SKILL" "~/.config/superpowers/worktrees" "using-git-worktrees does not mention old global path"
|
||||
# Intentionally search for the literal legacy path, not the current user's home.
|
||||
# shellcheck disable=SC2088
|
||||
legacy_global_worktree_path="~/.config/superpowers/worktrees"
|
||||
|
||||
assert_not_contains "$USING_SKILL" "$legacy_global_worktree_path" "using-git-worktrees does not mention old global path"
|
||||
assert_not_contains "$USING_SKILL" "global legacy" "using-git-worktrees does not use unclear global legacy shorthand"
|
||||
assert_not_contains "$USING_SKILL" "Global path" "using-git-worktrees has no global path quick-reference row"
|
||||
assert_contains "$USING_SKILL" 'default to `.worktrees/` at the project root' "using-git-worktrees defaults new manual worktrees to .worktrees/"
|
||||
|
||||
assert_not_contains "$FINISHING_SKILL" "~/.config/superpowers/worktrees" "finishing-a-development-branch does not treat old global path as owned"
|
||||
assert_not_contains "$FINISHING_SKILL" "$legacy_global_worktree_path" "finishing-a-development-branch does not treat old global path as owned"
|
||||
assert_contains "$FINISHING_SKILL" '`.worktrees/` or `worktrees/`' "finishing-a-development-branch keeps project-local cleanup ownership"
|
||||
|
||||
assert_not_contains "$ROTOTILL_SPEC" "~/.config/superpowers/worktrees" "rototill spec does not preserve old global path policy"
|
||||
assert_not_contains "$ROTOTILL_PLAN" "~/.config/superpowers/worktrees" "rototill plan does not preserve old global path policy"
|
||||
assert_not_contains "$ROTOTILL_SPEC" "$legacy_global_worktree_path" "rototill spec does not preserve old global path policy"
|
||||
assert_not_contains "$ROTOTILL_PLAN" "$legacy_global_worktree_path" "rototill plan does not preserve old global path policy"
|
||||
assert_not_contains "$ROTOTILL_PLAN" "legacy path compat" "rototill plan does not advertise legacy path compatibility"
|
||||
|
||||
echo ""
|
||||
|
||||
Reference in New Issue
Block a user