fix(skills): brainstorming gate exempts nothing-to-design requests; description exceptions are authoritative (SUP-333 C)

Consolidates the brainstorming exception with its routing-layer semantics, so this PR is independently mergeable (previously split across two stacked PRs whose intermediate state left the always- injected routing text contradicting the shipped description). brainstorming: the nothing-to-design exception, earned by a tripwire scan stated in one line before acting. Tripwires precede the permission (skimmers stop at "implement directly"); security-posture touches re-gate even with the exact value stated; requested deletions re-gate; rationalization table per writing-skills bulletproofing. Description 971/1024 chars, YAML-validated. using-superpowers: description-level exceptions are authoritative (compliance, not rationalization); doubt means invoke; only the description can define one; the skip must state its scan; flowchart routes the exempt path through the scan statement; <EXTREMELY-IMPORTANT> defers in one parenthetical. writing-skills: negative triggering conditions are scope (allowed, required at the description) vs workflow summaries (still forbidden) — prevents a future checklist pass from stripping the exception. Eval evidence (quorum): RED cost-checkbox-over-trigger failed 5/6 agents (pi ⊘); GREEN claude 3/3, codex ✓, antigravity ✓ (kimi unchanged from baseline — does not read description exceptions); gate-still-fires: brainstorming-resists 2/2 + codex, spec-plan brainstorm leg 3/3. Boundary scenarios (security one-liner, requested deletion): pre-stack dev baseline 0/3 + 0/3 (silent edit every time — the blanket gate never fired on one-liners); this text 2/3 + 2/3, the first text in the corpus to catch these at any rate; scenarios ship as regression instruments (proposed in prime-radiant-inc/superpowers-evals#11, open). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Canary-caught addition: on the assembled text, triggering-writing-plans went 0/3 with claude citing "your explicit instruction wins per the priority rules" to skip writing-plans under the scenario's "don't ask me any questions" pressure — the Instruction Priority section read as licensing ad-hoc pressure to skip workflow steps. User Instructions now distinguishes pressure phrasing (changes interaction style) from instructions that name what to skip (honored), and tags the quoted rationalization.
chore(evals): bump submodule for Claude Haiku target
2026-06-12 13:49:05 +08:00 · 2026-06-11 00:36:41 -07:00 · 2026-06-10 16:31:16 -07:00 · 2026-06-08 22:14:34 -07:00 · 2026-06-02 22:46:00 -07:00 · 2026-06-02 16:27:35 -07:00
14 changed files with 100 additions and 46 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -12,14 +12,17 @@ add a comment or reaction to the existing one instead.
 - [ ] I searched existing issues and this is not a duplicate
-## Environment
+## Environment (required)
 <!-- Required. We assume an agent filed this report — tell us which one and
     where it ran. We weigh reports by what produced them. -->
 | Field | Value |
 |-------|-------|
 | Superpowers version | |
 | Harness (Claude Code, Cursor, etc.) | |
 | Harness version | |
-| Model | |
+| Your model + version | |
 | All plugins installed | |
 | OS + shell | |
 ## Is this a Superpowers issue or a platform issue?
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -30,5 +30,18 @@ progress, and some were intentionally declined.
     of project? If this is specific to your domain, workflow, or a
     third-party tool, it may belong as its own plugin instead. -->
 ## Environment (required)
 <!-- Required. We assume an agent wrote this request — tell us which one and
     where it ran. We weigh proposals reasoned from documentation differently
     than ones grounded in a real session where the problem actually came up. -->
 | Field | Value |
 |-------|-------|
 | Superpowers version | |
 | Harness (Claude Code, Cursor, etc.) | |
 | Harness version | |
 | Your model + version | |
 | All plugins installed | |
 ## Context
-<!-- Optional: version info, harness, model, workflow where you hit this. -->
+<!-- Optional: the workflow where you hit this, links, transcripts. -->
--- a/.github/ISSUE_TEMPLATE/platform_support.md
+++ b/.github/ISSUE_TEMPLATE/platform_support.md
@@ -21,3 +21,14 @@ requested or discussed.
 ## Have you tried manual installation?
 <!-- Many tools work with Superpowers through manual setup even without
     official support. Did you try? What happened? -->
 ## Environment (required)
 <!-- Required. We assume an agent wrote this request — tell us which one and
     where it ran. -->
 | Field | Value |
 |-------|-------|
 | Harness you currently use (Claude Code, Cursor, etc.) | |
 | Harness version | |
 | Your model + version | |
 | All plugins installed | |
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -4,6 +4,23 @@ sections blank, contain multiple unrelated changes, or show no evidence
 of human involvement will be closed without review.
 -->
 > **This PR MUST target the `dev` branch, not `main`.** `main` is the
 > released branch; active work lands on `dev` first. PRs opened against
 > `main` will be asked to retarget `dev` before review.
 ## Who is submitting this PR? (required)
 <!-- Required. PRs that omit this will be closed. We assume an agent wrote
     this PR — tell us which one and where it ran. We weigh contributions by
     what produced them: content reasoned from documentation is held to a
     different bar than work grounded in a real session. -->
 | Field | Value |
 |-------|-------|
 | Your model + version | |
 | Harness + version | |
 | All plugins installed | |
 | Human partner who reviewed this diff | |
 ## What problem are you trying to solve?
 <!-- Describe the specific problem you encountered. If this was a session
     issue, include: what you were doing, what went wrong, the model's
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -14,7 +14,8 @@ Before you open a PR against this repo, you MUST:
 2. **Search for existing PRs** — open AND closed — that address the same problem. If duplicates exist, STOP and tell your human partner. Do not open another duplicate.
 3. **Verify this is a real problem.** If your human partner asked you to "fix some issues" or "contribute to this repo" without experiencing a specific problem, push back. Ask them what broke, what failed, what the user experience was.
 4. **Confirm the change belongs in core.** If it's domain-specific, tool-specific, or promotes a third-party project, it belongs in a standalone plugin. Tell your human partner.
-5. **Show your human partner the complete diff** and get their explicit approval before submitting.
+5. **Identify yourself.** Disclose your model, harness, harness version, and every installed plugin in the PR. Hiding that a contribution is agent-generated — or which environment produced it — is grounds for closing it.
 6. **Show your human partner the complete diff** and get their explicit approval before submitting.
 If any of these checks fail, do not open the PR. Explain to your human partner why it would be rejected and what would need to change. They will thank you for saving them the embarrassment.
@@ -26,6 +27,10 @@ If any of these checks fail, do not open the PR. Explain to your human partner w
 **PRs that show no evidence of human involvement will be closed.** A human must review the complete proposed diff before submission.
 **Submitters MUST identify themselves.** Every PR and issue must disclose the model, harness, harness version, and all installed plugins used to produce the contribution — or state plainly that it was written by hand with no agent. This is not optional. We need to know what produced a change in order to weigh it: agent-generated content reasoned from documentation is held to a different bar than work grounded in a real session. Contributions that hide their authoring environment will be closed.
 **All PRs MUST target the `dev` branch, not `main`.** `main` is the released branch; active work lands on `dev` first. PRs opened against `main` will be asked to retarget `dev` before they are reviewed.
 ## What We Will Not Accept
 ### Third-party dependencies
--- a/2
+++ b/2
--- a/skills/brainstorming/SKILL.md
+++ b/skills/brainstorming/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: brainstorming
-description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation."
+description: "You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation. The one exception (nothing-to-design) must be EARNED by a tripwire scan first - invoke this skill if the change: adds a file or dependency; touches a schema, API contract, or persisted data (even when the user stated the outcome); deletes or disables working functionality (even when asked); touches security posture at all (auth, sessions, timeouts, permissions, CORS, crypto - even with the exact value stated); alters user-visible behavior beyond the stated change; has more than one plausible reading; or is framed as a feature or project. Only when NO tripwire hits and the outcome is fully specified (e.g. 'add a basic checkbox, nothing fancy' where context leaves nothing to choose): state your scan in one line, then implement directly without invoking this skill."
 ---
 # Brainstorming Ideas Into Designs
@@ -10,12 +10,22 @@ Help turn ideas into fully formed designs and specs through natural collaborativ
 Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design and get user approval.
 <HARD-GATE>
-Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity.
+Do NOT invoke any implementation skill, write any code, scaffold any project, or take any implementation action until you have presented a design and the user has approved it. This applies to EVERY project regardless of perceived simplicity, with exactly one exception.
 Exception — nothing to design: when the exception in this skill's description applies (zero open design decisions; its tripwire list puts the gate back on), implement directly. TDD and verification-before-completion still apply. Brainstorming exists to surface decisions; when there are none, the user's request IS the design. Any doubt: the gate holds.
 </HARD-GATE>
 ## Anti-Pattern: "This Is Too Simple To Need A Design"
-Every project goes through this process. A todo list, a single-function utility, a config change — all of them. "Simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but you MUST present it and get approval.
+Anything with open decisions goes through this process. A todo list, a single-function utility, a data migration — "simple" projects are where unexamined assumptions cause the most wasted work. The design can be short (a few sentences for truly simple projects), but if anything remains to decide, you MUST present it and get approval. Do not confuse this with the nothing-to-design exception above: "this seems simple, I'll skip the design" is a rationalization whenever decisions exist.
 | Excuse | Reality |
 |--------|---------|
 | "The codebase has an established pattern, so nothing is open" | A pattern answers HOW, not WHETHER or WHAT. Those decisions are still open unless the user made them. |
 | "I can infer the obvious choice" | If there is a choice to infer, a decision is open. Invoke. |
 | "The user said keep it simple / nothing fancy" | A hedge describes the solution's size, not the request's completeness. Check what remains undecided, not the tone. |
 | "Asking would waste the user's time" | One design question costs seconds; an unexamined assumption costs a rewrite. |
 | "The user already made that decision — they told me to delete it" | A requested deletion still has consequences the user may not have weighed (working feature, no usage data, alternatives). Surface them first; the tripwire applies to requested deletions. |
 ## Checklist
--- a/skills/brainstorming/scripts/start-server.sh
+++ b/skills/brainstorming/scripts/start-server.sh
@@ -97,7 +97,7 @@ if [[ -f "$PID_FILE" ]]; then
  rm -f "$PID_FILE"
 fi
-cd "$SCRIPT_DIR" || exit
+cd "$SCRIPT_DIR"
 # Resolve the harness PID (grandparent of this script).
 # $PPID is the ephemeral shell the harness spawned to run us — it dies
@@ -135,7 +135,7 @@ disown "$SERVER_PID" 2>/dev/null
 echo "$SERVER_PID" > "$PID_FILE"
 # Wait for server-started message (check log file)
-for _ in {1..50}; do
+for i in {1..50}; do
  if grep -q "server-started" "$LOG_FILE" 2>/dev/null; then
    # Verify server is still alive after a short window (catches process reapers)
    alive="true"
--- a/skills/brainstorming/scripts/stop-server.sh
+++ b/skills/brainstorming/scripts/stop-server.sh
@@ -23,7 +23,7 @@ if [[ -f "$PID_FILE" ]]; then
  kill "$pid" 2>/dev/null || true
  # Wait for graceful shutdown (up to ~2s)
-  for _ in {1..20}; do
+  for i in {1..20}; do
    if ! kill -0 "$pid" 2>/dev/null; then
      break
    fi
--- a/skills/using-superpowers/SKILL.md
+++ b/skills/using-superpowers/SKILL.md
@@ -12,7 +12,7 @@ If you think there is even a 1% chance a skill might apply to what you are doing
 IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
-This is not negotiable. This is not optional. You cannot rationalize your way out of this.
+This is not negotiable. This is not optional. You cannot rationalize your way out of this. (The single carve-out: a skill whose own description says it does not apply — see The Rule.)
 </EXTREMELY-IMPORTANT>
 ## Instruction Priority
@@ -49,6 +49,10 @@ Skills speak in actions ("dispatch a subagent", "create a todo", "read a file")
 **Invoke relevant or requested skills BEFORE any response or action.** Even a 1% chance a skill might apply means that you should invoke the skill to check. If an invoked skill turns out to be wrong for the situation, you don't need to use it.
 **Documented exceptions in a skill's own description are authoritative.** When a description itself says the skill does not apply to a request (e.g. brainstorming's nothing-to-design exception), not invoking it is compliance, not rationalization. Any doubt about whether the exception's conditions hold means invoke. Only the skill's description can define such an exception; you cannot infer one.
 **An exception skip must be stated, never silent.** Before your first action, write one line naming the exception and the tripwire scan that came up empty — e.g. "Skipping brainstorming per its exception: no security/deletion/schema/new-file tripwires; outcome fully specified." If you did not write the scan line, you did not scan — invoke the skill instead.
 ```dot
 digraph skill_flow {
    "User message received" [shape=doublecircle];
@@ -69,7 +73,12 @@ digraph skill_flow {
    "Invoke brainstorming skill" -> "Might any skill apply?";
    "User message received" -> "Might any skill apply?";
-    "Might any skill apply?" -> "Invoke the skill" [label="yes, even 1%"];
+    "Might any skill apply?" -> "Skill's own description exempts this request?" [label="yes, even 1%"];
    "Skill's own description exempts this request?" [shape=diamond];
    "Skill's own description exempts this request?" -> "Invoke the skill" [label="no / any doubt"];
    "Skill's own description exempts this request?" -> "State the one-line tripwire scan, then proceed" [label="yes, clearly"];
    "State the one-line tripwire scan, then proceed" [shape=box];
    "State the one-line tripwire scan, then proceed" -> "Respond (including clarifications)";
    "Might any skill apply?" -> "Respond (including clarifications)" [label="definitely not"];
    "Invoke the skill" -> "Announce: 'Using [skill] to [purpose]'";
    "Announce: 'Using [skill] to [purpose]'" -> "Has checklist?";
@@ -94,6 +103,7 @@ These thoughts mean STOP—you're rationalizing:
 | "I remember this skill" | Skills evolve. Read current version. |
 | "This doesn't count as a task" | Action = task. Check for skills. |
 | "The skill is overkill" | Simple things become complex. Use it. |
 | "Too trivial to scan the tripwire list" | The scan is one sentence. Write it or invoke the skill. |
 | "I'll just do this one thing first" | Check BEFORE doing anything. |
 | "This feels productive" | Undisciplined action wastes time. Skills prevent this. |
 | "I know what that means" | Knowing the concept ≠ using the skill. Invoke it. |
@@ -118,4 +128,6 @@ The skill itself tells you which.
 ## User Instructions
-Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows.
+Instructions say WHAT, not HOW. "Add X" or "Fix Y" doesn't mean skip workflows — unless a skill's own description exempts the request (see The Rule above).
 Pressure phrasing — "don't ask questions", "make assumptions", "just build it" — changes how you interact (state assumptions instead of asking), not which skills you invoke. Only an instruction that names what to skip ("don't write a plan", "skip TDD") or a description exception skips a workflow step. "Your instruction wins per the priority rules" applied to an unnamed workflow step is a rationalization.
--- a/skills/writing-skills/SKILL.md
+++ b/skills/writing-skills/SKILL.md
@@ -151,6 +151,8 @@ Concrete results
 The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
 (Negative triggering conditions are still triggering conditions: a description MAY state when the skill does NOT apply — including its tripwires — and per using-superpowers' Rule such description-level exceptions are authoritative, so they must live here, not only in the body. That is scope, not workflow.)
 **Why this matters:** Testing revealed that when a description summarizes the skill's workflow, an agent may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused an agent to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
 When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), the agent correctly read the flowchart and followed the two-stage review process.
--- a/tests/claude-code/test-helpers.sh
+++ b/tests/claude-code/test-helpers.sh
@@ -7,8 +7,7 @@ run_claude() {
    local prompt="$1"
    local timeout="${2:-60}"
    local allowed_tools="${3:-}"
-    local output_file
+    local output_file=$(mktemp)
    output_file="$(mktemp)"
    # Build command as an argv array so timeout wraps claude directly.
    local cmd=(claude -p "$prompt")
@@ -75,8 +74,7 @@ assert_count() {
    local expected="$3"
    local test_name="${4:-test}"
-    local actual
+    local actual=$(echo "$output" | grep -c "$pattern" || echo "0")
    actual="$(echo "$output" | grep -c "$pattern" || true)"
    if [ "$actual" -eq "$expected" ]; then
        echo "  [PASS] $test_name (found $actual instances)"
@@ -100,10 +98,8 @@ assert_order() {
    local test_name="${4:-test}"
    # Get line numbers where patterns appear
-    local line_a
+    local line_a=$(echo "$output" | grep -n "$pattern_a" | head -1 | cut -d: -f1)
-    local line_b
+    local line_b=$(echo "$output" | grep -n "$pattern_b" | head -1 | cut -d: -f1)
    line_a="$(echo "$output" | grep -n "$pattern_a" | head -1 | cut -d: -f1 || true)"
    line_b="$(echo "$output" | grep -n "$pattern_b" | head -1 | cut -d: -f1 || true)"
    if [ -z "$line_a" ]; then
        echo "  [FAIL] $test_name: pattern A not found: $pattern_a"
@@ -129,8 +125,7 @@ assert_order() {
 # Create a temporary test project directory
 # Usage: test_project=$(create_test_project)
 create_test_project() {
-    local test_dir
+    local test_dir=$(mktemp -d)
    test_dir="$(mktemp -d)"
    echo "$test_dir"
 }
--- a/tests/claude-code/test-subagent-driven-development-integration.sh
+++ b/tests/claude-code/test-subagent-driven-development-integration.sh
@@ -37,10 +37,7 @@ TEST_PROJECT=$(create_test_project)
 echo "Test project: $TEST_PROJECT"
 # Trap to cleanup
-cleanup_integration_test_project() {
+trap "cleanup_test_project $TEST_PROJECT" EXIT
    cleanup_test_project "$TEST_PROJECT"
 }
 trap cleanup_integration_test_project EXIT
 # Set up minimal Node.js project
 cd "$TEST_PROJECT"
@@ -167,19 +164,12 @@ PLUGIN_DIR=$(cd "$SCRIPT_DIR/../.." && pwd)
 # other concurrent claude sessions.
 echo "Running Claude (plugin-dir: $PLUGIN_DIR, cwd: $TEST_PROJECT)..."
 echo "================================================================================"
-set +e
+cd "$TEST_PROJECT" && timeout 1800 claude -p "$PROMPT" --plugin-dir "$PLUGIN_DIR" --allowed-tools=all --permission-mode bypassPermissions 2>&1 | tee "$OUTPUT_FILE" || {
 (
    cd "$TEST_PROJECT" &&
        timeout 1800 claude -p "$PROMPT" --plugin-dir "$PLUGIN_DIR" --allowed-tools=all --permission-mode bypassPermissions
 ) 2>&1 | tee "$OUTPUT_FILE"
 execution_status=$?
 set -e
 if [[ "$execution_status" -ne 0 ]]; then
    echo ""
    echo "================================================================================"
-    echo "EXECUTION FAILED (exit code: $execution_status)"
+    echo "EXECUTION FAILED (exit code: $?)"
    exit 1
-fi
+}
 echo "================================================================================"
 echo ""
--- a/tests/claude-code/test-worktree-path-policy.sh
+++ b/tests/claude-code/test-worktree-path-policy.sh
@@ -47,20 +47,16 @@ assert_not_contains() {
 echo "=== Worktree Path Policy Test ==="
 echo ""
-# Intentionally search for the literal legacy path, not the current user's home.
+assert_not_contains "$USING_SKILL" "~/.config/superpowers/worktrees" "using-git-worktrees does not mention old global path"
 # shellcheck disable=SC2088
 legacy_global_worktree_path="~/.config/superpowers/worktrees"
 assert_not_contains "$USING_SKILL" "$legacy_global_worktree_path" "using-git-worktrees does not mention old global path"
 assert_not_contains "$USING_SKILL" "global legacy" "using-git-worktrees does not use unclear global legacy shorthand"
 assert_not_contains "$USING_SKILL" "Global path" "using-git-worktrees has no global path quick-reference row"
 assert_contains "$USING_SKILL" 'default to `.worktrees/` at the project root' "using-git-worktrees defaults new manual worktrees to .worktrees/"
-assert_not_contains "$FINISHING_SKILL" "$legacy_global_worktree_path" "finishing-a-development-branch does not treat old global path as owned"
+assert_not_contains "$FINISHING_SKILL" "~/.config/superpowers/worktrees" "finishing-a-development-branch does not treat old global path as owned"
 assert_contains "$FINISHING_SKILL" '`.worktrees/` or `worktrees/`' "finishing-a-development-branch keeps project-local cleanup ownership"
-assert_not_contains "$ROTOTILL_SPEC" "$legacy_global_worktree_path" "rototill spec does not preserve old global path policy"
+assert_not_contains "$ROTOTILL_SPEC" "~/.config/superpowers/worktrees" "rototill spec does not preserve old global path policy"
-assert_not_contains "$ROTOTILL_PLAN" "$legacy_global_worktree_path" "rototill plan does not preserve old global path policy"
+assert_not_contains "$ROTOTILL_PLAN" "~/.config/superpowers/worktrees" "rototill plan does not preserve old global path policy"
 assert_not_contains "$ROTOTILL_PLAN" "legacy path compat" "rototill plan does not advertise legacy path compatibility"
 echo ""
Author	SHA1	Message	Date
Drew Ritter	5c3af5f195	fix(skills): brainstorming gate exempts nothing-to-design requests; description exceptions are authoritative (SUP-333 C) Consolidates the brainstorming exception with its routing-layer semantics, so this PR is independently mergeable (previously split across two stacked PRs whose intermediate state left the always- injected routing text contradicting the shipped description). brainstorming: the nothing-to-design exception, earned by a tripwire scan stated in one line before acting. Tripwires precede the permission (skimmers stop at "implement directly"); security-posture touches re-gate even with the exact value stated; requested deletions re-gate; rationalization table per writing-skills bulletproofing. Description 971/1024 chars, YAML-validated. using-superpowers: description-level exceptions are authoritative (compliance, not rationalization); doubt means invoke; only the description can define one; the skip must state its scan; flowchart routes the exempt path through the scan statement; <EXTREMELY-IMPORTANT> defers in one parenthetical. writing-skills: negative triggering conditions are scope (allowed, required at the description) vs workflow summaries (still forbidden) — prevents a future checklist pass from stripping the exception. Eval evidence (quorum): RED cost-checkbox-over-trigger failed 5/6 agents (pi ⊘); GREEN claude 3/3, codex ✓, antigravity ✓ (kimi unchanged from baseline — does not read description exceptions); gate-still-fires: brainstorming-resists 2/2 + codex, spec-plan brainstorm leg 3/3. Boundary scenarios (security one-liner, requested deletion): pre-stack dev baseline 0/3 + 0/3 (silent edit every time — the blanket gate never fired on one-liners); this text 2/3 + 2/3, the first text in the corpus to catch these at any rate; scenarios ship as regression instruments (proposed in prime-radiant-inc/superpowers-evals#11, open). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Canary-caught addition: on the assembled text, triggering-writing-plans went 0/3 with claude citing "your explicit instruction wins per the priority rules" to skip writing-plans under the scenario's "don't ask me any questions" pressure — the Instruction Priority section read as licensing ad-hoc pressure to skip workflow steps. User Instructions now distinguishes pressure phrasing (changes interaction style) from instructions that name what to skip (honored), and tags the quoted rationalization.	2026-06-11 00:36:41 -07:00
Drew Ritter	0cb1960068	chore(evals): bump submodule for Claude Haiku target	2026-06-10 16:31:16 -07:00
Jesse Vincent	f55642e0dd	Require contributors to disclose authoring environment and target dev Add a mandatory self-identification disclosure (model, harness, harness version, all installed plugins) to the PR template and all three issue templates, and document the requirement in the contributor guidelines. We weigh contributions differently depending on what produced them: content reasoned from documentation is held to a different bar than work grounded in a real session. Also state explicitly, in both CLAUDE.md and the PR template, that all PRs must target the dev branch rather than main.	2026-06-08 22:14:34 -07:00
Drew Ritter	ae1eefb7f9	chore(evals): bump submodule to --scenarios filter (ff3ee83) Adds `run-all --scenarios` for resuming a scenario subset across the Code Assist rate-limit windows. Follows the agy rate-limit fix (79f9963). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 22:46:00 -07:00
Drew Ritter	617168aff5	chore(evals): bump submodule to antigravity rate-limit fix (79f9963) Serialize antigravity against the Gemini Code Assist rate limit (max_concurrency=1), diagnose 429/RESOURCE_EXHAUSTED honestly instead of as auth, fail-fast on a latched window, and tolerant preflight OK match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 16:27:35 -07:00