mirror of https://github.com/obra/superpowers.git synced 2026-06-10 20:59:05 +08:00

Files

Jesse Vincent e3fe480b29 feat(brainstorm-server): gate every endpoint behind a per-session key

The companion server is reachable by any local browser tab (default loopback
bind) and by any host that can route to it (remote --host bind). It served
screens, files, and accepted event-injecting WebSocket connections with no
authentication, so a malicious browser tab or a direct remote client could read
brainstorm content or inject events that the agent reads as the user's input
(prompt injection into a live session).

Generate a per-session secret token, carry it in the served URL as ?key=, and
mirror it into an HttpOnly SameSite=Strict per-port cookie on first load so
same-origin subresources and the WebSocket handshake authenticate automatically.
Every HTTP request and WebSocket upgrade now requires a valid key (query or
cookie, constant-time compared); unauthenticated requests get a friendly 403
explaining they need the full URL. A secret authenticates the client uniformly
across loopback, tunnel, and remote binds and defeats DNS rebinding, which a
Host/Origin allowlist cannot.

Also guard handleMessage against a null JSON payload that crashed the process.

Tests: new auth.test.js (13 cases) covering the key on /, /files/*, and WS plus
cookie bootstrap and the null-payload guard; server.test.js threads the key;
ws-protocol.test.js + auth.test.js wired into npm test.

Closes #1014
Refs #1110, #1553, #1504

2026-06-09 18:29:49 -07:00

13 KiB

Raw Blame History

Visual Companion Guide

Browser-based visual brainstorming companion for showing mockups, diagrams, and options.

When to Use

Decide per-question, not per-session. The test: would the user understand this better by seeing it than reading it?

Use the browser when the content itself is visual:

UI mockups — wireframes, layouts, navigation structures, component designs
Architecture diagrams — system components, data flow, relationship maps
Side-by-side visual comparisons — comparing two layouts, two color schemes, two design directions
Design polish — when the question is about look and feel, spacing, visual hierarchy
Spatial relationships — state machines, flowcharts, entity relationships rendered as diagrams

Use the terminal when the content is text or tabular:

Requirements and scope questions — "what does X mean?", "which features are in scope?"
Conceptual A/B/C choices — picking between approaches described in words
Tradeoff lists — pros/cons, comparison tables
Technical decisions — API design, data modeling, architectural approach selection
Clarifying questions — anything where the answer is words, not a visual preference

A question about a UI topic is not automatically a visual question. "What kind of wizard do you want?" is conceptual — use the terminal. "Which of these wizard layouts feels right?" is visual — use the browser.

How It Works

The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content to screen_dir, the user sees it in their browser and can click to select options. Selections are recorded to state_dir/events that you read on your next turn.

Content fragments vs full documents: If your HTML file starts with <!DOCTYPE or <html, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, selection indicator, and all interactive infrastructure. Write content fragments by default. Only write full documents when you need complete control over the page.

Starting a Session

# Start AFTER the user approves the companion. --open auto-opens their browser on
# the first screen; --project-dir persists mockups and enables same-port restart.
scripts/start-server.sh --project-dir /path/to/project --open

# Returns: {"type":"server-started","port":52341,
#           "url":"http://localhost:52341/?key=ab12…",
#           "screen_dir":"/path/to/project/.superpowers/brainstorm/12345-1706000000/content",
#           "state_dir":"/path/to/project/.superpowers/brainstorm/12345-1706000000/state"}

Save screen_dir and state_dir from the response. With --open, the browser opens itself when you push the first screen — you don't need to ask the user to open it, but still share the URL as a fallback (headless/remote setups won't auto-open).

The URL contains a session key (?key=…). The server rejects any request without it, so always give the user the complete URL from the url field — never strip the query string, and never hand out a bare http://host:port. The key gates HTTP and WebSocket access so a stray browser tab or another machine on the network can't read the screens or inject events. After the first load the browser remembers the key via a cookie, so reloads and /files/* assets work without repeating it.

Finding connection info: The server writes its startup JSON to $STATE_DIR/server-info. If you launched the server in the background and didn't capture stdout, read that file to get the URL and port. When using --project-dir, check <project>/.superpowers/brainstorm/ for the session directory.

Note: Pass the project root as --project-dir so mockups persist in .superpowers/brainstorm/ and survive server restarts. Without it, files go to /tmp and get cleaned up. Remind the user to add .superpowers/ to .gitignore if it's not already there.

Launching the server by platform:

Claude Code:

# Default mode works — the script backgrounds the server itself.
scripts/start-server.sh --project-dir /path/to/project

On Windows, the script auto-detects and switches to foreground mode (which blocks the tool call). Use run_in_background: true on the Bash tool call so the server survives across conversation turns, then read $STATE_DIR/server-info on the next turn to get the URL and port.

Codex:

# Codex reaps background processes. The script auto-detects CODEX_CI and
# switches to foreground mode. Run it normally — no extra flags needed.
scripts/start-server.sh --project-dir /path/to/project

Gemini CLI:

# Use --foreground and set is_background: true on your shell tool call
# so the process survives across turns
scripts/start-server.sh --project-dir /path/to/project --foreground

Copilot CLI:

# Use --foreground and start the server via the bash tool with mode: "async"
# so the process survives across turns. Capture the returned shellId for
# read_bash / stop_bash if you need to interact with it later.
scripts/start-server.sh --project-dir /path/to/project --foreground

Other environments: The server must keep running in the background across conversation turns. If your environment reaps detached processes, use --foreground and launch the command with your platform's background execution mechanism.

If the URL is unreachable from your browser (common in remote/containerized setups), bind a non-loopback host:

scripts/start-server.sh \
  --project-dir /path/to/project \
  --host 0.0.0.0 \
  --url-host localhost

Use --url-host to control what hostname is printed in the returned URL JSON.

The Loop

Check server is alive, then write HTML to a new file in screen_dir:
- Required: confirm the server is alive before referring to the URL or pushing a screen. Check that $STATE_DIR/server-info exists and $STATE_DIR/server-stopped does not. If it has shut down, restart it with start-server.sh using the same --project-dir — it reuses the same port, so the user's open tab reconnects on its own (it shows a "paused" overlay while the server is down) and you don't need to send a new URL. The server auto-exits after 4 hours idle (configurable with --idle-timeout-minutes).
- Use semantic filenames: platform.html, visual-style.html, layout.html
- Never reuse filenames — each screen gets a fresh file
- Use your file-creation tool — never use cat/heredoc (dumps noise into terminal)
- Server automatically serves the newest file
Tell user what to expect and end your turn:
- Remind them of the URL (every step, not just first)
- Give a brief text summary of what's on screen (e.g., "Showing 3 layout options for the homepage")
- Ask them to respond in the terminal: "Take a look and let me know what you think. Click to select an option if you'd like."
On your next turn — after the user responds in the terminal:
- Read $STATE_DIR/events if it exists — this contains the user's browser interactions (clicks, selections) as JSON lines
- Merge with the user's terminal text to get the full picture
- The terminal message is the primary feedback; state_dir/events provides structured interaction data
Iterate or advance — if feedback changes current screen, write a new file (e.g., layout-v2.html). Only move to the next question when the current step is validated.
Unload when returning to terminal — when the next step doesn't need the browser (e.g., a clarifying question, a tradeoff discussion), push a waiting screen to clear the stale content:
```

<div style="display:flex;align-items:center;justify-content:center;min-height:60vh">
  <p class="subtitle">Continuing in terminal...</p>
</div>
```
This prevents the user from staring at a resolved choice while the conversation has moved on. When the next visual question comes up, push a new content file as usual.
Repeat until done.

Writing Content Fragments

Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, selection indicator, and all interactive infrastructure).

Minimal example:

<h2>Which layout works better?</h2>
<p class="subtitle">Consider readability and visual hierarchy</p>

<div class="options">
  <div class="option" data-choice="a" onclick="toggleSelect(this)">
    <div class="letter">A</div>
    <div class="content">
      <h3>Single Column</h3>
      <p>Clean, focused reading experience</p>
    </div>
  </div>
  <div class="option" data-choice="b" onclick="toggleSelect(this)">
    <div class="letter">B</div>
    <div class="content">
      <h3>Two Column</h3>
      <p>Sidebar navigation with main content</p>
    </div>
  </div>
</div>

That's it. No <html>, no CSS, no <script> tags needed. The server provides all of that.

CSS Classes Available

The frame template provides these CSS classes for your content:

Options (A/B/C choices)

<div class="options">
  <div class="option" data-choice="a" onclick="toggleSelect(this)">
    <div class="letter">A</div>
    <div class="content">
      <h3>Title</h3>
      <p>Description</p>
    </div>
  </div>
</div>

Multi-select: Add data-multiselect to the container to let users select multiple options. Each click toggles the item. The indicator bar shows the count.

<div class="options" data-multiselect>
  <!-- same option markup — users can select/deselect multiple -->
</div>

Cards (visual designs)

<div class="cards">
  <div class="card" data-choice="design1" onclick="toggleSelect(this)">
    <div class="card-image"><!-- mockup content --></div>
    <div class="card-body">
      <h3>Name</h3>
      <p>Description</p>
    </div>
  </div>
</div>

Mockup container

<div class="mockup">
  <div class="mockup-header">Preview: Dashboard Layout</div>
  <div class="mockup-body"><!-- your mockup HTML --></div>
</div>

Split view (side-by-side)

<div class="split">
  <div class="mockup"><!-- left --></div>
  <div class="mockup"><!-- right --></div>
</div>

Pros/Cons

<div class="pros-cons">
  <div class="pros"><h4>Pros</h4><ul><li>Benefit</li></ul></div>
  <div class="cons"><h4>Cons</h4><ul><li>Drawback</li></ul></div>
</div>

Mock elements (wireframe building blocks)

<div class="mock-nav">Logo | Home | About | Contact</div>
<div style="display: flex;">
  <div class="mock-sidebar">Navigation</div>
  <div class="mock-content">Main content area</div>
</div>
<button class="mock-button">Action Button</button>
<input class="mock-input" placeholder="Input field">
<div class="placeholder">Placeholder area</div>

Typography and sections

h2 — page title
h3 — section heading
.subtitle — secondary text below title
.section — content block with bottom margin
.label — small uppercase label text

Browser Events Format

When the user clicks options in the browser, their interactions are recorded to $STATE_DIR/events (one JSON object per line). The file is cleared automatically when you push a new screen.

{"type":"click","choice":"a","text":"Option A - Simple Layout","timestamp":1706000101}
{"type":"click","choice":"c","text":"Option C - Complex Grid","timestamp":1706000108}
{"type":"click","choice":"b","text":"Option B - Hybrid","timestamp":1706000115}

The full event stream shows the user's exploration path — they may click multiple options before settling. The last choice event is typically the final selection, but the pattern of clicks can reveal hesitation or preferences worth asking about.

If $STATE_DIR/events doesn't exist, the user didn't interact with the browser — use only their terminal text.

Design Tips

Scale fidelity to the question — wireframes for layout, polish for polish questions
Explain the question on each page — "Which layout feels more professional?" not just "Pick one"
Iterate before advancing — if feedback changes current screen, write a new version
2-4 options max per screen
Use real content when it matters — for a photography portfolio, use actual images (Unsplash). Placeholder content obscures design issues.
Keep mockups simple — focus on layout and structure, not pixel-perfect design

File Naming

Use semantic names: platform.html, visual-style.html, layout.html
Never reuse filenames — each screen must be a new file
For iterations: append version suffix like layout-v2.html, layout-v3.html
Server serves newest file by modification time

Cleaning Up

scripts/stop-server.sh $SESSION_DIR

If the session used --project-dir, mockup files persist in .superpowers/brainstorm/ for later reference. Only /tmp sessions get deleted on stop.

Reference

Frame template (CSS reference): scripts/frame-template.html
Helper script (client-side): scripts/helper.js

13 KiB Raw Blame History