Refactor visual brainstorming: browser displays, terminal commands (#509)

* Refactor visual brainstorming: browser displays, terminal commands

Replaces the blocking TaskOutput/wait-for-feedback.sh pattern with a
non-blocking model where the browser is an interactive display and the
terminal stays available for conversation.

Server changes:
- Write user click events to .events JSONL file (per-screen, cleared on
  new screen push) so Claude reads them on its next turn
- Replace regex-based wrapInFrame with <!-- CONTENT --> placeholder
- Add --foreground flag to start-server.sh for sandbox environments
- Harden startup with nohup/disown and liveness check

UI changes:
- Remove feedback footer (textarea + Send button)
- Add selection indicator bar ("Option X selected — return to terminal")
- Narrow click handler to [data-choice] elements only

Skill changes:
- Rewrite visual-companion.md for non-blocking loop
- Fix visual companion being skipped on Codex (no browser tools needed)
- Make visual companion offer a standalone question (one question rule)

Deletes wait-for-feedback.sh entirely.

* Add visual companion offer to brainstorming checklist for UX topics

The visual companion was a disconnected section at the bottom of SKILL.md
that agents never reached because it wasn't in the mandatory checklist.
Now step 2 evaluates whether the topic involves visual/UX decisions and
offers the companion if so. Non-visual topics (APIs, data models, etc.)
skip the step entirely.

* Add multi-select support to visual companion

Containers with data-multiselect allow toggling multiple selections.
Without it, behavior is unchanged (single-select). Indicator bar shows
count when multiple items are selected.
This commit is contained in:
Drew Ritter
2026-02-19 16:31:51 -08:00
committed by GitHub
parent 3a254ba002
commit ce0f9a28be
10 changed files with 960 additions and 207 deletions

View File

@@ -4,20 +4,31 @@ Browser-based visual brainstorming companion for showing mockups, diagrams, and
## When to Use
Use the visual companion when seeing beats describing:
- **UI mockups** — layouts, navigation, component designs
- **Architecture diagrams** — system components, data flow, relationships
- **Complex choices** — multi-option decisions with visual trade-offs
- **Design polish** — when the question is about look and feel
- **Spatial relationships** — file structures, database schemas, state machines
Decide per-question, not per-session. The test: **would the user understand this better by seeing it than reading it?**
Don't use it for simple text questions, code review, or when the user prefers terminal-only interaction.
**Use the browser** when the content itself is visual:
- **UI mockups** — wireframes, layouts, navigation structures, component designs
- **Architecture diagrams** — system components, data flow, relationship maps
- **Side-by-side visual comparisons** — comparing two layouts, two color schemes, two design directions
- **Design polish** — when the question is about look and feel, spacing, visual hierarchy
- **Spatial relationships** — state machines, flowcharts, entity relationships rendered as diagrams
**Use the terminal** when the content is text or tabular:
- **Requirements and scope questions** — "what does X mean?", "which features are in scope?"
- **Conceptual A/B/C choices** — picking between approaches described in words
- **Tradeoff lists** — pros/cons, comparison tables
- **Technical decisions** — API design, data modeling, architectural approach selection
- **Clarifying questions** — anything where the answer is words, not a visual preference
A question *about* a UI topic is not automatically a visual question. "What kind of wizard do you want?" is conceptual — use the terminal. "Which of these wizard layouts feels right?" is visual — use the browser.
## How It Works
The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content, the user sees it in their browser, clicks options or types feedback, and you receive their response as JSON.
The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content, the user sees it in their browser and can click to select options. Selections are recorded to a `.events` file that you read on your next turn.
**Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, feedback footer, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.
**Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, selection indicator, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.
## Starting a Session
@@ -33,38 +44,55 @@ Save `screen_dir` from the response. Tell user to open the URL.
**Note:** Pass the project root as `--project-dir` so mockups persist in `.superpowers/brainstorm/` and survive server restarts. Without it, files go to `/tmp` and get cleaned up. Remind the user to add `.superpowers/` to `.gitignore` if it's not already there.
**Codex behavior:** In Codex (`CODEX_CI=1`), `start-server.sh` auto-switches to foreground mode by default because background jobs may be reaped. Use `--background` only if your environment reliably preserves detached processes.
**If background processes are reaped in your environment:** run in foreground from a persistent terminal session:
```bash
${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/start-server.sh --project-dir /path/to/project --foreground
```
In `--foreground` mode, the command stays attached and serves until interrupted.
If the URL is unreachable from your browser (common in remote/containerized setups), bind a non-loopback host:
```bash
${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/start-server.sh \
--project-dir /path/to/project \
--host 0.0.0.0 \
--url-host localhost
```
Use `--url-host` to control what hostname is printed in the returned URL JSON.
## The Loop
1. **Start watcher first** (background bash) — avoids race condition:
```bash
${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/wait-for-feedback.sh $SCREEN_DIR
```
2. **Write HTML** to a new file in `screen_dir`:
1. **Write HTML** to a new file in `screen_dir`:
- Use semantic filenames: `platform.html`, `visual-style.html`, `layout.html`
- **Never reuse filenames** — each screen gets a fresh file
- Use Write tool — **never use cat/heredoc** (dumps noise into terminal)
- Server automatically serves the newest file
3. **Tell user what to expect:**
2. **Tell user what to expect and end your turn:**
- Remind them of the URL (every step, not just first)
- Give a brief text summary of what's on screen (e.g., "Showing 3 layout options for the homepage")
- Ask them to respond in the terminal: "Take a look and let me know what you think. Click to select an option if you'd like."
4. **Wait for feedback** — call `TaskOutput(task_id, block=true, timeout=600000)`
- If timeout, call TaskOutput again (watcher still running)
- After 3 timeouts (30 min), say "Let me know when you want to continue"
3. **On your next turn**after the user responds in the terminal:
- Read `$SCREEN_DIR/.events` if it exists — this contains the user's browser interactions (clicks, selections) as JSON lines
- Merge with the user's terminal text to get the full picture
- The terminal message is the primary feedback; `.events` provides structured interaction data
5. **Process feedback** — returns JSON like `{"choice": "a", "feedback": "make header smaller"}`
4. **Iterate or advance** — if feedback changes current screen, write a new file (e.g., `layout-v2.html`). Only move to the next question when the current step is validated.
6. **Iterate or advance** — if feedback changes current screen, write a new file (e.g., `layout-v2.html`). Only move to the next question when the current step is validated.
7. Repeat until done.
5. Repeat until done.
## Writing Content Fragments
Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, feedback footer, interactive JS).
Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, selection indicator, and all interactive infrastructure).
**Minimal example:**
```html
<h2>Which layout works better?</h2>
<p class="subtitle">Consider readability and visual hierarchy</p>
@@ -94,6 +122,7 @@ That's it. No `<html>`, no CSS, no `<script>` tags needed. The server provides a
The frame template provides these CSS classes for your content:
### Options (A/B/C choices)
```html
<div class="options">
<div class="option" data-choice="a" onclick="toggleSelect(this)">
@@ -106,7 +135,16 @@ The frame template provides these CSS classes for your content:
</div>
```
**Multi-select:** Add `data-multiselect` to the container to let users select multiple options. Each click toggles the item. The indicator bar shows the count.
```html
<div class="options" data-multiselect>
<!-- same option markup — users can select/deselect multiple -->
</div>
```
### Cards (visual designs)
```html
<div class="cards">
<div class="card" data-choice="design1" onclick="toggleSelect(this)">
@@ -120,6 +158,7 @@ The frame template provides these CSS classes for your content:
```
### Mockup container
```html
<div class="mockup">
<div class="mockup-header">Preview: Dashboard Layout</div>
@@ -128,6 +167,7 @@ The frame template provides these CSS classes for your content:
```
### Split view (side-by-side)
```html
<div class="split">
<div class="mockup"><!-- left --></div>
@@ -136,6 +176,7 @@ The frame template provides these CSS classes for your content:
```
### Pros/Cons
```html
<div class="pros-cons">
<div class="pros"><h4>Pros</h4><ul><li>Benefit</li></ul></div>
@@ -144,6 +185,7 @@ The frame template provides these CSS classes for your content:
```
### Mock elements (wireframe building blocks)
```html
<div class="mock-nav">Logo | Home | About | Contact</div>
<div style="display: flex;">
@@ -156,22 +198,26 @@ The frame template provides these CSS classes for your content:
```
### Typography and sections
- `h2` — page title
- `h3` — section heading
- `.subtitle` — secondary text below title
- `.section` — content block with bottom margin
- `.label` — small uppercase label text
## User Feedback Format
## Browser Events Format
```json
{
"choice": "option-id",
"feedback": "user notes"
}
When the user clicks options in the browser, their interactions are recorded to `$SCREEN_DIR/.events` (one JSON object per line). The file is cleared automatically when you push a new screen.
```jsonl
{"type":"click","choice":"a","text":"Option A - Simple Layout","timestamp":1706000101}
{"type":"click","choice":"c","text":"Option C - Complex Grid","timestamp":1706000108}
{"type":"click","choice":"b","text":"Option B - Hybrid","timestamp":1706000115}
```
Both fields are optional — user may select without notes, or send notes without a selection.
The full event stream shows the user's exploration path — they may click multiple options before settling. The last `choice` event is typically the final selection, but the pattern of clicks can reveal hesitation or preferences worth asking about.
If `.events` doesn't exist, the user didn't interact with the browser — use only their terminal text.
## Design Tips
@@ -200,4 +246,4 @@ If the session used `--project-dir`, mockup files persist in `.superpowers/brain
## Reference
- Frame template (CSS reference): `${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/frame-template.html`
- Helper script (JS API): `${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/helper.js`
- Helper script (client-side): `${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/helper.js`