Refactor visual brainstorming: browser displays, terminal commands (#509)

* Refactor visual brainstorming: browser displays, terminal commands Replaces the blocking TaskOutput/wait-for-feedback.sh pattern with a non-blocking model where the browser is an interactive display and the terminal stays available for conversation. Server changes: - Write user click events to .events JSONL file (per-screen, cleared on new screen push) so Claude reads them on its next turn - Replace regex-based wrapInFrame with  placeholder - Add --foreground flag to start-server.sh for sandbox environments - Harden startup with nohup/disown and liveness check UI changes: - Remove feedback footer (textarea + Send button) - Add selection indicator bar ("Option X selected — return to terminal") - Narrow click handler to [data-choice] elements only Skill changes: - Rewrite visual-companion.md for non-blocking loop - Fix visual companion being skipped on Codex (no browser tools needed) - Make visual companion offer a standalone question (one question rule) Deletes wait-for-feedback.sh entirely. * Add visual companion offer to brainstorming checklist for UX topics The visual companion was a disconnected section at the bottom of SKILL.md that agents never reached because it wasn't in the mandatory checklist. Now step 2 evaluates whether the topic involves visual/UX decisions and offers the companion if so. Non-visual topics (APIs, data models, etc.) skip the step entirely. * Add multi-select support to visual companion Containers with data-multiselect allow toggling multiple selections. Without it, behavior is unchanged (single-select). Indicator bar shows count when multiple items are selected.
2026-04-23 18:09:05 +08:00 · 2026-02-19 16:31:51 -08:00
parent 3a254ba002
commit ce0f9a28be
10 changed files with 960 additions and 207 deletions
--- a/skills/brainstorming/SKILL.md
+++ b/skills/brainstorming/SKILL.md
@@ -22,17 +22,20 @@ Every project goes through this process. A todo list, a single-function utility,
 You MUST create a task for each of these items and complete them in order:

 1. **Explore project context** — check files, docs, recent commits
-2. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
-3. **Propose 2-3 approaches** — with trade-offs and your recommendation
-4. **Present design** — in sections scaled to their complexity, get user approval after each section
-5. **Write design doc** — save to `docs/plans/YYYY-MM-DD-<topic>-design.md` and commit
-6. **Transition to implementation** — invoke writing-plans skill to create implementation plan
+2. **Offer visual companion** (if topic will involve visual questions) — this is its own message, not combined with a clarifying question. See the Visual Companion section below.
+3. **Ask clarifying questions** — one at a time, understand purpose/constraints/success criteria
+4. **Propose 2-3 approaches** — with trade-offs and your recommendation
+5. **Present design** — in sections scaled to their complexity, get user approval after each section
+6. **Write design doc** — save to `docs/plans/YYYY-MM-DD-<topic>-design.md` and commit
+7. **Transition to implementation** — invoke writing-plans skill to create implementation plan

 ## Process Flow

 ```dot
 digraph brainstorming {
    "Explore project context" [shape=box];
+    "Visual questions ahead?" [shape=diamond];
+    "Offer Visual Companion\n(own message, no other content)" [shape=box];
    "Ask clarifying questions" [shape=box];
    "Propose 2-3 approaches" [shape=box];
    "Present design sections" [shape=box];
@@ -40,7 +43,10 @@ digraph brainstorming {
    "Write design doc" [shape=box];
    "Invoke writing-plans skill" [shape=doublecircle];

-    "Explore project context" -> "Ask clarifying questions";
+    "Explore project context" -> "Visual questions ahead?";
+    "Visual questions ahead?" -> "Offer Visual Companion\n(own message, no other content)" [label="yes"];
+    "Visual questions ahead?" -> "Ask clarifying questions" [label="no"];
+    "Offer Visual Companion\n(own message, no other content)" -> "Ask clarifying questions";
    "Ask clarifying questions" -> "Propose 2-3 approaches";
    "Propose 2-3 approaches" -> "Present design sections";
    "Present design sections" -> "User approves design?";
@@ -55,6 +61,7 @@ digraph brainstorming {
 ## The Process

 **Understanding the idea:**
+
 - Check out the current project state first (files, docs, recent commits)
 - Before asking detailed questions, assess scope: if the request describes multiple independent subsystems (e.g., "build a platform with chat, file storage, billing, and analytics"), flag this immediately. Don't spend questions refining details of a project that needs to be decomposed first.
 - If the project is too large for a single spec, help the user decompose into sub-projects: what are the independent pieces, how do they relate, what order should they be built? Then brainstorm the first sub-project through the normal design flow. Each sub-project gets its own spec → plan → implementation cycle.
@@ -64,11 +71,13 @@ digraph brainstorming {
 - Focus on understanding: purpose, constraints, success criteria

 **Exploring approaches:**
+
 - Propose 2-3 different approaches with trade-offs
 - Present options conversationally with your recommendation and reasoning
 - Lead with your recommended option and explain why

 **Presenting the design:**
+
 - Once you believe you understand what you're building, present the design
 - Scale each section to its complexity: a few sentences if straightforward, up to 200-300 words if nuanced
 - Ask after each section whether it looks right so far
@@ -76,12 +85,14 @@ digraph brainstorming {
 - Be ready to go back and clarify if something doesn't make sense

 **Design for isolation and clarity:**
+
 - Break the system into smaller units that each have one clear purpose, communicate through well-defined interfaces, and can be understood and tested independently
 - For each unit, you should be able to answer: what does it do, how do you use it, and what does it depend on?
 - Can someone understand what a unit does without reading its internals? Can you change the internals without breaking consumers? If not, the boundaries need work.
 - Smaller, well-bounded units are also easier for you to work with - you reason better about code you can hold in context at once, and your edits are more reliable when files are focused. When a file grows large, that's often a signal that it's doing too much.

 **Working in existing codebases:**
+
 - Explore the current structure before proposing changes. Follow existing patterns.
 - Where existing code has problems that affect the work (e.g., a file that's grown too large, unclear boundaries, tangled responsibilities), include targeted improvements as part of the design - the way a good developer improves code they're working in.
 - Don't propose unrelated refactoring. Stay focused on what serves the current goal.
@@ -89,6 +100,7 @@ digraph brainstorming {
 ## After the Design

 **Documentation:**
+
 - Write the validated design (spec) to `docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md`
  - (User preferences for spec location override this default)
 - Use elements-of-style:writing-clearly-and-concisely skill if available
@@ -96,11 +108,13 @@ digraph brainstorming {

 **Spec Review Loop:**
 After writing the spec document:
+
 1. Dispatch spec-document-reviewer subagent (see spec-document-reviewer-prompt.md)
 2. If Issues Found: fix, re-dispatch, repeat until Approved
 3. If loop exceeds 5 iterations, surface to human for guidance

 **Implementation:**
+
 - Invoke the writing-plans skill to create a detailed implementation plan
 - Do NOT invoke any other skill. writing-plans is the next step.

@@ -113,12 +127,21 @@ After writing the spec document:
 - **Incremental validation** - Present design, get approval before moving on
 - **Be flexible** - Go back and clarify when something doesn't make sense

-## Visual Companion (Claude Code Only)
+## Visual Companion

-A browser-based visual companion for showing mockups, diagrams, and options during brainstorming. Use it whenever visual representation would make feedback easier than text descriptions alone.
+A browser-based companion for showing mockups, diagrams, and visual options during brainstorming. Available as a tool — not a mode. Accepting the companion means it's available for questions that benefit from visual treatment; it does NOT mean every question goes through the browser.

-**When the topic involves visual decisions, ask:**
-> "This involves some visual decisions. I can show mockups in a browser window so you can see options and give feedback visually. This feature is still new — it can be token-intensive and a bit slow, but it works well for layout, design, and architecture questions. Want to try it? (Requires opening a local URL)"
+**Offering the companion:** When you anticipate that upcoming questions will involve visual content (mockups, layouts, diagrams), offer it once for consent:
+> "Some of the upcoming design questions would benefit from visual mockups. I can show those in a browser window so you can see and compare options visually. This feature is still new — it can be token-intensive and a bit slow, but it works well for layout and design questions. Want to try it? (Requires opening a local URL)"

-If they agree, read the detailed guide before proceeding:
-`${CLAUDE_PLUGIN_ROOT}/skills/brainstorming/visual-companion.md`
+**This offer MUST be its own message.** Do not combine it with clarifying questions, context summaries, or any other content. The message should contain ONLY the offer above and nothing else. Wait for the user's response before continuing. If they decline, proceed with text-only brainstorming.
+
+**Per-question decision:** Even after the user accepts, decide FOR EACH QUESTION whether to use the browser or the terminal. The test: **would the user understand this better by seeing it than reading it?**
+
+- **Use the browser** for content that IS visual — mockups, wireframes, layout comparisons, architecture diagrams, side-by-side visual designs
+- **Use the terminal** for content that is text — requirements questions, conceptual choices, tradeoff lists, A/B/C/D text options, scope decisions
+
+A question about a UI topic is not automatically a visual question. "What does personality mean in this context?" is a conceptual question — use the terminal. "Which wizard layout works better?" is a visual question — use the browser.
+
+If they agree to the companion, read the detailed guide before proceeding:
+`skills/brainstorming/visual-companion.md`
--- a/skills/brainstorming/visual-companion.md
+++ b/skills/brainstorming/visual-companion.md
@@ -4,20 +4,31 @@ Browser-based visual brainstorming companion for showing mockups, diagrams, and

 ## When to Use

-Use the visual companion when seeing beats describing:
- **UI mockups** — layouts, navigation, component designs
- **Architecture diagrams** — system components, data flow, relationships
- **Complex choices** — multi-option decisions with visual trade-offs
- **Design polish** — when the question is about look and feel
- **Spatial relationships** — file structures, database schemas, state machines
+Decide per-question, not per-session. The test: **would the user understand this better by seeing it than reading it?**

-Don't use it for simple text questions, code review, or when the user prefers terminal-only interaction.
+**Use the browser** when the content itself is visual:
+
+- **UI mockups** — wireframes, layouts, navigation structures, component designs
+- **Architecture diagrams** — system components, data flow, relationship maps
+- **Side-by-side visual comparisons** — comparing two layouts, two color schemes, two design directions
+- **Design polish** — when the question is about look and feel, spacing, visual hierarchy
+- **Spatial relationships** — state machines, flowcharts, entity relationships rendered as diagrams
+
+**Use the terminal** when the content is text or tabular:
+
+- **Requirements and scope questions** — "what does X mean?", "which features are in scope?"
+- **Conceptual A/B/C choices** — picking between approaches described in words
+- **Tradeoff lists** — pros/cons, comparison tables
+- **Technical decisions** — API design, data modeling, architectural approach selection
+- **Clarifying questions** — anything where the answer is words, not a visual preference
+
+A question *about* a UI topic is not automatically a visual question. "What kind of wizard do you want?" is conceptual — use the terminal. "Which of these wizard layouts feels right?" is visual — use the browser.

 ## How It Works

-The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content, the user sees it in their browser, clicks options or types feedback, and you receive their response as JSON.
+The server watches a directory for HTML files and serves the newest one to the browser. You write HTML content, the user sees it in their browser and can click to select options. Selections are recorded to a `.events` file that you read on your next turn.

-**Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, feedback footer, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.
+**Content fragments vs full documents:** If your HTML file starts with `<!DOCTYPE` or `<html`, the server serves it as-is (just injects the helper script). Otherwise, the server automatically wraps your content in the frame template — adding the header, CSS theme, selection indicator, and all interactive infrastructure. **Write content fragments by default.** Only write full documents when you need complete control over the page.

 ## Starting a Session

@@ -33,38 +44,55 @@ Save `screen_dir` from the response. Tell user to open the URL.

 **Note:** Pass the project root as `--project-dir` so mockups persist in `.superpowers/brainstorm/` and survive server restarts. Without it, files go to `/tmp` and get cleaned up. Remind the user to add `.superpowers/` to `.gitignore` if it's not already there.

+**Codex behavior:** In Codex (`CODEX_CI=1`), `start-server.sh` auto-switches to foreground mode by default because background jobs may be reaped. Use `--background` only if your environment reliably preserves detached processes.
+
+**If background processes are reaped in your environment:** run in foreground from a persistent terminal session:
+
+```bash
+${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/start-server.sh --project-dir /path/to/project --foreground
+```
+
+In `--foreground` mode, the command stays attached and serves until interrupted.
+
+If the URL is unreachable from your browser (common in remote/containerized setups), bind a non-loopback host:
+
+```bash
+${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/start-server.sh \
+  --project-dir /path/to/project \
+  --host 0.0.0.0 \
+  --url-host localhost
+```
+
+Use `--url-host` to control what hostname is printed in the returned URL JSON.
+
 ## The Loop

-1. **Start watcher first** (background bash) — avoids race condition:
-   ```bash
-   ${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/wait-for-feedback.sh $SCREEN_DIR
-   ```
-
-2. **Write HTML** to a new file in `screen_dir`:
+1. **Write HTML** to a new file in `screen_dir`:
   - Use semantic filenames: `platform.html`, `visual-style.html`, `layout.html`
   - **Never reuse filenames** — each screen gets a fresh file
   - Use Write tool — **never use cat/heredoc** (dumps noise into terminal)
   - Server automatically serves the newest file

-3. **Tell user what to expect:**
+2. **Tell user what to expect and end your turn:**
   - Remind them of the URL (every step, not just first)
   - Give a brief text summary of what's on screen (e.g., "Showing 3 layout options for the homepage")
+   - Ask them to respond in the terminal: "Take a look and let me know what you think. Click to select an option if you'd like."

-4. **Wait for feedback** — call `TaskOutput(task_id, block=true, timeout=600000)`
-   - If timeout, call TaskOutput again (watcher still running)
-   - After 3 timeouts (30 min), say "Let me know when you want to continue"
+3. **On your next turn** — after the user responds in the terminal:
+   - Read `$SCREEN_DIR/.events` if it exists — this contains the user's browser interactions (clicks, selections) as JSON lines
+   - Merge with the user's terminal text to get the full picture
+   - The terminal message is the primary feedback; `.events` provides structured interaction data

-5. **Process feedback** — returns JSON like `{"choice": "a", "feedback": "make header smaller"}`
+4. **Iterate or advance** — if feedback changes current screen, write a new file (e.g., `layout-v2.html`). Only move to the next question when the current step is validated.

-6. **Iterate or advance** — if feedback changes current screen, write a new file (e.g., `layout-v2.html`). Only move to the next question when the current step is validated.
-
-7. Repeat until done.
+5. Repeat until done.

 ## Writing Content Fragments

-Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, feedback footer, interactive JS).
+Write just the content that goes inside the page. The server wraps it in the frame template automatically (header, theme CSS, selection indicator, and all interactive infrastructure).

 **Minimal example:**
+
 ```html
 <h2>Which layout works better?</h2>
 <p class="subtitle">Consider readability and visual hierarchy</p>
@@ -94,6 +122,7 @@ That's it. No `<html>`, no CSS, no `<script>` tags needed. The server provides a
 The frame template provides these CSS classes for your content:

 ### Options (A/B/C choices)
+
 ```html
 <div class="options">
  <div class="option" data-choice="a" onclick="toggleSelect(this)">
@@ -106,7 +135,16 @@ The frame template provides these CSS classes for your content:
 </div>
 ```

+**Multi-select:** Add `data-multiselect` to the container to let users select multiple options. Each click toggles the item. The indicator bar shows the count.
+
+```html
+<div class="options" data-multiselect>
+  <!-- same option markup — users can select/deselect multiple -->
+</div>
+```
+
 ### Cards (visual designs)
+
 ```html
 <div class="cards">
  <div class="card" data-choice="design1" onclick="toggleSelect(this)">
@@ -120,6 +158,7 @@ The frame template provides these CSS classes for your content:
 ```

 ### Mockup container
+
 ```html
 <div class="mockup">
  <div class="mockup-header">Preview: Dashboard Layout</div>
@@ -128,6 +167,7 @@ The frame template provides these CSS classes for your content:
 ```

 ### Split view (side-by-side)
+
 ```html
 <div class="split">
  <div class="mockup"><!-- left --></div>
@@ -136,6 +176,7 @@ The frame template provides these CSS classes for your content:
 ```

 ### Pros/Cons
+
 ```html
 <div class="pros-cons">
  <div class="pros"><h4>Pros</h4><ul><li>Benefit</li></ul></div>
@@ -144,6 +185,7 @@ The frame template provides these CSS classes for your content:
 ```

 ### Mock elements (wireframe building blocks)
+
 ```html
 <div class="mock-nav">Logo | Home | About | Contact</div>
 <div style="display: flex;">
@@ -156,22 +198,26 @@ The frame template provides these CSS classes for your content:
 ```

 ### Typography and sections
+
 - `h2` — page title
 - `h3` — section heading
 - `.subtitle` — secondary text below title
 - `.section` — content block with bottom margin
 - `.label` — small uppercase label text

-## User Feedback Format
+## Browser Events Format

-```json
-{
-  "choice": "option-id",
-  "feedback": "user notes"
-}
+When the user clicks options in the browser, their interactions are recorded to `$SCREEN_DIR/.events` (one JSON object per line). The file is cleared automatically when you push a new screen.
+
+```jsonl
+{"type":"click","choice":"a","text":"Option A - Simple Layout","timestamp":1706000101}
+{"type":"click","choice":"c","text":"Option C - Complex Grid","timestamp":1706000108}
+{"type":"click","choice":"b","text":"Option B - Hybrid","timestamp":1706000115}
 ```

-Both fields are optional — user may select without notes, or send notes without a selection.
+The full event stream shows the user's exploration path — they may click multiple options before settling. The last `choice` event is typically the final selection, but the pattern of clicks can reveal hesitation or preferences worth asking about.
+
+If `.events` doesn't exist, the user didn't interact with the browser — use only their terminal text.

 ## Design Tips

@@ -200,4 +246,4 @@ If the session used `--project-dir`, mockup files persist in `.superpowers/brain
 ## Reference

 - Frame template (CSS reference): `${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/frame-template.html`
- Helper script (JS API): `${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/helper.js`
+- Helper script (client-side): `${CLAUDE_PLUGIN_ROOT}/lib/brainstorm-server/helper.js`