mirror of
https://github.com/obra/superpowers.git
synced 2026-06-16 15:49:05 +08:00
Spec visual brainstorming launch telemetry
This commit is contained in:
@@ -0,0 +1,236 @@
|
||||
# Visual Brainstorming Launch Telemetry
|
||||
|
||||
**Date:** 2026-06-15
|
||||
**Status:** Draft for Drew review
|
||||
**Linear:** PRI-2231
|
||||
**Scope:** `skills/brainstorming/scripts/`, Superpowers docs, Terminus-owned Cloudflare Worker, Terminus telemetry collector
|
||||
|
||||
## Problem
|
||||
|
||||
Jesse wants to understand whether the Superpowers visual brainstorming companion is being launched in real use. The visible product affordance can say `Superpowers v<version> by Prime Radiant`, but the measurement should not be a logo view or static asset hit.
|
||||
|
||||
The useful event is:
|
||||
|
||||
```text
|
||||
event=visual_brainstorming_launched
|
||||
```
|
||||
|
||||
This event means a Superpowers Visual Companion session reached a browser. GitHub Pages can host static brand assets, but it cannot reliably capture this semantic launch event. The Brainstorm/Brooks product should not own the collector either: Brainstorm is a separate prototype, while Superpowers needs its own lightweight telemetry path.
|
||||
|
||||
## Goals
|
||||
|
||||
- Add default-on, env-var-only opt-out launch telemetry for the visual companion.
|
||||
- Record one semantic launch event per companion launch under normal operation.
|
||||
- Include Superpowers version, country, capped user agent, timestamp, launch ID, and Cloudflare ray ID.
|
||||
- Keep the visual companion usable when telemetry infrastructure is down.
|
||||
- Send v0 data to Loki through Terminus-owned infrastructure.
|
||||
- Make abuse obvious and bounded without pretending a public OSS client can prove authentic human usage.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Do not measure logo impressions, frame reloads, or every page load as product events.
|
||||
- Do not load a remote logo in v0. Text branding is enough for the launch telemetry work.
|
||||
- Do not add DynamoDB or pre-write deduplication in v0.
|
||||
- Do not route Superpowers usage telemetry through the Brainstorm application.
|
||||
- Do not add OpenPanel ingestion in v0. Dashboards can be added after Loki events are clean.
|
||||
- Do not collect project paths, prompt text, user IDs, or browser interaction contents.
|
||||
|
||||
## Definitions
|
||||
|
||||
**Visual Companion:** The local browser display attached to the `brainstorming` skill. It is used for visual questions such as mockups, diagrams, layout comparisons, and spatial choices. The terminal remains the primary conversation channel.
|
||||
|
||||
**Launch:** A local visual companion server process starts and at least one browser page loads the companion helper. This is the moment worth counting.
|
||||
|
||||
**Launch ID:** An ephemeral random ID generated by the local visual companion server for one server process. It is not stable across sessions, machines, or installs.
|
||||
|
||||
## Proposed Architecture
|
||||
|
||||
```text
|
||||
Superpowers browser helper
|
||||
-> Cloudflare Worker at t.primeradiant.com
|
||||
-> API Gateway HTTP API
|
||||
-> VPC Lambda collector
|
||||
-> Loki on monitoring.terminus.internal
|
||||
```
|
||||
|
||||
The browser does not post directly to Terminus monitoring because Loki is intentionally private to the Terminus network. Cloudflare receives the public request, enriches it with edge metadata, signs a compact payload, and forwards it to the Terminus collector. The collector validates that the event came from the Worker before writing to Loki.
|
||||
|
||||
## Superpowers Changes
|
||||
|
||||
`server.cjs` should read the Superpowers version from package metadata at startup and generate a launch ID for the server process. It should expose a small telemetry config to `frame-template.html` and injected full-document helper pages unless `SUPERPOWERS_DISABLE_TELEMETRY=1`.
|
||||
|
||||
The config should include:
|
||||
|
||||
```json
|
||||
{
|
||||
"enabled": true,
|
||||
"endpoint": "https://t.primeradiant.com/superpowers/visual-brainstorming/launch",
|
||||
"event": "visual_brainstorming_launched",
|
||||
"surface": "brainstorming.visual_companion",
|
||||
"superpowersVersion": "5.1.0",
|
||||
"launchId": "ephemeral-random-id"
|
||||
}
|
||||
```
|
||||
|
||||
`helper.js` should send one best-effort GET request after the browser helper initializes:
|
||||
|
||||
```text
|
||||
https://t.primeradiant.com/superpowers/visual-brainstorming/launch?event=visual_brainstorming_launched&surface=brainstorming.visual_companion&v=<version>&launch_id=<launchId>
|
||||
```
|
||||
|
||||
The helper should use a source-side once-per-launch guard. The primary guard can be a `localStorage` key derived from the launch ID, with an in-memory fallback if storage is unavailable. This avoids intentionally firing on every frame reload. Rare duplicates from retries, private browsing behavior, or multiple browsers are acceptable and should be handled in Loki queries if they matter.
|
||||
|
||||
Telemetry failure must be silent. The helper should use a short best-effort request path such as `fetch(..., { method: "GET", mode: "no-cors", credentials: "omit", cache: "no-store", keepalive: true })`, ignore the response, and never affect rendering or WebSocket behavior.
|
||||
|
||||
The visible frame should include local branding:
|
||||
|
||||
```text
|
||||
Superpowers v<version> by Prime Radiant
|
||||
```
|
||||
|
||||
Local branding should still render when telemetry is disabled. The opt-out only suppresses the remote request.
|
||||
|
||||
## Cloudflare Worker
|
||||
|
||||
The Worker owns the public telemetry endpoint, for example:
|
||||
|
||||
```text
|
||||
https://t.primeradiant.com/superpowers/visual-brainstorming/launch
|
||||
```
|
||||
|
||||
The Worker should:
|
||||
|
||||
- Accept only the exact launch path.
|
||||
- Record events only for GET requests. HEAD may return a write-free health response.
|
||||
- Reject requests with a body.
|
||||
- Cap and validate all query parameters.
|
||||
- Require `event=visual_brainstorming_launched`.
|
||||
- Require `surface=brainstorming.visual_companion`.
|
||||
- Validate `v` as a bounded version-like string.
|
||||
- Validate `launch_id` as a bounded opaque ID.
|
||||
- Read country from Cloudflare edge metadata.
|
||||
- Read user agent from the request header and cap its stored length.
|
||||
- Include Cloudflare ray ID when available.
|
||||
- Use a short timeout when forwarding to the collector.
|
||||
- Return a small `204 No Content` response with `Cache-Control: no-store`.
|
||||
|
||||
The Worker signs the collector payload with HMAC. A concrete v1 signature format:
|
||||
|
||||
```text
|
||||
X-Superpowers-Timestamp: <unix-seconds>
|
||||
X-Superpowers-Signature: v1=<hex hmac sha256 over "<timestamp>.<raw-json-body>">
|
||||
```
|
||||
|
||||
The HMAC secret should be configured as a Cloudflare Worker secret. The same secret should be stored in AWS Secrets Manager for the Terminus collector, with Terraform wiring the Lambda to read and cache the secret on cold start. Rotation can be manual for v0.
|
||||
|
||||
## Terminus Collector
|
||||
|
||||
The collector and Worker should live outside the Brainstorm app. The implementation should use this Terminus-owned placement:
|
||||
|
||||
```text
|
||||
terminus/services/superpowers-telemetry-worker
|
||||
terminus/services/superpowers-telemetry-collector
|
||||
terminus/terraform/superpowers-telemetry
|
||||
```
|
||||
|
||||
The public AWS surface should be an API Gateway HTTP API that forwards to a Lambda running in the Terminus VPC. The Lambda needs network access to:
|
||||
|
||||
```text
|
||||
http://monitoring.terminus.internal:3100/loki/api/v1/push
|
||||
```
|
||||
|
||||
The Lambda should reject before writing to Loki when:
|
||||
|
||||
- The signature is missing or invalid.
|
||||
- The timestamp is outside a short freshness window, such as five minutes.
|
||||
- The body is too large.
|
||||
- The event name or surface is not the expected v0 value.
|
||||
- Required fields are missing or malformed.
|
||||
|
||||
The collector should not attempt to identify users. It should write one compact JSON log line per valid launch event.
|
||||
|
||||
## Loki Shape
|
||||
|
||||
Use low-cardinality labels:
|
||||
|
||||
```json
|
||||
{
|
||||
"app": "superpowers-telemetry",
|
||||
"event": "visual_brainstorming_launched",
|
||||
"env": "prod"
|
||||
}
|
||||
```
|
||||
|
||||
Put higher-cardinality values in the JSON body:
|
||||
|
||||
```json
|
||||
{
|
||||
"event": "visual_brainstorming_launched",
|
||||
"surface": "brainstorming.visual_companion",
|
||||
"superpowersVersion": "5.1.0",
|
||||
"launchId": "ephemeral-random-id",
|
||||
"country": "US",
|
||||
"userAgent": "Mozilla/5.0 ...",
|
||||
"cfRay": "abc123",
|
||||
"timestamp": "2026-06-15T22:00:00.000Z"
|
||||
}
|
||||
```
|
||||
|
||||
Initial dashboard queries should count events directly. If duplicate analysis is needed, query by `launchId` in the JSON body rather than changing the write path.
|
||||
|
||||
## Abuse and Cost Controls
|
||||
|
||||
Cloudflare should rate-limit the public telemetry path by client IP and path. The Worker should perform cheap validation before any collector call and emit at most one collector request for one accepted event request.
|
||||
|
||||
API Gateway should use stage throttling. Lambda should use reserved concurrency to cap worst-case spend. The Lambda should validate HMAC and timestamp before parsing more deeply or writing to Loki. Loki payloads should remain tiny and labels should remain low-cardinality.
|
||||
|
||||
This design does not prove that every accepted event came from an authentic Superpowers user. The client is public and open source, so anyone can copy the endpoint format. The security goal is narrower: prevent casual spoofing from bypassing the Worker, bound unauthenticated public traffic, and preserve useful honest-usage telemetry.
|
||||
|
||||
## Privacy and Documentation
|
||||
|
||||
Superpowers docs should disclose:
|
||||
|
||||
- Visual companion launch telemetry is default-on.
|
||||
- Opt out with `SUPERPOWERS_DISABLE_TELEMETRY=1`.
|
||||
- Collected fields are event name, surface, Superpowers version, country, capped user agent, timestamp, Cloudflare ray ID, and ephemeral launch ID.
|
||||
- No prompts, project paths, file contents, user IDs, or browser interaction contents are sent.
|
||||
- Telemetry failure does not affect the visual companion.
|
||||
|
||||
## Testing
|
||||
|
||||
Superpowers tests should cover:
|
||||
|
||||
- Version and launch ID are injected into the helper config.
|
||||
- `SUPERPOWERS_DISABLE_TELEMETRY=1` suppresses telemetry config.
|
||||
- Full-document pages and framed fragment pages both receive the same helper behavior.
|
||||
- Local branding renders even when telemetry is disabled.
|
||||
- The helper's once-per-launch guard avoids repeated sends for the same launch ID.
|
||||
|
||||
Collector tests should cover:
|
||||
|
||||
- Valid Worker-signed payload writes the expected Loki entry.
|
||||
- Invalid signature, stale timestamp, wrong event, oversized body, and malformed fields are rejected.
|
||||
- Loki write failures return an error to the Worker but do not expose secrets.
|
||||
|
||||
Worker tests should cover:
|
||||
|
||||
- Exact route validation.
|
||||
- Query parameter validation and caps.
|
||||
- HMAC signing canonicalization.
|
||||
- Country, user agent, and ray ID enrichment.
|
||||
- No collector write on HEAD health requests.
|
||||
|
||||
## Rollout
|
||||
|
||||
1. Land the Superpowers UI and docs changes behind the default-on telemetry config.
|
||||
2. Deploy the Cloudflare Worker and Terminus collector.
|
||||
3. Smoke test from a local visual companion launch with telemetry enabled.
|
||||
4. Smoke test `SUPERPOWERS_DISABLE_TELEMETRY=1`.
|
||||
5. Confirm Loki events are queryable by `event`, version, country, and launch ID.
|
||||
6. Add Grafana/OpenPanel views only after the Loki stream is stable.
|
||||
|
||||
## Decisions and Future Work
|
||||
|
||||
- Use `t.primeradiant.com` for the telemetry endpoint unless DNS availability blocks deployment.
|
||||
- Keep Worker ownership with the Terminus telemetry collector. The `prime-radiant-inc.github.io` repo can continue hosting public brand assets, but it should not own telemetry code.
|
||||
- A remote Prime Radiant logo can be added later as a cosmetic UI change. If it is added, it should load from the static brand asset URL and must not define or fire telemetry.
|
||||
Reference in New Issue
Block a user