Files
rrx-downloader/AGENTS.md

36 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS
## Scope
This repository has no existing conventions. These guidelines apply to all work in this repo.
## Network capture and assets
- Resource scope is limited to: audio, video, images, and viewport screenshots; do not collect other asset types unless explicitly requested.
- When a task asks to harvest page resources, prioritize what the user requests (e.g., only media or only core assets). Ask for scope if unclear.
- If the user provides a URL like `https://h5.rrx.cn/storeview/<page-id>.html`, extract `<page-id>`. Open a blank tab first, apply viewport override (width 375, height 667, devicePixelRatio 3, mobile: true, hasTouch: true), then navigate that tab to `https://ca.rrx.cn/v/<page-id>?rrxsrc=2&iframe=1&tpl=1`. Equivalent automation: call DevTools/Emulation to override device metrics with `{width:375,height:667,deviceScaleFactor:3, mobile:true, hasTouch:true}` before navigation to avoid double-loading assets.
- Use DevTools network captures to list requests; identify media by MIME or URL suffix.
- Save assets under `downloads/<date>-<title>-<page-id>/media/` (title from current page; date format `YYYYMMDD`) with clean filenames (strip query strings and `@!` size suffixes; keep proper extensions). After download, rename any files still containing size tokens or missing extensions to the original base name + proper extension.
- Also save the source page URL(s) provided by the user into the folder root as `downloads/<date>-<title>-<page-id>/urls.txt`.
- Prefer direct downloads (e.g., curl) if DevTools bodies are unavailable or truncated.
- After batch downloading, delete any 0-byte files, verify against the planned download list, and retry missing items up to 2 times; if still failing, stop and report the missing resources.
- After collecting all requested resources and screenshots, close any additional tabs/pages opened for capture. This is mandatory; do not leave capture tabs open.
## Download script usage
- Primary workflow: run `node run.mjs <page-url>` to capture network requests, screenshot, and download media in one step. This script uses Playwright Chromium to open a browser with mobile viewport (375×667 @ dpr 3), navigate to the page, capture audio/video/image URLs, take a viewport screenshot, then download assets directly (no separate `download.mjs`). For remote debugging, pass `--cdp ws://host:port/devtools/browser/<id>` or `--cdp http://host:port` (or set `ENV_CDP`) to resolve and connect to a Chrome DevTools endpoint.
- For manual control: use `python download.py --page-id <id> --title "<title>" --urls urls.txt --sources source_urls.txt` to batch download assets. The script generates `<date>` using format `YYYYMMDD`.
- `urls.txt` should list the target asset URLs (one per line) already filtered to the requested scope (e.g., media only).
- Downloads go to `downloads/<date>-<title>-<page-id>/media/`; filenames are cleaned (query/`@!` removed) and extensions retained/guessed; duplicates get numeric suffixes.
- After the batch finishes, the script deletes 0-byte files, compares against the planned list, retries missing items up to 2 times, and reports any still-missing resources.
- `urls.txt` is written to `downloads/<date>-<title>-<page-id>/urls.txt` to record user-provided page URLs.
## Screenshots
- Default viewport for screenshots: width 375, height 667, devicePixelRatio 3 (mobile portrait). Do not change unless the user explicitly requests another size.
- Match the screenshot to the users requested viewport. If they mention a size, emulate it and verify with `window.innerWidth/innerHeight` and `devicePixelRatio`.
- Capture screenshots with Chrome DevTools (device emulation per above) and save to `downloads/<date>-<title>-<page-id>/index.png` (title from current page; date format `YYYYMMDD`); use full-page only when explicitly asked.
## Communication and confirmation
- Do not ask for pre-work confirmation; proceed with default scope (media + viewport screenshot) unless the user explicitly specifies otherwise.
- After completion, briefly confirm collected assets (paths + key filenames); do not prompt for extra formats unless the user asks.
## Safety and precision
- Avoid downloading unrequested resources. If download failures occur, retry and report any missing items clearly.