AGENTS

Scope

This repository has no existing conventions. These guidelines apply to all work in this repo.

Network capture and assets

Resource scope is limited to: audio, video, images, and viewport screenshots; do not collect other asset types unless explicitly requested.
When a task asks to harvest page resources, prioritize what the user requests (e.g., only media or only core assets). Ask for scope if unclear.
If the user provides a URL like https://h5.rrx.cn/storeview/<page-id>.html, extract <page-id>. Open a blank tab first, apply viewport override (width 390, height 844, devicePixelRatio 3, mobile: true, hasTouch: true), then navigate that tab to https://ca.rrx.cn/v/<page-id>?rrxsrc=2&iframe=1&tpl=1. Equivalent automation: call DevTools/Emulation to override device metrics with {width:390,height:844,deviceScaleFactor:3, mobile:true, hasTouch:true} before navigation to avoid double-loading assets.
Use DevTools network captures to list requests; identify media by MIME or URL suffix.
Save assets under downloads/<date>-<title>-<page-id>/media/ (title from current page; date format YYYYMMDD) with clean filenames (strip query strings and @! size suffixes; keep proper extensions). After download, rename any files still containing size tokens or missing extensions to the original base name + proper extension.
Also save the source page URL(s) provided by the user into the folder root as downloads/<date>-<title>-<page-id>/urls.txt.
Prefer direct downloads (e.g., curl) if DevTools bodies are unavailable or truncated.
After batch downloading, delete any 0-byte files, verify against the planned download list, and retry missing items up to 2 times; if still failing, stop and report the missing resources.
After collecting all requested resources and screenshots, close any additional tabs/pages opened for capture. This is mandatory; do not leave capture tabs open.

Download script usage

Use python download.py --page-id <id> --title "<title>" --urls urls.txt --sources source_urls.txt to batch download assets. The script generates <date> using format YYYYMMDD.
urls.txt should list the target asset URLs (one per line) already filtered to the requested scope (e.g., media only).
Downloads go to downloads/<date>-<title>-<page-id>/media/; filenames are cleaned (query/@! removed) and extensions retained/guessed; duplicates get numeric suffixes.
After the batch finishes, the script deletes 0-byte files, compares against the planned list, retries missing items up to 2 times, and reports any still-missing resources.
urls.txt is written to downloads/<date>-<title>-<page-id>/urls.txt to record user-provided page URLs.

Screenshots

Default viewport for screenshots: width 390, height 844, devicePixelRatio 3 (mobile portrait). Do not change unless the user explicitly requests another size.
Match the screenshot to the user’s requested viewport. If they mention a size, emulate it and verify with window.innerWidth/innerHeight and devicePixelRatio.
Capture screenshots with Chrome DevTools (device emulation per above) and save to downloads/<date>-<title>-<page-id>/index.png (title from current page; date format YYYYMMDD); use full-page only when explicitly asked.

Communication and confirmation

Do not ask for pre-work confirmation; proceed with default scope (media + viewport screenshot) unless the user explicitly specifies otherwise.
After completion, briefly confirm collected assets (paths + key filenames); do not prompt for extra formats unless the user asks.

Safety and precision

Avoid downloading unrequested resources. If download failures occur, retry and report any missing items clearly.

3.6 KiB Raw Blame History Unescape Escape

AGENTS

Scope

Network capture and assets

Download script usage

Screenshots

Communication and confirmation

Safety and precision

3.6 KiB

Raw Blame History