Files
rrx-downloader/AGENTS.md

3.6 KiB
Raw Blame History

AGENTS

Scope

This repository has no existing conventions. These guidelines apply to all work in this repo.

Network capture and assets

  • Resource scope is limited to: audio, video, images, and viewport screenshots; do not collect other asset types unless explicitly requested.
  • When a task asks to harvest page resources, prioritize what the user requests (e.g., only media or only core assets). Ask for scope if unclear.
  • If the user provides a URL like https://h5.rrx.cn/storeview/<page-id>.html, extract <page-id>. Open a blank tab first, apply viewport override (width 390, height 844, devicePixelRatio 3, mobile: true, hasTouch: true), then navigate that tab to https://ca.rrx.cn/v/<page-id>?rrxsrc=2&iframe=1&tpl=1. Equivalent automation: call DevTools/Emulation to override device metrics with {width:390,height:844,deviceScaleFactor:3, mobile:true, hasTouch:true} before navigation to avoid double-loading assets.
  • Use DevTools network captures to list requests; identify media by MIME or URL suffix.
  • Save assets under downloads/<date>-<title>-<page-id>/media/ (title from current page; date format YYYYMMDD) with clean filenames (strip query strings and @! size suffixes; keep proper extensions). After download, rename any files still containing size tokens or missing extensions to the original base name + proper extension.
  • Also save the source page URL(s) provided by the user into the folder root as downloads/<date>-<title>-<page-id>/urls.txt.
  • Prefer direct downloads (e.g., curl) if DevTools bodies are unavailable or truncated.
  • After batch downloading, delete any 0-byte files, verify against the planned download list, and retry missing items up to 2 times; if still failing, stop and report the missing resources.
  • After collecting all requested resources and screenshots, close any additional tabs/pages opened for capture. This is mandatory; do not leave capture tabs open.

Download script usage

  • Use python download.py --page-id <id> --title "<title>" --urls urls.txt --sources source_urls.txt to batch download assets. The script generates <date> using format YYYYMMDD.
  • urls.txt should list the target asset URLs (one per line) already filtered to the requested scope (e.g., media only).
  • Downloads go to downloads/<date>-<title>-<page-id>/media/; filenames are cleaned (query/@! removed) and extensions retained/guessed; duplicates get numeric suffixes.
  • After the batch finishes, the script deletes 0-byte files, compares against the planned list, retries missing items up to 2 times, and reports any still-missing resources.
  • urls.txt is written to downloads/<date>-<title>-<page-id>/urls.txt to record user-provided page URLs.

Screenshots

  • Default viewport for screenshots: width 390, height 844, devicePixelRatio 3 (mobile portrait). Do not change unless the user explicitly requests another size.
  • Match the screenshot to the users requested viewport. If they mention a size, emulate it and verify with window.innerWidth/innerHeight and devicePixelRatio.
  • Capture screenshots with Chrome DevTools (device emulation per above) and save to downloads/<date>-<title>-<page-id>/index.png (title from current page; date format YYYYMMDD); use full-page only when explicitly asked.

Communication and confirmation

  • Do not ask for pre-work confirmation; proceed with default scope (media + viewport screenshot) unless the user explicitly specifies otherwise.
  • After completion, briefly confirm collected assets (paths + key filenames); do not prompt for extra formats unless the user asks.

Safety and precision

  • Avoid downloading unrequested resources. If download failures occur, retry and report any missing items clearly.