Support ENV_CDP and HTTP CDP endpoints
This commit is contained in:
@@ -15,7 +15,7 @@ This repository has no existing conventions. These guidelines apply to all work
|
||||
- After collecting all requested resources and screenshots, close any additional tabs/pages opened for capture. This is mandatory; do not leave capture tabs open.
|
||||
|
||||
## Download script usage
|
||||
- Primary workflow: run `node run.mjs <page-url>` to capture network requests, screenshot, and download media in one step. This script uses Puppeteer to open a browser with mobile viewport (390×844 @ dpr 3), navigate to the page, capture audio/video/image URLs, take a viewport screenshot, then call `download.mjs` to batch download assets.
|
||||
- Primary workflow: run `node run.mjs <page-url>` to capture network requests, screenshot, and download media in one step. This script uses Playwright Chromium to open a browser with mobile viewport (390×844 @ dpr 3), navigate to the page, capture audio/video/image URLs, take a viewport screenshot, then call `download.mjs` to batch download assets. For remote debugging, pass `--cdp ws://host:port/devtools/browser/<id>` or `--cdp http://host:port` (or set `ENV_CDP`) to resolve and connect to a Chrome DevTools endpoint.
|
||||
- For manual control: use `python download.py --page-id <id> --title "<title>" --urls urls.txt --sources source_urls.txt` (Python) or `node download.mjs --page-id <id> --title "<title>" --urls urls.txt --sources source_urls.txt` (Node.js) to batch download assets. The script generates `<date>` using format `YYYYMMDD`.
|
||||
- `urls.txt` should list the target asset URLs (one per line) already filtered to the requested scope (e.g., media only).
|
||||
- Downloads go to `downloads/<date>-<title>-<page-id>/media/`; filenames are cleaned (query/`@!` removed) and extensions retained/guessed; duplicates get numeric suffixes.
|
||||
|
||||
Reference in New Issue
Block a user