Bulk-download every PDF in your Adobe Creative Cloud account to a local folder, with a manifest that tracks what you've already pulled so re-runs only fetch new or changed files.
Adobe's web UI has no "download everything" button. If you've accumulated hundreds of PDFs in Cloud Documents over the years and want a local copy — for backup, migration off Creative Cloud, or just because — clicking each file by hand isn't reasonable. This script signs in once via a browser window, then talks directly to the same storage API the Creative Cloud Home web app uses.
One developer's account, ~876 PDFs, ran end-to-end. Resumes cleanly, skips files that haven't changed since the last run, and reconciles local deletions.
Working but rough. It does what I needed and it's been pushed up here in the hope that it's useful to someone else. Contributions welcome — see Open issues / wanted contributions below.
- Playwright launches a real Chromium window using a persistent profile in
~/.adobe_pdf_downloader/chrome_profile/. First run, you sign in to Adobe in that window. Subsequent runs reuse the saved session — no re-login unless Adobe expires it. - The script captures your IMS bearer token directly from
window.adobeIMS.getAccessToken()in the page context. - It auto-detects your account's root URN (
urn:aaid:sc:US:...) by listening for the first/links?assetId=...request the SPA fires after sign-in. The URN is then cached inmanifest.jsonso future runs don't need to detect it again. - It hits the storage discovery endpoint:
This single paginated walk returns every PDF in your entire Cloud Documents tree (recursive, not just the root folder).
GET <regional-host>/content/storage/id/<root_urn>/:page?type=application/pdf&limit=500 - For each PDF, it downloads the bytes via
<regional-host>/content/storage/id/<assetId>. Large files that exceed the direct-asset response limit fall back to ablock_downloaddescriptor → signed blobstore URL. - Downloads stream through
urllib.requeston a worker thread (atomic.part→ final rename), so big files don't buffer through Playwright's IPC channel. - Every successful file gets recorded in
manifest.jsonwithsha256, sizes, modification time, etag, local path, and a status field (downloaded/failed/missing_locally/deleted_remotely).
The script sends x-api-key: CCHomeWeb1 — this is the public client identifier that Adobe's own Creative Cloud Home web app sends from every user's browser. It's not a credential and not a secret; you can see the same value in your browser's Network panel any time you visit adobe.com/files/cloud-documents. Per Adobe's own developer docs, an API key only identifies the calling application and cannot authenticate a user. Your actual authentication is the IMS bearer token captured from your signed-in session.
- Python 3.10 or newer (the code uses PEP 604 union syntax:
str | None). - macOS, Linux, or Windows. Developed on macOS; Linux should be fine; Windows is untested — see Open issues.
- ~150 MB of disk for the Playwright Chromium build, plus however much your PDFs total.
- An active Adobe Creative Cloud account with PDFs you want to back up.
git clone https://github.com/pasolomon/Adobe-Clawback.git
cd adobe-clawback
./setup.shsetup.sh creates .venv/, installs playwright from requirements.txt, and downloads the Chromium browser binary.
Always activate the venv first:
cd adobe-clawback
source .venv/bin/activateThen:
# Download everything (resumes / catches up on subsequent runs)
python adobe_pdf_downloader.py
# Just list what's in your account, no downloads
python adobe_pdf_downloader.py --list
# Reconcile manifest against disk (no Chrome, no network)
# Useful if you deleted some files locally and want the manifest to reflect that.
python adobe_pdf_downloader.py --reconcile
# Manually override the root URN (rarely needed; only if auto-detection fails)
python adobe_pdf_downloader.py --root urn:aaid:sc:US:00000000-0000-0000-0000-000000000000A Chromium window opens pointing at https://www.adobe.com/files/cloud-documents. Sign in with your Adobe credentials in that window. The script watches in the background and proceeds automatically once it detects:
window.adobeIMS.isSignedInUser()returns true, and- It has captured your root URN from a
/links?assetId=urn:aaid:sc:US:...request.
Default sign-in timeout is 600 seconds (10 minutes). After sign-in, downloads begin and the window stays open until the run finishes.
The persistent Chromium profile keeps you signed in, so the window appears, immediately registers you as authenticated, and discovery + download starts within a second or two. If Adobe has expired your session you'll be prompted to sign in again.
adobe-clawback/
├── downloads/ # all your PDFs land here
│ ├── Some Document.pdf
│ ├── Another File.pdf
│ └── ...
├── manifest.json # state file (do not commit; in .gitignore)
├── adobe_pdf_downloader.py
├── setup.sh
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore
Filenames are sanitized (/, \, :, <, >, ", |, ?, * → _) and capped at 200 characters. Collisions get (1), (2), etc. appended.
manifest.json is the source of truth for what's been downloaded. Top-level shape:
{
"version": 3,
"created_at": "2025-...",
"root_urn": "urn:aaid:sc:US:...",
"regional_host": "https://platform-cs-edge-va6.adobe.io",
"files": {
"urn:aaid:sc:US:<asset-id>": {
"id": "urn:aaid:sc:US:<asset-id>",
"name": "Some Document.pdf",
"adobe_path": "/Folder/Subfolder/Some Document.pdf",
"size_remote": 123456,
"size_local": 123456,
"modified": "2024-...",
"etag": "...",
"local_path": "downloads/Some Document.pdf",
"sha256": "abc123...",
"downloaded_at": "2025-...",
"last_seen_remote": "2025-...",
"status": "downloaded"
}
},
"runs": [ { "started_at": "...", "ended_at": "...", "mode": "...", "discovered": 0, "downloaded": 0, "skipped": 0, "failed": [] } ],
"last_run": { "...": "..." }
}status values:
| Status | Meaning |
|---|---|
downloaded |
File is on disk and matches the remote. |
failed |
Last attempt errored out. last_error field has details. |
missing_locally |
Manifest says it was downloaded, but the file is no longer on disk. |
deleted_remotely |
File no longer appears in the Adobe listing on the most recent run. |
If manifest.json ever gets corrupted (interrupted write, etc.), it's automatically backed up to manifest.corrupt.<timestamp>.json and a fresh one is started — your downloads are not deleted, but they'll be re-hashed on the next run.
A file is skipped when all of these are true:
- It exists in the manifest with
status == "downloaded", - The file at
local_pathstill exists on disk, - The remote
modifiedtimestamp matches the manifest entry'smodified.
Otherwise it gets re-downloaded. So:
- New files in Adobe → downloaded.
- File modified in Adobe (different
modifiedtimestamp) → re-downloaded, overwriting the local copy. - File deleted from disk → re-downloaded (unless you also
--reconcile, in which case the manifest is updated tomissing_locallyfirst). - File deleted from Adobe → kept locally, manifest entry flips to
deleted_remotely.
"Couldn't auto-detect root URN"
The script tries three strategies: listening for a /links?assetId=... request, parsing it from a /files/id/<urn> URL, and using the cached value in the manifest. If all three fail, navigate into any folder in the open Chromium window — the URL will contain the URN — or run with --root urn:aaid:sc:US:... once.
Sign-in window timed out
Default is 10 minutes. If you need longer, edit SIGN_IN_TIMEOUT_S near the top of the script.
401 after a long-running download
The script catches 401s and refreshes the IMS token from window.adobeIMS.getAccessToken() automatically. If it still fails, your session has expired entirely — close the script (Ctrl+C) and re-run; you'll be prompted to sign in again.
responsetoolarge for a particular file
Handled automatically: the script falls back to :block_download to get a signed blobstore URL and streams from there.
429 / 5xx
Exponential backoff with Retry-After honored where present. If you're seeing sustained throttling, slow your re-runs down.
In rough priority order:
- Windows + Linux testing. Developed on macOS only.
- Headless mode after first sign-in. Currently always runs headful. Once a session is cached, there's no reason the browser needs to be visible.
- Concurrent downloads. Sequential is simple but slow for many small files. Cap at ~4 concurrent to stay polite.
- Progress bar.
richortqdmwould be a big UX upgrade, currently it just prints lines. - More file types. Code is hard-coded to
type=application/pdf. The same endpoint will serveimage/*,application/illustrator,application/photoshop, etc. — a--typeflag (or "everything") would make this much more useful. - Resume partial downloads.
.partfiles are deleted on error; HTTPRangerequests + checksums could resume. - Tests. None currently. Mocking Adobe's API is non-trivial but valuable.
- Better edge-case handling for non-US accounts / non-
platform-cs-edge-va6regions. Discovery should work via the/linksrel walk, but onlyva6has been observed in practice. - Docker / single-binary build for users who don't want to install Python.
If you're picking one up, an issue or draft PR before you start saves us both time.
- The script runs entirely on your machine.
- Your Adobe credentials are entered in the Chromium window — they go to Adobe, not to this code.
- The IMS bearer token lives in memory during a run and is never written to disk by this script. (Playwright's persistent profile stores Adobe's session cookies in
~/.adobe_pdf_downloader/chrome_profile/, same as any browser would.) manifest.jsoncontains your account's root URN, regional host, and full list of file paths/names. Do not commit it to a public repo. The supplied.gitignoreexcludes it.
This tool uses your own credentials to download your own files via the same endpoints Adobe's web app uses. It does not bypass authentication, scrape other users' content, or violate any access controls. That said:
- It is not affiliated with or endorsed by Adobe Inc.
- Adobe could change their API at any time and break it.
- Your account's terms of service apply. Use within them.
- No warranty — see LICENSE.
MIT — see file for full text.