Skip to content

Improve cross-user prompt cache sharing with --exclude-dynamic-system-prompt-sections #807

@alx32

Description

@alx32

TLDR

The --exclude-dynamic-system-prompt-sections flag currently achieves ~82% cross-user cache sharing, but actually ~98% of the content is sharable. Currently, the flag moves a ~14K-char block from the system prompt into the user message, but ~12K of that block is shared boilerplate (auto-memory type definitions, examples, save/access instructions) identical for every user. Only ~550 chars actually vary per user. Keeping the shared boilerplate in the system prompt and only moving the truly per-user values would significantly improve cross-user cache hit rates.

Context

The --exclude-dynamic-system-prompt-sections flag (CLI) / exclude_dynamic_sections (SDK SystemPromptPreset) was a great addition for enabling cross-user prompt caching. It moves per-user dynamic sections from the system prompt into a <system-reminder> block in the first user message, so that the system prompt is identical across users and can be cached.

The flag works correctly — system prompts are byte-for-byte identical across different users/directories when it's enabled. However, there's room to further improve how much content stays in the cacheable system prompt.

Observation

When exclude_dynamic_sections: true is set, the following content gets moved from the system prompt to the first user message:

Content Size Varies per user?
Auto-memory type definitions, examples, save/access instructions ~12,000 chars No — identical for every user
Auto-memory header + memory storage path ~740 chars Yes — the path varies, the instructions don't
Environment section (CWD, git status, platform, shell, OS, model info) ~985 chars Partially — CWD, git status, platform, shell, OS vary; model info boilerplate doesn't
Date ~40 chars Yes
CLAUDE.md file path and contents ~100 chars Yes
Git recent commits ~15 chars Yes

The relocated block totals ~13.9K chars. About ~12K of that (~87%) is the auto-memory instructions template — identical for every user. The rest contains a mix of per-user values and shared boilerplate. Only ~550 chars actually vary between users/machines.

Caching impact

Measuring system prompt + first user message (excluding tools):

Configuration Cacheable (system prompt) Per-request (user message) Cache sharing
Without flag (baseline) 0% — system prompt differs per user 100% 0%
With flag (current) 46% 54% ~46%
With proposed improvement 93% 7% ~93%

Reproducer

See this repro folder for:

  • reproduce.sh — Self-contained script that captures actual API request bodies with and without the flag. Requires the ANTHROPIC_API_KEY environment variable to be set.
  • proxy_server.py — Intercepting HTTP proxy that logs request bodies.

Diffs

Here's how the flag currently changes the outgoing API request: baseline vs. with flag enabled.

And here's the additional change we'd like to see — moving shared boilerplate back into the cacheable system prompt while keeping only per-user values in the user message: current behavior vs desired behavior.

Suggestion

A template-and-bind approach could work well here: keep the auto-memory instructions, environment template text, and other shared documentation in the system prompt with placeholders (e.g., {{MEMORY_PATH}}, {{CWD}}), and resolve them using per-user values provided in the first user message.

Everything that doesn't vary per user/machine (memory type definitions, save/access instructions, examples, model info boilerplate, environment template text) is identical for all users on the same model and could stay in the system prompt.

Environment

  • Claude Code CLI: 2.1.98
  • Claude Agent SDK (Python): 0.1.58
  • Model tested: haikuclaude-haiku-4-5-20251001
  • OS: Linux arm64

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions