context-chef — AI SDK middleware for automatic history compression, tool result truncation & token budget management #14266

MyPrototypeWhat · 2026-04-09T11:11:58Z

MyPrototypeWhat
Apr 9, 2026

Built an AI SDK middleware that handles context window management transparently. Wrap your model once, everything else stays the same — works with generateText, streamText, and agent tool loops.

The Problem

Building multi-turn agents with AI SDK, I kept hitting the same issues:

Context overflow — after 40+ turns, conversations exceed the token budget. You either truncate (losing context) or crash
Tool result bloat — a single run_bash can return 50KB+ of output, eating 10% of your context window
Manual token tracking — calling reportTokenUsage() or counting tokens manually on every turn is tedious and error-prone

The Solution

A single withContextChef() wrapper that handles all of this behind the scenes:

import { withContextChef } from '@context-chef/ai-sdk-middleware';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';

const model = withContextChef(openai('gpt-4o'), {
  contextWindow: 128_000,
  compress: { model: openai('gpt-4o-mini') },
  truncate: { threshold: 5000, headChars: 500, tailChars: 1000 },
});

// Everything below stays exactly the same
const result = await generateText({
  model,
  messages: conversationHistory,
  tools: myTools,
});

What it does

History compression — When conversation exceeds the token budget, older messages are automatically summarized by a cheap model (e.g. gpt-4o-mini). Recent messages are preserved. Includes a circuit breaker: if the compression model fails 3 times, it degrades gracefully instead of crashing your agent.

Tool result truncation — Large tool outputs are automatically truncated with head + tail preservation (keeps the command at the top and errors at the bottom). Optionally persists the full output to a storage adapter so the LLM can retrieve it later via a context://vfs/ URI.

truncate: {
  threshold: 5000,   // truncate results over 5000 chars
  headChars: 500,    // preserve first 500 chars
  tailChars: 1000,   // preserve last 1000 chars
  storage: new FileSystemAdapter('.context_vfs'), // optional: persist originals
},

Automatic token tracking — Extracts token usage from generateText/streamText responses and feeds it back to the compression engine. No manual tracking needed.

Compact (zero-cost pruning) — Mechanical message pruning that removes reasoning blocks and old tool calls without any LLM cost. Delegates to AI SDK's pruneMessages under the hood:

compact: {
  reasoning: 'all',
  toolCalls: 'before-last-message',
},

How it works

generateText / streamText ({ model: wrappedModel, messages })
  │
  ▼
transformParams (before LLM call)
  1. Truncate large tool results
  2. Convert AI SDK messages → context-chef IR
  3. Run compression if over token budget
  4. Convert back to AI SDK messages
  │
  ▼
LLM call executes normally
  │
  ▼
wrapGenerate / wrapStream (after LLM call)
  5. Extract token usage from response
  6. Feed back for next call's budget check
  │
  ▼
Result returned unchanged

The middleware is stateful — it tracks token usage across calls to know when compression is needed. Create one wrapped model per conversation/session.

Need more control?

The middleware covers the most common use case. For advanced features like:

Dynamic state injection (Zod-validated task state injected at optimal position to prevent state drift)
Tool namespaces (two-layer architecture to prevent tool hallucination with 30+ tools)
Cross-session memory (persistent key-value memory with TTL)
Snapshot & restore (rollback full context state for branching/error recovery)
Multi-provider compilation (same prompt architecture → OpenAI / Anthropic / Gemini payloads)

Use @context-chef/core directly.

Links

Would love feedback — especially on what other middleware hooks would be useful for AI SDK agent workflows!

2026-04-09T11:12:05Z

github-actions[bot]
bot Apr 9, 2026

This discussion was automatically closed because the community moved to community.vercel.com/ai-sdk

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context-chef — AI SDK middleware for automatic history compression, tool result truncation & token budget management #14266

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

context-chef — AI SDK middleware for automatic history compression, tool result truncation & token budget management #14266

Uh oh!

MyPrototypeWhat Apr 9, 2026

The Problem

The Solution

What it does

How it works

Need more control?

Links

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 9, 2026

MyPrototypeWhat
Apr 9, 2026

github-actions[bot]
bot Apr 9, 2026