Envoy AI Gateway v0.5.0

Multi-gateway configuration, prompt caching cost savings, fine-grained MCP authorization, OpenAI Responses API, and Google Search grounding for Gemini.

Envoy AI Gateway v0.5.0 makes multi-gateway deployments easier with the new GatewayConfig CRD, cuts costs with prompt caching for AWS Bedrock and GCP Claude, and unlocks fine-grained access control with CEL-based MCP authorization. Developers gain OpenAI Responses API support, Google Search grounding for Gemini, and the ability to mutate request bodies per-route. Under the hood, the switch to sonic JSON processing reduces latency across all requests.

📖 Full documentation

✨ New Features

Gateway Configuration

New GatewayConfig CRD — Gateway-scoped configuration via a new custom resource. Reference it from a Gateway via the aigateway.envoyproxy.io/gateway-config annotation to configure the external processor container (env vars, resource requirements, container settings). Multiple Gateways can share the same GatewayConfig.
Configurable endpoint prefixes — New prefix field on VersionedAPISchema for backends with non-standard OpenAI-compatible prefixes (e.g., Gemini's /v1beta/openai, Cohere's /compatibility/v1).

OpenAI API Support

OpenAI Responses API (/v1/responses) — Full support with streaming and non-streaming modes, function calling, MCP tools, reasoning, multi-turn conversations, multimodal capabilities, token usage tracking, and OpenInference tracing.

Provider Caching Enhancements

Prompt caching for AWS Bedrock Claude — Reuse cached system prompts with Bedrock Anthropic models. Cache point markers are handled automatically with separate tracking for cache creation and cache hit tokens.
Prompt caching for GCP Vertex AI Claude — Same cost-saving prompt caching for Claude models on GCP Vertex AI for system prompts and few-shot examples.

MCP Gateway Enhancements

Fine-grained authorization with CEL, JWT claims, and external auth — Write expressive CEL rules using request attributes (HTTP method, headers, JWT claims, tool names, call arguments), enforce access based on JWT claim values, or delegate to external gRPC/HTTP authorization services.
Real-time tool list synchronization — MCP clients automatically receive notifications/tools/list_changed when MCPRoutes update, refreshing available tools without reconnection.
Stdio server proxy in standalone mode — Run command-line MCP tools (e.g., npx-based servers) without code changes via the aigw CLI HTTP proxy.
Improved OAuth metadata discovery — Well-known endpoints now serve at the MCPRoute path prefix for correct authorization discovery across multiple routes.

Inference Extension

Security policies for inference pools — Apply BackendSecurityPolicy to InferencePool resources for consistent authentication across dynamically-selected inference endpoints.

Gemini Provider Enhancements

Google Search grounding — Give Gemini models access to real-time web information via the google_search tool type with domain filtering, blocking confidence thresholds, and time range restrictions.
Consistent thinking configuration across providers — Same thinking configuration works for both Anthropic and Gemini models for provider-agnostic reasoning features.
Gemini 3 reasoning and image quality controls — thinking_level (reasoning depth) and media_resolution (image quality vs. speed) with graceful degradation on older Gemini versions.
Visibility into model reasoning — Thought summaries extracted and surfaced from Gemini responses when thinking is enabled.
Enterprise web search integration — enterprise_search tool type for grounding responses in organization-specific search infrastructure and data sources.

Traffic Management

Route-level body mutation — Inject or remove JSON fields in request bodies per-backend using bodyMutation with set and remove operations. Route-level settings override backend defaults.
AWS Bedrock service tier control — Choose between standard, flex, priority, and reserved tiers for latency-sensitive or cost-optimized workloads with automatic fallback handling.

Observability Enhancements

Per-provider cost attribution — New gen_ai.provider.name metric attribute for filtering dashboards and alerts by provider.
Full tracing for Anthropic Messages API — OpenInference-compliant tracing for the native /messages endpoint, compatible with Arize Phoenix and OpenTelemetry platforms.
Cohere Rerank visibility — Full OpenTelemetry support for Cohere's v2 rerank endpoint capturing query, documents, and relevance scores.

Performance and Operations

Faster request processing with sonic JSON — Migrated to bytedance/sonic for JSON encoding/decoding with measurable latency improvements and lower CPU usage.
Faster cross-namespace reference validation — Optimized ReferenceGrant indexing reduces controller reconciliation time.
Improved MCP proxy throughput — HTTP connection reuse across MCP proxy requests eliminates per-request connection overhead. Details →

🔗 API Updates

New GatewayConfig CRD — Gateway-level configuration with extProc.kubernetes for container settings. Reference via aigateway.envoyproxy.io/gateway-config annotation.
VersionedAPISchema.prefix — New prefix field replaces overloading version for endpoint path customization.
AIGatewayRouteRuleBackendRef.bodyMutation — New field with set (field/value pairs) and remove (field names) for request body manipulation.
LLMRequestCostType.CacheCreationInputToken — New cost type for tokens written to cache, separate from CachedInputToken.
MCPRouteSecurityPolicy authorization fields — New authorization block with defaultAction, rules array (CEL, JWT scopes/claims, tools targeting), and extAuth for external authorization.
BackendSecurityPolicy.targetRefs expansion — Now accepts InferencePool (inference.networking.x-k8s.io) in addition to AIServiceBackend.

Deprecations

AIGatewayFilterConfigExternalProcessor.resources — Deprecated. Use GatewayConfig instead. Will be removed in v0.6.
version field as prefix for OpenAI schema — Deprecated. Use the new prefix field. Legacy behavior will be removed in v0.6.

🐛 Bug Fixes

AWS Bedrock Claude streaming reliability — Streaming responses from Bedrock Claude models now complete correctly without truncation.
Gemini streaming token counts — Token usage in Gemini streaming responses now matches OpenAI format.
Multi-chunk Gemini tool calls — Tool calls spanning multiple streaming chunks now have correct indices.
GCP Claude reasoning content — Reasoning/thinking content correctly passes through for Claude on GCP Vertex AI.
Zero-weight backend references — Backend references with zero weight no longer cause routing errors.
Umbrella chart image pull secrets — Helm deployments within umbrella charts correctly inherit global.imagePullSecrets.
GCP global region backends — Vertex AI backends with global region now work correctly.
Accurate per-token latency metrics — Fixed integer truncation in time_per_output_token calculation.
Anthropic token counting — Improved accuracy of input and output token counts for Anthropic models.

📖 Upgrade Guidance

Migrating to GatewayConfig

If you're using AIGatewayFilterConfigExternalProcessor.resources, migrate to the new GatewayConfig CRD:

Create a GatewayConfig resource:

apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: GatewayConfig
metadata:
  name: my-gateway-config
  namespace: default
spec:
  extProc:
    kubernetes:
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"
      env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector:4317"

Reference from your Gateway:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ai-gateway
  annotations:
    aigateway.envoyproxy.io/gateway-config: my-gateway-config

Migrating Endpoint Prefix Configuration

Before:

schema:
  name: OpenAI
  version: "/v1beta/openai"  # Deprecated

After:

schema:
  name: OpenAI
  prefix: "/v1beta/openai"

📦 Dependencies

Dependency	Version
Go	1.25.6
Envoy Gateway	v1.6
Envoy Proxy	v1.36.4
Gateway API	v1.4.0
Gateway API Inference Extension	v1.0.2

🙏 Acknowledgements

Special thanks to the growing community of adopters including Bloomberg, LY Corporation, Alan by Comma Soft, and NRP for their production insights, everyone who reported bugs, submitted PRs, and participated in design discussions, and the Envoy Gateway team for continued collaboration.

🔮 What's Next

Additional provider integrations (AWS Bedrock InvokeModel, Gemini embeddings, Azure/AKS workload identity)
Batch inference APIs for high-volume workloads
Advanced caching strategies with prompt cache key and retention controls
Upstream provider quota policies
Sensitive data redaction for request and response bodies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0

Choose a tag to compare

Sorry, something went wrong.