Skip to content

Streaming tool call input parameters dropped when translating Gemini → Anthropic Messages API format #25836

@dkindlund

Description

@dkindlund

Bug: Streaming tool call input parameters dropped when translating Gemini responses to Anthropic Messages API format

Description

When a client uses the Anthropic Messages API with stream: true to call a Gemini model through LiteLLM, tool call input parameters are silently dropped. The content_block_start event has input: {} (correct per spec), but no input_json_delta events follow to deliver the actual arguments. The tool name is preserved — only the parameters are lost.

Non-streaming requests to the same model with the same tools work correctly.

Steps to reproduce

Non-streaming (works):

curl -X POST http://localhost:4000/v1/messages \
  -H "Authorization: Bearer sk-key" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "gemini-2.5-pro",
    "max_tokens": 200,
    "tools": [{
      "name": "add_numbers",
      "description": "Add two numbers together.",
      "input_schema": {
        "type": "object",
        "properties": {
          "a": {"type": "number"},
          "b": {"type": "number"}
        },
        "required": ["a", "b"]
      }
    }],
    "messages": [{"role": "user", "content": "Use the add_numbers tool to compute 17 + 25."}]
  }'

Response correctly includes:

{"type": "tool_use", "name": "add_numbers", "input": {"a": 17, "b": 25}}

Streaming (broken):

curl -X POST http://localhost:4000/v1/messages \
  -H "Authorization: Bearer sk-key" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "gemini-2.5-pro",
    "max_tokens": 200,
    "stream": true,
    "tools": [{
      "name": "add_numbers",
      "description": "Add two numbers together.",
      "input_schema": {
        "type": "object",
        "properties": {
          "a": {"type": "number"},
          "b": {"type": "number"}
        },
        "required": ["a", "b"]
      }
    }],
    "messages": [{"role": "user", "content": "Use the add_numbers tool to compute 17 + 25."}]
  }'

Streaming response shows:

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "tool_use", "id": "...", "name": "add_numbers", "input": {}}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

No content_block_delta events with input_json_delta are emitted between content_block_start and content_block_stop. The tool arguments are completely lost.

Root cause analysis

The streaming translation path is: Gemini functionCall.args → OpenAI tool_calls[].function.arguments → Anthropic input_json_delta events.

Gemini → OpenAI translation (vertex_and_google_ai_studio_gemini.py line 1440-1446): correctly extracts functionCall.args and json-stringifies them into function.arguments. ✓

OpenAI → Anthropic content_block_start (transformation.py line 1213-1218): creates ToolUseBlock(input={}) — correct per Anthropic streaming spec, arguments should follow as deltas. ✓

OpenAI → Anthropic input_json_delta (transformation.py lines 1263-1302): code exists to extract tool.function.arguments and emit input_json_delta events. ✓

The gap: The code to generate input_json_delta events exists, but the events never reach the client. When Gemini sends the function call name and arguments in the same streaming chunk (rather than incrementally), the chunk is consumed by _translate_streaming_openai_chunk_to_anthropic_content_block() to create the content_block_start event, but the arguments from that same chunk are not re-processed to generate input_json_delta events.

The streaming iterator (streaming_iterator.py lines 121-148) detects a new content block, queues content_block_start, but the arguments that arrived in the same chunk are lost — the chunk is consumed and subsequent processing doesn't see them.

Impact

  • All MCP tool-based workflows are broken for Gemini models through the Anthropic Messages API streaming path
  • The Claude Agent SDK (@anthropic-ai/claude-agent-sdk) uses streaming by default, making all Gemini tool calls fail
  • The model correctly understands it should call tools but can never pass parameters, causing infinite retry loops until max_turns is exceeded
  • Affects gemini-2.5-pro and likely all Gemini models with tool use

Environment

  • LiteLLM v1.83.3-stable
  • Gemini models via Vertex AI
  • Client: Anthropic Messages API with stream: true
  • Same tools work correctly with Claude models through the same proxy
  • Same tools work correctly with Gemini in non-streaming mode

Workarounds

  1. Use Claude models instead of Gemini for tool-calling agent workflows
  2. Use the Claude Agent SDK's single message (non-streaming) input mode instead of streaming input mode — but this loses image upload, hooks, and interruption support

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions