Skip to content

fix(security): add SafeInt overflow protection in Expand and constant folding output size limit#28055

Open
tianleiwu wants to merge 4 commits intomainfrom
tlwu/safe_const_folding
Open

fix(security): add SafeInt overflow protection in Expand and constant folding output size limit#28055
tianleiwu wants to merge 4 commits intomainfrom
tlwu/safe_const_folding

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

Description

Harden the constant folding optimizer and the Expand CPU kernel against integer overflow attacks from crafted ONNX models.

Problem: The Expand::Compute() kernel performs cumulative dimension multiplications (input_count *= input_dim, output_count *= output_dim) using raw int64_t arithmetic. When triggered during constant folding at CreateSession() time via a crafted model with extreme shape values, signed integer overflow can produce corrupted values used for buffer offset calculations and memcpy lengths, creating a potential out-of-bounds write. The downstream SafeInt check in the allocator catches overflow only when the total byte count wraps, but carefully chosen dimensions can make the overflowed value appear valid.

Additionally, the constant folding optimizer has no output size budget — any deterministic node with constant inputs is eligible for constant folding regardless of output size, enabling memory exhaustion attacks at model load time.

Key Changes

1. SafeInt-protected arithmetic in expand.cc

Wraps all dimension accumulation and offset/length calculations with SafeInt<int64_t> or SafeInt<size_t> to catch overflow before it can corrupt buffer arithmetic:

Location Before After
Accumulator loop (L97-98) input_count *= input_dim SafeInt<int64_t>(input_count) * input_dim
Accumulator loop (L109) last_dim_size *= expand_dim_size[...] SafeInt<int64_t>(last_dim_size) * ...
copy_byte (L116) copy_len * sizeof(T) SafeInt<size_t>(copy_len) * sizeof(T)
input_offset (L122) i * copy_len SafeInt<int64_t>(i) * copy_len
output_offset (L126) output_offset += current_count * ... SafeInt<int64_t>(output_offset) + SafeInt<int64_t>(current_count) * ...

2. Constant folding output size limit in constant_folding.cc

  • Pre-execution check: EstimateNodeOutputSizeInBytes() uses shape inference results with SafeInt-protected arithmetic to estimate total output bytes. Nodes exceeding the limit are skipped.
  • Post-execution check: After kernel->Compute(), actual output SizeInBytes() is verified against the limit (catches cases where shape inference couldn't determine output size).
  • Exception isolation: kernel->Compute() is wrapped in try/catch so that SafeInt overflow exceptions from individual nodes skip the node rather than aborting the entire optimization pass.
  • Configurable limit: New session option optimization.constant_folding_max_output_size_in_bytes (default: 1 GB, "0" to disable).

3. Session option

New key kOrtSessionOptionsConstantFoldingMaxOutputSizeInBytes in onnxruntime_session_options_config_keys.h.

Motivation and Context

This addresses a security vulnerability where a malicious ONNX model can cause signed integer overflow in the Expand kernel during constant folding at model load time (CreateSession()), potentially leading to out-of-bounds memory writes. The constant folding size limit provides defense-in-depth against memory exhaustion attacks from untrusted models.

Testing

  • ConstantFoldingOutputSizeLimit — Verifies 4 MB Expand is blocked at 1 MB limit, allowed at 8 MB limit.
  • ConstantFoldingDefaultLimitBlocksLargeExpand — Verifies 1 GB ConstantOfShape is blocked at 512 MB limit.
  • ConstantFoldingSmallOutputAllowed — Verifies small Expand (64 bytes) is still folded normally.
  • ConstantFoldingExpandOverflowDimsSkipped — Verifies Expand with [2^32, 2^32] dimensions (int64 overflow) is gracefully skipped during constant folding.

…size check

The post-execution size check was calling IsTensor() and then
Get<Tensor>().SizeInBytes() on OrtValue entries from fetches.
For optional outputs that are not produced, IsTensor() returns true
(the type is set) but the data pointer is null, causing a SEGFAULT
when SizeInBytes() dereferences the null tensor's dtype_ member.

Add IsAllocated() check to skip unallocated (optional/None) outputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant