Skip to content

Fix CApi tests on S390x#28074

Open
AlekseiNikiforovIBM wants to merge 2 commits intomicrosoft:mainfrom
AlekseiNikiforovIBM:s390x_capi_tests
Open

Fix CApi tests on S390x#28074
AlekseiNikiforovIBM wants to merge 2 commits intomicrosoft:mainfrom
AlekseiNikiforovIBM:s390x_capi_tests

Conversation

@AlekseiNikiforovIBM
Copy link
Copy Markdown
Contributor

Description

When loading data into tensors from memory buffers from external files, byteswap it if necessary.

Also add a fix for deleter when byteswapping: keep copy of AllocatorPtr instead of reference.

Motivation and Context

While trying to setup local s390x CI, I've found 4 more tests that fail on s390x:

CApiTest.TestLoadModelFromArrayWithExternalInitializerFromFileArray
CApiTest.TestLoadModelFromArrayWithExternalInitializersFromFileArray
CApiTest.TestLoadModelFromArrayWithExternalInitializersFromFileArrayPathRobust
CApiTest.TestLoadModelFromArrayWithExternalInitializersFromFileMmap

…rom external files

This change fixes following tests:
CApiTest.TestLoadModelFromArrayWithExternalInitializerFromFileArray
CApiTest.TestLoadModelFromArrayWithExternalInitializersFromFileArray
CApiTest.TestLoadModelFromArrayWithExternalInitializersFromFileArrayPathRobust
CApiTest.TestLoadModelFromArrayWithExternalInitializersFromFileMmap
@AlekseiNikiforovIBM AlekseiNikiforovIBM changed the title S390x capi tests Fix CApi tests on S390x Apr 15, 2026
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the big-endian external-initializer path. I found one blocking correctness issue before this lands: the new conversion buffer size is based on logical element count, which can over-read packed sub-byte tensor payloads. Please reuse the already-computed TensorProto storage byte size for the allocation and spans.

auto allocator = CPUAllocator::DefaultInstance();

auto deleter = [allocator](uint8_t* ptr) { allocator->Free(ptr); };
std::unique_ptr<uint8_t[], decltype(deleter)> native_data{reinterpret_cast<uint8_t*>(allocator->Alloc(element_size * element_count)), deleter};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the TensorProto storage byte size that was already computed in tensor_byte_size, not element_size * tensor_shape.Size(). For packed sub-byte types such as INT4/UINT4/INT2/UINT2/FLOAT4E2M1, tensor_shape.Size() is the logical element count while the external payload has fewer storage bytes. For example, 3 INT4 values occupy 2 bytes, but this code would allocate/span 3 bytes and read one byte past the provided external initializer buffer on big-endian builds.

Please size native_data, src_span, and dst_span from tensor_byte_size (or Tensor::CalculateTensorStorageSize) and use utils::GetElementSizeOfTensor(old_initializer.data_type()) only as the byteswap granularity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants