Skip to content

Fix CoreML EP issue with external weight path handling.#28062

Open
skottmckay wants to merge 6 commits intomainfrom
skottmckay/GH28043
Open

Fix CoreML EP issue with external weight path handling.#28062
skottmckay wants to merge 6 commits intomainfrom
skottmckay/GH28043

Conversation

@skottmckay
Copy link
Copy Markdown
Contributor

Description

GitHub issue #28005: CoreML EP crashes with SIGBUS when loading a model with external data via CreateSession from a file path.

Root Cause
In tensorprotoutils.cc:362, TensorProtoWithExternalDataToTensorProto() passes model_path directly to ReadExternalDataForTensor(), but ReadExternalDataForTensor() expects a directory (it joins the path with the external data filename). This causes path construction like /path/to/model.onnx/model.onnx_data instead of /path/to/model.onnx_data.

The CPU EP's UnpackInitializerData() at line ~2572 correctly uses model_path.parent_path(). The CoreML EP is the only caller that passes a full model file path (via graph_viewer_.ModelPath() in model_builder.cc:791), triggering the bug.

Changes
Fix (tensorprotoutils.cc:364): Changed ReadExternalDataForTensor(ten_proto, model_path, ...) to ReadExternalDataForTensor(ten_proto, model_path.parent_path(), ...), matching the pattern used by UnpackInitializerData().

Test (coreml_basic_test.cc:444): Added CoreMLExecutionProviderTest.ExternalDataInitializer — creates a model with external data, saves it to disk, and loads it from a file path with CoreML EP to verify the fix. The test passes with the fix applied.

Motivation and Context

#28005

GitHub issue #28005: CoreML EP crashes with SIGBUS when loading a model with external data via CreateSession from a file path.

Root Cause
In tensorprotoutils.cc:362, TensorProtoWithExternalDataToTensorProto() passes model_path directly to ReadExternalDataForTensor(), but ReadExternalDataForTensor() expects a directory (it joins the path with the external data filename). This causes path construction like /path/to/model.onnx/model.onnx_data instead of /path/to/model.onnx_data.

The CPU EP's UnpackInitializerData() at line ~2572 correctly uses model_path.parent_path(). The CoreML EP is the only caller that passes a full model file path (via graph_viewer_.ModelPath() in model_builder.cc:791), triggering the bug.

Changes
Fix (tensorprotoutils.cc:364): Changed ReadExternalDataForTensor(ten_proto, model_path, ...) to ReadExternalDataForTensor(ten_proto, model_path.parent_path(), ...), matching the pattern used by UnpackInitializerData().

Test (coreml_basic_test.cc:444): Added CoreMLExecutionProviderTest.ExternalDataInitializer — creates a model with external data, saves it to disk, and loads it from a file path with CoreML EP to verify the fix. The test passes with the fix applied.
@skottmckay skottmckay requested a review from Copilot April 14, 2026 06:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes CoreML EP loading of ONNX models that store initializer tensors in external data files by ensuring external data paths are resolved relative to the model directory (not the model file path), and adds a regression test for the reported crash.

Changes:

  • Adjust external data loading to pass the model’s directory (parent_path()) to ReadExternalDataForTensor.
  • Add a CoreML EP regression test that writes an external initializer data file and loads the model via a file path.
  • Add supporting test includes/utilities for temporary directories and file I/O.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
onnxruntime/test/providers/coreml/coreml_basic_test.cc Adds a regression test that creates a model with external initializer data and verifies CoreML EP can load/run it from a file path.
onnxruntime/core/framework/tensorprotoutils.cc Fixes external data path resolution by passing the model directory to ReadExternalDataForTensor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/framework/tensorprotoutils.cc Outdated
Comment thread onnxruntime/core/framework/tensorprotoutils.cc Outdated
Comment thread onnxruntime/test/providers/coreml/coreml_basic_test.cc
skottmckay and others added 2 commits April 15, 2026 10:12
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/coreml/builders/impl/pad_op_builder.cc Outdated
Comment thread onnxruntime/core/providers/coreml/builders/impl/pad_op_builder.cc Outdated
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the focused fix. I found one cross-platform test issue: the new regression test executes CoreML inference on all platforms, but non-Apple CoreML builds use stubs whose Predict() intentionally fails. Please gate the run/output verification while keeping the load/initialize path covered for the external-data regression.

Comment thread onnxruntime/test/providers/coreml/coreml_basic_test.cc
tianleiwu
tianleiwu previously approved these changes Apr 15, 2026
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating my previous review after checking the CoreML support expectation and CI results. CoreML is expected to run on Apple/macOS, and the CoreML macOS build-and-test lanes for this PR passed. The earlier non-Apple stub concern is therefore non-blocking for this PR. From the CoreML path fix and regression coverage perspective, this looks good to me.

@skottmckay
Copy link
Copy Markdown
Contributor Author

Updating my previous review after checking the CoreML support expectation and CI results. CoreML is expected to run on Apple/macOS, and the CoreML macOS build-and-test lanes for this PR passed. The earlier non-Apple stub concern is therefore non-blocking for this PR. From the CoreML path fix and regression coverage perspective, this looks good to me.

Added a fix. It's developer friendly to be able to validate/debug most of the CoreML code on any device, so having a test that fails in that scenario because it's incorrectly trying to run the model is a distraction.

@skottmckay skottmckay requested a review from tianleiwu April 16, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants