DataManagement update by KulikovNikita · Pull Request #1568 · uxlfoundation/scikit-learn-intelex

KulikovNikita · 2023-11-06T15:22:00Z

Decoupling of data structures and interoperability module
- Low-level oneDAL data management wrappers
- Isolation of data conversion functionality to Interoperability modules
- Removing compile-time dependency on DPCTL
Interoperability modules:
- PyBuffer protocol (via pybind11 intermediate) - input & output
- dlpack protocol - input only
- Sycl Usm Array Interface aka sua - input & output
Tests:
- Test conversion to and from array
- Test conversion to and from CSR table
- Test conversion to and from homogenous table

Alexsandruss

Same comments are applicable for multiple places:

samir-nasibli · 2023-11-10T14:18:21Z

@KulikovNikita please rename inteoperability.dpctl to inteoperability.sua_iface, because actually it is interop for sua iface, that support by dpctl.tensor and dpnp.ndarrays

samir-nasibli · 2023-11-10T14:24:07Z

/intelci: run

samir-nasibli · 2023-11-15T17:55:06Z

@KulikovNikita we have to make versioning for 2024.0.0 features, since this not available yet. That is why we have this build failures

KulikovNikita · 2023-11-19T12:36:40Z

/intelci: run

KulikovNikita · 2023-11-19T12:44:19Z

/intelci: run

KulikovNikita · 2023-11-19T13:10:25Z

/intelci: run

KulikovNikita · 2023-11-19T13:35:22Z

Manually started CI job: http://intel-ci.intel.com/ee86e068-850b-f1b1-8bea-a4bf010d0e2e

KulikovNikita · 2023-11-20T07:52:21Z

Manually started CI job: http://intel-ci.intel.com/ee8779a3-43dc-f172-9b32-a4bf010d0e2e

KulikovNikita · 2023-11-20T15:25:24Z

/intelci: run

KulikovNikita · 2023-11-22T11:27:49Z

/intelci: run

KulikovNikita · 2023-11-22T11:35:45Z

/intelci: run

KulikovNikita · 2023-11-22T11:43:18Z

http://intel-ci.intel.com/ee892c3e-e623-f186-86e2-a4bf010d0e2e

KulikovNikita · 2023-11-22T15:02:14Z

/intelci: run

KulikovNikita · 2023-11-22T15:03:31Z

/intelci: run

KulikovNikita · 2023-11-22T15:50:28Z

/intelci: run

moved changes to the uxlfoundation#1684

* start from #2195 * add apache license * add files from #1568 * start converting over * attempts to fix copyright checker * remove table_metadata * merge * weird merge * renaming * change location * will implement these elsewhere * move files to follow naming * change headers further * interim standpoint which will fail * interim changes * helper -> utils * move macro to a central spot * remove whitespace * commit before merge * current status * interim * remove and format * more fixes * add fixes * add fixes * more fixes * add fixes * more fixes * more fixes * updates * updates * more fixes * formatting * formatting * formatting * changes * move header include * fix tensor issue * type change * move literals to see if it helps * status * remove inline * move ordering in table.cpp * missing whitespace * change macro section * add some commentary * attempt to shorten with a macro * first fixes for array_api_strict * formatting * remove unneeded code * remove unneeded code * oops * make better logic * attempt to include byte_offset * convert to static_cast entirely * back to reinterpret_cast * add initial tests * remove array api from contiguous test * hide behind if statement, definite a TODO * working on test case failures * switch to length_error * Update test_data.py * Update test_data.py * Update test_data.py * retest with dpnp and dpctl * fix stride counting * begin testing strategy change * add tests * fix mistakes in test * move class out of if statement * Update test_validation.py * remove pandas * missing change * add initial memory leak checking * remove get_namespace * remove dpc backend skip * fix issue in get_namespace change * further fixes * attempt again * address changes * add recursion block on suggestion * Update doc/third-party-programs-sklearnex.txt Co-authored-by: Alexander Andreev <alexander.andreev@intel.com> * add testing for emptys and simple types * rewrite test * attempt to solve empty and dlpack scalars * deal with odd scenario * fix some tests * attempt at making things consistent * oops * try again * try to swap * Update test_data.py * Update dlpack_utils.cpp * Update test_data.py * Update test_data.py * Update test_data.py * Update data_conversion.cpp * Update data_conversion.cpp * try again * bad logic correction * make consistent * std -> py * missing bracket * additional fixes * further homogenation * try again * fix dlpack again * set a ticket up to solve this edge case * Update test_data.py * Update test_data.py * Update dtype_conversion.cpp * Update data_conversion.cpp * Update data_conversion.cpp * Update data_conversion.cpp * Update dtype_conversion.cpp * Update data_conversion.cpp * Update test_data.py * Update test_data.py * Update test_data.py * Update onedal/datatypes/dlpack/dlpack_utils.cpp Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com> * Update onedal/datatypes/dlpack/data_conversion.cpp Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com> * Update onedal/datatypes/sycl_usm/dtype_conversion.cpp Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com> * Update test_data.py * Update test_data.py * Update test_data.py * Update test_data.py * Update dlpack_utils.cpp * Update dlpack_utils.cpp * Update data_conversion.cpp * Update data_conversion.cpp * Update data_conversion.cpp * Update data_conversion.cpp * clang formatting * fight segfault with dpctl * Update data_conversion.cpp * formatting * Update test_data.py * Update test_data.py * formatting' * Update dlpack_utils.cpp --------- Co-authored-by: Alexander Andreev <alexander.andreev@intel.com> Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>

Copilot

Pull Request Overview

This PR refactors and expands the data management and interoperability modules by decoupling data structures and isolating conversion functionality. Key modifications include:

Adding new interop submodules and updating the setup.py logic to include additional packages for newer oneDAL versions.
Incorporating data conversion wrappers for multiple protocols (PyBuffer, dlpack, sua) and updating SVM training to enforce device policy for sparse input.
Expanding the interoperability layer with extensive C++ and Python code changes across modules like dlpack, sua, buffer, and CSR table interop.

Reviewed Changes

Copilot reviewed 109 out of 109 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
setup.py	Adds new packages for tests when ONEDAL_VERSION >= 20240001.
onedal/svm/svm.py	Introduces a check to restrict sparse input on GPU and updates parameter passing for table conversion.
onedal/interop/utils/tensor_and_table.hpp	Introduces utility functions for wrapping tensors to homogen tables using new interop patterns.
onedal/interop/utils/tensor_and_array.hpp	Provides functions to wrap arrays from homogen tables with a temporary “Fixed for now” comment in the conversion routine.
onedal/interop/sua/*	Implements extensive changes in the sua module to support new SYCL USM array interfaces using dpctl and dpnp fallbacks.
onedal/interop/dlpack/*	Adds and refines dlpack conversion routines along with device conversion utilities.
onedal/interop/buffer/*	Introduces new functions and utilities for buffer protocol–based interoperability and type conversion.
onedal/interop/csr_table.py	Enhances conversion logic between native onedal CSR tables and SciPy sparse matrices.
onedal/interop/common.hpp	Adds helper functions for endian detection and inverse map computations.

Comments suppressed due to low confidence (1)

onedal/interop/utils/tensor_and_array.hpp:137

[nitpick] Clarify or document the temporary fix with specific details about future improvements to avoid ambiguity for future developers.

    return tensor;

Copilot · 2025-06-27T14:00:55Z

@@ -272,8 +275,12 @@ def _fit(self, X, y, sample_weight, module, queue):
            self._scale_, self._sigma_ = self._compute_gamma_sigma(self.gamma, X)

        policy = _get_policy(queue, X, y, sample_weight)


Consider adding a brief comment explaining why sparse inputs are not supported on GPU to aid future maintainers.

Suggested change

policy = _get_policy(queue, X, y, sample_weight)

policy = _get_policy(queue, X, y, sample_weight)

# Sparse inputs are not supported on GPU due to limitations in efficient sparse matrix

# operations and compatibility issues with the underlying GPU libraries.

Alexsandruss reviewed Nov 6, 2023

View reviewed changes

Comment thread onedal/common/device_lookup.cpp Outdated

Comment thread onedal/common/device_lookup.cpp Outdated

samir-nasibli reviewed Nov 8, 2023

View reviewed changes

Comment thread onedal/interoperability/dpctl/dpctl_and_table.cpp Outdated

KulikovNikita force-pushed the dev/dataframe-interchange-api branch 2 times, most recently from 03b4d73 to 66215c1 Compare November 19, 2023 09:36

KulikovNikita changed the title ~~WIP: DataManagement update~~ DataManagement update Nov 19, 2023

KulikovNikita marked this pull request as ready for review November 19, 2023 11:54

KulikovNikita requested review from aepanchi, maria-Petrova and napetrov as code owners November 19, 2023 13:08

KulikovNikita force-pushed the dev/dataframe-interchange-api branch from ad64186 to 783e704 Compare November 20, 2023 15:25

KulikovNikita assigned KulikovNikita, samir-nasibli and Alexsandruss Nov 22, 2023

KulikovNikita mentioned this pull request Nov 24, 2023

DataFrame API interchange protocol as an input #1587

Draft

One commit to rule them all

df00064

samir-nasibli added 3 commits January 29, 2024 08:08

revert changes for onedal/common/dispatch_utils.hpp

067f3e7

Merge branch 'intel:main' into dev/dataframe-interchange-api

9c9ad80

reverted changes for onedal/neighbors/neighbors.py

39a2db5

samir-nasibli mentioned this pull request Jan 30, 2024

ENH: update of onedal4py SVM fit method for sparse support #1683

Closed

samir-nasibli added 17 commits January 29, 2024 18:17

reverted svm tests disabling

cbde66c

reverted changes for the onedal dbscan

bb7fc22

moved changes to the uxlfoundation#1684

fix for the revert changes for onedal/neighbors/neighbors.py

c2f514c

Merge branch 'intel:main' into dev/dataframe-interchange-api

df4f6a9

Merge branch 'intel:main' into dev/dataframe-interchange-api

dc1bc36

Merge branch 'main' into dev/dataframe-interchange-api

acb3433

lint onedal/__init__.py

0678aae

Merge branch 'intel:main' into dev/dataframe-interchange-api

3eef6cd

oupdate onedal/common/dtype_dispatcher.hpp

9ade1f0

Merge branch 'intel:main' into dev/dataframe-interchange-api

02ed639

Merge branch 'intel:main' into dev/dataframe-interchange-api

e299b8f

mem leak fix

aeeb0a1

Merge branch 'intel:main' into dev/dataframe-interchange-api

d7d2ba3

Merge branch 'intel:main' into dev/dataframe-interchange-api

37a20c1

Merge branch 'intel:main' into dev/dataframe-interchange-api

4ad92cd

Merge branch 'intel:main' into dev/dataframe-interchange-api

c88e7dd

Merge branch 'intel:main' into dev/dataframe-interchange-api

a15f9f9

samir-nasibli mentioned this pull request Oct 20, 2024

ENH: Data management update to support SUA ifaces for Homogen OneDAL tables #2045

Merged

10 tasks

icfaust mentioned this pull request Nov 18, 2024

[bugfix, enhancement] enable proper GPU offloading with fp64 support when dpctl unavailable #2152

Merged

13 tasks

samir-nasibli mentioned this pull request Dec 20, 2024

ENH: Buffer API interop - data management update #1773

Draft

This was referenced Jan 26, 2025

[enhancement] Refactor onedal/datatypes in preparation for dlpack support #2195

Merged

[enhancement] add dlpack support to to_table #2275

Merged

icfaust added a commit to icfaust/scikit-learn-intelex that referenced this pull request Jan 27, 2025

add files from uxlfoundation#1568

df87e74

icfaust requested a review from Copilot June 27, 2025 14:00

Copilot AI reviewed Jun 27, 2025

View reviewed changes

		@@ -272,8 +275,12 @@ def _fit(self, X, y, sample_weight, module, queue):
		self._scale_, self._sigma_ = self._compute_gamma_sigma(self.gamma, X)

		policy = _get_policy(queue, X, y, sample_weight)

Conversation

KulikovNikita commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Alexsandruss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samir-nasibli commented Nov 10, 2023

Uh oh!

samir-nasibli commented Nov 10, 2023

Uh oh!

samir-nasibli commented Nov 15, 2023

Uh oh!

KulikovNikita commented Nov 19, 2023

Uh oh!

KulikovNikita commented Nov 19, 2023

Uh oh!

KulikovNikita commented Nov 19, 2023

Uh oh!

KulikovNikita commented Nov 19, 2023

Uh oh!

KulikovNikita commented Nov 20, 2023

Uh oh!

KulikovNikita commented Nov 20, 2023

Uh oh!

KulikovNikita commented Nov 22, 2023

Uh oh!

KulikovNikita commented Nov 22, 2023

Uh oh!

KulikovNikita commented Nov 22, 2023

Uh oh!

KulikovNikita commented Nov 22, 2023

Uh oh!

KulikovNikita commented Nov 22, 2023

Uh oh!

KulikovNikita commented Nov 22, 2023

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

KulikovNikita commented Nov 6, 2023 •

edited

Loading