Skip to content

DataManagement update#1568

Draft
KulikovNikita wants to merge 57 commits intouxlfoundation:mainfrom
KulikovNikita:dev/dataframe-interchange-api
Draft

DataManagement update#1568
KulikovNikita wants to merge 57 commits intouxlfoundation:mainfrom
KulikovNikita:dev/dataframe-interchange-api

Conversation

@KulikovNikita
Copy link
Copy Markdown
Contributor

@KulikovNikita KulikovNikita commented Nov 6, 2023

  • Decoupling of data structures and interoperability module
    • Low-level oneDAL data management wrappers
    • Isolation of data conversion functionality to Interoperability modules
    • Removing compile-time dependency on DPCTL
  • Interoperability modules:
  • Tests:
    • Test conversion to and from array
    • Test conversion to and from CSR table
    • Test conversion to and from homogenous table

Copy link
Copy Markdown
Contributor

@Alexsandruss Alexsandruss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments are applicable for multiple places:

Comment thread onedal/common/device_lookup.cpp Outdated
Comment thread onedal/common/device_lookup.cpp Outdated
Comment thread onedal/interoperability/dpctl/dpctl_and_table.cpp Outdated
@samir-nasibli
Copy link
Copy Markdown
Contributor

@KulikovNikita please rename inteoperability.dpctl to inteoperability.sua_iface, because actually it is interop for sua iface, that support by dpctl.tensor and dpnp.ndarrays

@samir-nasibli
Copy link
Copy Markdown
Contributor

/intelci: run

@samir-nasibli
Copy link
Copy Markdown
Contributor

@KulikovNikita we have to make versioning for 2024.0.0 features, since this not available yet. That is why we have this build failures

@KulikovNikita KulikovNikita force-pushed the dev/dataframe-interchange-api branch 2 times, most recently from 03b4d73 to 66215c1 Compare November 19, 2023 09:36
@KulikovNikita KulikovNikita changed the title WIP: DataManagement update DataManagement update Nov 19, 2023
@KulikovNikita KulikovNikita marked this pull request as ready for review November 19, 2023 11:54
@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

1 similar comment
@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

@KulikovNikita KulikovNikita force-pushed the dev/dataframe-interchange-api branch from ad64186 to 783e704 Compare November 20, 2023 15:25
@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

2 similar comments
@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

1 similar comment
@KulikovNikita
Copy link
Copy Markdown
Contributor Author

/intelci: run

icfaust added a commit to icfaust/scikit-learn-intelex that referenced this pull request Jan 27, 2025
icfaust added a commit that referenced this pull request Mar 18, 2025
* start from #2195

* add apache license

* add files from #1568

* start converting over

* attempts to fix copyright checker

* remove table_metadata

* merge

* weird merge

* renaming

* change location

* will implement these elsewhere

* move files to follow naming

* change headers further

* interim standpoint which will fail

* interim changes

* helper -> utils

* move macro to a central spot

* remove whitespace

* commit before merge

* current status

* interim

* remove and format

* more fixes

* add fixes

* add fixes

* more fixes

* add fixes

* more fixes

* more fixes

* updates

* updates

* more fixes

* formatting

* formatting

* formatting

* changes

* move header include

* fix tensor issue

* type change

* move literals to see if it helps

* status

* remove inline

* move ordering in table.cpp

* missing whitespace

* change macro section

* add some commentary

* attempt to shorten with a macro

* first fixes for array_api_strict

* formatting

* remove unneeded code

* remove unneeded code

* oops

* make better logic

* attempt to include byte_offset

* convert to static_cast entirely

* back to reinterpret_cast

* add initial tests

* remove array api from contiguous test

* hide behind if statement, definite a TODO

* working on test case failures

* switch to length_error

* Update test_data.py

* Update test_data.py

* Update test_data.py

* retest with dpnp and dpctl

* fix stride counting

* begin testing strategy change

* add tests

* fix mistakes in test

* move class out of if statement

* Update test_validation.py

* remove pandas

* missing change

* add initial memory leak checking

* remove get_namespace

* remove dpc backend skip

* fix issue in get_namespace change

* further fixes

* attempt again

* address changes

* add recursion block on suggestion

* Update doc/third-party-programs-sklearnex.txt

Co-authored-by: Alexander Andreev <alexander.andreev@intel.com>

* add testing for emptys and simple types

* rewrite test

* attempt to solve empty and dlpack scalars

* deal with odd scenario

* fix some tests

* attempt at making things consistent

* oops

* try again

* try to swap

* Update test_data.py

* Update dlpack_utils.cpp

* Update test_data.py

* Update test_data.py

* Update test_data.py

* Update data_conversion.cpp

* Update data_conversion.cpp

* try again

* bad logic correction

* make consistent

* std -> py

* missing bracket

* additional fixes

* further homogenation

* try again

* fix dlpack again

* set a ticket up to solve this edge case

* Update test_data.py

* Update test_data.py

* Update dtype_conversion.cpp

* Update data_conversion.cpp

* Update data_conversion.cpp

* Update data_conversion.cpp

* Update dtype_conversion.cpp

* Update data_conversion.cpp

* Update test_data.py

* Update test_data.py

* Update test_data.py

* Update onedal/datatypes/dlpack/dlpack_utils.cpp

Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>

* Update onedal/datatypes/dlpack/data_conversion.cpp

Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>

* Update onedal/datatypes/sycl_usm/dtype_conversion.cpp

Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>

* Update test_data.py

* Update test_data.py

* Update test_data.py

* Update test_data.py

* Update dlpack_utils.cpp

* Update dlpack_utils.cpp

* Update data_conversion.cpp

* Update data_conversion.cpp

* Update data_conversion.cpp

* Update data_conversion.cpp

* clang formatting

* fight segfault with dpctl

* Update data_conversion.cpp

* formatting

* Update test_data.py

* Update test_data.py

* formatting'

* Update dlpack_utils.cpp

---------

Co-authored-by: Alexander Andreev <alexander.andreev@intel.com>
Co-authored-by: Victoriya Fedotova <viktoria.nn@gmail.com>
@icfaust icfaust requested a review from Copilot June 27, 2025 14:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors and expands the data management and interoperability modules by decoupling data structures and isolating conversion functionality. Key modifications include:

  • Adding new interop submodules and updating the setup.py logic to include additional packages for newer oneDAL versions.
  • Incorporating data conversion wrappers for multiple protocols (PyBuffer, dlpack, sua) and updating SVM training to enforce device policy for sparse input.
  • Expanding the interoperability layer with extensive C++ and Python code changes across modules like dlpack, sua, buffer, and CSR table interop.

Reviewed Changes

Copilot reviewed 109 out of 109 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
setup.py Adds new packages for tests when ONEDAL_VERSION >= 20240001.
onedal/svm/svm.py Introduces a check to restrict sparse input on GPU and updates parameter passing for table conversion.
onedal/interop/utils/tensor_and_table.hpp Introduces utility functions for wrapping tensors to homogen tables using new interop patterns.
onedal/interop/utils/tensor_and_array.hpp Provides functions to wrap arrays from homogen tables with a temporary “Fixed for now” comment in the conversion routine.
onedal/interop/sua/* Implements extensive changes in the sua module to support new SYCL USM array interfaces using dpctl and dpnp fallbacks.
onedal/interop/dlpack/* Adds and refines dlpack conversion routines along with device conversion utilities.
onedal/interop/buffer/* Introduces new functions and utilities for buffer protocol–based interoperability and type conversion.
onedal/interop/csr_table.py Enhances conversion logic between native onedal CSR tables and SciPy sparse matrices.
onedal/interop/common.hpp Adds helper functions for endian detection and inverse map computations.
Comments suppressed due to low confidence (1)

onedal/interop/utils/tensor_and_array.hpp:137

  • [nitpick] Clarify or document the temporary fix with specific details about future improvements to avoid ambiguity for future developers.
    return tensor;

Comment thread onedal/svm/svm.py
@@ -272,8 +275,12 @@ def _fit(self, X, y, sample_weight, module, queue):
self._scale_, self._sigma_ = self._compute_gamma_sigma(self.gamma, X)

policy = _get_policy(queue, X, y, sample_weight)
Copy link

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a brief comment explaining why sparse inputs are not supported on GPU to aid future maintainers.

Suggested change
policy = _get_policy(queue, X, y, sample_weight)
policy = _get_policy(queue, X, y, sample_weight)
# Sparse inputs are not supported on GPU due to limitations in efficient sparse matrix
# operations and compatibility issues with the underlying GPU libraries.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants