Open
Conversation
2f1232c to
31ea520
Compare
Member
Author
|
I'm marking this as ready, but it depends on the mentioned PR. |
5 tasks
ac398f0 to
a7828f8
Compare
Use ghcr.io/nvidia/k8s-device-plugin:1bb36583 which includes upstream fixes for WSL2 CDI spec compatibility (cdiVersion and device naming), removing the need for any local spec transformation. See NVIDIA/k8s-device-plugin#1671. TODO: revert to chart-default image once a released version includes these fixes. Signed-off-by: Evan Lezar <elezar@nvidia.com>
a7828f8 to
8865107
Compare
Member
Author
|
/ok-to-test 8865107 |
On WSL2, NVIDIA GPUs are exposed through the DXG kernel driver (/dev/dxg) rather than the native nvidia* devices. CDI injects /dev/dxg as the sole GPU device node, plus GPU libraries under /usr/lib/wsl/. has_gpu_devices() previously only checked for /dev/nvidiactl, which does not exist on WSL2, so GPU enrichment never ran. This meant /dev/dxg was never permitted by Landlock and /proc write access (required by CUDA for thread naming) was never granted. Fix by: - Extending has_gpu_devices() to also detect /dev/dxg - Adding /dev/dxg to GPU_BASELINE_READ_WRITE (device nodes need O_RDWR) - Adding /usr/lib/wsl to GPU_BASELINE_READ_ONLY for CDI-injected GPU library bind-mounts that may not be covered by the /usr parent rule across filesystem boundaries The existing path existence check in enrich_proto_baseline_paths() ensures all new entries are silently skipped on native Linux where these paths do not exist.
33834f3 to
482aae3
Compare
Member
Author
|
/ok-to-test 482aae3 |
64c9d25 to
126b554
Compare
Member
Author
|
/ok-to-test 126b554 |
9b5317e to
5c01d5b
Compare
Member
Author
|
/ok-to-test 5c01d5b |
Member
Author
|
/ok-to-test 2105c21 |
2105c21 to
2bff2d8
Compare
Member
Author
|
/ok-to-test 2bff2d8 |
2bff2d8 to
482aae3
Compare
elezar
commented
Apr 16, 2026
| "/dev/nvidia-uvm", | ||
| "/dev/nvidia-uvm-tools", | ||
| "/dev/nvidia-modeset", | ||
| "/dev/dxg", // WSL2: DXG device (GPU via DirectX kernel driver, injected by CDI) |
Member
Author
There was a problem hiding this comment.
@pimlock when considering Tegra-based systems as in #625, the list of device nodes (and other paths) is much longer and are also system dependent. As such, I don't think that hardcoding this list would be feasible. Would it be possible to process the container config instead to get a list of device nodes that we would expect to access?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds GPU sandbox support for WSL2-based systems. On WSL2, NVIDIA GPUs are exposed through the DXG kernel driver (
/dev/dxg) rather than the nativenvidia*devices, and GPU libraries are injected by CDI into/usr/lib/wsl/rather than standard Linux paths.Two changes are required:
Device plugin image pin — pins
ghcr.io/nvidia/k8s-device-pluginto1bb36583, which includes upstream fixes for WSL2 CDI spec compatibility (correctcdiVersionand device naming). Includes a TODO to revert once a released version includes these fixes. See wsl: report a single "all" device to kubelet k8s-device-plugin#1671.Landlock baseline —
has_gpu_devices()previously only checked for/dev/nvidiactl, which does not exist on WSL2, so GPU enrichment never ran. This left/dev/dxg(the WSL2 GPU device node) and/procwrite access (required by CUDA for thread naming) unpermitted by Landlock. Fixes by extending GPU detection to also check/dev/dxg, adding it to the read-write baseline, and adding/usr/lib/wslto the read-only baseline for CDI-injected GPU libraries.The existing path existence checks in the enrichment logic ensure all new baseline entries are silently skipped on native Linux where these paths do not exist.
Related Issue
Closes #404
Depends on #495 and #503.
Changes
deploy/kube/gpu-manifests/nvidia-device-plugin-helmchart.yaml: pin device plugin image toghcr.io/nvidia/k8s-device-plugin:1bb36583crates/openshell-sandbox/src/lib.rs: extendhas_gpu_devices()to detect/dev/dxg; add/dev/dxgtoGPU_BASELINE_READ_WRITEand/usr/lib/wsltoGPU_BASELINE_READ_ONLYTesting
mise run pre-commitpassestest_gpu_sandbox_reports_available_gpupasses)Checklist