Open
Conversation
ROCm 7.2 added native __shfl_*_sync and __any_sync functions that accept a 64-bit mask (unsigned long long), matching the 64-lane wavefront. Prior ROCm versions used macros in hip_workaround.cuh that stripped the mask entirely. - Guard the __shfl_*_sync compatibility macros behind HIP_VERSION < 70200000 so they do not conflict with the native declarations - Use ~0ULL as the full-lane mask in Cython kernel templates for scan, sort, and nonzero operations (covers both wf32 and wf64) - Cast the user-supplied mask to unsigned long long in the JIT rawkernel path when targeting ROCm 7.2+ - Query actual device warp size via _get_warpsize() for the JIT default shuffle width instead of assuming 64 - Derive the test mask from the queried warp size so the test is portable across wf32 and wf64 hardware
…arlo example The monte_carlo kernel preamble had a '#ifndef __HIPCC__' guard around 'typedef unsigned long long uint64_t', which prevented the typedef from being defined when compiling with hipcc. Remove the guard so the type is available on both CUDA and HIP.
The Cython declaration of cudaTextureDesc used plain 'int' for the addressMode, filterMode, and readMode fields. On HIP these are distinct enum types (TextureAddressMode, TextureFilterMode, TextureReadMode), causing type mismatch warnings and potential undefined behavior. Use the proper enum types to match the C header.
Contributor
Author
|
/test rocm |
leofang
requested changes
Mar 6, 2026
Comment on lines
+480
to
+483
| # ROCm 7.2+ requires a 64-bit mask type for __shfl_*_sync / __any_sync. | ||
| # ~0ULL sets all bits, covering any wavefront size (32 or 64). | ||
| _full_mask = ('~0ULL' | ||
| if _runtime.is_hip else '0xffffffff') |
Member
There was a problem hiding this comment.
Let's consolidate these in e.g. cupy/_core/_kernel.pyx or somewhere else, and import from there. For example, there is already a _get_warpsize.
Member
There was a problem hiding this comment.
btw for my own curiosity:
- Is this change universal to all ROCm-supported devices, or is it device-dependent?
- Does this change break older (<7.2) ROCm users?
Comment on lines
+158
to
+161
| # ROCm 7.2+ requires a 64-bit mask type for __shfl_*_sync / __any_sync. | ||
| # ~0ULL sets all bits, covering any wavefront size (32 or 64). | ||
| _full_mask = ('~0ULL' | ||
| if runtime._is_hip_environment else '0xffffffff') |
Comment on lines
+384
to
+388
| _hip_72_plus = (runtime.is_hip | ||
| and driver.get_build_version() >= 7_02_00000) | ||
| if runtime.is_hip and not _hip_72_plus: | ||
| warnings.warn(f'mask {mask} is ignored on HIP', RuntimeWarning) | ||
| elif not (0x0 <= mask <= 0xffffffff): | ||
| max_mask = 0xffffffffffffffff if _hip_72_plus else 0xffffffff |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There are many crashes related to using an incorrect parameter for some of the shfl intrinsics when setting a full mask. This PR adds support for hip by defining a full mask when the warp/wavefront size is 64.