pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/cupy/cupy/pull/9748

s.com/assets/global-87aa887446e37f5c.css" /> Fix hip mask errors by gpinkert · Pull Request #9748 · cupy/cupy · GitHub
Skip to content

Fix hip mask errors#9748

Open
gpinkert wants to merge 3 commits intocupy:mainfrom
ROCm:fix-hip-mask-errors
Open

Fix hip mask errors#9748
gpinkert wants to merge 3 commits intocupy:mainfrom
ROCm:fix-hip-mask-errors

Conversation

@gpinkert
Copy link
Contributor

There are many crashes related to using an incorrect parameter for some of the shfl intrinsics when setting a full mask. This PR adds support for hip by defining a full mask when the warp/wavefront size is 64.

ROCm 7.2 added native __shfl_*_sync and __any_sync functions that
accept a 64-bit mask (unsigned long long), matching the 64-lane
wavefront.  Prior ROCm versions used macros in hip_workaround.cuh
that stripped the mask entirely.

- Guard the __shfl_*_sync compatibility macros behind HIP_VERSION
  < 70200000 so they do not conflict with the native declarations
- Use ~0ULL as the full-lane mask in Cython kernel templates for
  scan, sort, and nonzero operations (covers both wf32 and wf64)
- Cast the user-supplied mask to unsigned long long in the JIT
  rawkernel path when targeting ROCm 7.2+
- Query actual device warp size via _get_warpsize() for the JIT
  default shuffle width instead of assuming 64
- Derive the test mask from the queried warp size so the test is
  portable across wf32 and wf64 hardware
…arlo example

The monte_carlo kernel preamble had a '#ifndef __HIPCC__' guard around
'typedef unsigned long long uint64_t', which prevented the typedef from
being defined when compiling with hipcc.  Remove the guard so the type
is available on both CUDA and HIP.
The Cython declaration of cudaTextureDesc used plain 'int' for the
addressMode, filterMode, and readMode fields.  On HIP these are
distinct enum types (TextureAddressMode, TextureFilterMode,
TextureReadMode), causing type mismatch warnings and potential
undefined behavior.  Use the proper enum types to match the C header.
@gpinkert gpinkert requested a review from a team as a code owner February 25, 2026 01:31
@gpinkert
Copy link
Contributor Author

/test rocm

Comment on lines +480 to +483
# ROCm 7.2+ requires a 64-bit mask type for __shfl_*_sync / __any_sync.
# ~0ULL sets all bits, covering any wavefront size (32 or 64).
_full_mask = ('~0ULL'
if _runtime.is_hip else '0xffffffff')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consolidate these in e.g. cupy/_core/_kernel.pyx or somewhere else, and import from there. For example, there is already a _get_warpsize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw for my own curiosity:

  1. Is this change universal to all ROCm-supported devices, or is it device-dependent?
  2. Does this change break older (<7.2) ROCm users?

Comment on lines +158 to +161
# ROCm 7.2+ requires a 64-bit mask type for __shfl_*_sync / __any_sync.
# ~0ULL sets all bits, covering any wavefront size (32 or 64).
_full_mask = ('~0ULL'
if runtime._is_hip_environment else '0xffffffff')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +384 to +388
_hip_72_plus = (runtime.is_hip
and driver.get_build_version() >= 7_02_00000)
if runtime.is_hip and not _hip_72_plus:
warnings.warn(f'mask {mask} is ignored on HIP', RuntimeWarning)
elif not (0x0 <= mask <= 0xffffffff):
max_mask = 0xffffffffffffffff if _hip_72_plus else 0xffffffff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@leofang leofang self-assigned this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:bug Bugs prio:high to-be-backported Pull-requests to be backported to stable branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROCm 7.x: Boolean indexing fails to compile on CuPy 14.0.1 due to 32-bit mask passed to __shfl_*_sync

3 participants

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy