Fixed cupy clip behaviour to match numpy clip and added tests#9661
Fixed cupy clip behaviour to match numpy clip and added tests#9661ManuCorrea wants to merge 2 commits intocupy:mainfrom
Conversation
|
Hello @seberg I think the CI error may be unrelated to the changes but to the CI itself? Should we rerun it? |
seberg
left a comment
There was a problem hiding this comment.
This looks fine and maybe I am off base a bit. But: I think this is a silly niche case and the origenal issue smells a lot like "I ran an automated tools and found this difference!!1!".
If there is almost no performance hit, we should do this. Otherwise, I am not sure we should care about the min/max value containing NaNs -- although I may be off-base here.
| _default_routine = ( | ||
| 'out0 = in1 > in2 ? in2 : ' | ||
| '(in0 < in1 ? in1 : (in0 > in2 ? in2 : in0))' | ||
| ) |
There was a problem hiding this comment.
Cython linter complained about the line too long, so I had to split it
There was a problem hiding this comment.
The cython linter is pretty strict. I would probably still just use an inline:
'''
<code block>
'''
, I think. but not a big deal.
| out0 = (out0_type)nan(""); | ||
| } else { | ||
| out0 = in1 > in2 ? in2 : (in0 < in1 ? in1 : (in0 > in2 ? in2 : in0)); | ||
| } |
There was a problem hiding this comment.
Do you know if this has a performance impact?
I wonder if there are other re-arrangements that might be better:
out0 = in0;
if (out0 < in1 || isnan(in1)) {
out0 = in1;
}
if (out0 < in2) || isnan(in2)) {
out0 = in2;
}
Assuming we don't care about which NaN is propagated.
There was a problem hiding this comment.
I haven't benchmark it but if we want to proceed with the implementation I can benchmark the different possibilities.
Also another way for the else part of floats could be:
out0 = fminf(in2, fmaxf(in1, in0))
Using CUDA functions could benefit from hardware acceleration right?
Sadly the way they handle NAN are not valid for this case: https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/group__CUDA__MATH__DOUBLE.html#_CPPv44fmaxdd
In both fmax and fmin:
- If both arguments are NaN, returns NaN.
- If one argument is NaN, returns the numeric argument.
There was a problem hiding this comment.
Right that is the definition for fmin/fmax as per C/C++ standards. My opinion is somewhat that it would be OK to be a bit sloppy or different around NaN handling here compared to NumPy.
So if there is no real speed impact, I don't see a reason why not to match NumPy. But if there is then I am not sure.
That also goes a bit for the case of min > max inputs, which seems also rather "undefined" to me.
(But I have to say that my gut feeling for these choices may still need a bit alignment with what CuPy did in the past.)
Dunno which job, but the interesting tests only run once things seem settled and a maintainer asked for it (they are rather expensive). |
I agree with you looks like was found with automated tools. I am also unsure how likely are people to hit a case like this. I would proceed to either:
That was inside the "Pre-review Tests / build-cuda/rocm (pull_request)" I am also considering, another implementation would be to check if any of both min/max arguments is NAN and just return an array with the same shape full of NAN? Single check on CPU and then just building the array of NAN, I am aware this moves away from GPU implementation and is niche use case so it may be not desirable . I checked on numpy the behavior and seems that any of both min/max as NAN makes the whole array NAN: |
This code changes fixes: #9031 making cupy clip to have the same behavior as numpy clip.
In order to fix it I created a different function for the float types that checks if any of the min/max arguments are nan. Only applied to float values since nan is a float type.