gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397
gh-146393: Optimize float division operations by mutating uniquely-referenced operands in place (JIT only)#146397eendebakpt wants to merge 18 commits intopython:mainfrom
Conversation
…izer Add inplace float true division ops that the tier 2 optimizer emits when at least one operand is a known float: - _BINARY_OP_TRUEDIV_FLOAT_INPLACE (unique LHS) - _BINARY_OP_TRUEDIV_FLOAT_INPLACE_RIGHT (unique RHS) The optimizer inserts _GUARD_TOS_FLOAT / _GUARD_NOS_FLOAT for operands not yet known to be float, enabling specialization in expressions like `(a + b) / c`. Also marks the result of all NB_TRUE_DIVIDE operations as unique float in the abstract interpreter, enabling downstream inplace ops even for generic `a / b` (the `+=` can reuse the division result). Speeds up chain division patterns by ~2.3x and simple `total += a/b` by ~1.5x. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operations that always return a new float (true division, float**int, int**negative_int, mixed int/float arithmetic) now mark their result as PyJitRef_MakeUnique. This enables downstream operations to mutate the result in place instead of allocating a new float. Int results are NOT marked unique because small ints are cached/immortal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only set the result of NB_TRUE_DIVIDE to float when both operands are known int/float. Types like Fraction and Decimal override __truediv__ and return non-float results. The unconditional type propagation caused _POP_TOP_FLOAT to be emitted for Fraction results, crashing with an assertion failure. Fixes the segfault in test_math.testRemainder and test_random.test_binomialvariate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@eendebakpt we shouldn't speculatively add guards if we don't have history that they are actually floats, otherwise this will cause bad perf for overloaded binary ops. The fix is to record the binop types at trace recording time to speculate on. Check out for example
|
|
I removed the speculative guards. Updated benchmarks:
Improving the We could remove the |
|
Please update main and fix the merge conflicts. Thanks! |
…oat ops The merge of main (which added the int-int inplace ops and regenerated pycore_uop_ids.h) overwrote the truediv op IDs that were added in this branch. This caused a compilation failure in JIT builds because the executor_cases.c.h references _BINARY_OP_TRUEDIV_FLOAT (ID 333), _BINARY_OP_TRUEDIV_FLOAT_INPLACE (334) and _BINARY_OP_TRUEDIV_FLOAT_INPLACE_RIGHT (335), which were absent from the header.
We optimize float divisions for the case where one of the operands is a unique reference. This is similar to #146307, but with a guard for division by zero.
_BINARY_TRUEDIV_FLOATwhere there are no unique references (or we miss information about the uniqueness) has no performance improvement in itself, but is to propagate types better. This opcode has guards, so that even with input from locals the type is propagated.Micro-benchmarks (min of 3 runs, 2M iterations)
update benchmark no longer valid (see a new one below)
(a+b) * c(a+b) + (c+d)a / b(a+b) / cc(2.0+x) / yyc / (a+b)c(a/b) / (c/d)(a/b) + (c/d)All patterns are
total += <expr>in a tight loop.Benchmark script
Analysis
The inplace truediv kicks in when at least one operand is a uniquely-referenced float (e.g. the result of a prior add/multiply). The optimizer emits
_BINARY_OP_TRUEDIV_FLOAT_INPLACEor_INPLACE_RIGHT, saving onePyFloat_FromDoubleallocation + deallocation per iteration.The optimization works well for several cases. For some (e.g.
(a/b) + (c/d)) the performance gain is not due to an inplace division, but by better type propagation allowing the+to be specialized inplace. Thea / bis also faster because of better type propagation and a+=in the test script.In typical code intermediate results are often stored in local variables. For these cases it is important pick up (speculative) type information as soon as possible.