49763 – Sub-optimal optimization of abs(x) % n

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 49763 - Sub-optimal optimization of abs(x) % n

Summary: Sub-optimal optimization of abs(x) % n

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	All All

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-03-29 13:13 PDT by Jeremy R.
Modified:	2021-04-03 07:02 PDT (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jeremy R. 2021-03-29 13:13:28 PDT

LLVM does not find all optimizations related to taking the remainder of the
absolute value of a signed number.

The following equivilent expressions result in sub-optimal code generation:
    x % 2 == 1 || x % 2 == -1
    std::abs(x % 2) == 1
    std::abs(x) % 2 == 1

LLVM may be ignoring poison values for std::abs / @llvm.abs.* in making
optimizations for the remainder operation. Alternatively, LLVM may not be taking
the absolute value into account when performing algebraic operations on the
remainder instructions.

If x is guaranteed to be positive (e.g. it is unsigned or an induction variable),
LLVM does produce expected codegen. When std::abs is present, LLVM should be able
to take advantage of the remainder operation being odd (i.e. -x % n = -(x % n)).

The expected codegen is that of x % 2 != 0, x & 1, where LLVM generates a single
and instruction.

Godbolt comparison of these expressions: https://godbolt.org/z/1YPW1z4v7

Comment 1 Jeremy R. 2021-03-29 14:37:27 PDT

I think I was mistaken when I speculated about a problem with ignoring poison values from the absolute value calculation. It may be as simple as not optimizing abs(x) & 1 -> x & 1 and abs(x & 1) -> x & 1.

Comment 2 Sanjay Patel 2021-03-31 05:44:07 PDT

This should help the examples with "abs":
https://reviews.llvm.org/rGc2ebad8d55bd

The examples with "srem" are trickier. Clearly, we are missing these basic transforms in IR:
https://alive2.llvm.org/ce/z/BagXUu
https://alive2.llvm.org/ce/z/aapvrz

But we would have to bend our normal value usage rules to get those most basic folds to fire on the example with 2 uses of the srem result. 

That might be easier to justify as a backend (codegen) optimization.

Comment 3 Sanjay Patel 2021-03-31 06:27:31 PDT

Here's another missed fold in IR (not sure if there's a way to generalize this any further):
https://alive2.llvm.org/ce/z/xEHdTv

Comment 4 Sanjay Patel 2021-03-31 07:35:04 PDT

Here's a potential generalization of the above:
https://alive2.llvm.org/ce/z/Cav8Pv

; replace abs-of-srem-by-positive-divisor with urem-of-abs
define i5 @src(i5 %x, i5 %y) {
  %d = call i5 @llvm.abs.i5(i5 %y, i1 false) ; int min arg doesn't matter
  %s = srem i5 %x, %d
  %r = call i5 @llvm.abs.i5(i5 %s, i1 false) ; int min arg doesn't matter
  ret i5 %r
}


define i5 @tgt(i5 %x, i5 %y) {
  %d = call i5 @llvm.abs.i5(i5 %y, i1 false)
  %a = call i5 @llvm.abs.i5(i5 %x, i1 false)
  %r = urem i5 %a, %d
  ret i5 %r
}

-------------------------------------------------------------------------------

As with many folds, there's a trade-off that we have to limit the power of the transform based on number of uses the more that we generalize (because we need to create extra instructions to keep the transform valid).

If "x % 2" is the special and common case, we might as well just try to get that unless/until it is shown that we need to try harder.

Comment 5 Sanjay Patel 2021-04-01 06:50:02 PDT

(In reply to Sanjay Patel from comment #3)
> Here's another missed fold in IR (not sure if there's a way to generalize
> this any further):
> https://alive2.llvm.org/ce/z/xEHdTv

Committed as:
https://reviews.llvm.org/rG1462bdf1b985


So if I'm seeing it correctly, that leaves just the "is_odd_3" example as sub-optimal. That needs something like what I suggested in comment 2.

Comment 6 Jeremy R. 2021-04-02 15:46:24 PDT

(In reply to Sanjay Patel from comment #5)
> (In reply to Sanjay Patel from comment #3)
> > Here's another missed fold in IR (not sure if there's a way to generalize
> > this any further):
> > https://alive2.llvm.org/ce/z/xEHdTv
> 
> Committed as:
> https://reviews.llvm.org/rG1462bdf1b985
> 
> 
> So if I'm seeing it correctly, that leaves just the "is_odd_3" example as
> sub-optimal. That needs something like what I suggested in comment 2.

If I'm not mistaken, the abs (srem X, 2) --> and X, 1 transformation can be made more generally as abs (srem X, n) --> urem X, n. The urem X, 2 special case already has a transformation --> and X, 1.

Comment 7 Jeremy R. 2021-04-02 15:51:49 PDT

The abs (srem X, n) --> urem X, n transformation makes most sense to me though I understand your point regarding limiting the power of the transformation. I don't have any examples right off-hand that would benefit from this more powerful transform.

Comment 8 Sanjay Patel 2021-04-03 07:02:53 PDT

(In reply to Jeremy Morgan Rifkin from comment #6) 
> If I'm not mistaken, the abs (srem X, 2) --> and X, 1 transformation can be
> made more generally as abs (srem X, n) --> urem X, n.

That's almost the same as what I was suggesting in comment 4, but the transform is only valid if we can guarantee that X is positive.

This might make it clearer than my earlier example:
https://alive2.llvm.org/ce/z/ASha97

If you comment out the 'assume' line, the transform doesn't hold.