Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to simplify -min(-a, a) => max(-a, a) #38828

Closed
RKSimon opened this issue Oct 29, 2018 · 17 comments
Closed

Failure to simplify -min(-a, a) => max(-a, a) #38828

RKSimon opened this issue Oct 29, 2018 · 17 comments
Labels
bugzilla Issues migrated from bugzilla llvm:codegen

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 29, 2018

Bugzilla Link 39480
Version trunk
OS Windows NT
CC @adibiagio,@filcab,@LebedevRI,@rotateright,@Tilka
Fixed by commit(s) 664e178 5060f56 8878b79 3885207 01a6c4b 59387c0 16c642f 0e1241a 132be1f 1413576 a512c89

Extended Description

#include <algorithm>
#include <x86intrin.h>

int min_neg_s32(int a) {
    return -std::min(-a, a);
}

__m128i min_neg_v4s32(__m128i a) {
    return _mm_sub_epi32(_mm_setzero_si128(), _mm_min_epi32(_mm_sub_epi32(_mm_setzero_si128(), a), a));
}

float min_neg_f32(float a) {
    return -std::min(-a, a);
}

https://godbolt.org/z/h4zTcS

clang:

_Z11min_neg_s32i: # @_Z11min_neg_s32i
  movl %edi, %eax
  negl %eax
  cmpl %edi, %eax
  cmovgl %edi, %eax
  negl %eax
  retq
_Z13min_neg_v4s32Dv2_x: # @_Z13min_neg_v4s32Dv2_x
  vpxor %xmm1, %xmm1, %xmm1
  vpsubd %xmm0, %xmm1, %xmm2
  vpminsd %xmm0, %xmm2, %xmm0
  vpsubd %xmm0, %xmm1, %xmm0
  retq
_Z11min_neg_f32f: # @_Z11min_neg_f32f
  vmovaps .LCPI2_0(%rip), %xmm1 # xmm1 = [-0,-0,-0,-0]
  vxorps %xmm1, %xmm0, %xmm2
  vminss %xmm2, %xmm0, %xmm0
  vxorps %xmm1, %xmm0, %xmm0
  retq

gcc manages the scalar cases (s32 is interesting....) but not the vector case:

_Z11min_neg_s32i:
  movl %edi, %eax
  cltd
  xorl %edx, %eax
  subl %edx, %eax
  ret
_Z13min_neg_v4s32Dv2_x:
  vpxor %xmm1, %xmm1, %xmm1
  vpsubd %xmm0, %xmm1, %xmm2
  vpminsd %xmm0, %xmm2, %xmm0
  vpsubd %xmm0, %xmm1, %xmm0
  ret
_Z11min_neg_f32f:
  vmovss %xmm0, %xmm0, %xmm1
  vxorps .LC0(%rip), %xmm0, %xmm0
  vmaxss %xmm1, %xmm0, %xmm0
  ret
@rotateright
Copy link
Contributor

int min_neg_s32(int a)
{
return -std::min(-a, a);
}

That's a hard way to write 'std::abs':
https://rise4fun.com/Alive/Cha

%negx = sub i32 0, %x
%cmp = icmp sgt i32 %negx, %x
%min = select i1 %cmp, i32 %x, i32 %negx
%r = sub i32 0, %min
=>
%cmp2 = icmp slt i32 %x, 0
%r = select i1 %cmp2, i32 %negx, i32 %x

@Tilka
Copy link

Tilka commented Nov 6, 2019

-min(-x, x) can be generalized to -min(-x, y) => max(x, -y) (https://rise4fun.com/Alive/FBpl) but requires nsw at least on the min negation. The opposite case only works for y = x: -max(-x, x) => max(-x, x) (https://rise4fun.com/Alive/FDjL) but works for any combination of nsw flags.

@adibiagio
Copy link
Collaborator

Not sure about the nsw/nuw flags, but according to Alive this is correct too:

; -min(-x, -y) => max(x, y)

%negx = sub nsw 0, %x
%negy = sub nsw 0, %y
%cmp1 = icmp slt %negx, %negy
%sel1 = select %cmp1, %negx, %negy
%r = sub nsw 0, %sel1
=>
%cmp2 = icmp sgt %x, %y
%r = select %cmp2, %x, %y

I've noticed this while looking at the codegen of functions in llvm/test/CodeGen/X86/vector-reduce-umax.ll

Specifically, the sequence:

; X64-SSE2-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm0

Which is basically doing a - MIN (-A, -B)

And could be rewritten as a simple MAX(A, B):

; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1

@adibiagio
Copy link
Collaborator

Acutally sorry, the original code from that test was trying to do float negations (doing a xor of the sign bit).

Anyway, this is the Alive link: https://rise4fun.com/Alive/5Y7

@Tilka
Copy link

Tilka commented Feb 7, 2020

Note that

-min(-x, y) => max(x, -y)

is more general than

-min(-x, -y) => max(x, y)

it just requires an extra -(-y) => y step.

@LebedevRI
Copy link
Member

Nowadays with the help of Negator we get:

define dso_local i32 @​_Z11min_neg_s32i(i32 %0) local_unnamed_addr #​0 {
%2 = sub nsw i32 0, %0
%3 = icmp sgt i32 %2, %0
%4 = select i1 %3, i32 %2, i32 %0
ret i32 %4
}

So i think the transform we are missing is

Name: (-x) s> x --> x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sgt i8 %neg_x, %x
=>
%r = icmp slt i8 %x, 0

Name: (-x) s>= x --> x s< 1
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sge i8 %neg_x, %x
=>
%r = icmp slt i8 %x, 1

  • commutative variants. I'm not seeing a generalization for (-x) s?? y case.

https://rise4fun.com/Alive/moSi

Ignoring the question of intrinsics, i can take a look

@LebedevRI
Copy link
Member

Added folds for (-NSW x) pred x --> x pred' 0 folds.
We now seem to handle the original (integer!) case well:
https://godbolt.org/z/qPPn97

@LebedevRI
Copy link
Member

Added folds for (-NSW x) pred x --> x pred' 0 folds.
We now seem to handle the original (integer!) case well:
https://godbolt.org/z/qPPn97

... but don't handle the vector case, since it lacks NSW.
I think it should be done after canonicalization to the new intrinsics.

@LebedevRI
Copy link
Member

Hm, i lost track here. Let me revisit this...

@rotateright
Copy link
Contributor

Hm, i lost track here. Let me revisit this...

There are multiple examples here, so I'm not sure what's left either.

On the FP side, we probably just need to hoist an fneg through a select (fneg is a unary op, so it misses the binop transforms we would try with a select operand):
https://alive2.llvm.org/ce/z/7FKWLP

We want to make sure that FMF get propagated through those transforms, so it needs a pile of tests. Let me know if I should work on that part.

@LebedevRI
Copy link
Member

Note that

-min(-x, y) => max(x, -y)

is more general than

-min(-x, -y) => max(x, y)

it just requires an extra -(-y) => y step.

And once again, i'm not sure what specific transform we are missing?

-min(-x, y) => max(x, -y)
... seems to not be valid:
https://alive2.llvm.org/ce/z/5grsyy

Hm, i lost track here. Let me revisit this...

There are multiple examples here, so I'm not sure what's left either.

On the FP side, we probably just need to hoist an fneg through a select
(fneg is a unary op, so it misses the binop transforms we would try with a
select operand):
https://alive2.llvm.org/ce/z/7FKWLP

We want to make sure that FMF get propagated through those transforms, so it
needs a pile of tests. Let me know if I should work on that part.

@LebedevRI
Copy link
Member

Posted too soon.

@​Sanjay yes, i don't intend to touch FP side of things, feel free to.

@LebedevRI
Copy link
Member

Ok, i'm seeing one missed Negator pattern:
-({u,s}{min,max}(-x, x)) -> -({u,s}{max,min}(-x, x))
https://alive2.llvm.org/ce/z/xyP9U2
https://alive2.llvm.org/ce/z/uDVZGV
https://alive2.llvm.org/ce/z/wfMJ4-
https://alive2.llvm.org/ce/z/vrrc-5

@rotateright
Copy link
Contributor

FP neg fold:
https://reviews.llvm.org/rG3cdd05e519dd

I think that gives us the optimal code for the example in the original description:

float min_neg_f32(float a)
{
return -std::min(-a, a);
}

-->
vxorps LCPI2_0(%rip), %xmm0, %xmm1
vmaxss %xmm0, %xmm1, %xmm0

This isn't fabs because of -0.0, but we miss that too even with -ffast-math, so that's another bug...

@rotateright
Copy link
Contributor

Filled in another missing FP min/max transform:
https://reviews.llvm.org/rG1e9b6b89a7b5

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@RKSimon
Copy link
Collaborator Author

RKSimon commented Mar 10, 2022

AFAICT the integer min/max -> abs folds are now complete - @rotateright are there any more fabs patterns that you think should be handled on this ticket or can we close it?

@rotateright
Copy link
Contributor

Let's close this one since it was mostly about the integer patterns. I have some notes about the missing FP transforms. I'll open new issues for those or just post patches.

@RKSimon RKSimon closed this as completed Mar 10, 2022
rotateright added a commit that referenced this issue Mar 15, 2022
This can be viewed as swapping the select arms:
https://alive2.llvm.org/ce/z/jUvFMJ
...so we don't have the 'nsz' problem with the more general fold.

This unlocks other folds for the motivating fabs example.
This was discussed in issue #38828.
rotateright added a commit that referenced this issue Mar 31, 2022
This inverts a fold recently added to IR with:
3491f2f

We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB

The motivation for the IR change was to improve matching to
'fabs' in IR (see #38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.

I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.

Differential Revision: https://reviews.llvm.org/D122726
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
This inverts a fold recently added to IR with:
3491f2f4b033

We can put -bidirectional on the Alive2 examples to show that
the reverse transforms work:
https://alive2.llvm.org/ce/z/8iVQwB

The motivation for the IR change was to improve matching to
'fabs' in IR (see llvm/llvm-project#38828 ),
but it regressed x86 codegen for 'not-quite-fabs' patterns like
(X > -X) ? X : -X.
Ie, when there is no fast-math (nsz), the cmp+select is not a proper
fabs operation, but it does map nicely to the unusual NAN semantics
of MINSS/MAXSS.

I drafted this as a target-independent fold, but it doesn't appear to
help any other targets and seems to cause regressions for SystemZ at
least.

Differential Revision: https://reviews.llvm.org/D122726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen
Projects
None yet
Development

No branches or pull requests

5 participants