Failure to simplify -min(-a, a) => max(-a, a) #38828

RKSimon · 2018-10-29T14:26:02Z


Bugzilla Link	39480
Version	trunk
OS	Windows NT
CC	@adibiagio,@filcab,@LebedevRI,@rotateright,@Tilka
Fixed by commit(s)	`664e178` `5060f56` `8878b79` `3885207` `01a6c4b` `59387c0` `16c642f` `0e1241a` `132be1f` `1413576` `a512c89`

Extended Description

#include <algorithm>
#include <x86intrin.h>

int min_neg_s32(int a) {
    return -std::min(-a, a);
}

__m128i min_neg_v4s32(__m128i a) {
    return _mm_sub_epi32(_mm_setzero_si128(), _mm_min_epi32(_mm_sub_epi32(_mm_setzero_si128(), a), a));
}

float min_neg_f32(float a) {
    return -std::min(-a, a);
}

https://godbolt.org/z/h4zTcS

clang:

_Z11min_neg_s32i: # @_Z11min_neg_s32i
  movl %edi, %eax
  negl %eax
  cmpl %edi, %eax
  cmovgl %edi, %eax
  negl %eax
  retq
_Z13min_neg_v4s32Dv2_x: # @_Z13min_neg_v4s32Dv2_x
  vpxor %xmm1, %xmm1, %xmm1
  vpsubd %xmm0, %xmm1, %xmm2
  vpminsd %xmm0, %xmm2, %xmm0
  vpsubd %xmm0, %xmm1, %xmm0
  retq
_Z11min_neg_f32f: # @_Z11min_neg_f32f
  vmovaps .LCPI2_0(%rip), %xmm1 # xmm1 = [-0,-0,-0,-0]
  vxorps %xmm1, %xmm0, %xmm2
  vminss %xmm2, %xmm0, %xmm0
  vxorps %xmm1, %xmm0, %xmm0
  retq

gcc manages the scalar cases (s32 is interesting....) but not the vector case:

_Z11min_neg_s32i:
  movl %edi, %eax
  cltd
  xorl %edx, %eax
  subl %edx, %eax
  ret
_Z13min_neg_v4s32Dv2_x:
  vpxor %xmm1, %xmm1, %xmm1
  vpsubd %xmm0, %xmm1, %xmm2
  vpminsd %xmm0, %xmm2, %xmm0
  vpsubd %xmm0, %xmm1, %xmm0
  ret
_Z11min_neg_f32f:
  vmovss %xmm0, %xmm0, %xmm1
  vxorps .LC0(%rip), %xmm0, %xmm0
  vmaxss %xmm1, %xmm0, %xmm0
  ret

The text was updated successfully, but these errors were encountered:

rotateright · 2018-10-29T14:50:28Z

int min_neg_s32(int a)
{
return -std::min(-a, a);
}

That's a hard way to write 'std::abs':
https://rise4fun.com/Alive/Cha

%negx = sub i32 0, %x
%cmp = icmp sgt i32 %negx, %x
%min = select i1 %cmp, i32 %x, i32 %negx
%r = sub i32 0, %min
=>
%cmp2 = icmp slt i32 %x, 0
%r = select i1 %cmp2, i32 %negx, i32 %x

Tilka · 2019-11-06T12:28:40Z

-min(-x, x) can be generalized to -min(-x, y) => max(x, -y) (https://rise4fun.com/Alive/FBpl) but requires nsw at least on the min negation. The opposite case only works for y = x: -max(-x, x) => max(-x, x) (https://rise4fun.com/Alive/FDjL) but works for any combination of nsw flags.

adibiagio · 2020-02-07T17:50:38Z

Not sure about the nsw/nuw flags, but according to Alive this is correct too:

; -min(-x, -y) => max(x, y)

%negx = sub nsw 0, %x
%negy = sub nsw 0, %y
%cmp1 = icmp slt %negx, %negy
%sel1 = select %cmp1, %negx, %negy
%r = sub nsw 0, %sel1
=>
%cmp2 = icmp sgt %x, %y
%r = select %cmp2, %x, %y

I've noticed this while looking at the codegen of functions in llvm/test/CodeGen/X86/vector-reduce-umax.ll

Specifically, the sequence:

; X64-SSE2-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm0

Which is basically doing a - MIN (-A, -B)

And could be rewritten as a simple MAX(A, B):

; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1

adibiagio · 2020-02-07T17:58:48Z

Acutally sorry, the original code from that test was trying to do float negations (doing a xor of the sign bit).

Anyway, this is the Alive link: https://rise4fun.com/Alive/5Y7

Tilka · 2020-02-07T20:39:12Z

Note that

-min(-x, y) => max(x, -y)

is more general than

-min(-x, -y) => max(x, y)

it just requires an extra -(-y) => y step.

LebedevRI · 2020-08-05T21:24:05Z

Nowadays with the help of Negator we get:

define dso_local i32 @_Z11min_neg_s32i(i32 %0) local_unnamed_addr #0 {
%2 = sub nsw i32 0, %0
%3 = icmp sgt i32 %2, %0
%4 = select i1 %3, i32 %2, i32 %0
ret i32 %4
}

So i think the transform we are missing is

Name: (-x) s> x --> x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sgt i8 %neg_x, %x
=>
%r = icmp slt i8 %x, 0

Name: (-x) s>= x --> x s< 1
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sge i8 %neg_x, %x
=>
%r = icmp slt i8 %x, 1

commutative variants. I'm not seeing a generalization for (-x) s?? y case.

https://rise4fun.com/Alive/moSi

Ignoring the question of intrinsics, i can take a look

LebedevRI · 2020-08-06T08:55:06Z

Added folds for (-NSW x) pred x --> x pred' 0 folds.
We now seem to handle the original (integer!) case well:
https://godbolt.org/z/qPPn97

LebedevRI · 2020-08-06T09:02:39Z

Added folds for (-NSW x) pred x --> x pred' 0 folds.
We now seem to handle the original (integer!) case well:
https://godbolt.org/z/qPPn97

... but don't handle the vector case, since it lacks NSW.
I think it should be done after canonicalization to the new intrinsics.

LebedevRI · 2021-05-13T11:12:05Z

Hm, i lost track here. Let me revisit this...

rotateright · 2021-05-13T12:13:26Z

Hm, i lost track here. Let me revisit this...

There are multiple examples here, so I'm not sure what's left either.

On the FP side, we probably just need to hoist an fneg through a select (fneg is a unary op, so it misses the binop transforms we would try with a select operand):
https://alive2.llvm.org/ce/z/7FKWLP

We want to make sure that FMF get propagated through those transforms, so it needs a pile of tests. Let me know if I should work on that part.

LebedevRI · 2021-05-13T17:57:34Z

Note that

-min(-x, y) => max(x, -y)

is more general than

-min(-x, -y) => max(x, y)

it just requires an extra -(-y) => y step.

And once again, i'm not sure what specific transform we are missing?

-min(-x, y) => max(x, -y)
... seems to not be valid:
https://alive2.llvm.org/ce/z/5grsyy

Hm, i lost track here. Let me revisit this...

There are multiple examples here, so I'm not sure what's left either.

On the FP side, we probably just need to hoist an fneg through a select
(fneg is a unary op, so it misses the binop transforms we would try with a
select operand):
https://alive2.llvm.org/ce/z/7FKWLP

We want to make sure that FMF get propagated through those transforms, so it
needs a pile of tests. Let me know if I should work on that part.

LebedevRI · 2021-05-13T17:58:25Z

Posted too soon.

@Sanjay yes, i don't intend to touch FP side of things, feel free to.

LebedevRI · 2021-05-13T19:13:13Z

Ok, i'm seeing one missed Negator pattern:
-({u,s}{min,max}(-x, x)) -> -({u,s}{max,min}(-x, x))
https://alive2.llvm.org/ce/z/xyP9U2
https://alive2.llvm.org/ce/z/uDVZGV
https://alive2.llvm.org/ce/z/wfMJ4-
https://alive2.llvm.org/ce/z/vrrc-5

rotateright · 2021-05-17T19:26:37Z

FP neg fold:
https://reviews.llvm.org/rG3cdd05e519dd

I think that gives us the optimal code for the example in the original description:

float min_neg_f32(float a)
{
return -std::min(-a, a);
}

-->
vxorps LCPI2_0(%rip), %xmm0, %xmm1
vmaxss %xmm0, %xmm1, %xmm0

This isn't fabs because of -0.0, but we miss that too even with -ffast-math, so that's another bug...

rotateright · 2021-06-23T15:43:34Z

Filled in another missing FP min/max transform:
https://reviews.llvm.org/rG1e9b6b89a7b5

RKSimon · 2022-03-10T07:46:41Z

AFAICT the integer min/max -> abs folds are now complete - @rotateright are there any more fabs patterns that you think should be handled on this ticket or can we close it?

rotateright · 2022-03-10T15:28:01Z

Let's close this one since it was mostly about the integer patterns. I have some notes about the missing FP transforms. I'll open new issues for those or just post patches.

This can be viewed as swapping the select arms: https://alive2.llvm.org/ce/z/jUvFMJ ...so we don't have the 'nsz' problem with the more general fold. This unlocks other folds for the motivating fabs example. This was discussed in issue #38828.

This inverts a fold recently added to IR with: 3491f2f We can put -bidirectional on the Alive2 examples to show that the reverse transforms work: https://alive2.llvm.org/ce/z/8iVQwB The motivation for the IR change was to improve matching to 'fabs' in IR (see #38828 ), but it regressed x86 codegen for 'not-quite-fabs' patterns like (X > -X) ? X : -X. Ie, when there is no fast-math (nsz), the cmp+select is not a proper fabs operation, but it does map nicely to the unusual NAN semantics of MINSS/MAXSS. I drafted this as a target-independent fold, but it doesn't appear to help any other targets and seems to cause regressions for SystemZ at least. Differential Revision: https://reviews.llvm.org/D122726

This inverts a fold recently added to IR with: 3491f2f4b033 We can put -bidirectional on the Alive2 examples to show that the reverse transforms work: https://alive2.llvm.org/ce/z/8iVQwB The motivation for the IR change was to improve matching to 'fabs' in IR (see llvm/llvm-project#38828 ), but it regressed x86 codegen for 'not-quite-fabs' patterns like (X > -X) ? X : -X. Ie, when there is no fast-math (nsz), the cmp+select is not a proper fabs operation, but it does map nicely to the unusual NAN semantics of MINSS/MAXSS. I drafted this as a target-independent fold, but it doesn't appear to help any other targets and seems to cause regressions for SystemZ at least. Differential Revision: https://reviews.llvm.org/D122726

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

RKSimon closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to simplify -min(-a, a) => max(-a, a) #38828

Failure to simplify -min(-a, a) => max(-a, a) #38828

RKSimon commented Oct 29, 2018 •

edited

rotateright commented Oct 29, 2018

Tilka commented Nov 6, 2019

adibiagio commented Feb 7, 2020

adibiagio commented Feb 7, 2020

Tilka commented Feb 7, 2020

LebedevRI commented Aug 5, 2020

LebedevRI commented Aug 6, 2020

LebedevRI commented Aug 6, 2020

LebedevRI commented May 13, 2021

rotateright commented May 13, 2021

LebedevRI commented May 13, 2021

LebedevRI commented May 13, 2021

LebedevRI commented May 13, 2021

rotateright commented May 17, 2021

rotateright commented Jun 23, 2021

RKSimon commented Mar 10, 2022

rotateright commented Mar 10, 2022

Failure to simplify -min(-a, a) => max(-a, a) #38828

Failure to simplify -min(-a, a) => max(-a, a) #38828

Comments

RKSimon commented Oct 29, 2018 • edited

Extended Description

rotateright commented Oct 29, 2018

Tilka commented Nov 6, 2019

adibiagio commented Feb 7, 2020

adibiagio commented Feb 7, 2020

Tilka commented Feb 7, 2020

LebedevRI commented Aug 5, 2020

LebedevRI commented Aug 6, 2020

LebedevRI commented Aug 6, 2020

LebedevRI commented May 13, 2021

rotateright commented May 13, 2021

LebedevRI commented May 13, 2021

LebedevRI commented May 13, 2021

LebedevRI commented May 13, 2021

rotateright commented May 17, 2021

rotateright commented Jun 23, 2021

RKSimon commented Mar 10, 2022

rotateright commented Mar 10, 2022

RKSimon commented Oct 29, 2018 •

edited