[X86] or/add-like mismatch in uchar 'splat' to uint #48399

RKSimon · 2021-02-05T12:12:56Z


Bugzilla Link	49055
Version	trunk
OS	Windows NT
CC	@topperc,@LebedevRI,@rotateright

Extended Description

https://simd.godbolt.org/z/E15MrP

void loop_add(const unsigned char* __restrict pIn, unsigned int* __restrict pOut, int s) {
  for (int i = 0; i < s; i++) {
    unsigned int pixelChar = pIn[i];
    unsigned int pixel = pixelChar + (pixelChar << 8) + (pixelChar << 16) + (255 << 24);
    pOut[i] = pixel;
  }
}

This successfully combines to a zext+mul+add (I think technically it could be zext+mul+or)?

But this:

void loop_or(const unsigned char* __restrict pIn, unsigned int* __restrict pOut, int s) {
  for (int i = 0; i < s; i++) {
    unsigned int pixelChar = pIn[i];
    unsigned int pixel = pixelChar | (pixelChar << 8) | (pixelChar << 16) | (255 << 24);
    pOut[i] = pixel;
  }
}

completely fails and we end up with a long shift+or chain.

The text was updated successfully, but these errors were encountered:

RKSimon · 2021-02-05T12:12:56Z

assigned to @rotateright

rotateright · 2021-02-05T15:22:22Z

The root question - which of these is canonical?

define i32 @src(i8 %x) {
%zx = zext i8 %x to i32
%s1 = shl nuw nsw i32 %zx, 8
%s2 = shl nuw nsw i32 %zx, 16
%or = or i32 %s1, %s2
ret i32 %or
}

define i32 @tgt(i8 %x) {
%zx = zext i8 %x to i32
%or = mul i32 %zx, 65792
ret i32 %or
}

https://alive2.llvm.org/ce/z/dqT39Y

The difference in instcombine is in InstCombinerImpl::SimplifyUsingDistributiveLaws() -> getBinOpsForFactorization().

We recognize add-of-shifts as a factorized multiply, but not or-of-shifts.

We're arguing for "mul" in this example, and we already do that partially, so the backend should be equipped to decompose back to shift+logic.

Interestingly, the -reassociate pass already has a reverse transform from -instcombine that turns the 'or' into 'add': "ShouldConvertOrWithNoCommonBitsToAdd"

If we allow shift opcodes there, that's probably the quickest fix.

rotateright · 2021-02-05T18:59:34Z

A little git digging turns up that the -reassociate logic was added relatively recently:
https://reviews.llvm.org/rG70472f3
https://reviews.llvm.org/rG93f3d7f

So it seems fine to add 'shl' to the opcode list. It's awkward that we have opposing canonicalizations in different passes, but I don't have any better ideas.

I'll add a reassociate test and a phase ordering test to prevent this from regressing.

rotateright · 2021-02-09T12:48:09Z

The small fix to -reassociate improves, but does not solve the larger example shown here:
https://reviews.llvm.org/rG6fd91be35444

We'll need to adjust reassociate and/or instcombine with at least 1 more change to get this to canonicalize as expected.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

rotateright closed this as completed in 892c731 Dec 20, 2021

RKSimon added llvm:codegen llvm:core labels Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86] or/add-like mismatch in uchar 'splat' to uint #48399

[X86] or/add-like mismatch in uchar 'splat' to uint #48399

RKSimon commented Feb 5, 2021 •

edited

RKSimon commented Feb 5, 2021

rotateright commented Feb 5, 2021

rotateright commented Feb 5, 2021

rotateright commented Feb 9, 2021

[X86] or/add-like mismatch in uchar 'splat' to uint #48399

[X86] or/add-like mismatch in uchar 'splat' to uint #48399

Comments

RKSimon commented Feb 5, 2021 • edited

Extended Description

RKSimon commented Feb 5, 2021

rotateright commented Feb 5, 2021

rotateright commented Feb 5, 2021

rotateright commented Feb 9, 2021

RKSimon commented Feb 5, 2021 •

edited