Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] or/add-like mismatch in uchar 'splat' to uint #48399

Closed
RKSimon opened this issue Feb 5, 2021 · 4 comments
Closed

[X86] or/add-like mismatch in uchar 'splat' to uint #48399

RKSimon opened this issue Feb 5, 2021 · 4 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla llvm:codegen llvm:core

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 5, 2021

Bugzilla Link 49055
Version trunk
OS Windows NT
CC @topperc,@LebedevRI,@rotateright

Extended Description

https://simd.godbolt.org/z/E15MrP

void loop_add(const unsigned char* __restrict pIn, unsigned int* __restrict pOut, int s) {
  for (int i = 0; i < s; i++) {
    unsigned int pixelChar = pIn[i];
    unsigned int pixel = pixelChar + (pixelChar << 8) + (pixelChar << 16) + (255 << 24);
    pOut[i] = pixel;
  }
}

This successfully combines to a zext+mul+add (I think technically it could be zext+mul+or)?

But this:

void loop_or(const unsigned char* __restrict pIn, unsigned int* __restrict pOut, int s) {
  for (int i = 0; i < s; i++) {
    unsigned int pixelChar = pIn[i];
    unsigned int pixel = pixelChar | (pixelChar << 8) | (pixelChar << 16) | (255 << 24);
    pOut[i] = pixel;
  }
}

completely fails and we end up with a long shift+or chain.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 5, 2021

assigned to @rotateright

@rotateright
Copy link
Contributor

The root question - which of these is canonical?

define i32 @​src(i8 %x) {
%zx = zext i8 %x to i32
%s1 = shl nuw nsw i32 %zx, 8
%s2 = shl nuw nsw i32 %zx, 16
%or = or i32 %s1, %s2
ret i32 %or
}

define i32 @​tgt(i8 %x) {
%zx = zext i8 %x to i32
%or = mul i32 %zx, 65792
ret i32 %or
}

https://alive2.llvm.org/ce/z/dqT39Y

The difference in instcombine is in InstCombinerImpl::SimplifyUsingDistributiveLaws() -> getBinOpsForFactorization().

We recognize add-of-shifts as a factorized multiply, but not or-of-shifts.

We're arguing for "mul" in this example, and we already do that partially, so the backend should be equipped to decompose back to shift+logic.

Interestingly, the -reassociate pass already has a reverse transform from -instcombine that turns the 'or' into 'add': "ShouldConvertOrWithNoCommonBitsToAdd"

If we allow shift opcodes there, that's probably the quickest fix.

@rotateright
Copy link
Contributor

A little git digging turns up that the -reassociate logic was added relatively recently:
https://reviews.llvm.org/rG70472f3
https://reviews.llvm.org/rG93f3d7f

So it seems fine to add 'shl' to the opcode list. It's awkward that we have opposing canonicalizations in different passes, but I don't have any better ideas.

I'll add a reassociate test and a phase ordering test to prevent this from regressing.

@rotateright
Copy link
Contributor

The small fix to -reassociate improves, but does not solve the larger example shown here:
https://reviews.llvm.org/rG6fd91be35444

We'll need to adjust reassociate and/or instcombine with at least 1 more change to get this to canonicalize as expected.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen llvm:core
Projects
None yet
Development

No branches or pull requests

2 participants