https://simd.godbolt.org/z/E15MrP void loop_add(const unsigned char* __restrict pIn, unsigned int* __restrict pOut, int s) { for (int i = 0; i < s; i++) { unsigned int pixelChar = pIn[i]; unsigned int pixel = pixelChar + (pixelChar << 8) + (pixelChar << 16) + (255 << 24); pOut[i] = pixel; } } This successfully combines to a zext+mul+add (I think technically it could be zext+mul+or)? But this: void loop_or(const unsigned char* __restrict pIn, unsigned int* __restrict pOut, int s) { for (int i = 0; i < s; i++) { unsigned int pixelChar = pIn[i]; unsigned int pixel = pixelChar | (pixelChar << 8) | (pixelChar << 16) | (255 << 24); pOut[i] = pixel; } } completely fails and we end up with a long shift+or chain.
The root question - which of these is canonical? define i32 @src(i8 %x) { %zx = zext i8 %x to i32 %s1 = shl nuw nsw i32 %zx, 8 %s2 = shl nuw nsw i32 %zx, 16 %or = or i32 %s1, %s2 ret i32 %or } define i32 @tgt(i8 %x) { %zx = zext i8 %x to i32 %or = mul i32 %zx, 65792 ret i32 %or } https://alive2.llvm.org/ce/z/dqT39Y The difference in instcombine is in InstCombinerImpl::SimplifyUsingDistributiveLaws() -> getBinOpsForFactorization(). We recognize add-of-shifts as a factorized multiply, but not or-of-shifts. We're arguing for "mul" in this example, and we already do that partially, so the backend should be equipped to decompose back to shift+logic. Interestingly, the -reassociate pass already has a reverse transform from -instcombine that turns the 'or' into 'add': "ShouldConvertOrWithNoCommonBitsToAdd" If we allow shift opcodes there, that's probably the quickest fix.
A little git digging turns up that the -reassociate logic was added relatively recently: https://reviews.llvm.org/rG70472f3 https://reviews.llvm.org/rG93f3d7f So it seems fine to add 'shl' to the opcode list. It's awkward that we have opposing canonicalizations in different passes, but I don't have any better ideas. I'll add a reassociate test and a phase ordering test to prevent this from regressing.
The small fix to -reassociate improves, but does not solve the larger example shown here: https://reviews.llvm.org/rG6fd91be35444 We'll need to adjust reassociate and/or instcombine with at least 1 more change to get this to canonicalize as expected.