This is an archive of the discontinued LLVM Phabricator instance.

Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue
ClosedPublic

Authored by jfb on Sep 6 2018, 2:09 PM.

Details

Summary

This code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types.

clang part of this patch: D51752

Diff Detail

Event Timeline

jfb created this revision.Sep 6 2018, 2:09 PM
jfb edited the summary of this revision. (Show Details)Sep 6 2018, 2:10 PM
jfb added a reviewer: MatzeB.
jfb planned changes to this revision.Sep 6 2018, 2:25 PM

As noted in D51752 I'll instead improve isBytewiseValue to do this.

jfb updated this revision to Diff 164485.Sep 7 2018, 11:49 AM
  • Augment isBytewiseValue to be as powerful as clang's version. Update code that used it to handle undef merging. This captures more memcpy patterns and can generate memcpy of undef (which selection DAG later gets rid of).
jfb retitled this revision from NFC: move isRepeatedBytePattern from clang to Constant to Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue.Sep 7 2018, 11:50 AM
jfb edited the summary of this revision. (Show Details)
efriedma added inline comments.
lib/Analysis/ValueTracking.cpp
3127

Do you really need separate loops for ConstantVector and ConstantArray?

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
657

No test coverage for this change?

lib/Transforms/Scalar/MemCpyOptimizer.cpp
391

What's the point of changing this signature? It doesn't seem to have any effect.

MatzeB added a comment.Sep 7 2018, 1:51 PM

LGTM once Eli approves.

lib/Analysis/ValueTracking.cpp
3046

You could move this down, closer to its first user.

3058–3063

Indeed an interesting discussion whether we should call into something like InstructionSimplify here. I agree with you that we'd rather see this handled in a separate step by whoever calls isByteWiseValue(). That way we can keep this function simple and only handle the patterns that are already simplified.

3130

not sure how valuable this comment is :)

jfb updated this revision to Diff 164531.Sep 7 2018, 3:42 PM
jfb marked 3 inline comments as done.
  • Update comments and move Ctx down a bit.
lib/Analysis/ValueTracking.cpp
3058–3063

This comment was originally at the bottom of the function, I moved it to early-exit if it's not a constant because all the subsequent code is about constants (not values).

I can drop the comment if you want?

3127

I don't think ConstantArray has getSplatValue, which I use for ConstantVector. Or are you suggesting something else?

lib/Transforms/Scalar/LoopIdiomRecognize.cpp
657

Indeed, right now getMemSetPatternValue doesn't check for UndefValue so there's no way to trigger this line. I'm adding it opportunistically because it's obviously correct, and I added a FIXME in getMemSetPatternValue to do this.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
391

If ByteVal is UndefValue then it can be merged with any other valid pattern. The below change does this. The signature change allows updating ByteVal. I could instead return a pair if you want.

jfb added a subscriber: mehdi_amini.Sep 7 2018, 4:07 PM

@mehdi_amini from our discussion in D49771 I think this patch takes us partway to what you wanted (having LLVM do at least as much work as clang when it comes to generating memset). After this commits I'll look at making clang less smart, and relying on LLVM being smart instead. I want to balance compile-time versus early code quality.

efriedma added inline comments.Sep 7 2018, 4:50 PM
lib/Analysis/ValueTracking.cpp
3127

I meant you could just use the for loop for ConstantVector. Maybe a bit less efficient, but the difference is unlikely to matter.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
391

Whether you use an out parameter or a pair, the value still isn't actually used by either of the callers of tryMergingIntoMemset.

Looking again, I guess technically it's used in MemCpyOptPass::processStore, but only in the case where tryMergingIntoMemset fails, so I'm guessing that isn't intentional.

jfb updated this revision to Diff 165602.Sep 14 2018, 3:36 PM
  • Remove ref param
jfb added inline comments.Sep 14 2018, 3:37 PM
lib/Analysis/ValueTracking.cpp
3127

I'd rather not since getSplatValue does exactly what we want here.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
391

You're right, changed back.

Missing test coverage for half.

Missing test coverage here for ConstantStruct/ConstantArray/ConstantVector, although I guess the array/struct bits are covered by the clang patch. I'd like to see some coverage for vectors with unusual element sizes, though, like <i1 x 16>.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
757–758

Unnecessary change.

796

Unnecessary change.

bjope added a subscriber: bjope.Sep 20 2018, 12:15 AM

This patch will conflict with my patch here: https://reviews.llvm.org/D52092

Background: In our out-of-tree target we have 16-bit-bytes (so memset/memcpy etc operate on 16-bit units). We need to patch isBytewiseValue to handle both 8-bit and 16-bit bytes (we store BitsPerByte in various places such as DataLayout, so it could be 8 or 16 depending on target). The idea with D52092 was to make it possible to check BitsPerByte in isBytewiseValue, and then convert the existing isBytewiseValue into a more general isSplatValue that should work for any given bit length (similar to isSplat in APInt.h).

Do you think it is possible to use such a design here? (I doubt that it costs too much to have a more general isSplatValue in LLVM trunk)

jfb updated this revision to Diff 166359.Sep 20 2018, 2:09 PM
jfb marked 2 inline comments as done.
  • Augment isBytewiseValue to be as powerful as clang's version. Update code that used it to handle undef merging. This captures more memcpy patterns and can generate memcpy of undef (which selection DAG later gets rid of).
  • Update comments and move Ctx down a bit.
  • Remove ref param
  • Remove extra changes, left over from clang-format
jfb planned changes to this revision.Sep 20 2018, 2:09 PM

I'll add more tests soon.

jfb updated this revision to Diff 166377.Sep 20 2018, 3:55 PM
  • Add a bit more testing
jfb added a comment.Sep 20 2018, 3:56 PM

Missing test coverage for half.

Missing test coverage here for ConstantStruct/ConstantArray/ConstantVector, although I guess the array/struct bits are covered by the clang patch. I'd like to see some coverage for vectors with unusual element sizes, though, like <i1 x 16>.

I added some tests. Indeed the clang-side test/CodeGenCXX/auto-var-init.cpp already exercises a bunch of this.

efriedma added inline comments.Sep 20 2018, 4:10 PM
test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗(On Diff #166377)

You're not really testing any interesting codepaths with a constant zero? We just hit the "isNullValue()" early exit.

jfb added inline comments.Sep 20 2018, 4:57 PM
test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗(On Diff #166377)

It's the only interesting codepath for i1 though. Unless you're saying that we should add handling for i1 in isBytewiseValue?

jfb added inline comments.Sep 20 2018, 5:00 PM
test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗(On Diff #166377)

To be clear, this is where we return nullptr for all other i1:

// We can handle constant integers that are multiple of 8 bits.
if (ConstantInt *CI = dyn_cast<ConstantInt>(C)) {
  if (CI->getBitWidth() % 8 == 0) {
    assert(CI->getBitWidth() > 8 && "8 bits should be handled above!");
    if (!CI->getValue().isSplat(8))
      return nullptr;
    return ConstantInt::get(Ctx, CI->getValue().trunc(8));
  }
}
efriedma added inline comments.Sep 20 2018, 5:12 PM
test/Transforms/MemCpyOpt/memcpy-to-memset.ll
54 ↗(On Diff #166377)

Well, an improved isBytewiseValue could theoretically handle it... but I guess it's probably not worth bothering, at least for now. But better to have some test coverage so we it's clear what isn't handled.

jfb updated this revision to Diff 166385.Sep 20 2018, 5:15 PM
  • Test i1, show it isn't handled.
jfb marked an inline comment as done.Sep 20 2018, 5:16 PM

Added the negative test.

This revision is now accepted and ready to land.Sep 20 2018, 5:47 PM
This revision was automatically updated to reflect the committed changes.
vitalybuka added inline comments.
lib/Analysis/ValueTracking.cpp
3127

@jfb This vector processing does not handle undefs correctly. I'd like to fix that with D64031

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 10:32 AM
Herald added a subscriber: jkorous. · View Herald Transcript