This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF.
ClosedPublic

Authored by fhahn on May 30 2022, 2:41 PM.

Details

Summary

Try to simplify BranchOnCount to BranchOnCond true if TC <= UF * VF.

This is an alternative to D121899 which simplifies the VPlan directly
instead of doing so late in code-gen.

The potential benefit of doing this in VPlan is that this may help
cost-modeling in the future. The reason this is done in prepareToExecute
at the moment is that a single plan may be used for multiple VFs/UFs.

There are further simplifications that can be applied as follow ups:

  1. Replace inductions with constants
  2. Replace vector region with regular block.

Fixes #55354.

Depends on D126679.

Diff Detail

Event Timeline

fhahn created this revision.May 30 2022, 2:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 2:41 PM
fhahn requested review of this revision.May 30 2022, 2:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 2:41 PM
Herald added a subscriber: vkmr. · View Herald Transcript
Ayal accepted this revision.Jun 2 2022, 3:07 AM

This looks good to me, thanks!

Added some nits and potential follow-ups.

No need to add a test for #55354, as it is already covered by existing tests, right?

llvm/lib/Transforms/Vectorize/VPlan.cpp
770

Independent of this patch?

905

CanonicalIVStartValue is assumed to be non negative, right?

This is as close to a VPlan2VPlan transformation as we can get at the moment in general. The TC <= VF case (using lower-bound of 1 for UF) could be optimized as an early VPlan2VPlan transformation, considering VPlan's VF range. But UF is currently set after finalizing a VPlan, except if set manually and/or optimizing for size.

907

nit: can fold condition e.g. into && isa<ConstantInt>...

909

getFixedValue()?

912

(Could alternatively set the two operands of Term to the same value for trivial comparison, given that BranchOnCount actually means ReiterateWhileNotEqual. Currently BranchOnCond can also represent loop branches, and using it is clearer. May be good to have a single representation for loop branches, independent of this patch.)

This revision is now accepted and ready to land.Jun 2 2022, 3:07 AM
fhahn updated this revision to Diff 434399.Jun 6 2022, 1:33 AM
fhahn marked 4 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/VPlan.cpp
905

CanonicalIVStartValue is assumed to be non negative, right?

Ah yes, it is always non-negative. When preparing to execute the plan for the main vector loop it will be set to nullptr, for the epilogue vector loop it will be set to a positive start value.

But I think it only makes sense for now to simplify for the main vector loop, so I'll added a check.

This is as close to a VPlan2VPlan transformation as we can get at the moment in general. The TC <= VF case (using lower-bound of 1 for UF) could be optimized as an early VPlan2VPlan transformation, considering VPlan's VF range. But UF is currently set after finalizing a VPlan, except if set manually and/or optimizing for size.

Yes, I think it would be good to move this to a VPlan2VPlan transform as follow-up, but we need to think a bit more on how to model constraints on the UF I think.

907

Reduced the nesting level, thanks!

909

Unfortunately I think getKnownMinValue is needed to handle scalable VFs here. there are some test cases that would crash with getFixedValue.

912

May be good to have a single representation for loop branches, independent of this patch.

Sounds good as follow-up!

This revision was landed with ongoing or failed builds.Jun 6 2022, 1:39 AM
This revision was automatically updated to reflect the committed changes.

Thanks a lot for this @fhahn! I think this means I can now abandon D121899, right?

fhahn added a comment.Jun 7 2022, 2:15 PM

Thanks a lot for this @fhahn! I think this means I can now abandon D121899, right?

Yes I think so ,thanks!

fhahn added a comment.Jul 1 2022, 5:52 AM

this seems to be causing a miscompile, see https://github.com/llvm/llvm-project/issues/56319

Thanks, the issue is that in some cases with epilogue vectorization the main vector loop is effectively dead because we need to execute at least one iteration of the scalar loop, e.g. due to interleave groups. If the same plan is re-used for both the main vector loop and the epilogue vector loop, then the re-used plan has been optimized in a way that doesn't apply to the epilogue loop.

I pushed 0dddf04caba5, which disables the optimization when doing epilogue vectorization. I'll also work on a better fix, which is to have 2 separate plans for the different loops, which avoids the problem by construction.