42405 – [ARM] SCE optimization pass hoisting DIV operation outside of is-else BB as a speculative execution

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 42405 - [ARM] SCE optimization pass hoisting DIV operation outside of is-else BB as a speculative execution

Summary: [ARM] SCE optimization pass hoisting DIV operation outside of is-else BB as a...

Status:	NEW

Alias:	None

Product:	tools
Classification:	Unclassified
Component:	llc (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Anton Afanasyev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-06-26 03:44 PDT by Ayman Musa
Modified:	2019-08-13 04:10 PDT (History)
CC List:	10 users (show)

See Also:
Fixed By Commit(s):

Attachments
reproducer mir file (4.78 KB, text/plain) 2019-06-26 03:44 PDT, Ayman Musa	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ayman Musa 2019-06-26 03:44:55 PDT

Created attachment 22144 [details]
reproducer mir file

Attached is an MIR test (reproducer.mir) that shows a case where the original sequence performs the DIV operation only if the divisor is not 0. But after running MachineCSE pass we can see that the DIV was hoisted outside of the if-else BBs, what would result in an integer division by zero.

The command line for running the pass is:
llc -mtriple=aarch64-none-linux-gnu -run-pass=machine-cse -asm-verbose=1 -verify-machineinstrs reproducer.mir -o after-pass

Most likely that this bug was introduced in: https://reviews.llvm.org/rL362901.
Owner of the change is already investigating the issue.

Comment 1 Tim Northover 2019-06-26 04:07:56 PDT

Why is this an issue for AArch64? Only (32-bit) M-class and R-class CPUs can trap on division by zero.

Comment 2 Igor Breger 2019-06-26 04:15:55 PDT

(In reply to Tim Northover from comment #1)
> Why is this an issue for AArch64? Only (32-bit) M-class and R-class CPUs can
> trap on division by zero.

X86 (didn't check other targets) for example mark SDIV/UDIV as hasSideEffect and it prevent the optimization/bug.

Comment 3 Tim Northover 2019-06-26 04:17:38 PDT

I'm asking why this is a bug at all on AArch64 (the target in your llc invocation). I could see it being a problem on some ARMv7, possibly also resolved by marking division as having side-effects.

Comment 4 Igor Breger 2019-06-26 04:26:23 PDT

I am not familiar with AArch64 target, we only use it as reproducer to show the problem ( we have out of tree target) .

Comment 5 Anton Afanasyev 2019-06-26 04:47:58 PDT

I believe adding `MCID::UnmodeledSideEffects` flag to arm `DIV` instr would be a correct fix of this bug. Ordinary instructions (like add or shift) still could be correctly hoisted in such a way for the given CFG.

Comment 6 Tim Northover 2019-06-26 04:50:34 PDT

I think that's what I'm leaning towards too. It's a shame "mayRaiseFPException" is named so specifically because it feels even more apt otherwise.

Comment 7 Igor Breger 2019-06-26 04:57:57 PDT

I don't think that UnmodeledSideEffects is the solution, it limit Scheduler a lot.  
I also think that in-order HW will get performance 
degradation due to https://reviews.llvm.org/rL362901. Is it code size or performance optimization ?

Comment 8 Tim Northover 2019-06-26 05:03:39 PDT

> I don't think that UnmodeledSideEffects is the solution, it limit Scheduler a lot.  

I'm sympathetic, but unless you can come up with a way it *is* modeled there's not really any alternative. It clearly has some undesirable side-effect.

Though the more I look at mayRaiseFPException, the more appropriate it looks for this use-case except for the name. In principle we can even arrange the "FPExcept" flag to be set correctly based on the architecture in ARM.

Comment 9 Anton Afanasyev 2019-06-26 08:49:11 PDT

(In reply to Igor Breger from comment #7)
> I don't think that UnmodeledSideEffects is the solution, it limit Scheduler
> a lot.  
> I also think that in-order HW will get performance 
> degradation due to https://reviews.llvm.org/rL362901. Is it code size or
> performance optimization ?

It's performance optimization primarily, since it eliminates partial redundancy of computation.

I can also add special condition to prevent hoisting for such CFG (not so strong as suggested here https://reviews.llvm.org/rL362901#663831 though) since it makes no performance effect (only code size one). This would be a workaround for CSE step possibly not being able to recognize regs pressure causing by elimination, but it is an other issue, not related to the original bug (though it fixes it).

The original bug should be fixed by something like "mayRaiseException" flag.

Comment 10 Ayman Musa 2019-06-27 00:44:11 PDT

IMHO, such an optimization, which adds a speculative execution of an operation, especially when done in Machine instruction level, should be very careful when hoisting an operation as a speculative execute.
This at least requires a dedicated flag like "canSpeculate" or "mayRaiseException" as stated by Anton.
Marking such an instruction with hasSideEffect is too aggressive and will result in an unjustified performance damage.
Or at least, to enable a target hook check that would exclude specific machine instructions from being candidates.

Comment 11 Anton Afanasyev 2019-06-28 08:12:08 PDT

Btw, I've made a modification to PRE condition preventing it from useless hoisting: https://reviews.llvm.org/D63934. This change fixes this bug implicitly but I still believe the actual solution should involve checking special flag like "mayRaiseFPException".

Comment 12 Alina Sbirlea 2019-07-08 14:39:44 PDT

FWIW, the failure I'm seeing in Halide due to https://reviews.llvm.org/rL362901 is not resolved https://reviews.llvm.org/D63934.

Comment 13 Anton Afanasyev 2019-07-08 15:23:30 PDT

Hi Alina, could you provide me with test case or at least its control flow graph? We can separate your case from this issue.

Comment 14 Alina Sbirlea 2019-07-09 09:54:54 PDT

Yes, I am working on it.

Comment 15 Anton Afanasyev 2019-07-31 08:55:30 PDT

(Moving discussion back to this thread from 
https://reviews.llvm.org/D63934)

Ping! Hi Alina, Ayman, Igor, I've asked before (https://reviews.llvm.org/D63934#1592654) whether the new patch from Kai Luo (https://reviews.llvm.org/D64394) fixes your particular issues. I can revert my original patch if it is not the case.

Comment 16 Alina Sbirlea 2019-07-31 12:16:02 PDT

Hi Anton,

Apologies for the delayed reply. We no longer see the original failure, but it appears it is not due to changes in your original patch, that triggered the failure. The possible source of the fix may be in the HVX backend.
Please rely on the other folks' input on whether to revert r362901.

Comment 17 Igor Breger 2019-08-01 00:25:26 PDT

Hi Anton,

Apologies for the delayed reply.
The Kai Luo patch doesn't fix the issue.
The alternative to revert it to let to Subtarget to decide if speculative execution allowed and if PerformSimplePRE optimization should/can be done.

Comment 18 Anton Afanasyev 2019-08-09 11:05:07 PDT

Hi all, I've benchmarked the effect of the revertion my and @lkail patches.
The benchmark showed some increase of the `exec_time` (details are here https://reviews.llvm.org/D56772#1623312).
So I'd like to leave these changes unreverted and to proceed with particular patches. Please give LGTM to this patch: https://reviews.llvm.org/D63934. I'm to prepare the next patch for Subtarget allowance of speculative execution.
Please tell if you have any objections.

Comment 19 Anton Afanasyev 2019-08-13 03:32:46 PDT

(In reply to Igor Breger from comment #17)
> Hi Anton,
> 
> Apologies for the delayed reply.
> The Kai Luo patch doesn't fix the issue.
> The alternative to revert it to let to Subtarget to decide if speculative
> execution allowed and if PerformSimplePRE optimization should/can be done.

Hi Igor,

Does this patch https://reviews.llvm.org/D66132 is what you need to fix the issue? You can set `isSpeculativeExecutionForbidden()` to true for your particular Subtarget. Alternative way is to set `hasSideEffect()` to particular instructions.

Comment 20 Igor Breger 2019-08-13 04:10:20 PDT

(In reply to Anton Afanasyev from comment #19)
> (In reply to Igor Breger from comment #17)
> > Hi Anton,
> > 
> > Apologies for the delayed reply.
> > The Kai Luo patch doesn't fix the issue.
> > The alternative to revert it to let to Subtarget to decide if speculative
> > execution allowed and if PerformSimplePRE optimization should/can be done.
> 
> Hi Igor,
> 
> Does this patch https://reviews.llvm.org/D66132 is what you need to fix the
> issue? You can set `isSpeculativeExecutionForbidden()` to true for your
> particular Subtarget. Alternative way is to set `hasSideEffect()` to
> particular instructions.

Yes, Thank you!