Miscompile lencod (after Sanjoy's SCEV changes?) #28318

tobiasgrosser · 2016-05-31T06:35:24Z


Bugzilla Link	27944
Resolution	FIXED
Resolved on	Jul 11, 2016 07:21
Version	unspecified
OS	Linux
CC	@hfinkel,@jdoerfert,@jdoerfert,@sanjoy

Extended Description

It seems that after Sanjoys SCEV changes that happend between 115336 and 115351 a miscompile was introduced in FAIL: lencod.execution_time

http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable/builds/3034

To the the build-breakage that happened before it is not easily possible to track down the precise version.

sanjoy · 2016-05-31T06:53:22Z

Hi Tobias,

It look like it broke earlier, in http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable/builds/3026/ and the only SCEV change in that build is 271151. I didn't notice this earlier because I thought this too was a failed build due to removing the apply method.

Do you know why the test is failing? Is it a performance regression, or is the benchmark infinite looping?

Finally, can you point me to some instructions on how I can reproduce this?

tobiasgrosser · 2016-06-01T15:12:56Z

Hi Sanjoy,

you are right. This is really just the one commit. I tried to revert it locally, but I unfortunately already get conflicts. The command to try this locally should be:

lnt runtest nt --sandbox /tmp --cc llvm_build/bin/clang-3.9 --test-suite test-suite/ --only-test MultiSource/Applications/JM/lencod --mllvm=-polly --mllvm=-polly-position=before-vectorizer

I am currently traveling so I won't be able to debug this until next week. As the SCEV nsw stuff has many corner cases, it might very well be that this just triggers a Polly bug. So don't feel obliged to debug this trough. However, if you have some insights I would be glad to hear them.

Best,
Tobias

tobiasgrosser · 2016-06-04T07:54:33Z

*** Bug llvm/llvm-bugzilla-archive#27980 has been marked as a duplicate of this bug. ***

tobiasgrosser · 2016-06-11T19:37:28Z

@Johannes, this bug seems to be in code which you know best. Could you possibly have a look?

OK. This is really a nasty bug, but I can reproduce it. I first tried to create a big .bc file and then just wanted to run bugpoint to reduce it. Unfortunately, this did not work for two reasons: 1) opt and llc/lli took minutes to just run this test case once. So everything took ages. 2) When specifying the individual passes one-by-one instead of using -O3, this bug disappears. :( So I had to revert to manually bisecting this bug. After a couple of hours debugging this is what I got:

https://llvm.org/svn/llvm-project/llvm/trunk@271151 is indeed the change that triggers this bug.
There is only a single mis-compile

File: block.c
Function: dct_chroma
Region: %for.body660---%if.end1256

The bug disappears if I add -polly-invariant-load-hoisting=false

Before Sanjoys patch we get:

Invariant Accesses: {
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_if_else764[i0] -> MemRef_197[0] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 0] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 2] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 1] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 3] };
        Execution Context: [qp_per_dc.0] -> {  :  }
}

After Sanjoys patch we get:

Invariant Accesses: {
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_if_else764[i0] -> MemRef_197[0] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 0] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 2] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 1] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 3] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
}

Without Load Hoisting we get:

Invariant Accesses: {                                                        
}                                                                            
Context:                                                                     
[qp_per_dc.0] -> {  : -2147483648 <= qp_per_dc.0 <= 2147483647 }             
Assumed Context:                                                             
[qp_per_dc.0] -> {  :  }                                                     
Invalid Context:                                                             
[qp_per_dc.0] -> {  : 1 = 0 }                                                
p0: %qp_per_dc.0

How to reproduce the bug:

Go to /home/grosser/Projects/polly/test-suite/MultiSource/Applications/JM/lencod and run:

for i in ls *.c; do echo $i; /home/grosser/tmp/llvm-project/llvm-project/build/bin/clang -O3 -mllvm -polly -mllvm -polly-process-unprofitable -mllvm -polly-position=before-vectorizer $i -c -o $i.o; done && clang *.o -o executable && ./executable -d data/encoder_small.cfg -p InputFile=data/foreman_part_qcif_444.yuv -p LeakyBucketRateFile=data/leakybucketrate.cfg -p QmatrixFile=data/q_matrix.cfg > executable.out && diff lencod.reference_output.small executable.out

tobiasgrosser · 2016-07-02T15:13:56Z

Reduced test case
Hi guys,

I just managed to reduce this test case further. Interestingly, the new test case failed already before Sanjoy's patch.

I see the following with latest Polly (274430):

$opt test.ll -polly-scops -polly-process-unprofitable -polly-invariant-load-hoisting=false -analyze

Invariant Accesses: {
}
Context:
{  :  }
Assumed Context:
{  :  }
Invalid Context:
{  : 1 = 0 }
Alias Groups (0):
    n/a
Statements {
	Stmt_loop

...
Stmt_loop_next
Domain :=
{ Stmt_loop_next[0] };
Schedule :=
{ Stmt_loop_next[i0] -> [0, 1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_loop_next[i0] -> MemRef_val__phi[] };
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_loop_next[i0] -> MemRef_a[1] };
}

$opt test.ll -polly-scops -polly-process-unprofitable -analyze
Invariant Accesses: {
ReadAccess := [Reduction Type: NONE] [Scalar: 0]
{ Stmt_loop_next[i0] -> MemRef_a[1] };
Execution Context: { : 1 = 0 }
}
Context:
{ : }
Assumed Context:
{ : }
Invalid Context:
{ : 1 = 0 }
Statements {
Stmt_loop
...
Stmt_loop_next
Domain :=
{ Stmt_loop_next[0] };
Schedule :=
{ Stmt_loop_next[i0] -> [0, 1] };
MustWriteAccess := [Reduction Type: NONE] [Scalar: 1]
{ Stmt_loop_next[i0] -> MemRef_val__phi[] };
}

Even though Stmt_loop_next is always executed, it has an empty execution context when hoisted. This seems incorrect.

It seems the reason is that the subscript expression of the hoisted memory access
is "{0,+,4}<%loop>" and does not have nsw/nuw annotations. As a result, the
invalid context computed in MemoryAccess::getPwAff is non-empty '{ [i0] : i0 <= -2305843009213693953 or i0 >= 2305843009213693952 }' and will -- when projected on the parameters -- indicate that any execution of this memory access is invalid and consequently won't happen.

It seems we have a mismatch here. The invalid-context of the entire scop still allows the scop to be executed while we assume locally that some parts won't be executed.

The following patch fixes this bug as well as the LNT bug we see in MultiSource/Applications/JM/lencod:

--- a/lib/Analysis/ScopInfo.cpp
+++ b/lib/Analysis/ScopInfo.cpp
@@ -869,7 +869,9 @@ void MemoryAccess::dump() const { print(errs()); }
__isl_give isl_pw_aff *MemoryAccess::getPwAff(const SCEV *E) {
auto *Stmt = getStatement();
PWACtx PWAC = Stmt->getParent()->getPwAff(E, Stmt->getEntryBlock());

InvalidDomain = isl_set_union(InvalidDomain, PWAC.second);

isl_set *StmtDom = isl_set_reset_tuple_id(getStatement()->getDomain());
isl_set *NewInvalidDom = isl_set_intersect(StmtDom, PWAC.second);
InvalidDomain = isl_set_union(InvalidDomain, NewInvalidDom);
return PWAC.first;
}

It restricts the invalid accesses to the memory accesses that are actually executed, which in this very case is enough to fix this issue as the only value of i0 that is every executed is i0 = 0, and the problematic values of i0 are larger.

It would probably also make sense to collect the invalid context from the MemoryAccesses to ensure we won't get into a situation again where these invalid contexts are used, but not enforced by our run-time checks.

Johannes, any thoughts on this?

Best,
Tobias

tobiasgrosser · 2016-07-11T14:21:40Z

I committed the fix proposed below in r275053. This addresses the most obvious issues and should make the buildbots green again.

We may still need to review the load hoisting code more carefully. The fact that the run-time check we construct does not take into account the invalid-assumptions we derive for memory accesses looks rather fragile and would clearly benefit from some further review and testing. Unfortunately, Johannes is currently not available to help with these issues.

tobiasgrosser · 2021-11-26T21:53:03Z

mentioned in issue llvm/llvm-bugzilla-archive#27980

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscompile lencod (after Sanjoy's SCEV changes?) #28318

Miscompile lencod (after Sanjoy's SCEV changes?) #28318

tobiasgrosser commented May 31, 2016

sanjoy commented May 31, 2016

tobiasgrosser commented Jun 1, 2016

tobiasgrosser commented Jun 4, 2016

tobiasgrosser commented Jun 11, 2016

tobiasgrosser commented Jul 2, 2016

tobiasgrosser commented Jul 11, 2016

tobiasgrosser commented Nov 26, 2021

Miscompile lencod (after Sanjoy's SCEV changes?) #28318

Miscompile lencod (after Sanjoy's SCEV changes?) #28318

Comments

tobiasgrosser commented May 31, 2016

Extended Description

sanjoy commented May 31, 2016

tobiasgrosser commented Jun 1, 2016

tobiasgrosser commented Jun 4, 2016

tobiasgrosser commented Jun 11, 2016

tobiasgrosser commented Jul 2, 2016

tobiasgrosser commented Jul 11, 2016

tobiasgrosser commented Nov 26, 2021