27944 – Miscompile lencod (after Sanjoy's SCEV changes?)

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 27944 - Miscompile lencod (after Sanjoy's SCEV changes?)

Summary: Miscompile lencod (after Sanjoy's SCEV changes?)

Status:	RESOLVED FIXED

Alias:	None

Product:	Polly
Classification:	Unclassified
Component:	Other (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P normal
Assignee:	Polly Development Mailinglist

URL:
Keywords:

Duplicates (1):	27980 (view as bug list)
Depends on:
Blocks:

Reported:	2016-05-30 23:35 PDT by Tobias Grosser
Modified:	2016-07-11 07:21 PDT (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
Reduced test case (549 bytes, application/octet-stream) 2016-07-02 08:13 PDT, Tobias Grosser	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tobias Grosser 2016-05-30 23:35:24 PDT

It seems that after Sanjoys SCEV changes that happend between 115336 and 115351 a miscompile was introduced in FAIL: lencod.execution_time

http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable/builds/3034

To the the build-breakage that happened before it is not easily possible to track down the precise version.

Comment 1 Sanjoy Das 2016-05-30 23:53:22 PDT

Hi Tobias,

It look like it broke earlier, in http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable/builds/3026/ and the only SCEV change in that build is 271151.  I didn't notice this earlier because I thought this too was a failed build due to removing the `apply` method.

Do you know why the test is failing?  Is it a performance regression, or is the benchmark infinite looping?

Finally, can you point me to some instructions on how I can reproduce this?

Comment 2 Tobias Grosser 2016-06-01 08:12:56 PDT

Hi Sanjoy,

you are right. This is really just the one commit. I tried to revert it locally, but I unfortunately already get conflicts. The command to try this locally should be: 

lnt runtest nt --sandbox /tmp --cc llvm_build/bin/clang-3.9 --test-suite test-suite/ --only-test MultiSource/Applications/JM/lencod --mllvm=-polly --mllvm=-polly-position=before-vectorizer

I am currently traveling so I won't be able to debug this until next week. As the SCEV nsw stuff has many corner cases, it might very well be that this just triggers a Polly bug. So don't feel obliged to debug this trough. However, if you have some insights I would be glad to hear them.

Best,
Tobias

Comment 3 Tobias Grosser 2016-06-04 00:54:33 PDT

*** Bug 27980 has been marked as a duplicate of this bug. ***

Comment 4 Tobias Grosser 2016-06-11 12:37:28 PDT

@Johannes, this bug seems to be in code which you know best. Could you possibly have a look?



OK. This is really a nasty bug, but I can reproduce it. I first tried to create a big .bc file and then just  wanted to run bugpoint to reduce it. Unfortunately, this did not work for two reasons: 1) opt and llc/lli took minutes to just run this test case once. So everything took ages. 2) When specifying the individual passes one-by-one instead of using -O3, this bug disappears. :( So I had to revert to manually bisecting this bug. After a couple of hours debugging this is what I got:

1) https://llvm.org/svn/llvm-project/llvm/trunk@271151 is indeed the change that triggers this bug.

2) There is only a single mis-compile 

File: block.c
Function: dct_chroma
Region: %for.body660---%if.end1256

3) The bug disappears if I add -polly-invariant-load-hoisting=false

Before Sanjoys patch we get:

    Invariant Accesses: {
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_if_else764[i0] -> MemRef_197[0] };
            Execution Context: [qp_per_dc.0] -> {  :  }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 0] };
            Execution Context: [qp_per_dc.0] -> {  :  }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 2] };
            Execution Context: [qp_per_dc.0] -> {  :  }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 1] };
            Execution Context: [qp_per_dc.0] -> {  :  }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 3] };
            Execution Context: [qp_per_dc.0] -> {  :  }
    }

After Sanjoys patch we get:

    Invariant Accesses: {
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_if_else764[i0] -> MemRef_197[0] };
            Execution Context: [qp_per_dc.0] -> {  :  }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 0] };
            Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 2] };
            Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 1] };
            Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 3] };
            Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
    }

Without Load Hoisting we get:

    Invariant Accesses: {                                                        
    }                                                                            
    Context:                                                                     
    [qp_per_dc.0] -> {  : -2147483648 <= qp_per_dc.0 <= 2147483647 }             
    Assumed Context:                                                             
    [qp_per_dc.0] -> {  :  }                                                     
    Invalid Context:                                                             
    [qp_per_dc.0] -> {  : 1 = 0 }                                                
    p0: %qp_per_dc.0     

How to reproduce the bug:

Go to /home/grosser/Projects/polly/test-suite/MultiSource/Applications/JM/lencod and run:

for i in `ls *.c`; do echo $i; /home/grosser/tmp/llvm-project/llvm-project/build/bin/clang -O3 -mllvm -polly -mllvm -polly-process-unprofitable -mllvm -polly-position=before-vectorizer $i -c -o $i.o; done && clang *.o -o executable && ./executable -d data/encoder_small.cfg -p InputFile=data/foreman_part_qcif_444.yuv -p LeakyBucketRateFile=data/leakybucketrate.cfg -p QmatrixFile=data/q_matrix.cfg > executable.out && diff lencod.reference_output.small executable.out

Comment 5 Tobias Grosser 2016-07-02 08:13:56 PDT

Created attachment 16682 [details]
Reduced test case

Hi guys,

I just managed to reduce this test case further. Interestingly, the new test case failed already before Sanjoy's patch.

I see the following with latest Polly (274430):

$opt test.ll  -polly-scops  -polly-process-unprofitable -polly-invariant-load-hoisting=false -analyze

    Invariant Accesses: {
    }
    Context:
    {  :  }
    Assumed Context:
    {  :  }
    Invalid Context:
    {  : 1 = 0 }
    Alias Groups (0):
        n/a
    Statements {
    	Stmt_loop
...
    	Stmt_loop_next
            Domain :=
                { Stmt_loop_next[0] };
            Schedule :=
                { Stmt_loop_next[i0] -> [0, 1] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_loop_next[i0] -> MemRef_val__phi[] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                { Stmt_loop_next[i0] -> MemRef_a[1] };
    }

$opt test.ll  -polly-scops  -polly-process-unprofitable -analyze
    Invariant Accesses: {
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 0]
                { Stmt_loop_next[i0] -> MemRef_a[1] };
            Execution Context: {  : 1 = 0 }
    }
    Context:
    {  :  }
    Assumed Context:
    {  :  }
    Invalid Context:
    {  : 1 = 0 }
    Statements {
    	Stmt_loop
...
    	Stmt_loop_next
            Domain :=
                { Stmt_loop_next[0] };
            Schedule :=
                { Stmt_loop_next[i0] -> [0, 1] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_loop_next[i0] -> MemRef_val__phi[] };
    }

Even though Stmt_loop_next is always executed, it has an empty execution context when hoisted. This seems incorrect.

It seems the reason is that the subscript expression of the hoisted memory access
is "{0,+,4}<%loop>" and does not have nsw/nuw annotations. As a result, the
invalid context computed in MemoryAccess::getPwAff is non-empty '{ [i0] : i0 <= -2305843009213693953 or i0 >= 2305843009213693952 }' and will -- when projected on the parameters -- indicate that any execution of this memory access is invalid and consequently won't happen.

It seems we have a mismatch here. The invalid-context of the entire scop still allows the scop to be executed while we assume locally that some parts won't be executed.

The following patch fixes this bug as well as the LNT bug we see in MultiSource/Applications/JM/lencod:

--- a/lib/Analysis/ScopInfo.cpp
+++ b/lib/Analysis/ScopInfo.cpp
@@ -869,7 +869,9 @@ void MemoryAccess::dump() const { print(errs()); }
 __isl_give isl_pw_aff *MemoryAccess::getPwAff(const SCEV *E) {
   auto *Stmt = getStatement();
   PWACtx PWAC = Stmt->getParent()->getPwAff(E, Stmt->getEntryBlock());
-  InvalidDomain = isl_set_union(InvalidDomain, PWAC.second);
+  isl_set *StmtDom = isl_set_reset_tuple_id(getStatement()->getDomain());
+  isl_set *NewInvalidDom = isl_set_intersect(StmtDom, PWAC.second);
+  InvalidDomain = isl_set_union(InvalidDomain, NewInvalidDom);
   return PWAC.first;
 }

It restricts the invalid accesses to the memory accesses that are actually executed, which in this very case is enough to fix this issue as the only value of i0 that is every executed is i0 = 0, and the problematic values of i0 are larger.

It would probably also make sense to collect the invalid context from the MemoryAccesses to ensure we won't get into a situation again where these invalid contexts are used, but not enforced by our run-time checks.

Johannes, any thoughts on this?
 
Best,
Tobias

Comment 6 Tobias Grosser 2016-07-11 07:21:40 PDT

I committed the fix proposed below in r275053. This addresses the most obvious issues and should make the buildbots green again.

We may still need to review the load hoisting code more carefully. The fact that the run-time check we construct does not take into account the invalid-assumptions we derive for memory accesses looks rather fragile and would clearly benefit from some further review and testing. Unfortunately, Johannes is currently not available to help with these issues.