LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 32251 - Miscompilation of External/SPEC/CFP2006/444.namd
Summary: Miscompilation of External/SPEC/CFP2006/444.namd
Status: NEW
Alias: None
Product: Polly
Classification: Unclassified
Component: Optimizer (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P enhancement
Assignee: Polly Development Mailinglist
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-13 06:43 PDT by Michael Kruse
Modified: 2017-04-05 13:28 PDT (History)
2 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Kruse 2017-03-13 06:43:07 PDT
SPEC CPU 2016 1.1 444.namd currently fails with Polly (-O3 -mllvm -polly -mllvm -polly-process-unprofitable). Trying to bisect the commit causing it, I could go back to r288174 where it still fails. This is the first commit where Polly compiles with current LLVM/Clang trunk (r297414)

This might be something uncovered by an LLVM change. My next step would be to bisect with https://github.com/llvm-project/llvm-project .

Command line:
$ /tmp/lnt/bin/python /tmp/lnt/bin/lnt runtest test-suite --sandbox /tmp/runtests-858grax9 --test-suite /root/src/llvm/projects/test-suite --cc /tmp/runtests-858grax9/cc --cxx /tmp/runtests-858grax9/c++ --use-lit /root/build/llvm/release/bin/llvm-lit --cmake-define TEST_SUITE_LLVM_SIZE=/root/build/llvm/release/bin/llvm-size --use-cmake /usr/bin/cmake --cmake-cache ReleaseNoLTO --no-timestamp --benchmarking-only --threads 8 --build-threads 8 --only-test External/SPEC/CFP2006/444.namd --cppflags -mllvm --cppflags -polly --cppflags -mllvm --cppflags -polly-process-unprofitable

[...]

FAIL: test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test (1 of 1)
******************** TEST 'test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test' FAILED ********************

/tmp/runtests-858grax9/build/tools/timeit --limit-core 0 --limit-cpu 7200 --timeout 7200 --limit-file-size 104857600 --limit-rss-size 838860800 --redirect-stdout /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.stdout --redirect-input /dev/null --summary /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/Output/444.namd.test.time /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/444.namd --input /root/src/llvm/projects/test-suite/test-suite-externals/speccpu2006/benchspec/CPU2006/444.namd/data/all/input/namd.input --iterations 1 --output /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.out
/tmp/runtests-858grax9/build/tools/fpcmp -a 0.00001 /root/src/llvm/projects/test-suite/test-suite-externals/speccpu2006/benchspec/CPU2006/444.namd/data/train/output/namd.out /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.out

/tmp/runtests-858grax9/build/tools/fpcmp: FP Comparison failed, not a numeric difference between ' ' and 'R'
Comment 1 Michael Kruse 2017-03-20 14:30:13 PDT
Bisecting revealed commit r270559

Author: Than McIntosh <thanm@google.com>
Date:   Tue May 24 13:23:44 2016 +0000

    Rework/enhance stack coloring data flow analysis.

    Replace bidirectional flow analysis to compute liveness with forward
    analysis pass. Treat lifetimes as starting when there is a first
    reference to the stack slot, as opposed to starting at the point of the
    lifetime.start intrinsic, so as to increase the number of stack
    variables we can overlap.

    Reviewers: gbiv, qcolumbet, wmi
    Differential Revision: http://reviews.llvm.org/D18827

    Bug: 25776

being the cause. I assume this just revealed a bug in Polly. I find this somewhat surprising since I had SPEC2006 passing well after May last year. The only explanation I have is that test-suite/lnt changed as well since then (both were trunk and not bisected).
Comment 2 Michael Kruse 2017-03-26 08:03:38 PDT
The issue are the lifetime markers left behind by Polly. They are removed from the generated code, but the original code still contains them. 

In particular, the code before Polly was:

    llvm.lifetime.start(&var)
    [...]
    llvm.lifetime.end(&var)

The code after Polly looks like:

    if (VersioningCond) {
    } else {
      llvm.lifetime.start(&var)
    }
    [...]
    llvm.lifetime.end(&var)

Such that there are execution paths (the one taken in the test-suite execution) where llvm.lifetime.start is not executed before llvm.lifetime.end. According the semantics in http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic this doesn't seem wrong. It just means that &var is never dead before the llvm.lifetime.end marker.

This variant fixes it:

    if (VersioningCond) {
      llvm.lifetime.start(&var)
    } else {
      llvm.lifetime.start(&var)
    }
    [...]
    llvm.lifetime.end(&var)


I don't know whether the fault is at Polly to generate such code or with http://reviews.llvm.org/D18827 which does not correctly interpret the lifetimes. If it's Polly's fault, then I don't still know whether the second version is correct or does not miscompile because of other reasons. Polly at the moment cannot ensure the correct positioning of llvm.lifetime.start (that is, before every use that originally comes after llvm.lifetime.start)
Comment 3 Than McIntosh 2017-03-30 13:10:55 PDT
What's happening here is that stack coloring is deciding to overlap stack slot 0 (corresponding to the variable "iterations") with stack slot 1 (corresponding to the variable "simParams"), which is definitely incorrect -- these variables do not have disjoint lifetimes.

Looking at the IR in the base (-O3, no polly) case, the lifetime start marker for "iterations" appears towards the end of the entry block, meaning that the code for the first few lines of the functions looks roughly like

    define i32 @main(i32 %argc, i8** nocapture readonly %argv) ...
    entry:
      %iterations = alloca i32, align 4
      %simParams = alloca %class.SimParameters, align 8
      ...
      %comp = alloca %class.ResultSet, align 8
      %0 = bitcast i32* %iterations to i8*
      call void @llvm.lifetime.start(i64 4, i8* nonnull %0) #11
      store i32 -1, i32* %iterations, align 4, !tbaa !1
      %sub = and i32 %argc, 1
      %tobool = icmp eq i32 %sub, 0
      br i1 %tobool, label %if.then, label %for.cond.preheader

    ...

    if.then:                                          ; preds = %entry
      %2 = load i8*, i8** %argv, align 8, !tbaa !5
      tail call void @_z10exit_usagepkc(i8* %2)
      unreachable

This is the normal pattern for a function's top-level local variables (e.g. lifetime start appears in the entry block). Looking at the coming into the optimizer with Polly turned on, I see the following instead:

    define i32 @main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #3 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
    entry:
      %iterations = alloca i32, align 4
      %0 = bitcast i32* %iterations to i8*
      %simParams = alloca %class.SimParameters, align 8
      ...
      %comp = alloca %class.ResultSet, align 8
      %1 = and i32 %argc, 1
      %2 = icmp eq i32 %1, 0
      br i1 %2, label %if.then, label %for.cond.preheader

    if.then:                                          ; preds = %entry
      call void @llvm.lifetime.start(i64 4, i8* nonnull %0) #11
      %3 = load i8*, i8** %argv, align 8, !tbaa !1
      tail call void @_Z10exit_usagePKc(i8* %3)
      unreachable

    for.cond.preheader:                               ; preds = %entry
      store i32 -1, i32* %iterations, align 4, !alias.scope !5, !noalias !7
      %cmp528 = icmp sgt i32 %argc, 1
      br i1 %cmp528, label %for.body.lr.ph, label %if.then50

Note that the lifetime start marker for "iterations" has been relocated, pushed down into the block containing the no-return call to exit (oddly, the bitcast feeding the lifetime start still seems to be stuck up in the entry block, not sure why). It would be nice to understand why the initial placement of the markers is happening this way.

The current implementation of stack coloring uses forward data flow analysis, meaning that it computes live intervals by propagating uses of a given slot in block N to N's successors, until hitting a "lifetime end" marker for the slot.

In addition, a given slot is classified as "normal" or "conservative", where "normal" implies that all uses of the slot are dominated by the lifetime start marker and "conservative" means that we found some path from the entry block to a use of a slot that didn't go through a start marker for the slot.

For "normal" slots, the live interval for the slot is treated as starting at the first use for it after the start marker, whereas for "conservative" slots we treat the marker itself as the start of the lifetime (for more detail on this, see http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/StackColoring.cpp?view=markup&pathrev=297904#l161).

In this case the slots for both "iterations" and "simParams" are being treated conservatively, however the conservative treatment doesn't help much due to the migration of the start lifetime for slot 0 out of the entry block. 

The reason this case works with the old version of stack coloring was that it used bidirectional flow analysis -- propagating not just from first uses down to lifetime ends, but also backwards (from a use block back to its predecessors), upwards within the control flow graph. This produces a more extensive lifetime for the "iterations" slot, resulting in an interference between the two slots that inhibits overlap.

I don't see an easy fix for this in StackColoring, short of going back to the bidirectional scheme. The old bidirectional analysis would "fix" the problem in this specific instance, but it is easy to think of examples where the bidirectional method would also fail if optimization passes are allowed to arbitrarily relocate markers.

I think at some level or another stack coloring has to trust that the lifetime start/end markers are a source of "truth". If optimization passes prior to stack coloring can move markers wherever they want, then there is really no point looking at the markers at all (which would greatly impair the effectiveness of stack coloring).
Comment 4 Michael Kruse 2017-04-05 13:28:20 PDT
Fixed in r299585.

However, I am leaving this bug open because while r299585 fixes the problem for 444.namd, it may not fix it in the general case.

Given that the lifetime start itself is a conditional that is outside of the SCoP;

    if (c) {
      llvm.lifetime.start(&var)
        [...]
    } else {
      llvm.lifetime.start(&var)
        [...]
    }
    [...]
    llvm.lifetime.end(&var)

then after r299585 we would generate

    if (c) {
      if (VersioningCond) {
        [...]
      } else {
        [...]
      }
    } else {
      llvm.lifetime.start(&var)
        [...]
    }
    [...]
    llvm.lifetime.end(&var)

Where again some paths cross llvm.lifetime.start while others don't.

According the the required form suggested by Daniel Berlin in
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111669.html
we need to hoist the llvm.ifetime.start markers to before the loop versioning, respectively sink the llvm.ifetime.end markers behind it:

    if (c) {
      llvm.lifetime.start(&var)
      if (VersioningCond) {
        [...]
      } else {
        [...]
      }
    } else {
      llvm.lifetime.start(&var)
        [...]
    }
    [...]
    llvm.lifetime.end(&var)