Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation of External/SPEC/CFP2006/444.namd #31599

Open
Meinersbur opened this issue Mar 13, 2017 · 4 comments
Open

Miscompilation of External/SPEC/CFP2006/444.namd #31599

Meinersbur opened this issue Mar 13, 2017 · 4 comments
Labels
bugzilla Issues migrated from bugzilla polly

Comments

@Meinersbur
Copy link
Member

Bugzilla Link 32251
Version unspecified
OS Windows NT
CC @thanm

Extended Description

SPEC CPU 2016 1.1 444.namd currently fails with Polly (-O3 -mllvm -polly -mllvm -polly-process-unprofitable). Trying to bisect the commit causing it, I could go back to r288174 where it still fails. This is the first commit where Polly compiles with current LLVM/Clang trunk (r297414)

This might be something uncovered by an LLVM change. My next step would be to bisect with https://github.com/llvm-project/llvm-project .

Command line:
$ /tmp/lnt/bin/python /tmp/lnt/bin/lnt runtest test-suite --sandbox /tmp/runtests-858grax9 --test-suite /root/src/llvm/projects/test-suite --cc /tmp/runtests-858grax9/cc --cxx /tmp/runtests-858grax9/c++ --use-lit /root/build/llvm/release/bin/llvm-lit --cmake-define TEST_SUITE_LLVM_SIZE=/root/build/llvm/release/bin/llvm-size --use-cmake /usr/bin/cmake --cmake-cache ReleaseNoLTO --no-timestamp --benchmarking-only --threads 8 --build-threads 8 --only-test External/SPEC/CFP2006/444.namd --cppflags -mllvm --cppflags -polly --cppflags -mllvm --cppflags -polly-process-unprofitable

[...]

FAIL: test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test (1 of 1)
******************** TEST 'test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test' FAILED ********************

/tmp/runtests-858grax9/build/tools/timeit --limit-core 0 --limit-cpu 7200 --timeout 7200 --limit-file-size 104857600 --limit-rss-size 838860800 --redirect-stdout /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.stdout --redirect-input /dev/null --summary /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/Output/444.namd.test.time /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/444.namd --input /root/src/llvm/projects/test-suite/test-suite-externals/speccpu2006/benchspec/CPU2006/444.namd/data/all/input/namd.input --iterations 1 --output /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.out
/tmp/runtests-858grax9/build/tools/fpcmp -a 0.00001 /root/src/llvm/projects/test-suite/test-suite-externals/speccpu2006/benchspec/CPU2006/444.namd/data/train/output/namd.out /tmp/runtests-858grax9/build/External/SPEC/CFP2006/444.namd/namd.out

/tmp/runtests-858grax9/build/tools/fpcmp: FP Comparison failed, not a numeric difference between ' ' and 'R'

@Meinersbur
Copy link
Member Author

Bisecting revealed commit r270559

Author: Than McIntosh thanm@google.com
Date: Tue May 24 13:23:44 2016 +0000

Rework/enhance stack coloring data flow analysis.

Replace bidirectional flow analysis to compute liveness with forward
analysis pass. Treat lifetimes as starting when there is a first
reference to the stack slot, as opposed to starting at the point of the
lifetime.start intrinsic, so as to increase the number of stack
variables we can overlap.

Reviewers: gbiv, qcolumbet, wmi
Differential Revision: http://reviews.llvm.org/D18827

Bug: 25776

being the cause. I assume this just revealed a bug in Polly. I find this somewhat surprising since I had SPEC2006 passing well after May last year. The only explanation I have is that test-suite/lnt changed as well since then (both were trunk and not bisected).

@Meinersbur
Copy link
Member Author

The issue are the lifetime markers left behind by Polly. They are removed from the generated code, but the original code still contains them.

In particular, the code before Polly was:

llvm.lifetime.start(&var)
[...]
llvm.lifetime.end(&var)

The code after Polly looks like:

if (VersioningCond) {
} else {
  llvm.lifetime.start(&var)
}
[...]
llvm.lifetime.end(&var)

Such that there are execution paths (the one taken in the test-suite execution) where llvm.lifetime.start is not executed before llvm.lifetime.end. According the semantics in http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic this doesn't seem wrong. It just means that &var is never dead before the llvm.lifetime.end marker.

This variant fixes it:

if (VersioningCond) {
  llvm.lifetime.start(&var)
} else {
  llvm.lifetime.start(&var)
}
[...]
llvm.lifetime.end(&var)

I don't know whether the fault is at Polly to generate such code or with http://reviews.llvm.org/D18827 which does not correctly interpret the lifetimes. If it's Polly's fault, then I don't still know whether the second version is correct or does not miscompile because of other reasons. Polly at the moment cannot ensure the correct positioning of llvm.lifetime.start (that is, before every use that originally comes after llvm.lifetime.start)

@thanm
Copy link
Contributor

thanm commented Mar 30, 2017

What's happening here is that stack coloring is deciding to overlap stack slot 0 (corresponding to the variable "iterations") with stack slot 1 (corresponding to the variable "simParams"), which is definitely incorrect -- these variables do not have disjoint lifetimes.

Looking at the IR in the base (-O3, no polly) case, the lifetime start marker for "iterations" appears towards the end of the entry block, meaning that the code for the first few lines of the functions looks roughly like

define i32 @​main(i32 %argc, i8** nocapture readonly %argv) ...
entry:
  %iterations = alloca i32, align 4
  %simParams = alloca %class.SimParameters, align 8
  ...
  %comp = alloca %class.ResultSet, align 8
  %0 = bitcast i32* %iterations to i8*
  call void @​llvm.lifetime.start(i64 4, i8* nonnull %0) #​11
  store i32 -1, i32* %iterations, align 4, !tbaa !​1
  %sub = and i32 %argc, 1
  %tobool = icmp eq i32 %sub, 0
  br i1 %tobool, label %if.then, label %for.cond.preheader

...

if.then:                                          ; preds = %entry
  %2 = load i8*, i8** %argv, align 8, !tbaa !​5
  tail call void @​_z10exit_usagepkc(i8* %2)
  unreachable

This is the normal pattern for a function's top-level local variables (e.g. lifetime start appears in the entry block). Looking at the coming into the optimizer with Polly turned on, I see the following instead:

define i32 @​main(i32 %argc, i8** nocapture readonly %argv) local_unnamed_addr #​3 personality i8* bitcast (i32 (...)* @​__gxx_personality_v0 to i8*) {
entry:
  %iterations = alloca i32, align 4
  %0 = bitcast i32* %iterations to i8*
  %simParams = alloca %class.SimParameters, align 8
  ...
  %comp = alloca %class.ResultSet, align 8
  %1 = and i32 %argc, 1
  %2 = icmp eq i32 %1, 0
  br i1 %2, label %if.then, label %for.cond.preheader

if.then:                                          ; preds = %entry
  call void @​llvm.lifetime.start(i64 4, i8* nonnull %0) #​11
  %3 = load i8*, i8** %argv, align 8, !tbaa !​1
  tail call void @​_Z10exit_usagePKc(i8* %3)
  unreachable

for.cond.preheader:                               ; preds = %entry
  store i32 -1, i32* %iterations, align 4, !alias.scope !​5, !noalias !​7
  %cmp528 = icmp sgt i32 %argc, 1
  br i1 %cmp528, label %for.body.lr.ph, label %if.then50

Note that the lifetime start marker for "iterations" has been relocated, pushed down into the block containing the no-return call to exit (oddly, the bitcast feeding the lifetime start still seems to be stuck up in the entry block, not sure why). It would be nice to understand why the initial placement of the markers is happening this way.

The current implementation of stack coloring uses forward data flow analysis, meaning that it computes live intervals by propagating uses of a given slot in block N to N's successors, until hitting a "lifetime end" marker for the slot.

In addition, a given slot is classified as "normal" or "conservative", where "normal" implies that all uses of the slot are dominated by the lifetime start marker and "conservative" means that we found some path from the entry block to a use of a slot that didn't go through a start marker for the slot.

For "normal" slots, the live interval for the slot is treated as starting at the first use for it after the start marker, whereas for "conservative" slots we treat the marker itself as the start of the lifetime (for more detail on this, see http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/StackColoring.cpp?view=markup&pathrev=297904#l161).

In this case the slots for both "iterations" and "simParams" are being treated conservatively, however the conservative treatment doesn't help much due to the migration of the start lifetime for slot 0 out of the entry block.

The reason this case works with the old version of stack coloring was that it used bidirectional flow analysis -- propagating not just from first uses down to lifetime ends, but also backwards (from a use block back to its predecessors), upwards within the control flow graph. This produces a more extensive lifetime for the "iterations" slot, resulting in an interference between the two slots that inhibits overlap.

I don't see an easy fix for this in StackColoring, short of going back to the bidirectional scheme. The old bidirectional analysis would "fix" the problem in this specific instance, but it is easy to think of examples where the bidirectional method would also fail if optimization passes are allowed to arbitrarily relocate markers.

I think at some level or another stack coloring has to trust that the lifetime start/end markers are a source of "truth". If optimization passes prior to stack coloring can move markers wherever they want, then there is really no point looking at the markers at all (which would greatly impair the effectiveness of stack coloring).

@Meinersbur
Copy link
Member Author

Fixed in r299585.

However, I am leaving this bug open because while r299585 fixes the problem for 444.namd, it may not fix it in the general case.

Given that the lifetime start itself is a conditional that is outside of the SCoP;

if (c) {
  llvm.lifetime.start(&var)
    [...]
} else {
  llvm.lifetime.start(&var)
    [...]
}
[...]
llvm.lifetime.end(&var)

then after r299585 we would generate

if (c) {
  if (VersioningCond) {
    [...]
  } else {
    [...]
  }
} else {
  llvm.lifetime.start(&var)
    [...]
}
[...]
llvm.lifetime.end(&var)

Where again some paths cross llvm.lifetime.start while others don't.

According the the required form suggested by Daniel Berlin in
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111669.html
we need to hoist the llvm.ifetime.start markers to before the loop versioning, respectively sink the llvm.ifetime.end markers behind it:

if (c) {
  llvm.lifetime.start(&var)
  if (VersioningCond) {
    [...]
  } else {
    [...]
  }
} else {
  llvm.lifetime.start(&var)
    [...]
}
[...]
llvm.lifetime.end(&var)

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla polly
Projects
None yet
Development

No branches or pull requests

2 participants