19029 – clang 3.4 aborts when compiling dlaed3_ function in Numeric-24.2 on i386 with -fPIC -march=athlon64

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 19029 - clang 3.4 aborts when compiling dlaed3_ function in Numeric-24.2 on i386 with -fPIC -march=athlon64

Summary: clang 3.4 aborts when compiling dlaed3_ function in Numeric-24.2 on i386 with...

Status:	NEW

Alias:	None

Product:	clang
Classification:	Unclassified
Component:	-New Bugs (show other bugs)
Version:	3.4
Hardware:	PC FreeBSD

Importance:	P normal
Assignee:	Unassigned Clang Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-03-03 12:58 PST by Don Lewis
Modified:	2018-10-25 20:12 PDT (History)
CC List:	6 users (show)

See Also:
Fixed By Commit(s):

Attachments
run script (412 bytes, text/plain) 2014-03-03 13:00 PST, Don Lewis	Details
preprocessed source (14.69 KB, text/plain) 2014-03-03 13:01 PST, Don Lewis	Details
More general testcase, reproduces with any target CPU (726 bytes, application/octet-stream) 2014-04-23 01:48 PDT, Dimitry Andric	Details
.ll version of first reduced testcase (5.88 KB, application/octet-stream) 2014-04-23 13:30 PDT, Dimitry Andric	Details
.ll version of second reduced testcase (7.80 KB, application/octet-stream) 2014-04-23 13:32 PDT, Dimitry Andric	Details
Tarball with intermediate .ll files (8.83 KB, application/x-tar) 2014-04-23 15:43 PDT, Dimitry Andric	Details
IR of pr19029-1 just before FPPassManager's Loop Vectorization pass (2.89 KB, application/octet-stream) 2014-04-25 18:29 PDT, Dimitry Andric	Details
IR of pr19029-1 after FPPassManager's Loop Vectorization pass (8.42 KB, application/octet-stream) 2014-04-25 18:30 PDT, Dimitry Andric	Details
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Don Lewis 2014-03-03 12:58:42 PST

When attempting to compile the dlapack_lite.c file in the python Numeric-24.2 module, clang aborts when it encounters the dlaed3_() function if -march=athlon64 is specified.  Clang does not abort without -march=athlon64.

# cc -c -O2 -fno-strict-aliasing -fPIC -march=athlon64 dlaed3_.c
Instruction does not dominate all uses!
  %arrayidx106 = getelementptr inbounds double* %dlamda, i32 %sub83
  %bound1492 = icmp ule double* %arrayidx106, %scevgep473
Instruction does not dominate all uses!
  %arrayidx106 = getelementptr inbounds double* %dlamda, i32 %sub83
  %bound0491 = icmp ule double* %scevgep471, %arrayidx106
Broken module found, compilation aborted!
Stack dump:
0.	Program arguments: /usr/bin/cc -cc1 -triple i386-unknown-freebsd11.0 -emit-obj -disable-free -main-file-name dlaed3_.c -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -relaxed-aliasing -masm-verbose -mconstructor-aliases -target-cpu athlon64 -coverage-file /usr/ports/math/py-numeric/work/Numeric-24.2/Src/dlaed3_.o -resource-dir /usr/bin/../lib/clang/3.4 -O2 -fdebug-compilation-dir /usr/ports/math/py-numeric/work/Numeric-24.2/Src -ferror-limit 19 -fmessage-length 191 -mstackrealign -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o dlaed3_.o -x c dlaed3_.c 
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'dlaed3_.c'.
4.	Running pass 'Module Verifier' on function '@dlaed3_'
cc: error: unable to execute command: Abort trap (core dumped)
cc: error: clang frontend command failed due to signal (use -v to see invocation)
FreeBSD clang version 3.4 (tags/RELEASE_34/final 197956) 20140216
Target: i386-unknown-freebsd11.0
Thread model: posix
cc: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
cc: note: diagnostic msg: 
********************

Comment 1 Don Lewis 2014-03-03 13:00:41 PST

Created attachment 12178 [details]
run script

Comment 2 Don Lewis 2014-03-03 13:01:52 PST

Created attachment 12179 [details]
preprocessed source

Comment 3 Dimitry Andric 2014-04-20 12:30:16 PDT

Testcase reduces to just this:

a;
dlaed3_(double *q, double *dlamda, double *w) {
  int b;
  static c, j;
  --dlamda;
  -a;
  for (;; ++j) {
    b = j - 1;
    for (; c <= b; ++c)
      w[c] = q[j] - dlamda[j];
  }
}

Strangely enough, it seems to be fixed by:

http://llvm.org/viewvc/llvm-project?view=revision&revision=205264

It also fixes a very similar-looking bug reported by a user of the FreeBSD editors/libreoffice port here:

http://www.freebsd.org/cgi/query-pr.cgi?pr=187177

Hal, I've put you on CC since you are the author of that commit.  Any idea if the commit might be just hiding some other problem?

Comment 4 Hal Finkel 2014-04-22 13:53:23 PDT

> 
> Hal, I've put you on CC since you are the author of that commit.  Any idea
> if the commit might be just hiding some other problem?

That commit did not fix anything, but did change some pass ordering. I'm fairly certain that anything "fixed" by that commit is now just hidden. If you compile with -fno-unroll-loops does the bug come back?

Comment 5 Dimitry Andric 2014-04-22 15:54:09 PDT

(In reply to comment #4)
> If you compile with -fno-unroll-loops does the bug come back?

Yep, with trunk r206915 and -fno-unroll-loops, it bombs again:

$ /share/dim/llvm/206915-trunk-freebsd11-i386-ninja-rel-1/bin/clang -cc1 -triple i386-unknown-freebsd11.0 -emit-obj -disable-free -main-file-name pr19029-reduced.c -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -relaxed-aliasing -masm-verbose -mconstructor-aliases -target-cpu athlon64 -O2 -ferror-limit 19 -fmessage-length 191 -mstackrealign -fobjc-runtime=gnustep -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -fno-unroll-loops -x c pr19029-reduced.c
pr19029-reduced.c:1:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
a;
^
pr19029-reduced.c:2:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
dlaed3_(double *q, double *dlamda, double *w) {
^~~~~~~
pr19029-reduced.c:4:10: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
  static c, j;
  ~~~~~~ ^
pr19029-reduced.c:4:13: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
  static c, j;
  ~~~~~~    ^
pr19029-reduced.c:6:3: warning: expression result unused [-Wunused-value]
  -a;
  ^~
Instruction does not dominate all uses!
  %27 = getelementptr inbounds double* %dlamda, i32 %6
  %bound112 = icmp ule double* %27, %scevgep6
Instruction does not dominate all uses!
  %27 = getelementptr inbounds double* %dlamda, i32 %6
  %bound011 = icmp ule double* %scevgep, %27
Instruction does not dominate all uses!
  %25 = getelementptr inbounds double* %q, i32 %3
  %bound1 = icmp ule double* %25, %scevgep6
Instruction does not dominate all uses!
  %25 = getelementptr inbounds double* %q, i32 %3
  %bound0 = icmp ule double* %scevgep, %25
fatal error: error in backend: Broken function found, compilation aborted!

Comment 6 Dimitry Andric 2014-04-22 17:19:09 PDT

(In reply to comment #5)
...
> Yep, with trunk r206915 and -fno-unroll-loops, it bombs again:

By bisecting backwards, I found out this error seems to have been introduced here:

http://llvm.org/viewvc/llvm-project?view=revision&revision=189858

"Enable late-vectorization by default. This patch changes the default setting for the LateVectorization flag that controls where the loop-vectorizer is ran."

I guess the actual bug is yet another side-effect exposed by this change?  Nadav, since you authored r189858, I've put you on CC too, do you have any idea?

Comment 7 Dimitry Andric 2014-04-23 01:48:56 PDT

Created attachment 12422 [details]
More general testcase, reproduces with any target CPU

(In reply to comment #3)
...
> It also fixes a very similar-looking bug reported by a user of the FreeBSD
> editors/libreoffice port here:
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=187177

Here is a small testcase reduced from that bug's original, which crashes clang trunk r206915 in the same manner, without even setting any CPU type:

$ clang -cc1 -triple x86_64-unknown-linux -emit-obj -relaxed-aliasing -O2 -vectorize-loops -fno-unroll-loops pr19029-2.cpp
Instruction does not dominate all uses!
  %24 = getelementptr inbounds %struct._rtl_uString* %2, i64 0, i32 2, i64 %13
  %bc7 = bitcast i16* %24 to i8*
Instruction does not dominate all uses!
  %24 = getelementptr inbounds %struct._rtl_uString* %2, i64 0, i32 2, i64 %13
  %bc = bitcast i16* %24 to i8*
fatal error: error in backend: Broken function found, compilation aborted!

Some additional data points:
- Removing -relaxed-aliasing makes the bug disappear.
- Lowering to -O1 makes the bug disappear.
- Removing -vectorize-loops makes the bug disappear.
- And obviously, removing -fno-unroll-loops also makes the bug disappear.

Comment 8 Dimitry Andric 2014-04-23 11:27:19 PDT

(In reply to comment #6)
> By bisecting backwards, I found out this error seems to have been introduced
> here:
> 
> http://llvm.org/viewvc/llvm-project?view=revision&revision=189858
> 
> "Enable late-vectorization by default. This patch changes the default
> setting for the LateVectorization flag that controls where the
> loop-vectorizer is ran."

So when forcing late vectorization on, using -mllvm -late-vectorize=true, I searched backwards again, and now ended up at this previous revision (again by nadav), which seems to introduce the crash:

http://llvm.org/viewvc/llvm-project?view=revision&revision=189539

"This patch moves the SLP-vectorizer and BB-vectorizer back into SCC passes"

I'm not sure if there is any option I can enable for earlier revisions, to partially undo this, so I can figure out where the actual problem originates?

Comment 9 Dimitry Andric 2014-04-23 11:31:40 PDT

For completeness' sake, both testcases can be reproduced by using the following flags:

clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -O2 -vectorize-loops -mllvm -late-vectorize=true

The actual triple does not matter too much, I also tried:
* i386-unknown-freebsd11.0
* i386-unknown-linux
* x86_64-unknown-linux

Comment 10 Nadav Rotem 2014-04-23 11:40:31 PDT

It looks like a bug in the loop-vectorizer. Can you reduce the test case to a bitcode file?

Comment 11 Dimitry Andric 2014-04-23 13:30:56 PDT

Created attachment 12425 [details]
.ll version of first reduced testcase

I can convert the testcases to .ll format, that seems to work fine.  Here is the .ll version of the first one.  When I attempt to run llvm-as (from trunk r206915) on it, I get the same backend error as before:

llvm-as: assembly parsed, but does not verify as correct!
Instruction does not dominate all uses!
  %43 = getelementptr inbounds double* %dlamda, i64 %.sum
  %bound112 = icmp ule double* %43, %scevgep6
Instruction does not dominate all uses!
  %43 = getelementptr inbounds double* %dlamda, i64 %.sum
  %bound011 = icmp ule double* %scevgep, %43
Instruction does not dominate all uses!
  %41 = getelementptr inbounds double* %q, i64 %40
  %bound1 = icmp ule double* %41, %scevgep6
Instruction does not dominate all uses!
  %41 = getelementptr inbounds double* %q, i64 %40
  %bound0 = icmp ule double* %scevgep, %41
Broken module found, compilation terminated.
Broken module found, compilation terminated.

Comment 12 Dimitry Andric 2014-04-23 13:32:27 PDT

Created attachment 12426 [details]
.ll version of second reduced testcase

Similar to the first .ll testcase, this crashes llvm-as:

$ llvm-as pr19029-2.ll
llvm-as: assembly parsed, but does not verify as correct!
Instruction does not dominate all uses!
  %33 = getelementptr inbounds %struct._rtl_uString* %2, i64 0, i32 2, i64 %12
  %bc8 = bitcast i16* %33 to i8*
Instruction does not dominate all uses!
  %33 = getelementptr inbounds %struct._rtl_uString* %2, i64 0, i32 2, i64 %12
  %bc = bitcast i16* %33 to i8*
Broken module found, compilation terminated.
Broken module found, compilation terminated.

Comment 13 Nadav Rotem 2014-04-23 13:59:10 PDT

There is a clang flag for printing the IR before every transformation. I think that the generated LL file that you attached is already invalid. We need to catch it before it becomes invalid.

Comment 14 Dimitry Andric 2014-04-23 14:51:44 PDT

The flag appears to be -mllvm -print-before-all, but most of the 79 intermediate IR files don't seem to be complete, e.g. the very first one prints:

llvm-as: temp01.ll:12:41: error: use of undefined metadata '!0'
  %5 = load double** %3, align 8, !tbaa !0
                                        ^

Others result in errors like:

llvm-as: temp24.ll:3:8: error: expected 'type' after '='
  %5 = load i32* @dlaed3_.c, align 4, !tbaa !0
       ^

The pass numbers that do work without errors are:

08: *** IR Dump Before Interprocedural Sparse Conditional Constant Propagation
09: *** IR Dump Before Dead Argument Elimination
60: *** IR Dump Before Function Integration/Inlining ***printing a <null> value
61: *** IR Dump Before Deduce function attributes ***printing a <null> value
62: *** IR Dump Before A No-Op Barrier Pass

Then pass 75 ('Before Strip Unused Function Prototypes') dies with the 'Instruction does not dominate all uses!' error.  The previous pass is 'Before Simplify the CFG', but the produced IR is apparently not valid.

Comment 15 Nadav Rotem 2014-04-23 15:10:40 PDT

What was the last pass that finished successfully? You can manually place a breakpoint before that pass and dump the module.

Comment 16 Dimitry Andric 2014-04-23 15:43:18 PDT

Created attachment 12427 [details]
Tarball with intermediate .ll files

Here's the collection of each intermediate .ll file. The temp74.ll file is the one that causes the crash, most of the earlier ones don't get accepted by llvm-as.

Comment 17 Hal Finkel 2014-04-23 17:27:54 PDT

(In reply to comment #16)
> Created attachment 12427 [details]
> Tarball with intermediate .ll files
> 
> Here's the collection of each intermediate .ll file. The temp74.ll file is
> the one that causes the crash, most of the earlier ones don't get accepted
> by llvm-as.

How exactly do you run these to reproduce the bad output?

Comment 18 Dimitry Andric 2014-04-24 00:51:38 PDT

(In reply to comment #17)
> (In reply to comment #16)
> > Created attachment 12427 [details]
> > Tarball with intermediate .ll files
> > 
> > Here's the collection of each intermediate .ll file. The temp74.ll file is
> > the one that causes the crash, most of the earlier ones don't get accepted
> > by llvm-as.
> 
> How exactly do you run these to reproduce the bad output?

Just run 'llvm-as' on them; no special flags needed.

Comment 19 Hal Finkel 2014-04-24 01:34:46 PDT

(In reply to comment #18)
> (In reply to comment #17)
> > (In reply to comment #16)
> > > Created attachment 12427 [details]
> > > Tarball with intermediate .ll files
> > > 
> > > Here's the collection of each intermediate .ll file. The temp74.ll file is
> > > the one that causes the crash, most of the earlier ones don't get accepted
> > > by llvm-as.
> > 
> > How exactly do you run these to reproduce the bad output?
> 
> Just run 'llvm-as' on them; no special flags needed.

How did you generate the files? If we're to isolate the bug, we need to be able to run the optimization pass so that it generates the bad output. llvm-as will just verify that the output is invalid once the bug has already been triggered.

Comment 20 Dimitry Andric 2014-04-24 01:55:41 PDT

(In reply to comment #19)
> How did you generate the files? If we're to isolate the bug, we need to be
> able to run the optimization pass so that it generates the bad output.

I couldn't get bugpoint to work (it tries to run /usr/bin/gcc, which does not exist on my system... :), so I used -mllvm -print-before-all as a clang option, e.g.:

clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -O2 -vectorize-loops -mllvm -late-vectorize=true -mllvm -print-before-all pr19029-1.c 2> irdumps.txt

This logs all the IR into irdumps.txt.  I use the following python fragment to split out the dumps in separate files:

#!/usr/bin/env python
irfile = open('irdumps.txt', 'r')
counter = 0
outfile = None
for line in irfile:
    if line.startswith('*** IR Dump'):
        counter += 1
        if outfile:
            outfile.close()
        print 'Opening output file %d...' % counter
        outfile = open('temp%02d.ll' % counter, 'w')
        outfile.write('; %s' % line)
    elif outfile:
        outfile.write(line)
if outfile:
    outfile.close()

Unfortunately, not each pass logs the full IR, for some reason, so not each individual dump is useful at this time.  Nadav suggested instead to run clang in gdb and set a breakpoint on the pass manager, but I'm not sure how to dump the current IR as a file from gdb...

Comment 21 Dimitry Andric 2014-04-25 18:29:30 PDT

Created attachment 12445 [details]
IR of pr19029-1 just before FPPassManager's Loop Vectorization pass

It turns out the IR becomes invalid after FPPassManager's Loop Vectorization pass.  I will attach the .ll from before and after.

Comment 22 Dimitry Andric 2014-04-25 18:30:18 PDT

Created attachment 12446 [details]
IR of pr19029-1 after FPPassManager's Loop Vectorization pass

Comment 23 Dimitry Andric 2014-04-25 18:33:02 PDT

Note that LoopVectorize::runOnFunction() calls processLoop() only once.  Before the call, the module is still OK, after the call it is broken.

Comment 24 Dimitry Andric 2014-04-26 10:03:11 PDT

Some more investigation shows that LoopVectorize::processLoop() calls InnerLoopVectorizer::vectorize().  This first calls  InnerLoopVectorizer::createEmptyLoop(), after which the IR is already bad.  This is not the case before the createEmptyLoop() call.

I'm not sure if the IR is supposed to be consistent throughout the InnerLoopVectorizer implementation, however...

Comment 25 Dimitry Andric 2014-04-29 12:03:26 PDT

Nadav, do you need any other .ll output?  I think attachment 12445 [details] is the last stage before the LoopVectorizer does something bad to the IR.

Comment 26 Dimitry Andric 2014-05-11 16:21:29 PDT

Ping :)

Comment 27 Dimitry Andric 2014-06-03 15:32:01 PDT

Ping 2 :)

Comment 28 Dimitry Andric 2018-01-28 08:14:27 PST

Turns out this finally got fixed in https://reviews.llvm.org/rL229419 ("Run LICM as part of the cleanup phase from the scalar optimizer") by James Molloy.