Created attachment 13534 [details] Test case, minimized from Inkscape's geom.cpp I received a report from the maintainer of the FreeBSD inkscape port, about clang giving a lot of "Bad machine code" messages when compiling a certain source file in this program: http://beefy1.isc.freebsd.org/data/head-i386-default/2014-12-12_10h09m09s/logs/inkscape-0.48.5_1.log The errors are as follows: *** Bad machine code: MBB exits via conditional branch/fall-through but the CFG successors don't match the actual successors! *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#6 if.end3 (0x2d17a3f0) *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#18 (null) (0x2d29287c) *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#19 (null) (0x2d2928d8) *** Bad machine code: MBB exits via conditional branch/fall-through but the CFG successors don't match the actual successors! *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#9 if.end.i34 (0x2d1d3e38) *** Bad machine code: Using an undefined physical register *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#10 if.else.i43 (0x2d17a4a8) - instruction: FNSTSW16r %AX<imp-def>, %FPSW<imp-use,kill> - operand 1: %FPSW<imp-use,kill> This crash occurs both with clang 3.4.1 (current in FreeBSD 9.x through 11.x), and 3.5.0 (which is coming up in FreeBSD 11.x). When I tried with clang trunk r223525, it spun at 100% CPU, until I killed it after 10 minutes or so (but it didn't use a lot of memory). I have reduced the test case to the attached file, which can be compiled with: clang -cc1 -triple i386-unknown-linux -S -mrelocation-model static -target-cpu i486 -O2 inkscape-geom-reduced.cpp resulting in: *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#9 (null) (0x2add140c) *** Bad machine code: MBB exits via conditional branch/fall-through but the CFG successors don't match the actual successors! *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#2 if.end.i (0x2add7488) *** Bad machine code: Using an undefined physical register *** - function: _ZN4Geom23line_segment_intersectpERKNS_5PointES2_S2_S2_ - basic block: BB#3 if.end10.i (0x2adb7334) - instruction: FNSTSW16r %AX<imp-def>, %FPSW<imp-use> - operand 1: %FPSW<imp-use> fatal error: error in backend: Found 3 machine code errors. These errors show for -target-cpu values of i486, i586 and i686, but as far as I can determine, they disappear when using pentium2 or 'higher'.
compiles fine for me on clang version 3.6.0 (trunk 224110) (llvm/trunk 224168) Try updating and reopen if you can still reproduce.
FWIW, this bug affects clang 3.4, 3.4.1, 3.4.2 and 3.5.0. After bisection, it seems to be fixed by r222008, but that is very hard to apply to older releases. So I guess release users are SOL for now...
I got very similar crash report for the fontforge port [1], and this results in a crash, even with trunk r224516. Note that this only happens with assertions enabled! The resulting message is very similar, after BB dumps, it gives: *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: SplineIsLinear - basic block: BB#152 land.end.i573 (0x2b5254f8) *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: SplineIsLinear - basic block: BB#156 land.end11.i579 (0x2b525644) *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: SplineIsLinear - basic block: BB#159 if.then17.i588 (0x2b525790) *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: SplineIsLinear - basic block: BB#160 if.else19.i593 (0x2b5257ec) *** Bad machine code: MBB exits via unconditional fall-through but its successor differs from its CFG successor! *** - function: SplineIsLinear - basic block: BB#162 if.then27.i600 (0x2b5258a4) *** Bad machine code: Using an undefined physical register *** - function: SplineIsLinear - basic block: BB#163 if.else32.i605 (0x2b525900) - instruction: FNSTSW16r %AX<imp-def>, %FPSW<imp-use,kill> - operand 1: %FPSW<imp-use,kill> fatal error: error in backend: Found 6 machine code errors. Specifically, the "FNSTSW16r %AX<imp-def>, %FPSW<imp-use,kill>" is precisely the same as in the Inkscape case. I think this must be some sort of bug in the (x87) floating point optimization. I will add a reduced test case after this comment. [1] http://pb2.nyi.freebsd.org/data/head-i386-PR195480-default/2014-12-15_22h55m50s/logs/errors/fontforge-20120731.b_7.log
Created attachment 13571 [details] Test case, minimized from FontForge's splineutil2.c This is a test case, reduced from FontForge's splineutil2.c. Compile this with trunk (with asserts enabled), as follows: clang -cc1 -triple i386-unknown-linux -c -O2 fontforge-splineutil2-reduced.c Any other i386 triple is also fine, e.g. 'i386-unknown-freebsd', or even just 'i386'. Also, in contrast to the previous test case, this does not even need -target-cpu i486.
This still crashes with trunk r225622, and clang 3.6.0.
reduced ir: target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple = "i386-unknown-linux" declare void @f() define i1 @SplineIsLinear(double %a, float %e) { %tobool = fcmp une float undef, 0.000000e+00 br i1 %tobool, label %if.then, label %if.end if.then: ; preds = %0 call void @f() br label %if.end if.end: ; preds = %if.then, %0 %sub = fsub float -0.000000e+00, %e %conv = fpext float %sub to double %mul.i = fmul double %a, 0.000000e+00 %tobool.i = fcmp une double %mul.i, 0.000000e+00 br i1 %tobool.i, label %Within16RoundingErrors.exit, label %if.end.i if.end.i: ; preds = %if.end br i1 %tobool, label %if.then2.i, label %if.end4.i if.then2.i: ; preds = %if.end.i %conv3.i = fptosi float %e to i32 br label %Within16RoundingErrors.exit if.end4.i: ; preds = %if.end.i %sub.i = fsub double 0.000000e+00, %a %cmp5.i = fcmp ogt double %sub.i, 1.000000e+00 %conv6.i = zext i1 %cmp5.i to i32 br label %Within16RoundingErrors.exit Within16RoundingErrors.exit: ; preds = %if.end4.i, %if.then2.i, %if.end %retval.0.i = phi i32 [ %conv3.i, %if.then2.i ], [ %conv6.i, %if.end4.i ], [ undef, %if.end ] %tobool2 = icmp eq i32 %retval.0.i, 0 br i1 %tobool2, label %land.end, label %land.rhs land.rhs: ; preds = %Within16RoundingErrors.exit br i1 %tobool.i, label %if.then.i12, label %if.end.i14 if.then.i12: ; preds = %land.rhs %cmp.i10 = fcmp olt double %conv, 1.000000e-05 br label %land.end if.end.i14: ; preds = %land.rhs %tobool1.i13 = fcmp une float %e, -0.000000e+00 br i1 %tobool1.i13, label %land.end, label %if.end4.i20 if.end4.i20: ; preds = %if.end.i14 %sub.i17 = fsub double 0.000000e+00, %a %cmp5.i18 = fcmp ogt double %sub.i17, 1.000000e+00 br label %land.end land.end: ; preds = %if.end4.i20, %if.end.i14, %if.then.i12, %Within16RoundingErrors.exit %phi = phi i1 [ undef, %Within16RoundingErrors.exit ], [ %cmp.i10, %if.then.i12 ], [ %cmp5.i18, %if.end4.i20 ], [ undef, %if.end.i14 ] ret i1 %phi }
reduced further: target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple = "i386-unknown-linux" declare void @f() define i1 @SplineIsLinear(double %a, float %e) { %tobool = fcmp une float undef, 0.000000e+00 br i1 %tobool, label %if.then, label %if.end if.then: ; preds = %0 call void @f() br label %if.end if.end: ; preds = %if.then, %0 %sub = fsub float -0.000000e+00, %e %mul.i = fmul double %a, 0.000000e+00 %tobool.i = fcmp une double %mul.i, 0.000000e+00 br i1 %tobool.i, label %Within16RoundingErrors.exit, label %if.end.i if.end.i: ; preds = %if.end br i1 %tobool, label %if.then2.i, label %if.end4.i if.then2.i: ; preds = %if.end.i %1 = trunc i32 undef to i1 br label %Within16RoundingErrors.exit if.end4.i: ; preds = %if.end.i %sub.i = fsub double 0.000000e+00, %a %cmp5.i = fcmp ogt double %sub.i, 1.000000e+00 br label %Within16RoundingErrors.exit Within16RoundingErrors.exit: ; preds = %if.end4.i, %if.then2.i, %if.end %retval.0.ii = phi i1 [ undef, %if.then2.i ], [ %cmp5.i, %if.end4.i ], [ undef, %if.end ] %cmp.i10 = fcmp olt float %sub, 1.000000e+00 %tobool1.i13 = fcmp une float %e, -0.000000e+00 %brmerge1 = select i1 %tobool.i, i1 %retval.0.ii, i1 %tobool1.i13 %cmp5.i18 = fcmp ogt double %a, 1.000000e+00 %cmp.i10.cmp5.i18 = select i1 %brmerge1, i1 %cmp.i10, i1 %cmp5.i18 ret i1 %cmp.i10.cmp5.i18 }
Just for reference, still crashing with clang 3.7.0 release. :-(
Still crashes with trunk r252206 (as of 2015-11-07).
Created attachment 15262 [details] Don't consider unallocatable registers dead The error happens when BranchFolding's SplitMBBAt eventually calls RegisterScavanger's forward(). RegisterScavenger seems to duplicate some of MachineVerify's dead register checking. By not considering unallocatable registers (like FPSW) to be "dead", the IR successfully lowered. Is this correct? I have no idea!
(In reply to comment #10) > Created attachment 15262 [details] > Don't consider unallocatable registers dead > > The error happens when BranchFolding's SplitMBBAt eventually calls > RegisterScavanger's forward(). RegisterScavenger seems to duplicate some of > MachineVerify's dead register checking. By not considering unallocatable > registers (like FPSW) to be "dead", the IR successfully lowered. Is this > correct? I have no idea! Send a patch to llvm-commits and CC somebody like Matthias Braun.
(In reply to comment #10) > Created attachment 15262 [details] > Don't consider unallocatable registers dead > > The error happens when BranchFolding's SplitMBBAt eventually calls > RegisterScavanger's forward(). RegisterScavenger seems to duplicate some of > MachineVerify's dead register checking. By not considering unallocatable > registers (like FPSW) to be "dead", the IR successfully lowered. Is this > correct? I have no idea! FWIW, this patch solves the crash, in any case.
The patch looks bogus to me: !Allocatable registers have different semsntic from reserved registers. !Allocatable registers are tracked perfectly fine by the register scavenger and should not need the change in this patch. Conversely !allocatable registers must have correct def/uses/live in annotations we should not disable machine verification for them. I looked into what is happening here, it looks like we have some invalid defs/uses of the FPSW register after register allocation. You can check this with -verify-machineinstr and see the error being present after greedy register allocation (unfortunately -verify-machineinstr is not enabled by default even in debug builds because of compiletime concerns). The bad MBB message then stem from the fact that RegisterScavenger::forward() detects the mismatched def/uses as well and upon discovering this problem immediately calls the machine verifier while the branch folder was in the process of changing basic blocks, hence all the MBB exits messages.
As for fixing this properly: The problem appears to be that a reload of an x87 value is inserted at a point in the program where the FPSW register is live (it lives between an UCOM and the corresponding FNSTSW to get the comparison result into a GPR register). Unfortunately the reload instruction looks like this: %vreg78<def> = LD_Fp032 %FPSW<imp-def,dead> which clobbers the FPSW value. The register allocator is not prepared for spill and reload instructions clobbering unrelated registers AFAIK. From reading the intel manual it seems that fld does indeed clobber the C0,C2,C3 bits set by fucom. The only proper fix I see right now, would be to introduce a pseudo instruction for the UCOM+FNSTSW pair which expanded after register allocation so no reload can be placed between the two instructions.
Some kind of pseudo is indeed the way to go. The register allocator does not expect additional definitions (either of virtual or physical registers) to sneak in while performing reload of store.
Attempting to revive this bug again. It still occurs with trunk r258168. I now also encountered it while building gmsh (see http://gmsh.info/ ), where clang dies on its periodic.cpp file, with a similar error as described earlier in this bug. Reduced C++ test case: class voroMetal3D { void correspondance(double, double, double, double, int, bool &, double, double, double); bool equal(double, double, double); }; int a; void voroMetal3D::correspondance(double p1, double p2, double p3, double p4, int p5, bool &p6, double p7, double p8, double p9) { if (p5 == 0) p6 = 0; if (p5 == 1 && equal(p1, 0, p4) && equal(p2, p8, p4) && equal(p3, p9, p4)) p6 = 1; if (p5 == 6) p6 = 0; if (p5 == 7 && equal(p1, p7, p4) && equal(p2, p8, p4) && equal(p3, p9, p4)) p6 = 1; } bool voroMetal3D::equal(double p1, double p2, double p3) { if (p3 && p1 < p2) a = 1; else a = 0; return a; } Compiling this with "clang -cc1 -triple i386 -emit-obj -O2 testcase.cpp" will result in
Attempting to revive this bug again. It still occurs with trunk r258168. I now also encountered it while building gmsh (see http://gmsh.info/ ), where clang dies on its periodic.cpp file, with a similar error as described earlier in this bug. Reduced C++ test case: class voroMetal3D { void correspondance(double, double, double, double, int, bool &, double, double, double); bool equal(double, double, double); }; int a; void voroMetal3D::correspondance(double p1, double p2, double p3, double p4, int p5, bool &p6, double p7, double p8, double p9) { if (p5 == 0) p6 = 0; if (p5 == 1 && equal(p1, 0, p4) && equal(p2, p8, p4) && equal(p3, p9, p4)) p6 = 1; if (p5 == 6) p6 = 0; if (p5 == 7 && equal(p1, p7, p4) && equal(p2, p8, p4) && equal(p3, p9, p4)) p6 = 1; } bool voroMetal3D::equal(double p1, double p2, double p3) { if (p3 && p1 < p2) a = 1; else a = 0; return a; } Compiling this with "clang -cc1 -triple i386 -emit-obj -O2 testcase.cpp" will result in a machine code dump, followed by: *** Bad machine code: MBB exits via conditional branch/fall-through but the CFG successors don't match the actual successors! *** - function: _ZN11voroMetal3D14correspondanceEddddiRbddd - basic block: BB#2 land.lhs.true (0x2be19358) *** Bad machine code: Using an undefined physical register *** - function: _ZN11voroMetal3D14correspondanceEddddiRbddd - basic block: BB#5 land.lhs.true13 (0x2be19524) - instruction: FNSTSW16r - operand 1: %FPSW<imp-use,kill> *** Bad machine code: Using an undefined physical register *** - function: _ZN11voroMetal3D14correspondanceEddddiRbddd - basic block: BB#11 land.lhs.true13 (0x2be193b4) - instruction: JNE_1 - operand 1: %EFLAGS<imp-use> fatal error: error in backend: Found 3 machine code errors.
Can somebody please reduce the test case in comment 17 to .ll form again? I never understand how to use bugpoint to do this. :-)
(In reply to comment #18) > Can somebody please reduce the test case in comment 17 to .ll form again? I > never understand how to use bugpoint to do this. :-) This is how: PATH="$HOME/llvm/Debug+Asserts/bin:$PATH" ~/llvm/Debug+Asserts/bin/bugpoint PR21903.ll -llc-safe After some manual reduction, I am left with: target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple = "i686--linux-gnu" define void @f(double %p1, double %p2, double %p3, double %p4, i32 %p5, i8* %p6, double %p7, double %p8, double %p9) align 2 { entry: %cmp.i54 = fcmp olt double %p2, %p8 %tobool.i45 = fcmp une double %p4, 0.000000e+00 switch i32 %p5, label %if.end20 [ i32 0, label %if.then10 i32 1, label %land.lhs.true i32 2, label %if.then10 i32 4, label %land.lhs.true13 ] land.lhs.true: ; preds = %entry br i1 undef, label %land.lhs.true3, label %if.end20 land.lhs.true3: ; preds = %land.lhs.true15, %land.lhs.true br i1 %cmp.i54, label %land.lhs.true5, label %if.end20 land.lhs.true5: ; preds = %land.lhs.true3 %cmp.i50 = fcmp olt double %p3, %p9 br i1 %cmp.i50, label %if.then10, label %if.end20 if.then10: ; preds = %land.lhs.true5, %entry, %entry store i8 1, i8* %p6, align 1 br label %if.end20 land.lhs.true13: ; preds = %entry %cmp.i46 = fcmp olt double %p1, %p7 %or.cond.i47 = and i1 %tobool.i45, %cmp.i46 %storemerge.i48 = zext i1 %or.cond.i47 to i32 br i1 %or.cond.i47, label %land.lhs.true15, label %if.end20 land.lhs.true15: ; preds = %land.lhs.true13 %storemerge.i44 = zext i1 %cmp.i54 to i32 br label %land.lhs.true3 if.end20: ; preds = %land.lhs.true13, %if.then10, %land.lhs.true5, %land.lhs.true3, %land.lhs.true, %entry ret void }
We know what the bug is, see my previous comment. However at least for me this is low priority because: It only affects machines before i686 (it's easy to work around this with -march=i686 or similar) and the fix will probably take some time implementing and testing properly. If someone else wants to take a stab at fixing it?
Also the error symptom here is somewhat generic, it mainly means that there is some error in the internal representation of the function, it's not necessarily the same bug. The 2nd case you posted appears to be happening for i686 as well, so there's a high chance this is a different underlying bug which should be put into an own ticket.
Interestingly, r192635 ("Fix the ExecutionDepsFix pass to handle AVX instructions") appears to introduce the failure for the fontforge-splineutil2-reduced.c test case. E.g r192634 compiles it without complaints, r192635 results in the bad machine code dump. I'm unsure whether it really 'causes' it, or if it is just a side effect, or even a red herring. :)
r275201 (BranchFolding: Use LivePhysReg to update live in lists) fixes this.
I didn't see any work addressing the underlying problem (x87 reloads cannot be placed everywhere because they (possibly) clobber FPSW status flags). So this not failing anymore is probably luck...
And indeed, the fontforge-splineutil2-reduced.c testcase still fails with trunk r278432: ~/obj/llvm-trunk-278432/bin/clang -cc1 -triple i386 -S -target-cpu i686 -O2 -mllvm -verify-machineinstrs=1 fontforge-splineutil2-reduced.c [...lots of BB dumping...] *** Bad machine code: Using an undefined physical register *** - function: SplineIsLinear - basic block: BB#6 if.end4.i (0x2c2114cc) [1248B;1488B) - instruction: 1312B FNSTSW16r - operand 1: %FPSW<imp-use,kill> fatal error: error in backend: Found 1 machine code errors. So this problem is definitely not fixed.
This bug was finally fixed (after a few years! :) with https://reviews.llvm.org/rL353536 ("[X86] Remove isReMaterializable from X87 floating point constant loads and constant pool loads").