15840 – gstreamer causes assert "Wrong topological sorting", function InitDAGTopologicalSorting

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 15840 - gstreamer causes assert "Wrong topological sorting", function InitDAGTopologicalSorting

Summary: gstreamer causes assert "Wrong topological sorting", function InitDAGTopologi...

Status:	RESOLVED FIXED

Alias:	None

Product:	new-bugs
Classification:	Unclassified
Component:	new bugs (show other bugs)
Version:	trunk
Hardware:	PC FreeBSD

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-04-24 17:06 PDT by Dimitry Andric
Modified:	2013-09-22 16:52 PDT (History)
CC List:	8 users (show)

See Also:
Fixed By Commit(s):

Attachments
Testcase for InitDAGTopologicalSorting assertion failure (802 bytes, application/octet-stream) 2013-04-24 17:06 PDT, Dimitry Andric	Details
fix for assertion failure (1.82 KB, patch) 2013-08-09 16:55 PDT, Tijl Coosemans	Details
proposed fix (4.69 KB, patch) 2013-08-12 09:17 PDT, Tijl Coosemans	Details
Proposed fix, updated for ToT (4.71 KB, patch) 2013-08-12 16:20 PDT, Dimitry Andric	Details
Proposed fix, updated for ToT, with testcase (5.88 KB, patch) 2013-08-13 07:12 PDT, Dimitry Andric	Details
Show Obsolete (2) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dimitry Andric 2013-04-24 17:06:36 PDT

Created attachment 10423 [details]
Testcase for InitDAGTopologicalSorting assertion failure

Reduced testcase attached.  Compiling with "-target i386-unknown-freebsd10 -O2 -c gstatomicqueue-testcase.c" results in:

Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->getSUnit()->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /share/dim/src/llvm/trunk/lib/CodeGen/ScheduleDAG.cpp, line 511.
Stack dump:
0.      Program arguments: /home/dim/obj/llvm-180108M-trunk-freebsd10-i386-aconf-rel-1/Release+Asserts/bin/clang -cc1 -triple i386-unknown-freebsd10 -emit-obj -disable-free -main-file-name gstatomicqueue-testcase.c -mrelocation-model static -mdisable-fp-elim -masm-verbose -mconstructor-aliases -target-cpu i486 -target-linker-version 2.23.1 -coverage-file /share/dim/src/llvm/bugs/gstreamer/gstatomicqueue-testcase.o -resource-dir /home/dim/obj/llvm-180108M-trunk-freebsd10-i386-aconf-rel-1/Release+Asserts/bin/../lib/clang/3.3 -O2 -fdebug-compilation-dir /share/dim/src/llvm/bugs/gstreamer -ferror-limit 19 -fmessage-length 278 -mstackrealign -fobjc-runtime=gnustep -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops -o gstatomicqueue-testcase.o -x c gstatomicqueue-testcase.c
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'gstatomicqueue-testcase.c'.
4.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@gst_atomic_queue_push'
clang: error: unable to execute command: Abort trap (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 3.3 (trunk 180108)
Target: i386-unknown-freebsd10
Thread model: posix

Comment 1 Benjamin Kramer 2013-05-03 14:13:38 PDT

Reduced test case, compile with llc -march=x86 -mcpu=i486 < t.ll:

define void @gst_atomic_queue_push() {
entry:
  br label %while.body

while.body:
  %0 = load volatile i32* undef, align 4
  fence seq_cst
  %1 = load volatile i32* undef, align 4
  %cmp = icmp sgt i32 %1, %0
  br i1 %cmp, label %while.body, label %if.then

if.then:
  ret void
}

Comment 2 Tijl Coosemans 2013-08-09 16:55:40 PDT

Created attachment 11013 [details]
fix for assertion failure

The attached patch changes the WorkList in InitDAGTopologicalSorting into a FIFO queue. Items that are added earlier to the list need to get a higher index than later items because they may be successors of the later items. This fixes the assertion failure, but does not fix the bug yet. The bug does not appear with -msse2. In that case the mfence instruction is used. So something seems to go wrong with the -mno-sse2 fence (lock orl $0,(%esp)).

Comment 3 Tijl Coosemans 2013-08-12 09:17:41 PDT

Created attachment 11024 [details]
proposed fix

This patch fixes the bug for me. It creates a new X86ISD::FENCE node and lowers ISD::ATOMIC_FENCE to it instead of to X86::OR32mrLocked. Then during instruction selection X86ISD::FENCE is replaced by X86::LOCK_OR32mi8.

Comment 4 Tijl Coosemans 2013-08-12 09:20:44 PDT

The patch to fix the assertion failure turns not to be necessary but it's still a valid patch I think.

Comment 5 Dimitry Andric 2013-08-12 16:20:04 PDT

Created attachment 11025 [details]
Proposed fix, updated for ToT

As-is, the proposed fix doesn't compile with top-of-trunk:

llvm[3]: Compiling X86ISelDAGToDAG.cpp for Release+Asserts build
lib/Target/X86/X86ISelDAGToDAG.cpp:1867:18: error: no matching member function for call to 'getMachineNode'
  return CurDAG->getMachineNode(X86::LOCK_OR32mi8, dl, MVT::Other, Ops);
         ~~~~~~~~^~~~~~~~~~~~~~
include/llvm/CodeGen/SelectionDAG.h:824:18: note: candidate function not viable: no known conversion from 'llvm::DebugLoc' to 'llvm::SDLoc' for 2nd argument
  MachineSDNode *getMachineNode(unsigned Opcode, SDLoc dl, EVT VT,
                 ^

and so on for all the possible overloads.  Apparently llvm:DebugLoc has been replaced with llvm::SDLoc after the 3.3 release.

I have updated the patch to compile with top-of-trunk.  Now the only thing missing is a good testcase...

And I agree, your earlier patch replacing vector with queue should also go in, as a separate commit of course.

Comment 6 Dimitry Andric 2013-08-12 17:03:59 PDT

For some reason, this fix also changes the behavior of the codegen a little, since it makes test/CodeGen/X86/2012-01-16-mfence-nosse-flags.ll fail.  This is because instead of the expected "cmpl $0, %eax", it now generates "testl %eax, %eax".  I have no idea why yet.  If it is correct, that other test must also be updated.

Comment 7 Tijl Coosemans 2013-08-13 05:47:33 PDT

(In reply to comment #6)
> For some reason, this fix also changes the behavior of the codegen a little,
> since it makes test/CodeGen/X86/2012-01-16-mfence-nosse-flags.ll fail.  This
> is because instead of the expected "cmpl $0, %eax", it now generates "testl
> %eax, %eax".  I have no idea why yet.  If it is correct, that other test
> must also be updated.

I'm not sufficiently familiar with llvm internals to know if this is a correct explanation but I suspect the reason is that the fence used to be "xorl r,r \ lock orl r,(%esp)" whereas now it is "lock orl $O,(%esp)" (in the old code there was getConstant(0) and now it's getTargetConstant(0)).

In the test case the comparison comes before the fence which means initially llvm expected to be able to reuse the zero for the fence in the comparison (cmpl r,%eax) and this isn't optimised into a testl. Later the comparison is rescheduled and llvm doesn't know r isn't clobbered so it becomes cmpl $0,%eax. With the patch the fence doesn't require a zero so initially there's cmpl $0,%eax which is optimised into a testl that is then rescheduled.

First I tried to keep the getConstant(0), as an operand to X86ISD::FENCE and then instruction selection would choose between LOCK_OR32mi8 and LOCK_OR32mr based on whether the zero was used only once or not. This basically worked but I couldn't figure out what to put in the .td file for X86ISD::FENCE with an operand and I couldn't get it to optimise {__sync_synchronize();return 0;} which had two "xorl %eax,%eax" (llvm doesn't seem to know the src operand in LOCK_OR isn't clobbered).

As for a test case, wouldn't Benjamin Kramer's test above be good enough?

Comment 8 Dimitry Andric 2013-08-13 07:12:34 PDT

Created attachment 11028 [details]
Proposed fix, updated for ToT, with testcase

Updated the fix with a testcase specific for this PR, taken from comment 1.

I also had to update the testcase for bug 11768, which now generates more efficient code, e.g. before the fix it generated:

        movl    ptr, %eax
        xorl    %ecx, %ecx
        lock
        orl     %ecx, (%esp)
        cmpl    $0, %eax
        je      .LBB0_1

after the fix it generates:

        movl    ptr, %eax
        lock
        orl     $0, (%esp)
        testl   %eax, %eax
        je      .LBB0_1

Comment 9 Tim Northover 2013-09-06 11:02:16 PDT

I've done some digging on exactly why it's failing, so I'll record my discoveries in case I forget over the weekend. The issue is that SelectionDAG comes across:

    A = Node(load volatile)
    MachineNode(ORfence)
    B Node(load volatile)
    Node(sub A, B)

in chain order. When deciding whether it can merge the sub and the "A load", as is x86's wont, it tries to check if combining those chains would create a cycle.

Unfortunately it stops looking when it reaches a MachineNode, thinking that everything beyond that point must be selected and hence OK (SelectionDAG.cpp:1863). That is an incorrect assumption in this case and a loopy SUB32rm is emitted.

So the patch works by making sure there isn't a pre-selected MachineNode to find during this check. It would be nice to assume that MachineNodes only exist after selection, but if so, we've got quite a bit of refactoring in multiple backends to get there.

I'll see if I can work out what the performance implications of doing a more thorough search here are.

Comment 10 Dimitry Andric 2013-09-22 16:52:00 PDT

Fixed in r191165. Thanks Tim!