Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gstreamer causes assert "Wrong topological sorting", function InitDAGTopologicalSorting #16212

Closed
DimitryAndric opened this issue Apr 25, 2013 · 11 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@DimitryAndric
Copy link
Collaborator

Bugzilla Link 15840
Resolution FIXED
Resolved on Sep 22, 2013 16:52
Version trunk
OS FreeBSD
Attachments Testcase for InitDAGTopologicalSorting assertion failure
CC @d0k,@rakuco,@TNorthover

Extended Description

Reduced testcase attached. Compiling with "-target i386-unknown-freebsd10 -O2 -c gstatomicqueue-testcase.c" results in:

Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->getSUnit()->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /share/dim/src/llvm/trunk/lib/CodeGen/ScheduleDAG.cpp, line 511.
Stack dump:
0. Program arguments: /home/dim/obj/llvm-180108M-trunk-freebsd10-i386-aconf-rel-1/Release+Asserts/bin/clang -cc1 -triple i386-unknown-freebsd10 -emit-obj -disable-free -main-file-name gstatomicqueue-testcase.c -mrelocation-model static -mdisable-fp-elim -masm-verbose -mconstructor-aliases -target-cpu i486 -target-linker-version 2.23.1 -coverage-file /share/dim/src/llvm/bugs/gstreamer/gstatomicqueue-testcase.o -resource-dir /home/dim/obj/llvm-180108M-trunk-freebsd10-i386-aconf-rel-1/Release+Asserts/bin/../lib/clang/3.3 -O2 -fdebug-compilation-dir /share/dim/src/llvm/bugs/gstreamer -ferror-limit 19 -fmessage-length 278 -mstackrealign -fobjc-runtime=gnustep -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops -o gstatomicqueue-testcase.o -x c gstatomicqueue-testcase.c

  1.  <eof> parser at end of file
    
  2.  Code generation
    
  3.  Running pass 'Function Pass Manager' on module 'gstatomicqueue-testcase.c'.
    
  4.  Running pass 'X86 DAG->DAG Instruction Selection' on function '@gst_atomic_queue_push'
    

clang: error: unable to execute command: Abort trap (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 3.3 (trunk 180108)
Target: i386-unknown-freebsd10
Thread model: posix

@d0k
Copy link
Member

d0k commented May 3, 2013

Reduced test case, compile with llc -march=x86 -mcpu=i486 < t.ll:

define void @​gst_atomic_queue_push() {
entry:
br label %while.body

while.body:
%0 = load volatile i32* undef, align 4
fence seq_cst
%1 = load volatile i32* undef, align 4
%cmp = icmp sgt i32 %1, %0
br i1 %cmp, label %while.body, label %if.then

if.then:
ret void
}

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 9, 2013

fix for assertion failure
The attached patch changes the WorkList in InitDAGTopologicalSorting into a FIFO queue. Items that are added earlier to the list need to get a higher index than later items because they may be successors of the later items. This fixes the assertion failure, but does not fix the bug yet. The bug does not appear with -msse2. In that case the mfence instruction is used. So something seems to go wrong with the -mno-sse2 fence (lock orl $0,(%esp)).

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 12, 2013

proposed fix
This patch fixes the bug for me. It creates a new X86ISD::FENCE node and lowers ISD::ATOMIC_FENCE to it instead of to X86::OR32mrLocked. Then during instruction selection X86ISD::FENCE is replaced by X86::LOCK_OR32mi8.

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 12, 2013

The patch to fix the assertion failure turns not to be necessary but it's still a valid patch I think.

@DimitryAndric
Copy link
Collaborator Author

Proposed fix, updated for ToT
As-is, the proposed fix doesn't compile with top-of-trunk:

llvm[3]: Compiling X86ISelDAGToDAG.cpp for Release+Asserts build
lib/Target/X86/X86ISelDAGToDAG.cpp:1867:18: error: no matching member function for call to 'getMachineNode'
return CurDAG->getMachineNode(X86::LOCK_OR32mi8, dl, MVT::Other, Ops);
~~~~~~~~^~~~~~~~~~~~~~
include/llvm/CodeGen/SelectionDAG.h:824:18: note: candidate function not viable: no known conversion from 'llvm::DebugLoc' to 'llvm::SDLoc' for 2nd argument
MachineSDNode *getMachineNode(unsigned Opcode, SDLoc dl, EVT VT,
^

and so on for all the possible overloads. Apparently llvm:DebugLoc has been replaced with llvm::SDLoc after the 3.3 release.

I have updated the patch to compile with top-of-trunk. Now the only thing missing is a good testcase...

And I agree, your earlier patch replacing vector with queue should also go in, as a separate commit of course.

@DimitryAndric
Copy link
Collaborator Author

For some reason, this fix also changes the behavior of the codegen a little, since it makes test/CodeGen/X86/2012-01-16-mfence-nosse-flags.ll fail. This is because instead of the expected "cmpl $0, %eax", it now generates "testl %eax, %eax". I have no idea why yet. If it is correct, that other test must also be updated.

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 13, 2013

For some reason, this fix also changes the behavior of the codegen a little,
since it makes test/CodeGen/X86/2012-01-16-mfence-nosse-flags.ll fail. This
is because instead of the expected "cmpl $0, %eax", it now generates "testl
%eax, %eax". I have no idea why yet. If it is correct, that other test
must also be updated.

I'm not sufficiently familiar with llvm internals to know if this is a correct explanation but I suspect the reason is that the fence used to be "xorl r,r \ lock orl r,(%esp)" whereas now it is "lock orl $O,(%esp)" (in the old code there was getConstant(0) and now it's getTargetConstant(0)).

In the test case the comparison comes before the fence which means initially llvm expected to be able to reuse the zero for the fence in the comparison (cmpl r,%eax) and this isn't optimised into a testl. Later the comparison is rescheduled and llvm doesn't know r isn't clobbered so it becomes cmpl $0,%eax. With the patch the fence doesn't require a zero so initially there's cmpl $0,%eax which is optimised into a testl that is then rescheduled.

First I tried to keep the getConstant(0), as an operand to X86ISD::FENCE and then instruction selection would choose between LOCK_OR32mi8 and LOCK_OR32mr based on whether the zero was used only once or not. This basically worked but I couldn't figure out what to put in the .td file for X86ISD::FENCE with an operand and I couldn't get it to optimise {__sync_synchronize();return 0;} which had two "xorl %eax,%eax" (llvm doesn't seem to know the src operand in LOCK_OR isn't clobbered).

As for a test case, wouldn't Benjamin Kramer's test above be good enough?

@DimitryAndric
Copy link
Collaborator Author

Proposed fix, updated for ToT, with testcase
Updated the fix with a testcase specific for this PR, taken from comment 1.

I also had to update the testcase for bug 11768, which now generates more efficient code, e.g. before the fix it generated:

    movl    ptr, %eax
    xorl    %ecx, %ecx
    lock
    orl     %ecx, (%esp)
    cmpl    $0, %eax
    je      .LBB0_1

after the fix it generates:

    movl    ptr, %eax
    lock
    orl     $0, (%esp)
    testl   %eax, %eax
    je      .LBB0_1

@TNorthover
Copy link
Contributor

I've done some digging on exactly why it's failing, so I'll record my discoveries in case I forget over the weekend. The issue is that SelectionDAG comes across:

A = Node(load volatile)
MachineNode(ORfence)
B Node(load volatile)
Node(sub A, B)

in chain order. When deciding whether it can merge the sub and the "A load", as is x86's wont, it tries to check if combining those chains would create a cycle.

Unfortunately it stops looking when it reaches a MachineNode, thinking that everything beyond that point must be selected and hence OK (SelectionDAG.cpp:1863). That is an incorrect assumption in this case and a loopy SUB32rm is emitted.

So the patch works by making sure there isn't a pre-selected MachineNode to find during this check. It would be nice to assume that MachineNodes only exist after selection, but if so, we've got quite a bit of refactoring in multiple backends to get there.

I'll see if I can work out what the performance implications of doing a more thorough search here are.

@DimitryAndric
Copy link
Collaborator Author

Fixed in r191165. Thanks Tim!

@DimitryAndric
Copy link
Collaborator Author

mentioned in issue llvm/llvm-bugzilla-archive#15981

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 4, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants