-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SelectionDAG.cpp:900: bool llvm::SelectionDAG::RemoveNodeFromCSEMaps(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "DELETED_NODE in CSEMap!"' failed. with llc -O1 #46861
Comments
That one is on the 11.x branch, so let's treat this as a blocker. QingShan: can you take a look? |
Here we called |
Sent patch https://reviews.llvm.org/D87614 to fix this. |
That landed in 2508ef0. I've merged it to 11.x in 88e17a8. Thanks! |
Thanks! |
Reopening as https://reviews.llvm.org/D87614#2273931 suggests there are still problems. I'm not familiar with the code, but it sounds like we're now trying to fix the fixes... Is it possible to fix these problems in some more principled way, or can we revert back to green? |
I reverted the commit and plan to send a new revision.. I thought it was in urgent and committed without additional sanitize check. Sorry for the back-and-forth. |
Thanks. Cherry-picked the revert to 11.x as 22dab21. |
Hmm, we do some big refactor for getNegatedExpression and this is the last step to remove unused DAG Node we created to save compiling time. Though there are several patches but in fact, they are all doing the same thing. We miss the fact that, two DAG nodes can be the same no matter where it comes from. I think, we have clean it now. |
I see the latest patch is here: https://reviews.llvm.org/D87698 |
That landed and has been cherry-picked to 11.x as b78e5de. Please let me know if there are any follow-ups. |
Hi! Unfortunately we found a new problem related to this in our downstream testing. Reproduce with: Result: This starts happening wiith the fix https://reviews.llvm.org/D87698 submitted as a2fb544. If I revert a2fb544 this crash goes away. I haven't tried this out on 11.x, only on trunk. |
Qiu: sounds like there are still problems here, despite multiple fixes, and it's holding up the release. Can you please take a look? |
In fact, can we revert back to the behavior of 10.0.0? Given the number of fix attempts so far, I'm not confident that just adding another patch right before the release will do the trick here. |
Mikael, can you comment on the nature of the source code and how likely users are to run into this bug? Given the state of this problem, we may have to release with this as a known issue. |
Also could someone comment on how severe this assert is. For example, is the bug only that we're trying to remove an already deleted node from this map (which seems benign in itself) or could it lead to miscompilation when asserts are disabled? |
As far as I know we have only seen this in some generated tests involving loops with _Complex float. The latest example looks like _Complex float a; void f(void) and then we ran I'm not sure what is really required for it to happen. |
Hi, in patch D87698, will it resolve the crash if only revert the |
Yes if I just remove that line the crash seems to go away for the new reproducer. I haven't done any more testing so I've no idea if it comes with any negative effects. |
Qui, do you know the answer to this question? |
As I tested, the program still crashes due to memory error on a non-assert build. At the first look, the reason is that a constant node (combined from a fmul) is deleted but still referenced in BTW, removing the |
Okay, that sounds pretty bad.
I thought the RemoveDeadNode calls were added to fix llvm/llvm-bugzilla-archive#46877 which reports an infinite loop in DAGCombiner, not just slow compiles? (compile time "explosion" also doesn't sound good..) |
I've been trying to trace the history of these patches: The latest patch causing a problem is D87698 That is a re-land of D87614 but with a fix "in rare cases the node to be removed is the same as result of getNode. We missed that" D87614 was in turn a fix of D86183 for #46861 D86183 was a fix for llvm/llvm-bugzilla-archive#46877 which was caused by D77319 (which was already reverted once) It seems D77319 is another in a line of fixes: "We are struggling to fix such issues on D75501, D70975, D76638, D76439 etc"
|
It was exponentially increasing compile-time after closer inspection, but it went effectively-infinite-loop pretty quickly.
I don't have the current failure root caused, but so far I haven't gotten it to fail on x86 or aarch64, so there may be some target-specific behavior in play. Backing out the whole series of patches would probably be worse than taking 1 more patch since we were fixing a crash to begin with. But I can't say yet whether this is something we would hit in typical code yet. Killing a RemoveDeadNode() call seems like a relatively safe change, but our track record on this series of patches is poor as noted. cc'ing Craig for another opinion. I'll keep trying to reduce/isolate the test. |
In this particular case a target using AA during DAGCombin/Isel is needed. Our out of tree target does, and PPC too. When I forced useAA==true on x86 it crashed there too. |
Thanks for looking into it! |
The cause should be: We're negating Here it's some coincidence that NegX is equal to NegY. NegX was kept since the code did not go through the part in |
I have it reduced to this to crash on x86 (but ironically not with the original PPC target): define float @f(float* %p) { So to trigger the crash, we require fast-math (at least 'reassoc') and some constant folding (the load/store of 0.0 and subsequent fmul) that escaped IR optimization. That constant folding has to follow a very narrow path - even commuting the operands in some of those instructions avoids the bug. I don't know how to assess the risk of that scenario. I want to believe it's not worth delaying the 11.0 release again, but a fix should definitely go into any later builds since it's a compiler crash. |
I improved SDAG's constant folding: It's still a judgement call as to whether we should delay the release for my fix or a change in getNegatedExpression(). |
Thank you, Sanjay! Although we eliminated the failure, I'm afraid there might be cases similar to this but negated results are not constants? (NegX is equal to NegY, NegX was used but NegY was deleted) |
We've found another crash relasted to the "RemoveDeadNode(CFP)" part. Unfortunately I haven't been able to reproduce it on PPC or X86. Roughly what happens is that in TargetLowering::getNegatedExpression Op is t36: f32 = ConstantFP<4.218750e+08> and
returns a the same node so Op == CFP and then we do
which will remove Op. But the caller of getNegatedExpression has a handle to Op which then is NegX: t36: f32 = <<Deleted Node!>> and it will use it to build a new fsub node, which will reference a deleted node: Creating new node: t37: f32 = fsub nnan ninf nsz arcp contract afn reassoc t37, ConstantFP:f32<-4.218750e+08> The new x * 0.0 fold doesn't help in this case. |
(Bah, forgot to add the last part) Operand not processed? UNREACHABLE executed at ../lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:485! |
This looks like the node is its own operand? t37: f32 = fsub nnan ninf nsz arcp contract afn reassoc t37, ConstantFP:f32<-4.218750e+08> |
Yes I think it's due to the input actually being a DELETED NODE. |
Ok, now I've managed to reproduce the latest crash on X86 as well: llc -O0 -o - bbi-47837_x86.ll |
Ah, so it's the same issue as comment #28. I think I need to revert the Remove DeadNode(CFP) part since it's actually not necessary for the original bug of thi s. |
Mikael, how bad would it be for your group if these issues are not fixed in LLVM 11.0.0, but rather 11.0.1?
Please go ahead an do that, and include the test cases from this bug so we don't regress. |
It's ok for us. |
Reduced to crash with default settings (optimizations on) for most targets: define float @f(float %x, float %y) { |
Sure. See rGb326d4ff.
Thanks! |
Thanks. Pushed to 11.x as 121baba. It would be nice if this code were more reliable, but closing this bug for now since the currently known problems should be fixed now. |
Extended Description
llc -O1 -o - bbi-47255.ll
results in
llc: ../lib/CodeGen/SelectionDAG/SelectionDAG.cpp:900: bool llvm::SelectionDAG::RemoveNodeFromCSEMaps(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "DELETED_NODE in CSEMap!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0. Program arguments: build-all/bin/llc -O1 -o - bbi-47255.ll
#0 0x00000000025d2c84 PrintStackTraceSignalHandler(void*) (build-all/bin/llc+0x25d2c84)
#1 0x00000000025d07fe llvm::sys::RunSignalHandlers() (build-all/bin/llc+0x25d07fe)
#2 0x00000000025d2f9c SignalHandler(int) (build-all/bin/llc+0x25d2f9c)
#3 0x00007f7f4f6c88a0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x128a0)
#4 0x00007f7f4e171f47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
#5 0x00007f7f4e1738b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0
#6 0x00007f7f4e16342a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0
#7 0x00007f7f4e1634a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2)
#8 0x00000000023be964 llvm::SelectionDAG::RemoveNodeFromCSEMaps(llvm::SDNode*) (build-all/bin/llc+0x23be964)
#9 0x00000000023bec4f llvm::SelectionDAG::DeleteNode(llvm::SDNode*) (build-all/bin/llc+0x23bec4f)
#10 0x000000000226da96 (anonymous namespace)::DAGCombiner::recursivelyDeleteUnusedNodes(llvm::SDNode*) (build-all/bin/llc+0x226da96)
#11 0x000000000226e990 llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) (build-all/bin/llc+0x226e990)
#12 0x0000000002416372 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (build-all/bin/llc+0x2416372)
#13 0x0000000002415392 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (build-all/bin/llc+0x2415392)
#14 0x00000000024119ce llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x24119ce)
#15 0x00000000012f6da5 (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x12f6da5)
#16 0x0000000001abf979 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (build-all/bin/llc+0x1abf979)
#17 0x0000000001eb83bf llvm::FPPassManager::runOnFunction(llvm::Function&) (build-all/bin/llc+0x1eb83bf)
#18 0x0000000001ebec58 llvm::FPPassManager::runOnModule(llvm::Module&) (build-all/bin/llc+0x1ebec58)
#19 0x0000000001eb898d llvm::legacy::PassManagerImpl::run(llvm::Module&) (build-all/bin/llc+0x1eb898d)
#20 0x00000000006c306d main (build-all/bin/llc+0x6c306d)
#21 0x00007f7f4e154b97 __libc_start_main /build/glibc-2ORdQG/glibc-2.27/csu/../csu/libc-start.c:344:0
#22 0x00000000006c09aa _start (build-all/bin/llc+0x6c09aa)
Abort (core dumped)
This is new, it starts crashing with deb4b25:
The text was updated successfully, but these errors were encountered: