Created attachment 22946 [details] The IR module Precondition: LLVM 9.0.0 built by MSVC 2019. Steps to reproduce: 1) Load 'module.bc' to memory from the file. 2) Add module to LLLazyJIT. 3) Request address of function 'calculate'. 4) Call function 'calculate'. Expected result: The function should be executed fine. Actual result: The OrcJIT failed on an assertion Core.cpp:867: Assertion failed: I != Symbols.end() && "Symbol not found", file C:\Workspace\llvm-9.0.0.src\lib\ExecutionEngine\Orc\Core.cpp, line 867 Note: I figured out that the missed symbol is __real@40091eb851eb851f. It appears after IR code is compiled by llc to assembler. This issue is not observed on Linux and MinGW builds on Windows.
Created attachment 22947 [details] Binary IR module
Created attachment 22948 [details] The x86-64 assembler version of module
; ModuleID = 'module.bc' source_filename = "module.ll" @PI = local_unnamed_addr constant double 3.140000e+00 @cachedValue = private unnamed_addr global double 0.000000e+00 ; Function Attrs: norecurse nounwind writeonly define double @"calculate"(i32) local_unnamed_addr #0 { %2 = sitofp i32 %0 to double %3 = fmul double %2, 3.140000e+00 %4 = fmul double %3, %2 store double %4, double* @cachedValue, align 8 ret double %4 } ; Function Attrs: norecurse nounwind readonly define double @"getCachedValue"() local_unnamed_addr #1 { %1 = load double, double* @cachedValue, align 8 ret double %1 } attributes #0 = { norecurse nounwind writeonly } attributes #1 = { norecurse nounwind readonly }
.text .def @feat.00; .scl 3; .type 0; .endef .globl @feat.00 .set @feat.00, 0 .file "module.ll" .def calculate; .scl 2; .type 32; .endef .globl __real@40091eb851eb851f # -- Begin function calculate .section .rdata,"dr",discard,__real@40091eb851eb851f .p2align 3 __real@40091eb851eb851f: .quad 4614253070214989087 # double 3.1400000000000001 .text .globl calculate .p2align 4, 0x90 calculate: # @calculate # %bb.0: cvtsi2sdl %ecx, %xmm1 movsd __real@40091eb851eb851f(%rip), %xmm0 # xmm0 = mem[0],zero mulsd %xmm1, %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, .LcachedValue(%rip) retq # -- End function .def getCachedValue; .scl 2; .type 32; .endef .globl getCachedValue # -- Begin function getCachedValue .p2align 4, 0x90 getCachedValue: # @getCachedValue # %bb.0: movsd .LcachedValue(%rip), %xmm0 # xmm0 = mem[0],zero retq # -- End function .section .rdata,"dr" .globl PI # @PI .p2align 3 PI: .quad 4614253070214989087 # double 3.1400000000000001 .lcomm .LcachedValue,8,8 # @cachedValue
Created attachment 22955 [details] The patch which fixes the issue.
Hi Alexander, Thanks for looking in to this. The proposed patch is not the right fix: It weakens responsibility checking and dynamically injects new definitions. I'm trying to wrap my head around what has gone wrong so that I can figure out the correct fix. From the original assertion it looks like the assertion that fired was in a call to JITDylib::resolve that included a symbol that wasn't defined in the JITDylib, but I'm not sure how that could have happened: We should have hit the corresponding assertion in MaterializationResponsibility::notifyResolved before we got to JITDylib::resolve. Would you be able to run this under a debugger and obtain a backtrace for the original assertion? If this analysis is correct: That the symbol has somehow made it into the responsibility set for the MaterializationResponsibility, but is missing from the JITDylib, then we need to track down where this correspondence broke. They should move in lock-step.
Created attachment 22960 [details] The x86-64 assembler version of module (Linux)
.text .file "module.ll" .section .rodata.cst8,"aM",@progbits,8 .p2align 3 # -- Begin function calculate .LCPI0_0: .quad 4614253070214989087 # double 3.1400000000000001 .text .globl calculate .p2align 4, 0x90 .type calculate,@function calculate: # @calculate # %bb.0: cvtsi2sd %edi, %xmm1 movsd .LCPI0_0(%rip), %xmm0 # xmm0 = mem[0],zero mulsd %xmm1, %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, .LcachedValue(%rip) retq .Lfunc_end0: .size calculate, .Lfunc_end0-calculate # -- End function .globl getCachedValue # -- Begin function getCachedValue .p2align 4, 0x90 .type getCachedValue,@function getCachedValue: # @getCachedValue # %bb.0: movsd .LcachedValue(%rip), %xmm0 # xmm0 = mem[0],zero retq .Lfunc_end1: .size getCachedValue, .Lfunc_end1-getCachedValue # -- End function .type PI,@object # @PI .section .rodata,"a",@progbits .globl PI .p2align 3 PI: .quad 4614253070214989087 # double 3.1400000000000001 .size PI, 8 .type .LcachedValue,@object # @cachedValue .local .LcachedValue .comm .LcachedValue,8,8 .section ".note.GNU-stack","",@progbits
Hi Lang, Thank you for the fast response. I had some debug session before I prepared my "hot fix" :) So per my light analysis, the root cause of the issue is a difference between low-level code generation on MSVC (COFF) and Linux/MinGW. I have attached the assembler version of the module which compiled by LLC on Linux. You can see a difference: MSVC version has additional constant __real@40091eb851eb851f(%rip) in function 'calculate'. This constant wasn't declared in the IR module, but was added during compilation to platform-specific code. Maybe it connected with some rules of compilation for the MSVC target. So what is happened in code in my understanding: 1) We requested to materialize function 'calculate' and inserted the function name 'calculate' in the 'Symbols' map. 2) We have got 'notifyResolved' notification from RTDyldObjectLinkingLayer.cpp with list of resolved symbols. 3) We expected to see in resolved symbols list only symbols which we requested (which contains in 'Symbols') and assertion failed. But because a new constant '__real@40091eb851eb851f' was generated during compilation it also presents in the resolved list which comes from the linking layer (RTDyldObjectLinkingLayer.cpp). So in my opinion, 'calculation' depends on constant '__real@40091eb851eb851f' and we have to extend our 'Symbol' map. And you right, my patch injects all additional symbols which come from the linking layer. Because all symbols already have real addresses it is safe. --- Best wishes, Alexander
Another way just discard all unexpected symbols, but I don't enough familiar with OrcJIT architecture to say what way is proper.
Created attachment 22965 [details] The patch which fixes the issue [updated]. Ignore already resolved symbols.
Hi Lang, Is there any chance that this patch (or "true" fix) will be in LLVM 10?
Hi Alexander, I'm just back from vacation. I'll try to get you an update on this today. -- Lang.
Hi Alexander, > I had some debug session before I prepared my "hot fix" :) > So per my light analysis, the root cause of the issue is a difference between low-level code generation on MSVC (COFF) and Linux/MinGW. > I have attached the assembler version of the module which compiled by LLC on Linux. You can see a difference: MSVC version has additional constant __real@40091eb851eb851f(%rip) in function 'calculate'. >This constant wasn't declared in the IR module, but was added during compilation to platform-specific code. Maybe it connected with some rules of compilation for the MSVC target. That all sounds right to me. I guess the code generator transformed this into a named constant (probably intended to be non-exported from the final symbol table). The thing that has me confused is the specific assertion that you saw: It is in JITDylib::resolve, but this is only called from MaterializationResponsibility::notifyResolved which contains what should have been an equivalent assertion: That the symbol appears in the SymbolFlags map of the MaterializationResponsibility. So what I want to know is how we managed to trigger the second assertion without triggering the first? Are you able to debug your LLVM build and tell me whether the loop in MaterializationResponsibility::notifyResolved is running? That mystery aside, I think the right answer here is to have RTDyldObjectLinkingLayer auto-claim responsibility for new symbols in the object. I've attached a patch to do that (auto-claim-obj-syms.patch). Would you be able to test it out for me? I do not have access to a Windows machine. -- Lang.
Created attachment 22978 [details] Patch to auto-claim new symbols discovered by the linking layer.
We had a winter holidays. Sure. I'll try your patch. Could you please also take a look https://bugs.llvm.org/show_bug.cgi?id=44336 (issue with an assertion of Export flag)? Because this bug depends on 44336.
What version of LLVM is your patch for? I use the 9.0.0 release. Could you please prepare a patch for 9.0.0?
> The thing that has me confused is the specific assertion that you saw: It is > in JITDylib::resolve, but this is only called from > MaterializationResponsibility::notifyResolved which contains what should > have been an equivalent assertion: That the symbol appears in the > SymbolFlags map of the MaterializationResponsibility. So what I want to know > is how we managed to trigger the second assertion without triggering the > first? Are you able to debug your LLVM build and tell me whether the loop in > MaterializationResponsibility::notifyResolved is running? Sorry for the confusion. This part of the patch fixes the issue https://bugs.llvm.org/show_bug.cgi?id=44336 (The root cause of the issue that the resolved symbol on COFF doesn't have flag JITSymbolFlags::Exported). Because without such fix I can't debug issue with extra symbols.
Created attachment 23004 [details] Patch to auto-claim new symbols discovered by the linking layer LLVM 9.0.0
Lang, I have adapted your patch for LLVM 9.0.0 (please take a look is it right) and have got the error at runtime: JIT session error: Duplicate definition of symbol '__real@40091eb851eb851f'
I think root cause of this error (Duplicate definition of symbol '__real@40091eb851eb851f') is here (return make_error<DuplicateDefinition>(*KV.first);): Error JITDylib::defineMaterializing(const SymbolFlagsMap &SymbolFlags) { return ES.runSessionLocked([&]() -> Error { std::vector<SymbolTable::iterator> AddedSyms; for (auto &KV : SymbolFlags) { SymbolTable::iterator EntryItr; bool Added; std::tie(EntryItr, Added) = Symbols.insert(std::make_pair(KV.first, SymbolTableEntry(KV.second))); if (Added) { AddedSyms.push_back(EntryItr); EntryItr->second.setState(SymbolState::Materializing); } else { // Remove any symbols already added. for (auto &SI : AddedSyms) Symbols.erase(SI); // FIXME: Return all duplicates. return make_error<DuplicateDefinition>(*KV.first); } } return Error::success(); }); } Called from: Error RTDyldObjectLinkingLayer::onObjLoad( VModuleKey K, MaterializationResponsibility &R, object::ObjectFile &Obj, std::unique_ptr<RuntimeDyld::LoadedObjectInfo> LoadedObjInfo, std::map<StringRef, JITEvaluatedSymbol> Resolved, std::set<StringRef> &InternalSymbols) { SymbolFlagsMap ExtraSymbolsToClaim; SymbolMap Symbols; for (auto &KV : Resolved) { // Scan the symbols and add them to the Symbols map for resolution. // We never claim internal symbols. if (InternalSymbols.count(KV.first)) continue; auto InternedName = getExecutionSession().intern(KV.first); auto Flags = KV.second.getFlags(); // Override object flags and claim responsibility for symbols if // requested. if (OverrideObjectFlags || AutoClaimObjectSymbols) { auto I = R.getSymbols().find(InternedName); if (OverrideObjectFlags && I != R.getSymbols().end()) Flags = I->second; else if (AutoClaimObjectSymbols && I == R.getSymbols().end()) ExtraSymbolsToClaim[InternedName] = Flags; } Symbols[InternedName] = JITEvaluatedSymbol(KV.second.getAddress(), Flags); } if (!ExtraSymbolsToClaim.empty()) if (auto Err = R.defineMaterializing(ExtraSymbolsToClaim)) return Err; R.notifyResolved(Symbols); if (NotifyLoaded) NotifyLoaded(K, Obj, *LoadedObjInfo); return Error::success(); } The second issue is here: if (OverrideObjectFlags && I != R.getSymbols().end()) Flags = I->second; else if (AutoClaimObjectSymbols && I == R.getSymbols().end()) ExtraSymbolsToClaim[InternedName] = Flags; } Symbols[InternedName] = JITEvaluatedSymbol(KV.second.getAddress(), Flags); Despite the AutoClaimObjectSymbols flag is true extra symbol will be added to Symbols[InternedName].
Hi Alexander, Someone has reported a similar bug in http://bugs.llvm.org/PR40074. I have committed a (likely) fix for their issue in 84217ad66115cc31b184374a03c8333e4578996f. Would you be able to test and see if that commit fixes your issue too? -- Lang.
(In reply to Lang Hames from comment #22) > Hi Alexander, > > Someone has reported a similar bug in http://bugs.llvm.org/PR40074. I have > committed a (likely) fix for their issue in > 84217ad66115cc31b184374a03c8333e4578996f. Would you be able to test and see > if that commit fixes your issue too? > > -- Lang. Hi Lang, I have ported your patch from #40074 to LLVM 9.0.0 and tested it in the mine environment. Your patch is working fine for me. Also, it fixes bug #44336. But are you sure that it is a good idea to pass a parameter by value? Error defineMaterializing(SymbolFlagsMap SymbolFlags); If you don't want to modify the map, a constant reference is a better solution, if you want to modify, a non-constant reference or pointer is more applicable here, in my opinion. One more question for you: is removing a module from OrcJIT will be available in LLVM 10? --- Best wishes, Alexander
> I have ported your patch from #40074 to LLVM 9.0.0 and tested it in the mine environment. > Your patch is working fine for me. > Also, it fixes bug #44336. That's great news. :) Ok -- closing this as fixed by 84217ad6611. > But are you sure that it is a good idea to pass a parameter by value? > Error defineMaterializing(SymbolFlagsMap SymbolFlags); > If you don't want to modify the map, a constant reference is a better solution, if you want to modify, a non-constant reference or pointer is more applicable here, in my opinion. The JITDylib::defineMaterializing method makes modifications to the map based on what definitions are accepted and rejected, and returns these to MaterializationResponsibilty::defineMaterializing so that it can update the responsibility set. Since these modifications are not interesting to the client, it makes sense to pass by value: It is cheap, and if the client does not need their value any more then no copies are required. > One more question for you: is removing a module from OrcJIT will be available in LLVM 10? Unfortunately support for removing modules will not make it in to LLVM 10. The best solution for now is to maintain a separate ExecutionSession for code that you want to throw away. I am working on removable module support, and hope to get it in to LLVM 11.