LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 40074 - Duplicate definition of symbol error to do with mergable comdat constant symbols on Windows/COFF
Summary: Duplicate definition of symbol error to do with mergable comdat constant symb...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: OrcJIT (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-18 03:45 PST by Machiel van Hooren
Modified: 2020-01-22 15:03 PST (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments
Work around duplicate defs from COFF constant pool entries. (8.39 KB, patch)
2020-01-16 17:03 PST, Lang Hames
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Machiel van Hooren 2018-12-18 03:45:26 PST
It happens when the same literal is defined twice, each in a different module and after the second module has been added to the RTDyldObjectLinkingLayer.

So for example in some module:

define float @someFunction() {
entry:
  ret float 2.000000e+00
}

In some other module:
define float @someOtherFunction() {
entry:
  ret float 2.000000e+00
}

When materializing the second function, you will get this error: 

JIT session error: Duplicate definition of symbol '__real@40000000'

The symbol names come from comdat symbols that are created in TargetLoweringObjectFileImpl.cpp TargetLoweringObjectFileCOFF::getSectionForConstant

For COFF, I have the OverrideObjectFlagsWithResponsibilityFlags and AutoClaimResponsibilityForObjectSymbols set to true on the RTDyldObjectLinkingLayer or my function symbols are not found at all.

The same code compiles and runs just fine on Linux/Elf, even with the COFF workarounds enabled there as well.

A workaround I found is to set HasCOFFComdatConstants to false in MCAsmInfoCOFF.cpp
Comment 1 Machiel van Hooren 2019-05-18 14:30:32 PDT
This bug is still reproducible with ORCv2
Comment 2 Lang Hames 2019-05-22 14:10:29 PDT
Hi Machiel,

Oh -- this is awful. :)

Does it manifest with the same error on ORCv2? Or with a different error?

The relevant parts of RTDyldObjectLinkingLayer/RuntimeDyld's algorithm look something like this:

  (1) scan the object file, noting weak definitions in a side table
  (2) find out which weak definitions we are responsible for
  (3) perform the link
  (4) if auto-claim is on, add all provided definitions to the definition set
  (5) register definitions

The issue likely has something to do with the fact that determine responsibility for materializing weak defs in (2), but don't claim it until (4). Though I would have expected this to lead to a missing definition error, rather than a duplicate.

We probably need to add a new API to MaterializationResponsibility: "defineMaterializingWeak". Whereas "defineMaterializing" generates an error on duplicates, "defineMaterializingWeak" would silently continue without adding duplicates to the symbol table.

Could you include the target triple that you're seeing in your test cases so that I can make sure I match your setup?
Comment 3 Machiel van Hooren 2019-05-23 10:52:56 PDT
Hi Lang,

Yes, on ORCv2 it prints the same error message to the console and then crashes shortly afterwards.
My target tripple is x86_64-pc-windows-msvc.

The crash happens in RTDyldObjectLinkingLayer::onObjEmit when it tries to destroy a MemoryBuffer (the ObjBuffer). 

Let me know if you need anything else. I could create a minimal repro case using the Kaleidoscope examples if you want.
Comment 4 Machiel van Hooren 2020-01-15 08:03:05 PST
I have marked this bug as a release blocker because it causes a crash when using ORC on Windows in a very trivial case.
Comment 5 Lang Hames 2020-01-16 17:03:55 PST
Created attachment 23021 [details]
Work around duplicate defs from COFF constant pool entries.
Comment 6 Lang Hames 2020-01-16 17:06:34 PST
Setting priority back down. This isn’t ideal, but it’s not a release blocker: ORCv1 and MCJIT are still both in LLVM10.

Confirming the earlier analysis: __real@40000000 is (a COMDAT “any” symbol) for a constant pool entry that is created by MC. We’re failing when two materialization units create the same symbol late in the pipeline and both try to register it.

An ideal solution to this would involve teaching the JIT, libObject, and the JIT-linker about COMDAT symbols. That’s not going to happen for a while though.

In the mean time I have attached a patch for an possible workaround. This patch modifies defineMaterializing to support defining new weak symbols (we’ll want to take this part either way), then adds a hack to RTDyldObjectLinkingLayer to detect COFF objects, look for newly defined symbols, and then mark any newly defined symbols in COMDAT sections as weak.

Machiel — Could you see if this patch fixes your issue? Unfortunately I don’t have a windows machine to test execution of COFF objects.

— Lang.
Comment 7 Machiel van Hooren 2020-01-17 08:12:21 PST
(In reply to Lang Hames from comment #6)
> Setting priority back down. This isn’t ideal, but it’s not a release
> blocker: ORCv1 and MCJIT are still both in LLVM10.
> 
> ....
>
> Machiel — Could you see if this patch fixes your issue? Unfortunately I
> don’t have a windows machine to test execution of COFF objects.
> 
> — Lang.

Roger on it not being a release blocker. I'm not really familiar with the criteria for what should warrant a release blocker, sorry about that.

I have tested your patch and everything seems to work fine so great success! I've also verified that the issue still exists in unpatched ORC.
Comment 8 Lang Hames 2020-01-21 14:03:16 PST
Hi Machiel,

Can you confirm that 84217ad66115cc31b184374a03c8333e4578996f fixes this bug?

It looks like this might be related to http://bugs.llvm.org/PR44337 -- I want to make sure I figure out what has been fixed and what's still broken.

-- Lang.
Comment 9 Machiel van Hooren 2020-01-21 15:31:55 PST
Hi Lang,

I've tested my repro case on Windows with both debug and release builds without issue, using the unmodified current master branch.

I have also tested the IR posted in http://bugs.llvm.org/PR44337 by loading the IR from a file, adding the module to a dylib, then getting the address to the 'calculate' function and executing it.
This also works without problems, although there is an access violation when executing the compiled IR due to the usage of @cachedValue. (store double %4, double* @cachedValue, align 8). Removing that line will produce valid output.

Machiel
Comment 10 Machiel van Hooren 2020-01-21 15:37:17 PST
I must add that I do not use LLJIT but my own JIT implementation using ORC.
Comment 11 Lang Hames 2020-01-22 15:03:10 PST
Great! Thank you for checking this out Machiel.

Closing as fixed by 84217ad6611.