Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate definition of symbol error to do with mergable comdat constant symbols on Windows/COFF #39421

Closed
mvhooren opened this issue Dec 18, 2018 · 12 comments
Labels
bugzilla Issues migrated from bugzilla orcjit

Comments

@mvhooren
Copy link

Bugzilla Link 40074
Resolution FIXED
Resolved on Jan 22, 2020 15:03
Version trunk
OS Windows NT
CC @AlexDenisov,@lhames,@vchuravy

Extended Description

It happens when the same literal is defined twice, each in a different module and after the second module has been added to the RTDyldObjectLinkingLayer.

So for example in some module:

define float @​someFunction() {
entry:
ret float 2.000000e+00
}

In some other module:
define float @​someOtherFunction() {
entry:
ret float 2.000000e+00
}

When materializing the second function, you will get this error:

JIT session error: Duplicate definition of symbol '__real@40000000'

The symbol names come from comdat symbols that are created in TargetLoweringObjectFileImpl.cpp TargetLoweringObjectFileCOFF::getSectionForConstant

For COFF, I have the OverrideObjectFlagsWithResponsibilityFlags and AutoClaimResponsibilityForObjectSymbols set to true on the RTDyldObjectLinkingLayer or my function symbols are not found at all.

The same code compiles and runs just fine on Linux/Elf, even with the COFF workarounds enabled there as well.

A workaround I found is to set HasCOFFComdatConstants to false in MCAsmInfoCOFF.cpp

@mvhooren
Copy link
Author

This bug is still reproducible with ORCv2

@lhames
Copy link
Contributor

lhames commented May 22, 2019

Hi Machiel,

Oh -- this is awful. :)

Does it manifest with the same error on ORCv2? Or with a different error?

The relevant parts of RTDyldObjectLinkingLayer/RuntimeDyld's algorithm look something like this:

(1) scan the object file, noting weak definitions in a side table
(2) find out which weak definitions we are responsible for
(3) perform the link
(4) if auto-claim is on, add all provided definitions to the definition set
(5) register definitions

The issue likely has something to do with the fact that determine responsibility for materializing weak defs in (2), but don't claim it until (4). Though I would have expected this to lead to a missing definition error, rather than a duplicate.

We probably need to add a new API to MaterializationResponsibility: "defineMaterializingWeak". Whereas "defineMaterializing" generates an error on duplicates, "defineMaterializingWeak" would silently continue without adding duplicates to the symbol table.

Could you include the target triple that you're seeing in your test cases so that I can make sure I match your setup?

@mvhooren
Copy link
Author

Hi Lang,

Yes, on ORCv2 it prints the same error message to the console and then crashes shortly afterwards.
My target tripple is x86_64-pc-windows-msvc.

The crash happens in RTDyldObjectLinkingLayer::onObjEmit when it tries to destroy a MemoryBuffer (the ObjBuffer).

Let me know if you need anything else. I could create a minimal repro case using the Kaleidoscope examples if you want.

@mvhooren
Copy link
Author

I have marked this bug as a release blocker because it causes a crash when using ORC on Windows in a very trivial case.

@lhames
Copy link
Contributor

lhames commented Jan 17, 2020

@lhames
Copy link
Contributor

lhames commented Jan 17, 2020

Setting priority back down. This isn’t ideal, but it’s not a release blocker: ORCv1 and MCJIT are still both in LLVM10.

Confirming the earlier analysis: __real@40000000 is (a COMDAT “any” symbol) for a constant pool entry that is created by MC. We’re failing when two materialization units create the same symbol late in the pipeline and both try to register it.

An ideal solution to this would involve teaching the JIT, libObject, and the JIT-linker about COMDAT symbols. That’s not going to happen for a while though.

In the mean time I have attached a patch for an possible workaround. This patch modifies defineMaterializing to support defining new weak symbols (we’ll want to take this part either way), then adds a hack to RTDyldObjectLinkingLayer to detect COFF objects, look for newly defined symbols, and then mark any newly defined symbols in COMDAT sections as weak.

Machiel — Could you see if this patch fixes your issue? Unfortunately I don’t have a windows machine to test execution of COFF objects.

— Lang.

@mvhooren
Copy link
Author

Setting priority back down. This isn’t ideal, but it’s not a release
blocker: ORCv1 and MCJIT are still both in LLVM10.

....

Machiel — Could you see if this patch fixes your issue? Unfortunately I
don’t have a windows machine to test execution of COFF objects.

— Lang.

Roger on it not being a release blocker. I'm not really familiar with the criteria for what should warrant a release blocker, sorry about that.

I have tested your patch and everything seems to work fine so great success! I've also verified that the issue still exists in unpatched ORC.

@lhames
Copy link
Contributor

lhames commented Jan 21, 2020

Hi Machiel,

Can you confirm that 84217ad fixes this bug?

It looks like this might be related to http://bugs.#44337 -- I want to make sure I figure out what has been fixed and what's still broken.

-- Lang.

@mvhooren
Copy link
Author

Hi Lang,

I've tested my repro case on Windows with both debug and release builds without issue, using the unmodified current master branch.

I have also tested the IR posted in http://bugs.#44337 by loading the IR from a file, adding the module to a dylib, then getting the address to the 'calculate' function and executing it.
This also works without problems, although there is an access violation when executing the compiled IR due to the usage of @​cachedValue. (store double %4, double* @​cachedValue, align 8). Removing that line will produce valid output.

Machiel

@mvhooren
Copy link
Author

I must add that I do not use LLJIT but my own JIT implementation using ORC.

@lhames
Copy link
Contributor

lhames commented Jan 22, 2020

Great! Thank you for checking this out Machiel.

Closing as fixed by 84217ad.

@lhames
Copy link
Contributor

lhames commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#44700

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla orcjit
Projects
None yet
Development

No branches or pull requests

2 participants