LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 39427 - gcc ABI incompatibility when passing llvm::Optional
Summary: gcc ABI incompatibility when passing llvm::Optional
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Support Libraries (show other bugs)
Version: 7.0
Hardware: PC Linux
: P release blocker
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks: release-7.0.1
  Show dependency tree
 
Reported: 2018-10-24 23:05 PDT by tneumann
Modified: 2019-03-07 00:13 PST (History)
16 users (show)

See Also:
Fixed By Commit(s): r346985


Attachments
ABI demonstration (1009 bytes, text/x-c++src)
2018-10-24 23:05 PDT, tneumann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tneumann 2018-10-24 23:05:46 PDT
Created attachment 21034 [details]
ABI demonstration

Calling llvm::ConstantExpr::getGetElementPtr crashes when LLVM 7 is compiled with gcc 8.2 and the caller is compiled with clang 7.

The reason is an ABI incompatibility: clang wants to pass the llvm::Optional<unsigned> parameter by value (i.e., as a single i64), while gcc expects the value to be passed by reference.

I have attached a simple test program that demonstrates the problem explicitly, regardless how the underlying LLVM was compiled. (It still uses the LLVM 7 headers, but the problem is triggered purely in the example code here).

To reproduce, compile and run the attached program like this:

clang++-7 -c -opart1.o foo.cpp `llvm-config-7 --cxxflags`
g++ -c -opart2.cpp -DPART2 foo.cpp `llvm-config-7 --cxxflags`
clang++-7 -ofoo part1.o part2.o `llvm-config-7 --cxxflags`
./foo

This leads to a crash. Using either g++ or clang++ for both files is fine, but mixing them leads to a crash. By inspecting the assembler code we can see that clang++ passes by value, and g++ passes by reference.
Comment 1 Reid Kleckner 2018-10-25 11:27:15 PDT
Yeah, this is bad. It's been discussed on llvm-dev and the cause is known, but the ship has sailed:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126564.html
http://llvm.org/viewvc/llvm-project?view=revision&revision=342637

Maybe we can fix it for 7.0.1.

Surprisingly, I don't think we had a bug for it in the tracker until now, just mailing list discussion.
Comment 2 Tom Stellard 2018-10-25 11:45:06 PDT
Is it correct that the consequences of backporting r342637 are:

- 7.0.0->7.0.1 ABI break for libLLVM-7.so that was compiled by gcc.
- gcc < 7.3 won't be able to compile LLVM 7.0.1.
Comment 3 Hans Wennborg 2018-11-07 05:54:36 PST
(In reply to Tom Stellard from comment #2)
> Is it correct that the consequences of backporting r342637 are:
> 
> - 7.0.0->7.0.1 ABI break for libLLVM-7.so that was compiled by gcc.
> - gcc < 7.3 won't be able to compile LLVM 7.0.1.

No, r342637 just reverted trunk back to the state we were in when branching for 7.0.0 IIRC.

This isn't fixed on trunk yet.

My proposal is to go back to what we had before r322838.
Comment 4 Josh Stone 2018-11-13 12:35:05 PST
I believe they're hitting this in the Debian bug I just linked, where clang-built libLLVM.so is called by rustc's src/rustllvm built with GCC.
Comment 5 Sylvestre Ledru 2018-11-13 13:26:00 PST
Tom, do you know how you are going to fix that?
Thanks
Comment 6 Tom Stellard 2018-11-13 20:00:32 PST
(In reply to Sylvestre Ledru from comment #5)
> Tom, do you know how you are going to fix that?
> Thanks

I think Han's suggestion for fixing it is fine for trunk.

For our Fedora builds I know we are going to fix it by keeping the ABI the same on gcc builds and changing the ABI on clang builds to match gcc (i.e removing the #if block that only is active for clang).

For the upstream 7.0.1 release, I don't see a way to fix this without breaking the ABI for at least one of the compilers.  Since clang is an llvm project, my suggestion for the release_70 branch would be to fix it the opposite of how we will fix it in Fedora, which means to keep the clang ABI the same and change the gcc ABI to match it.
Comment 7 Hans Wennborg 2018-11-14 00:51:43 PST
It's confusing to keep track of the changes here.

I believe this is the change that introduced the Clang-GCC ABI difference, by adding the #ifdefs:

---
commit 933d1a7b93303c063ebf5e65e3aa0aae351ec8ee
Author: Benjamin Kramer <benny.kra@googlemail.com>
Date:   Thu Jan 18 16:23:40 2018 +0000

    [ADT] Just give up on GCC, I can't fix this.

    While the memmove workaround fixed it for GCC 6.3. GCC 4.8 and GCC 7.1
    are still broken. I have no clue what's going on, just blacklist GCC for
    now.

    Needless to say this code is ubsan, asan and msan-clean.

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322862 91177308-0d34-0410-b5e6-96231b3b80d8
---


IIUC that was the end of a chain of commits trying to make Optional trivially copyable, see PR35978.

The reason for #ifdefs is that several versions of GCC miscompiles #ifdefed out code otherwise.


My suggestion is that for trunk, we back out the whole thing. It's caused more problems than it's solving I think. I believe that means backing out to the state before r322838.


> For our Fedora builds I know we are going to fix it by keeping the ABI the same on gcc builds and changing the ABI on clang builds to match gcc (i.e removing the #if block that only is active for clang).

I think removing the clang-only #if block is the correct fix for the branch.

> For the upstream 7.0.1 release, I don't see a way to fix this without breaking the ABI for at least one of the compilers.  Since clang is an llvm project, my suggestion for the release_70 branch would be to fix it the opposite of how we will fix it in Fedora, which means to keep the clang ABI the same and change the gcc ABI to match it.

If we do this, many versions of GCC will misscompile the code. That's why the #if's were added. So I think removing the clang-only #if block is a better fix, even if it disturbs the ABI more (i.e. changes it for clang-built binaries).



Ben, do you have any input on this?
Comment 8 Benjamin Kramer 2018-11-14 02:35:10 PST
We had 3 more attempts at fixing this that broke with random versions of GCC. Let's just revert it, I don't think there are any dependencies on it in trunk at the moment.

It will make clang-tidy emit many more warnings and you can't put llvm::Optional into a union anymore, but that's better than an ABI break.
Comment 9 Sylvestre Ledru 2018-11-14 07:13:11 PST
In Debian & Ubuntu, I am taking the risk to break the ABI.
AFAIK, "only" rustc was impacted

I uploaded a new version of the package with this patch:
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/blob/7/debian/patches/pr39427-misscompile.diff
It fixed the rust crash.

However, the test case from this bug still fail (there is a small typo in comment #0)
Should be 
g++ -c -opart2.o -DPART2 foo.cpp `llvm-config-7 --cxxflags`
(instead of g++ -c -opart2.cpp -DPART2 foo.cpp `llvm-config-7 --cxxflags`)

#0  0x00000000004005c0 in bar(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) ()
#1  0x00000000004005ad in main ()
Comment 10 Tom Stellard 2018-11-14 13:46:57 PST
Patch to fix this in trunk: https://reviews.llvm.org/D54540
Comment 11 Tom Stellard 2018-11-26 21:09:05 PST
This has been merged to trunk as r346985.  For the stable release, what do people think about holding this out of 7.0.1 and then doing another release after 7.0.1 that includes only this patch (either 7.0.2 or 7.1.0), so that users can get all the fixes without the ABI change if they want.
Comment 12 Michał Górny 2018-12-30 01:53:59 PST
Ping.  What happens now?
Comment 13 Reid Kleckner 2019-01-02 10:42:43 PST
The way I see it, users care about GCC/clang library ABI compatibility, so we need to put out a release with this bug fixed somewhere, even though it technically breaks our ABI stability promise. I seem to recall Tom Stellard suggesting that we put out two simultaneous releases: 7.0.1 and 7.0.2, which are the same, except 7.0.2 takes the patch to revert the ifdefs from llvm::Optional. Personally, this seems like extra complexity, especially because most Linux distros are carrying the patch to revert the llvm::Optional ifdefs anyway. Pulling the revert into 7.0.1 simplifies their lives. Ultimately, Tom is the 7.0.1 release manager, so it is his decision as to whether it is worth the extra complexity and effort to make two releases with two sets of pre-built packages.
Comment 14 Michał Górny 2019-01-02 11:17:50 PST
Reid, 7.0.1 is out already.  Hence, I'm pinging for some kind of progress here ;-).

There were also suggestions to make it 7.1.0 but I don't know if the progress on removing minor version number didn't already made that impracticable.  One thing to note is that we need to change SOVERSION for this release.
Comment 15 Tom Stellard 2019-01-07 08:55:08 PST
I am planning to put this fix into a 7.1.0 release.  I will update this issue when the branch is ready for testing/review.
Comment 16 Sylvestre Ledru 2019-01-07 09:16:57 PST
7.1.0 will have only this fix, right?
Comment 17 Tom Stellard 2019-01-07 09:47:06 PST
(In reply to Sylvestre Ledru from comment #16)
> 7.1.0 will have only this fix, right?

Yes, that's the plan.
Comment 18 sguelton 2019-01-23 06:15:49 PST
With llvm::isPodlIke behind replaced by a portable llvmm:is_trivially_copyable, it seems possible to get back to an optimised version of llvm::Optional for trivially copyable types, without ABI issues.

Patch proposed in https://reviews.llvm.org/D57097
Comment 19 sguelton 2019-02-13 01:50:45 PST
This should be fixed as of https://reviews.llvm.org/rL353927
Comment 20 sguelton 2019-03-07 00:13:36 PST
Fixed as of r354264, the attached test case now compiles correctly.