Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang-overlay of <intrin.h> breaks <intrin0.h> in VS 2019 version 16.8p1 #46443

Open
BillyONeal opened this issue Aug 10, 2020 · 17 comments
Open
Labels
bugzilla Issues migrated from bugzilla clang:headers Headers provided by Clang, e.g. for intrinsics

Comments

@BillyONeal
Copy link
Contributor

Bugzilla Link 47099
Version 10.0
OS Windows NT
CC @CaseyCarter,@topperc,@RKSimon,@zygoloid,@rnk

Extended Description

Previously reported by a Visual Studio customer as https://developercommunity.visualstudio.com/content/problem/1144026/visual-studio-version-1680-preview-10-no-longer-co.html

The standard libraries have an <intrin0.h> where we declare intrinsics used by the standard library headers as a throughput optimization, because <intrin.h> is huge and causes measurable throughput costs to #include <atomic>.

As part of implementing C++20, we needed new intrinsics for <bit> so we moved them from <intrin.h> to <intrin0.h>. Unfortunately, that is breaking whatever overlay mechanism Clang on Windows uses to select its version of <intrin.h> because it tries to declare _tzcnt_u32 and _tzcnt_u64 as object-like macros.

We are interested in investigating a scheme whereby our official <intrin.h> would do #ifdef __clang__, then #include <whatever clang wants.hpp> or similar so that we would have a more firmly established contract for clang's extensions rather than needing overlays and #include_next.

Billy ONeal
Visual C++ Libraries

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented Aug 10, 2020

Per the Intel Intrinsics Guide, _tzcnt_u32 is declared by <immintrin.h>:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=tzcnt&expand=5972

Clang provides its own <immintrin.h>, which declares this intrinsic as a macro. If <intrin0.h> is providing its own definition of _tzcnt_u32, rather than getting the one from <immintrin.h>, that's going to cause problems.

Now, Clang also provides a complete replacement for <intrin.h>, and doesn't use the one provided by MSVC. Clang's replacement <intrin.h> (indirectly) includes <immintrin.h>, which is where _tzcnt_u32 usually comes from when we target Windows.

It seems like there are a couple of different approaches we could take here:

  1. We continue to replace <intrin.h> with our own implementation, and extend that to also cover <intrin0.h>. We'd need to know which intrinsics should be provided by <intrin0.h> so that we can expose the proper set. No changes on the Visual C++ side.

  2. Your <intrin0.h> detects clang and includes the relevant header from the Intel Intrinsics Guide.

If we want to avoid pulling in all of <immintrin.h> into your standard library, option 1 seems like the way to go to me. Do you have any documentation for what <intrin0.h> should provide?

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented Aug 10, 2020

Alternatively, could you define the functions in terms of Clang _builtin functions under #ifdef __clang__, and not include any intrinsic header at all in that case (and more generally never include <intrin0.h> when the compiler is Clang)?

@BillyONeal
Copy link
Contributor Author

Clang also provides a complete replacement for <intrin.h>

Not quite, because it does #include_next which ends up pulling in ours.

not include any intrinsic header at all in that case (and more generally never include <intrin0.h> when the compiler is Clang)?

That's the first thing I tried; unfortunately that doesn't work because the user can include <intrin.h> themselves which clobbers the attempt to use the intrinsic in .

This is one of those places that hasn't been really 'designed' because we never had to worry about a compiler changing this outside of the corresponding library parts.

  1. We continue to replace <intrin.h> with our own implementation, and extend that to also cover <intrin0.h>. We'd need to know which intrinsics should be provided by <intrin0.h>

That set is constantly changing; <intrin.h> includes <intrin0.h> and we have not before considered 'promoting' an intrinsic from one to the other a breaking change, but it has broken bits here.

  1. Your <intrin0.h> detects clang and includes the relevant header from the Intel Intrinsics Guide.

I tried something like this too, making our intrin0.h go "if clang, then #include <intrin.h>" to pick up your overridden version. Unfortunately due to the aforementioned include_next that causes circular include hell since that tries to include clang's intrin.h, which include_next's our intrin.h which then again tries to include intrin0.h...

That's why I think getting rid of what's forcing you folks to use include_next is going to be a key part of whatever solution on which we land.

@topperc
Copy link
Collaborator

topperc commented Aug 10, 2020

I believe our header is doing

#ifndef _MSC_VER
#include_next <intrin.h>
#else
// provide the intrinsics
#endif

So if we're not pretending to be MSVC we don't define anything and just pass on to the next include.

I think _MSC_VER being defined is controlled by -fms-compatibility-version.

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented Aug 10, 2020

Clang also provides a complete replacement for <intrin.h>

Not quite, because it does #include_next which ends up pulling in ours.

The #include_next is under an #ifndef _MSC_VER. Clang defines _MSC_VER itself when targeting Windows. I think the intent is that the #include_next is unreachable on Windows, and instead exists only to make our <intrin.h> be as invisible as possible when not targeting Windows.

If we're including your <intrin.h> instead of our own, then perhaps something strange is happening there. (Or perhaps I'm misunderstanding what the #ifndef is doing.) Can you check whether Clang is defining _MSC_VER in your test environment?

not include any intrinsic header at all in that case (and more generally
never include <intrin0.h> when the compiler is Clang)?

That's the first thing I tried; unfortunately that doesn't work because the
user can include <intrin.h> themselves which clobbers the attempt to use
the intrinsic in .

I'm not sure I understand. If (when compiling with Clang) your directly uses Clang's __builtin_popcount and friends, and doesn't include <intrin0.h>, then I would think that nothing that uses should be clobbered by <intrin.h>. What am I missing?

This is one of those places that hasn't been really 'designed' because we
never had to worry about a compiler changing this outside of the
corresponding library parts.

  1. We continue to replace <intrin.h> with our own implementation, and extend that to also cover <intrin0.h>. We'd need to know which intrinsics should be provided by <intrin0.h>

That set is constantly changing; <intrin.h> includes <intrin0.h> and we have
not before considered 'promoting' an intrinsic from one to the other a
breaking change, but it has broken bits here.

I'd imagine that most of the time, things in your <intrin0.h> are consistent with things in our <intrin.h>, since (other than the <immintrin.h> and related stuff) it's mostly just a bunch of extern "C" function declarations that will typically have the exact same signature in your <intrin*.h> and in our <intrin.h>. But because in this instance you're (presumably) declaring one of the <immintrin.h> functions in <intrin0.h>, and those aren't just function declarations, this is a more risky change.

@BillyONeal
Copy link
Contributor Author

So if we're not pretending to be MSVC we don't define anything and just pass on to the next include.

I think _MSC_VER being defined is controlled by -fms-compatibility-version.

Hmmmm I see now that clang also overlays the headers intrin.h ends up also including, and... we're not generally prepared for that. I don't think in general it is reasonable for the VC Libraries to support this condition where some of the intrinsics infrastructure is being replaced but other parts are not. We have no contract that the different headers from our implementation are only implemented in terms of the publicly documented portions of the other headers, and intrin0.h is just the first one we noticed because it's the first one substantially changed since we officially started supporting Clang.

One wouldn't expect our to work with libc++'s , for example, and that's the kind of swiss cheese / sponge we've got here.

We can absolutely fix our intrinsics headers to step out of the way as needed by Clang; please let us know what you would like to do in that case. Alternately, Clang can just not load our intrinsics headers at all and we will fix the standard libraries to not implement the intrin0.h optimization there.

@BillyONeal
Copy link
Contributor Author

Can you check whether Clang is defining _MSC_VER in your test environment?

It is.

I'm not sure I understand. If (when compiling with Clang) your directly uses Clang's __builtin_popcount and friends, and doesn't include <intrin0.h>, then I would think that nothing that uses should be clobbered by <intrin.h>. What am I missing?

Sorry, what I mean is I tried to make the content of intrin0.h be "if clang, then #include <intrin.h> because clang will replace that". But clang is recursively relying on our <intrin.h> to provide _InterlockedAdd (for example) when it does include_next, because we declare _InterlockedAdd only in intrin0.h.

But because in this instance you're (presumably) declaring one of the <immintrin.h> functions in <intrin0.h>, and those aren't just function declarations, this is a more risky change.

Right in this particular example it came from immintrin but it highlights a contract problem in general here; that we need to make it easier for Clang to provide these bits without 'guessing' at what the contents of our intrin headers will be, and Clang needs to not try to replace the intrinsics package piecemeal.

@zygoloid
Copy link
Mannequin

zygoloid mannequin commented Aug 11, 2020

OK. So:

We need to implement <immintrin.h> and related headers ourselves. Those are implemented in terms of private compiler intrinsics which we don't want anything outside those headers to use, and I don't think we would want to support whatever mechanism your <immintrin.h> uses to communicate with cl.exe. This is an abstraction layer defined by an Intel spec, that's intended to be implemented by the compiler.

However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we could remove our <intrin.h> entirely and use yours instead; I'm not sure. There was presumably a reason why we implemented our own instead of just using the platform <intrin.h>, but it might be historical at this point.

That said, if I'm understanding the discussion correctly, the new intrinsics that you want to add to <intrin0.h> are the tzcnt* ones. Those are from <immintrin.h>, which is the compiler's domain not the standard library's. As such, I don't think it's appropriate for _tzcnt_u32 to be declared anywhere other than by the compiler's own builtin headers.

So if we want tzcnt* to be provided by <intrin0.h>, I think the choice is either that

  1. <intrin.h> is not a compiler builtin header -- and Clang shouldn't be providing a copy of it -- in which case the only way it can get access to _tzcnt_u32 would be by including <immintrin.h> (which you quite reasonably don't want to do because that header is huge), or

  2. <intrin.h> is a compiler builtin header, in which case Clang needs to implement the whole thing, and needs to also implement <intrin0.h> and provide the contents for it.

Option (2) seems preferable to me. While we can't guess which of the MSVC-specific functions will get moved from <intrin.h> to <intrin0.h>, we can conservatively put all of them in <intrin0.h>. Given that you want to put _tzcnt_u32 in <intrin0.h> too, that presumably means also including <bmiintrin.h> from our <intrin0.h>.

That all seems doable to me.

Would it be possible for you to try https://reviews.llvm.org/D85699 and see if it fixes the problem for you?

@BillyONeal
Copy link
Contributor Author

We need to implement <immintrin.h> and related headers ourselves.

That makes sense to me.

However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we could remove our <intrin.h> entirely and use yours instead; I'm not sure. There was presumably a reason why we implemented our own instead of just using the platform <intrin.h>, but it might be historical at this point.

Whether you use ours or not isn't really an issue, just hopefully want to get to a place where we aren't 'mixing'.

That said, if I'm understanding the discussion correctly, the new intrinsics that you want to add to <intrin0.h> are the tzcnt* ones.

Hmmm yes and no. The VS 16.8 STL needs to work with Clang 10, so the cat's already out of the bag. We will need to workaround the problem by declaring those intrinsics in intrin0.h only for MSVC and pulling in the full intrin.h for clang.

What I want is to figure out what protocol/contract we need to prevent similar cats from escaping in the future. That could be just contributing the contents of our intrin.h to the LLVM project so that the include_next gets removed, and the in the future they get maintained independently, with clang adding declarations as you all implement them.

Option (2) seems preferable to me. While we can't guess which of the MSVC-specific functions will get moved from <intrin.h> to <intrin0.h>, we can conservatively put all of them in <intrin0.h>. Given that you want to put _tzcnt_u32 in <intrin0.h> too, that presumably means also including <bmiintrin.h> from our <intrin0.h>.

Right, I was going to just completely defer to clang's intrin.h and skip the intrin0 part completely, but was foiled by that include_next. The include_next is really the source of the pain here as it means we can't fix the problem in either our sources or LLVM's, we need to make a coordinated change in both because the resulting TU is blurred between our release vehicles. Do you want us to submit a change that effectively makes your intrin.h and our intrin.h identical for further review? (I'll strip out declarations you already have) That lets us get rid of the problematic include_next.

@topperc
Copy link
Collaborator

topperc commented Aug 11, 2020

If _MSC_VER is being defined there shouldn’t be an include_next happening in clang’s header.

@BillyONeal
Copy link
Contributor Author

If _MSC_VER is being defined there shouldn’t be an include_next happening in
clang’s header.

Hmmmmm did I get confused with double negatives? I probably got confused with double negatives. I just tried this again and maybe it'll work. Will keep you posted.

@BillyONeal
Copy link
Contributor Author

OK, that seemed to work, as long as we have assurance that this include_next (and similar) won't ever engage on Windows just dummying <intrin0.h> out and making include <intrin.h> seeeeeeems to work for now.

It still seems like we need a checklist or something for communicating with you folks when the set of intrinsics changes.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 24, 2020

some _InterlockedCompareExchange128 intrinsic functions are missing from intrin.h.
e.g. _InterlockedCompareExchange128_nf used by Microsoft STL (see https://github.com/microsoft/STL/blob/master/stl/inc/atomic#L483) is missing.

https://docs.microsoft.com/en-us/cpp/intrinsics/interlockedcompareexchange128?view=msvc-160
microsoft/STL#1491

currently intrin.h only contains _InterlockedCompareExchange128 and _InterlockedCompareExchange128_np:
https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/intrin.h#L217

@rnk
Copy link
Collaborator

rnk commented Nov 24, 2020

These days, intrin.h mostly just provides declarations of builtins, and doesn't provide as many implementations. Being present in the header isn't enough. We also need an implementation in the compiler, and, surprise surprise, most MSVC intrinsics that were not necessary have not been implemented. _InterlockedCompareExchange128_nf is one such unimplemented intrinsic. I'll go ahead and look into implementing it.


Getting back to Billy's original concern, I would like to try to improve on the header shadowing situation. Can we set up a time to talk about this?

This is the list of headers that clang shadows:

$ ls /c/Program\ Files\ (x86)/Microsoft\ Visual\ Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/include/ > vc-headers.txt

$ ls ../clang/lib/Headers/ > clang-headers.txt

$ comm -12 vc-headers.txt clang-headers.txt
ammintrin.h
arm64intr.h
armintr.h
emmintrin.h
immintrin.h
intrin.h
iso646.h
limits.h
mm3dnow.h
mmintrin.h
nmmintrin.h
pmmintrin.h
smmintrin.h
stdarg.h
stdbool.h
stdint.h
tmmintrin.h
vadefs.h
varargs.h
wmmintrin.h
xmmintrin.h

The vast majority are CPU vendor intrinsic headers, ARM and X86. Those are obviously tightly coupled with the compiler, so clang needs to prefer its own over MSVC's.

stdarg.h is normal, it is the compiler's implementation of the standard va_start/va_arg macros. vadefs.h is unnecessary: we only need it because the CRT insists on using __crt_va_start instead of the standard macros. If Microsoft could change vadefs.h to use _builtin_va* if clang is defined, we could remove vadefs.h.

stdbool.h, stdint.h, and limits.h handshake with compiler predefined macros.

That leaves intrin.h and intrin0.h. I would also like clang to benefit from the intrin0.h optimization. I don't want to include every x86 intrinsic when using the MSVC STL with clang. We could re-structure our headers so that all the MSVC-specific builtins are declared in intrin0.h and have intrin.h include x86intrin.h and intrin0.h. Would that work? What would you like to see here?

In summary, it seems like there is nothing for clang to do, except perhaps around vadefs.h and intrin[0].h. Email me if you want to set up a time to coordinate on this.

@rnk
Copy link
Collaborator

rnk commented Nov 25, 2020

I have some patches that hopefully fix the case of the MSVC STL with arm64:
https://reviews.llvm.org/D92061
https://reviews.llvm.org/D92062

The first patch fixes a separate bug.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 19, 2021

Just for the record, it looks like there's a workaround in: microsoft/STL#1300 (comment)
and apparently the fix should land in MSVC 16.9

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#48283

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:headers Headers provided by Clang, e.g. for intrinsics
Projects
None yet
Development

No branches or pull requests

4 participants