LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47099 - Clang-overlay of <intrin.h> breaks <intrin0.h> in VS 2019 version 16.8p1
Summary: Clang-overlay of <intrin.h> breaks <intrin0.h> in VS 2019 version 16.8p1
Status: NEW
Alias: None
Product: clang
Classification: Unclassified
Component: Headers (show other bugs)
Version: 10.0
Hardware: PC Windows NT
: P enhancement
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-10 14:32 PDT by Billy O'Neal
Modified: 2021-02-19 12:09 PST (History)
8 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Billy O'Neal 2020-08-10 14:32:51 PDT
Previously reported by a Visual Studio customer as https://developercommunity.visualstudio.com/content/problem/1144026/visual-studio-version-1680-preview-10-no-longer-co.html

The standard libraries have an `<intrin0.h>` where we declare intrinsics used by the standard library headers as a throughput optimization, because `<intrin.h>` is *huge* and causes measurable throughput costs to `#include <atomic>`.

As part of implementing C++20, we needed new intrinsics for `<bit>` so we moved them from `<intrin.h>` to `<intrin0.h>`. Unfortunately, that is breaking whatever overlay mechanism Clang on Windows uses to select its version of `<intrin.h>` because it tries to declare `_tzcnt_u32` and `_tzcnt_u64` as object-like macros.

We are interested in investigating a scheme whereby our official `<intrin.h>` would do `#ifdef __clang__`, then `#include <whatever clang wants.hpp>` or similar so that we would have a more firmly established contract for clang's extensions rather than needing overlays and `#include_next`.

Billy ONeal
Visual C++ Libraries
Comment 1 Richard Smith 2020-08-10 15:22:55 PDT
Per the Intel Intrinsics Guide, _tzcnt_u32 is declared by <immintrin.h>:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=tzcnt&expand=5972

Clang provides its own <immintrin.h>, which declares this intrinsic as a macro. If <intrin0.h> is providing its own definition of _tzcnt_u32, rather than getting the one from <immintrin.h>, that's going to cause problems.

Now, Clang also provides a complete replacement for <intrin.h>, and doesn't use the one provided by MSVC. Clang's replacement <intrin.h> (indirectly) includes <immintrin.h>, which is where _tzcnt_u32 usually comes from when we target Windows.

It seems like there are a couple of different approaches we could take here:

1) We continue to replace <intrin.h> with our own implementation, and extend that to also cover <intrin0.h>. We'd need to know which intrinsics should be provided by <intrin0.h> so that we can expose the proper set. No changes on the Visual C++ side.

2) Your <intrin0.h> detects __clang__ and includes the relevant header from the Intel Intrinsics Guide.

If we want to avoid pulling in all of <immintrin.h> into your standard library, option 1 seems like the way to go to me. Do you have any documentation for what <intrin0.h> should provide?
Comment 2 Richard Smith 2020-08-10 15:24:11 PDT
Alternatively, could you define the <bit> functions in terms of Clang __builtin_ functions under `#ifdef __clang__`, and not include any intrinsic header at all in that case (and more generally never include <intrin0.h> when the compiler is Clang)?
Comment 3 Billy O'Neal 2020-08-10 15:49:13 PDT
>Clang also provides a complete replacement for <intrin.h>

Not quite, because it does #include_next which ends up pulling in ours.

>not include any intrinsic header at all in that case (and more generally never include <intrin0.h> when the compiler is Clang)?

That's the first thing I tried; unfortunately that doesn't work because the user can include <intrin.h> themselves which clobbers the attempt to *use* the intrinsic in <bit>.

This is one of those places that hasn't been really 'designed' because we never had to worry about a compiler changing this outside of the corresponding library parts.

>1) We continue to replace <intrin.h> with our own implementation, and extend that to also cover <intrin0.h>. We'd need to know which intrinsics should be provided by <intrin0.h>

That set is constantly changing; <intrin.h> includes <intrin0.h> and we have not before considered 'promoting' an intrinsic from one to the other a breaking change, but it has broken bits here.

>2) Your <intrin0.h> detects __clang__ and includes the relevant header from the Intel Intrinsics Guide.

I tried something like this too, making our `intrin0.h` go "if clang, then #include <intrin.h>" to pick up your overridden version. Unfortunately due to the aforementioned include_next that causes circular include hell since that tries to include clang's intrin.h, which include_next's our intrin.h which then again tries to include intrin0.h...

That's why I think getting rid of what's forcing you folks to use include_next is going to be a key part of whatever solution on which we land.
Comment 4 Craig Topper 2020-08-10 15:57:44 PDT
I believe our header is doing

#ifndef _MSC_VER
#include_next <intrin.h> 
#else
// provide the intrinsics
#endif

So if we're not pretending to be MSVC we don't define anything and just pass on to the next include.

I think _MSC_VER being defined is controlled by -fms-compatibility-version.
Comment 5 Richard Smith 2020-08-10 16:29:54 PDT
(In reply to Billy O'Neal from comment #3)
> >Clang also provides a complete replacement for <intrin.h>
> 
> Not quite, because it does #include_next which ends up pulling in ours.

The `#include_next` is under an `#ifndef _MSC_VER`. Clang defines `_MSC_VER` itself when targeting Windows. I think the intent is that the `#include_next` is unreachable on Windows, and instead exists only to make our `<intrin.h>` be as invisible as possible when *not* targeting Windows.

If we're including your `<intrin.h>` instead of our own, then perhaps something strange is happening there. (Or perhaps I'm misunderstanding what the `#ifndef` is doing.) Can you check whether Clang is defining _MSC_VER in your test environment?

> > not include any intrinsic header at all in that case (and more generally
> > never include <intrin0.h> when the compiler is Clang)?
> 
> That's the first thing I tried; unfortunately that doesn't work because the
> user can include <intrin.h> themselves which clobbers the attempt to *use*
> the intrinsic in <bit>.

I'm not sure I understand. If (when compiling with Clang) your <bit> directly uses Clang's __builtin_popcount and friends, and doesn't include <intrin0.h>, then I would think that nothing that <bit> uses should be clobbered by <intrin.h>. What am I missing?

> This is one of those places that hasn't been really 'designed' because we
> never had to worry about a compiler changing this outside of the
> corresponding library parts.
> 
> >1) We continue to replace <intrin.h> with our own implementation, and extend that to also cover <intrin0.h>. We'd need to know which intrinsics should be provided by <intrin0.h>
> 
> That set is constantly changing; <intrin.h> includes <intrin0.h> and we have
> not before considered 'promoting' an intrinsic from one to the other a
> breaking change, but it has broken bits here.

I'd imagine that most of the time, things in your <intrin0.h> are consistent with things in our <intrin.h>, since (other than the <immintrin.h> and related stuff) it's mostly just a bunch of extern "C" function declarations that will typically have the exact same signature in your <intrin*.h> and in our <intrin.h>. But because in this instance you're (presumably) declaring one of the <immintrin.h> functions in <intrin0.h>, and those aren't just function declarations, this is a more risky change.
Comment 6 Billy O'Neal 2020-08-10 16:32:28 PDT
>So if we're not pretending to be MSVC we don't define anything and just pass on to the next include.
>
>I think _MSC_VER being defined is controlled by -fms-compatibility-version.

Hmmmm I see now that clang also overlays the headers intrin.h ends up also including, and... we're not generally prepared for that. I don't think in general it is reasonable for the VC Libraries to support this condition where some of the intrinsics infrastructure is being replaced but other parts are not. We have no contract that the different headers from our implementation are only implemented in terms of the publicly documented portions of the other headers, and intrin0.h is just the first one we noticed because it's the first one substantially changed since we officially started supporting Clang.

One wouldn't expect our <vector> to work with libc++'s <algorithm>, for example, and that's the kind of swiss cheese / sponge we've got here.

We can absolutely fix our intrinsics headers to step out of the way as needed by Clang; please let us know what you would like to do in that case. Alternately, Clang can just not load our intrinsics headers at all and we will fix the standard libraries to not implement the intrin0.h optimization there.
Comment 7 Billy O'Neal 2020-08-10 16:43:56 PDT
>Can you check whether Clang is defining _MSC_VER in your test environment?

It is.

>I'm not sure I understand. If (when compiling with Clang) your <bit> directly uses Clang's __builtin_popcount and friends, and doesn't include <intrin0.h>, then I would think that nothing that <bit> uses should be clobbered by <intrin.h>. What am I missing?

Sorry, what I mean is I tried to make the content of intrin0.h be "if clang, then #include <intrin.h> because clang will replace that". But clang is recursively relying on our <intrin.h> to provide _InterlockedAdd (for example) when it does include_next, because we declare _InterlockedAdd only in intrin0.h.

>But because in this instance you're (presumably) declaring one of the <immintrin.h> functions in <intrin0.h>, and those aren't just function declarations, this is a more risky change.

Right in this particular example it came from immintrin but it highlights a contract problem in general here; that we need to make it easier for Clang to provide these bits without 'guessing' at what the contents of our intrin headers will be, and Clang needs to not try to replace the intrinsics package piecemeal.
Comment 8 Richard Smith 2020-08-10 17:54:42 PDT
OK. So:

We need to implement <immintrin.h> and related headers ourselves. Those are implemented in terms of private compiler intrinsics which we don't want anything outside those headers to use, and I don't think we would want to support whatever mechanism your <immintrin.h> uses to communicate with cl.exe. This is an abstraction layer defined by an Intel spec, that's intended to be implemented by the compiler.

However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we could remove our <intrin.h> entirely and use yours instead; I'm not sure. There was presumably a reason why we implemented our own instead of just using the platform <intrin.h>, but it might be historical at this point.

That said, if I'm understanding the discussion correctly, the new intrinsics that you want to add to <intrin0.h> are the _tzcnt_* ones. Those are from <immintrin.h>, which is the compiler's domain not the standard library's. As such, I don't think it's appropriate for _tzcnt_u32 to be declared anywhere other than by the compiler's own builtin headers.

So if we want _tzcnt_* to be provided by <intrin0.h>, I think the choice is either that

1) <intrin.h> is not a compiler builtin header -- and Clang shouldn't be providing a copy of it -- in which case the only way it can get access to _tzcnt_u32 would be by including <immintrin.h> (which you quite reasonably don't want to do because that header is huge), or

2) <intrin.h> is a compiler builtin header, in which case Clang needs to implement the whole thing, and needs to also implement <intrin0.h> and provide the contents for it.

Option (2) seems preferable to me. While we can't guess which of the MSVC-specific functions will get moved from <intrin.h> to <intrin0.h>, we can conservatively put *all* of them in <intrin0.h>. Given that you want to put _tzcnt_u32 in <intrin0.h> too, that presumably means also including <bmiintrin.h> from our <intrin0.h>.

That all seems doable to me.

Would it be possible for you to try https://reviews.llvm.org/D85699 and see if it fixes the problem for you?
Comment 9 Billy O'Neal 2020-08-10 20:13:35 PDT
>We need to implement <immintrin.h> and related headers ourselves.

That makes sense to me.

>However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we could remove our <intrin.h> entirely and use yours instead; I'm not sure. There was presumably a reason why we implemented our own instead of just using the platform <intrin.h>, but it might be historical at this point.

Whether you use ours or not isn't really an issue, just hopefully want to get to a place where we aren't 'mixing'.

>That said, if I'm understanding the discussion correctly, the new intrinsics that you want to add to <intrin0.h> are the _tzcnt_* ones.

Hmmm yes and no. The VS 16.8 STL needs to work with Clang 10, so the cat's already out of the bag. We will need to workaround the problem by declaring those intrinsics in intrin0.h only for MSVC and pulling in the full intrin.h for clang.

What I want is to figure out what protocol/contract we need to prevent similar cats from escaping in the future. That could be just contributing the contents of our intrin.h to the LLVM project so that the include_next gets removed, and the in the future they get maintained independently, with clang adding declarations as you all implement them.

>Option (2) seems preferable to me. While we can't guess which of the MSVC-specific functions will get moved from <intrin.h> to <intrin0.h>, we can conservatively put *all* of them in <intrin0.h>. Given that you want to put _tzcnt_u32 in <intrin0.h> too, that presumably means also including <bmiintrin.h> from our <intrin0.h>.

Right, I was going to just completely defer to clang's intrin.h and skip the intrin0 part completely, but was foiled by that include_next. The include_next is really the source of the pain here as it means we can't fix the problem in either our sources or LLVM's, we need to make a coordinated change in both because the resulting TU is blurred between our release vehicles. Do you want us to submit a change that effectively makes your intrin.h and our intrin.h identical for further review? (I'll strip out declarations you already have) That lets us get rid of the problematic include_next.
Comment 10 Craig Topper 2020-08-10 20:29:47 PDT
If _MSC_VER is being defined there shouldn’t be an include_next happening in clang’s header.
Comment 11 Billy O'Neal 2020-08-10 22:24:00 PDT
(In reply to Craig Topper from comment #10)
> If _MSC_VER is being defined there shouldn’t be an include_next happening in
> clang’s header.

Hmmmmm did I get confused with double negatives? I probably got confused with double negatives. I just tried this again and maybe it'll work. Will keep you posted.
Comment 12 Billy O'Neal 2020-08-10 22:39:27 PDT
OK, that seemed to work, as long as we have assurance that this include_next (and similar) won't ever engage on Windows just dummying <intrin0.h> out and making include <intrin.h> seeeeeeems to work for now.

It still seems like we need a checklist or something for communicating with you folks when the set of intrinsics changes.
Comment 13 Zufu Liu 2020-11-24 06:31:43 PST
some _InterlockedCompareExchange128 intrinsic functions are missing from intrin.h.
e.g. _InterlockedCompareExchange128_nf used by Microsoft STL <atomic> (see https://github.com/microsoft/STL/blob/master/stl/inc/atomic#L483) is missing.

https://docs.microsoft.com/en-us/cpp/intrinsics/interlockedcompareexchange128?view=msvc-160
https://github.com/microsoft/STL/issues/1491

currently intrin.h only contains _InterlockedCompareExchange128 and _InterlockedCompareExchange128_np:
https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/intrin.h#L217
Comment 14 Reid Kleckner 2020-11-24 10:58:19 PST
These days, intrin.h mostly just provides declarations of builtins, and doesn't provide as many implementations. Being present in the header isn't enough. We also need an implementation in the compiler, and, surprise surprise, most MSVC intrinsics that were not necessary have not been implemented. _InterlockedCompareExchange128_nf is one such unimplemented intrinsic. I'll go ahead and look into implementing it.

---

Getting back to Billy's original concern, I would like to try to improve on the header shadowing situation. Can we set up a time to talk about this?

This is the list of headers that clang shadows:


$ ls /c/Program\ Files\ \(x86\)/Microsoft\ Visual\ Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/include/ > vc-headers.txt

$ ls ../clang/lib/Headers/ > clang-headers.txt

$ comm -12 vc-headers.txt clang-headers.txt
ammintrin.h
arm64intr.h
armintr.h
emmintrin.h
immintrin.h
intrin.h
iso646.h
limits.h
mm3dnow.h
mmintrin.h
nmmintrin.h
pmmintrin.h
smmintrin.h
stdarg.h
stdbool.h
stdint.h
tmmintrin.h
vadefs.h
varargs.h
wmmintrin.h
xmmintrin.h


The vast majority are CPU vendor intrinsic headers, ARM and X86. Those are obviously tightly coupled with the compiler, so clang needs to prefer its own over MSVC's.

stdarg.h is normal, it is the compiler's implementation of the standard va_start/va_arg macros. vadefs.h is unnecessary: we only need it because the CRT insists on using __crt_va_start instead of the standard macros. If Microsoft could change vadefs.h to use __builtin_va_* if __clang__ is defined, we could remove vadefs.h.

stdbool.h, stdint.h, and limits.h handshake with compiler predefined macros.

That leaves intrin.h and intrin0.h. I would also like clang to benefit from the intrin0.h optimization. I don't want to include every x86 intrinsic when using the MSVC STL with clang. We could re-structure our headers so that all the MSVC-specific builtins are declared in intrin0.h and have intrin.h include x86intrin.h and intrin0.h. Would that work? What would you like to see here?

In summary, it seems like there is nothing for clang to do, except perhaps around vadefs.h and intrin[0].h. Email me if you want to set up a time to coordinate on this.
Comment 15 Reid Kleckner 2020-11-24 16:10:03 PST
I have some patches that hopefully fix the case of the MSVC STL with arm64:
https://reviews.llvm.org/D92061
https://reviews.llvm.org/D92062

The first patch fixes a separate bug.
Comment 16 Alexandre Ganea 2021-02-19 12:09:49 PST
Just for the record, it looks like there's a workaround in: https://github.com/microsoft/STL/issues/1300#issuecomment-718065833
and apparently the fix should land in MSVC 16.9