New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang-overlay of <intrin.h> breaks <intrin0.h> in VS 2019 version 16.8p1 #46443
Comments
Per the Intel Intrinsics Guide, _tzcnt_u32 is declared by <immintrin.h>: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=tzcnt&expand=5972 Clang provides its own <immintrin.h>, which declares this intrinsic as a macro. If <intrin0.h> is providing its own definition of _tzcnt_u32, rather than getting the one from <immintrin.h>, that's going to cause problems. Now, Clang also provides a complete replacement for <intrin.h>, and doesn't use the one provided by MSVC. Clang's replacement <intrin.h> (indirectly) includes <immintrin.h>, which is where _tzcnt_u32 usually comes from when we target Windows. It seems like there are a couple of different approaches we could take here:
If we want to avoid pulling in all of <immintrin.h> into your standard library, option 1 seems like the way to go to me. Do you have any documentation for what <intrin0.h> should provide? |
Alternatively, could you define the functions in terms of Clang _builtin functions under |
Not quite, because it does #include_next which ends up pulling in ours.
That's the first thing I tried; unfortunately that doesn't work because the user can include <intrin.h> themselves which clobbers the attempt to use the intrinsic in . This is one of those places that hasn't been really 'designed' because we never had to worry about a compiler changing this outside of the corresponding library parts.
That set is constantly changing; <intrin.h> includes <intrin0.h> and we have not before considered 'promoting' an intrinsic from one to the other a breaking change, but it has broken bits here.
I tried something like this too, making our That's why I think getting rid of what's forcing you folks to use include_next is going to be a key part of whatever solution on which we land. |
I believe our header is doing #ifndef _MSC_VER So if we're not pretending to be MSVC we don't define anything and just pass on to the next include. I think _MSC_VER being defined is controlled by -fms-compatibility-version. |
The If we're including your
I'm not sure I understand. If (when compiling with Clang) your directly uses Clang's __builtin_popcount and friends, and doesn't include <intrin0.h>, then I would think that nothing that uses should be clobbered by <intrin.h>. What am I missing?
I'd imagine that most of the time, things in your <intrin0.h> are consistent with things in our <intrin.h>, since (other than the <immintrin.h> and related stuff) it's mostly just a bunch of extern "C" function declarations that will typically have the exact same signature in your <intrin*.h> and in our <intrin.h>. But because in this instance you're (presumably) declaring one of the <immintrin.h> functions in <intrin0.h>, and those aren't just function declarations, this is a more risky change. |
Hmmmm I see now that clang also overlays the headers intrin.h ends up also including, and... we're not generally prepared for that. I don't think in general it is reasonable for the VC Libraries to support this condition where some of the intrinsics infrastructure is being replaced but other parts are not. We have no contract that the different headers from our implementation are only implemented in terms of the publicly documented portions of the other headers, and intrin0.h is just the first one we noticed because it's the first one substantially changed since we officially started supporting Clang. One wouldn't expect our to work with libc++'s , for example, and that's the kind of swiss cheese / sponge we've got here. We can absolutely fix our intrinsics headers to step out of the way as needed by Clang; please let us know what you would like to do in that case. Alternately, Clang can just not load our intrinsics headers at all and we will fix the standard libraries to not implement the intrin0.h optimization there. |
It is.
Sorry, what I mean is I tried to make the content of intrin0.h be "if clang, then #include <intrin.h> because clang will replace that". But clang is recursively relying on our <intrin.h> to provide _InterlockedAdd (for example) when it does include_next, because we declare _InterlockedAdd only in intrin0.h.
Right in this particular example it came from immintrin but it highlights a contract problem in general here; that we need to make it easier for Clang to provide these bits without 'guessing' at what the contents of our intrin headers will be, and Clang needs to not try to replace the intrinsics package piecemeal. |
OK. So: We need to implement <immintrin.h> and related headers ourselves. Those are implemented in terms of private compiler intrinsics which we don't want anything outside those headers to use, and I don't think we would want to support whatever mechanism your <immintrin.h> uses to communicate with cl.exe. This is an abstraction layer defined by an Intel spec, that's intended to be implemented by the compiler. However, <intrin.h> is not an Intel thing, it's an MSVC thing. Perhaps we could remove our <intrin.h> entirely and use yours instead; I'm not sure. There was presumably a reason why we implemented our own instead of just using the platform <intrin.h>, but it might be historical at this point. That said, if I'm understanding the discussion correctly, the new intrinsics that you want to add to <intrin0.h> are the tzcnt* ones. Those are from <immintrin.h>, which is the compiler's domain not the standard library's. As such, I don't think it's appropriate for _tzcnt_u32 to be declared anywhere other than by the compiler's own builtin headers. So if we want tzcnt* to be provided by <intrin0.h>, I think the choice is either that
Option (2) seems preferable to me. While we can't guess which of the MSVC-specific functions will get moved from <intrin.h> to <intrin0.h>, we can conservatively put all of them in <intrin0.h>. Given that you want to put _tzcnt_u32 in <intrin0.h> too, that presumably means also including <bmiintrin.h> from our <intrin0.h>. That all seems doable to me. Would it be possible for you to try https://reviews.llvm.org/D85699 and see if it fixes the problem for you? |
That makes sense to me.
Whether you use ours or not isn't really an issue, just hopefully want to get to a place where we aren't 'mixing'.
Hmmm yes and no. The VS 16.8 STL needs to work with Clang 10, so the cat's already out of the bag. We will need to workaround the problem by declaring those intrinsics in intrin0.h only for MSVC and pulling in the full intrin.h for clang. What I want is to figure out what protocol/contract we need to prevent similar cats from escaping in the future. That could be just contributing the contents of our intrin.h to the LLVM project so that the include_next gets removed, and the in the future they get maintained independently, with clang adding declarations as you all implement them.
Right, I was going to just completely defer to clang's intrin.h and skip the intrin0 part completely, but was foiled by that include_next. The include_next is really the source of the pain here as it means we can't fix the problem in either our sources or LLVM's, we need to make a coordinated change in both because the resulting TU is blurred between our release vehicles. Do you want us to submit a change that effectively makes your intrin.h and our intrin.h identical for further review? (I'll strip out declarations you already have) That lets us get rid of the problematic include_next. |
If _MSC_VER is being defined there shouldn’t be an include_next happening in clang’s header. |
Hmmmmm did I get confused with double negatives? I probably got confused with double negatives. I just tried this again and maybe it'll work. Will keep you posted. |
OK, that seemed to work, as long as we have assurance that this include_next (and similar) won't ever engage on Windows just dummying <intrin0.h> out and making include <intrin.h> seeeeeeems to work for now. It still seems like we need a checklist or something for communicating with you folks when the set of intrinsics changes. |
some _InterlockedCompareExchange128 intrinsic functions are missing from intrin.h. https://docs.microsoft.com/en-us/cpp/intrinsics/interlockedcompareexchange128?view=msvc-160 currently intrin.h only contains _InterlockedCompareExchange128 and _InterlockedCompareExchange128_np: |
These days, intrin.h mostly just provides declarations of builtins, and doesn't provide as many implementations. Being present in the header isn't enough. We also need an implementation in the compiler, and, surprise surprise, most MSVC intrinsics that were not necessary have not been implemented. _InterlockedCompareExchange128_nf is one such unimplemented intrinsic. I'll go ahead and look into implementing it. Getting back to Billy's original concern, I would like to try to improve on the header shadowing situation. Can we set up a time to talk about this? This is the list of headers that clang shadows: $ ls /c/Program\ Files\ (x86)/Microsoft\ Visual\ Studio/2019/Professional/VC/Tools/MSVC/14.28.29333/include/ > vc-headers.txt $ ls ../clang/lib/Headers/ > clang-headers.txt $ comm -12 vc-headers.txt clang-headers.txt The vast majority are CPU vendor intrinsic headers, ARM and X86. Those are obviously tightly coupled with the compiler, so clang needs to prefer its own over MSVC's. stdarg.h is normal, it is the compiler's implementation of the standard va_start/va_arg macros. vadefs.h is unnecessary: we only need it because the CRT insists on using __crt_va_start instead of the standard macros. If Microsoft could change vadefs.h to use _builtin_va* if clang is defined, we could remove vadefs.h. stdbool.h, stdint.h, and limits.h handshake with compiler predefined macros. That leaves intrin.h and intrin0.h. I would also like clang to benefit from the intrin0.h optimization. I don't want to include every x86 intrinsic when using the MSVC STL with clang. We could re-structure our headers so that all the MSVC-specific builtins are declared in intrin0.h and have intrin.h include x86intrin.h and intrin0.h. Would that work? What would you like to see here? In summary, it seems like there is nothing for clang to do, except perhaps around vadefs.h and intrin[0].h. Email me if you want to set up a time to coordinate on this. |
I have some patches that hopefully fix the case of the MSVC STL with arm64: The first patch fixes a separate bug. |
Just for the record, it looks like there's a workaround in: microsoft/STL#1300 (comment) |
mentioned in issue llvm/llvm-bugzilla-archive#48283 |
Extended Description
Previously reported by a Visual Studio customer as https://developercommunity.visualstudio.com/content/problem/1144026/visual-studio-version-1680-preview-10-no-longer-co.html
The standard libraries have an
<intrin0.h>
where we declare intrinsics used by the standard library headers as a throughput optimization, because<intrin.h>
is huge and causes measurable throughput costs to#include <atomic>
.As part of implementing C++20, we needed new intrinsics for
<bit>
so we moved them from<intrin.h>
to<intrin0.h>
. Unfortunately, that is breaking whatever overlay mechanism Clang on Windows uses to select its version of<intrin.h>
because it tries to declare_tzcnt_u32
and_tzcnt_u64
as object-like macros.We are interested in investigating a scheme whereby our official
<intrin.h>
would do#ifdef __clang__
, then#include <whatever clang wants.hpp>
or similar so that we would have a more firmly established contract for clang's extensions rather than needing overlays and#include_next
.Billy ONeal
Visual C++ Libraries
The text was updated successfully, but these errors were encountered: