-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang really shouldn't have ASM tests in it #27189
Comments
We had a variation on this argument for x86's obscene pile of SSE/AVX intrinsics: in unoptimized code, we thought it was important that the one-to-one intrinsic -> asm mapping should be true - because debugging those things is tough enough already. This is bug 24580. However, after: ...we split those up. I don't think we ever answered if there was a designated place for end-to-end tests of this sort, but based on this recent llvm-dev thread: maybe test-suite is the right spot? |
I think this goal is counter-productive. Imagine the case where the intrinsic is lowered to IR instructions, not builtins, and O0 generates so many wrapper code that the patterns don't match on the back-end. You have a few not-ideal options:
All for the sake of an identity that is not really necessary. If, for any target, this identity is mandated by the ABI, then Clang must generate builtins for ALL intrinsics, and it's up to the back-end (or a target-specific middle-end pass) to convert to IR on early-O1+. In that case, just checking the the IR has the builtin you want is, now, enough for a Clang test. |
For x86 at least, I don't think there's any concern about preventing optimizations. We've mostly settled on doing these transforms in InstCombineCalls.cpp. That let's us write readable/debuggable C++ code to do the transform, ensures that we're doing the transform to IR very early, and doesn't interfere with the programmer's intrinsic-based intent when building at -O0. There are cases where we transform to native IR instructions in the header files. In addition to the vector programmer's debugging concerns, consider that we have macros in headers that look like this: /* Vector shuffle */ Granted, not having an x86-specific builtin for this case is nice...the proliferation of x86 intrinsics is unstoppable. Sane instruction set architecture may not have this problem. Some more details about the optimization options for intrinsics are here: |
The complaint isn't really about asm tests, but about running LLVM optimisations. And unfortunately, I think that's the more burdensome one to fix. The IR we produce for float32x2_t res(float32x2_t l, float32x2_t r) { is: define <2 x float> @res(<2 x float> %l, <2 x float> %r) #0 { Testing that code, or even working out whether it's correct by eye, is nontrivial. On the other hand, breaking clang tests with an LLVM change can't be fun. I wonder if just running "opt -mem2reg" would be an acceptable compromise: I'd expect it to clean up the generated NEON code massively and it's a very stable pass with a well-defined job. |
Given how many tests we do now, I'll see if I can whip up a script to auto-convert the existing ones (i.e. replace the asm checks with the CodeGen clang actually produces, using llvm-lit [[VARS]]: not ideal, but certainly no worse than what we've got now given that it already passes). Other than that, a related question we might want to consider is just how thorough we want these NEON tests to be. I suspect they currently have large but not complete coverage of arm_neon.h, and they take a fairly ridiculous amount of time to execute. Would a more orthogonal subset be better, especially now that we have the emperor tests in the test-suite? Not sure. And not sure how we'd decide which variants are worth testing either. We can probably delay answering those questions to later though. |
Oh, and I also thoroughly agree with Renato that we make absolutely no guarantees about intrinsics mapping to instructions in the ARM world. arm_neon.h describes semantics, not the assembly code output. |
There's also the question of how much can we really do in LLVM's side. Whenever Clang changes to support new intrinsics, so does LLVM. Meaning the same Clang+LLVM revision is known to work and emit the expected assembly, but not necessarily an old Clang (or an LLVM test produced with an old Clang) in a new LLVM revision. By removing the tests from Clang, we may inadvertently cause the same effect in LLVM's side, by having old IR patterns not match because of some new optimisation pass. In order to solve this in a reasonable fashion, the best approach is to auto-generate LLVM tests from Clang's "current" output. This can be done with a mix of original C code, a pass to IR and concatenating some RUN/CHECK lines at the begin/end. Even though this is only valid for a small class of problems, it's probably the class that will generate most grief in LLVM testing. The rest will hopefully be more stable. I recommend that the first step becomes duplicating the tests in the LLVM side and see how they end up as. As we accept the odds, we start removing the tests on Clang's side, and we'll be left with only the hairy cases in Clang. |
Half-baked idea: this seems like something that'd be more generally useful than a one-off script. Maybe it's worth teaching FileCheck more about llvm's textual IR to make it do this transformation semi-automagically (under some new flag, of course): // RUN: %cc1 %s -emit-llvm -o - | FileCheck --auto-ir-vars %s float32x2_t res(float32x2_t l, float32x2_t r) { // CHECK-LABEL: define <2 x float> @res(<2 x float> %l, <2 x float> %r) #0 { </hand waving> such that the %vars get automatically turned into the appropriate name-agnostic |
Is it necessary to have complete checking for this kind of case? I think we just need one label and one check line to know that clang has done the intended job of translating the intrinsic to IR: ; CHECK-LABEL: define <2 x float> @res(<2 x float> %l, <2 x float> %r) We can assume that the rest of the IR is correct and covered by tests elsewhere? (Surprisingly, I don't see any x86 equivalents for these kinds of tests...) |
Unfortunately, that doesn't cover. For silly semantics (vadd_f32 -> fadd float) should be enough, but others, like zip, vldN and the ones that rely on shuffles, casts, promotions and truncations, will all need a strict sequence of IR instructions with the correct dependency between them. However, having CHECK for just the sequence of instructions will be "enough" for most purposes, though, so we may only need full-blown variable usage for the few complex cases... |
Agreed. It looks like Altivec is tested quite efficiently this way: |
What's wrong with a very strict matching of the entire (unoptimized) IR sequence, (à la update_llc_test_checks.py)? We assume they're correct today so in theory we don't even need to look at the current IR. We don't expect these to change often, and when they do, updating the output and git-diff ought to be enough. I think I'd want to know if some unrelated IRGen commit changed the output (it's not unrelated anymore, is it?) Yes, it's some extra work when something does changes. But if we provide an easy one-line script to update the checks, the change author can either ignore it (the current status quo), or make sure they didn't break anything (strictly better than the status quo). |
I'd much prefer to check dataflow in all cases. It's very easy to end up with "fadd %a, %a" or get the subtraction the wrong way round (we did that originally, in fact, and it took us 2 years to fix properly because people had started relying on the obviously-wrong operand order). |
I still think having readable tests is more important than wanting to completely isolate Clang from LLVM's mid-end. |
Tim mentioned in comment 5 that these tests may be taking more time than they're worth. But an "update_clang_test_checks.py" and "update_opt_test_checks.py" would be extremely useful in general, and hopefully that's where we're headed. :) |
I'm not against it, just trying to be pragmatic. We don't have perfect tests today, so we don't need perfect tests to replace them. But if people are happy to make them better at the same time as moving, I'm game. |
I fully agree with you here. The thing is that builtins are slightly more involved for this exact reason: they expect to be able to bore through the frontend and hook into the mid or back ends. I believe a similar argument has been made in favor of intrinsics over builtins in the past. As already pointed out (by you!) the data flow analysis here is relevant and important since it is possible to break things subtly otherwise. I think that the readability of the tests directly impact that. |
I've put a patch up for review at http://reviews.llvm.org/D17999. |
I think this has been addressed by r263048. |
Extended Description
After enough discussions in the mailing list, and what I believe is now the consensus, we should move all ASM tests to LLVM and transform all Clang tests into IR tests.
The few cases that people were arguing about:
That's not true. I love that the back-end combines vmul+vadd into vmla, and it should still do it. Testing for a sequence of IR instructions instead (or builtin) is not worse.
No it doesn't. There are a number of cases where we (and GAS) do slight transformations on the asm and output a different code (ex. add r0, -1 -> sub r0, 1). Furthermore, inline asm has a nice IR representation, which we can also test.
If C code gets translated into IR and then assembly, we can obviously always find an IR that will produce the same assembly. To do that, simple -emit-llvm.
--
Having said that, the move itself will be very tedious, but it is necessary.
Given that most people were complaining about the ARM and AArch64 tests, I think it's only fair that we, ARM folks, share the load into moving things. I'll copy as many people as I can into this bug so we can coordinate our efforts, but I'd like to have a much cleaner Clang test by 3.9.0.
The text was updated successfully, but these errors were encountered: