-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid code generation (aliasing between scalar and vector type) #31404
Comments
Does bug 31928 provide the answer? |
I don't think it does -- this issue has something to do with the fact that 'a' and 'b' are vector types (__m256d). Two other things that are interesting:
struct A {
}; |
so.. any thoughts on this issue? PS: added Intel folks to CC in case it is related to X86 specifically. |
I don't think it's an x86 problem (unless there's some mis-specification of the x86 intrinsics in the front-end). There's nothing x86-specific in the IR when the undef appears. Here's the IR heading into -gvn with no undef on any printf, and "opt -tbaa -gvn" replaces it with undef. I don't know enough about aliasing or gvn to know what's happening here (cc'ing Davide and Daniel), but it does seem strange that we only make that one printf use undef. ; ModuleID = '32056.ll' %struct.A = type { %union.anon } @.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1 ; Function Attrs: norecurse ssp uwtable for.body: ; preds = %entry ; Function Attrs: argmemonly nounwind ; Function Attrs: nounwind ; Function Attrs: argmemonly nounwind ; Function Attrs: nounwind attributes #0 = { norecurse ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{#0} !0 = !{i32 1, !"PIC Level", i32 2} |
Sorry, I pasted a bigger version than what I was looking at last. This removes some of the unnecessary bits (but can almost certainly be reduced some more): %struct.A = type { %union.anon } @.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1 define i32 @main(i32 %argc, i8** nocapture readnone %argv) { for.body: declare i32 @printf(i8* nocapture readonly, ...) !0 = !{i32 1, !"PIC Level", i32 2} |
This is definitely an aliasing issue. Here's a minimal example: %struct.wibble = type { %struct.eggs } @global = private unnamed_addr constant [4 x i8] c"%f \00", align 1 define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) { bb7: ; preds = %bb declare i32 @hoge(i8* nocapture readonly, ...) declare i32 @quux.2(i8* nocapture readonly) !0 = !{#1, !2, i64 0} print-memoryssa shows: bb7: ; preds = %bb Note the load (which is the broken one), is a use of live on entry, instead of 1. This is wrong. |
For simplicity, here's a version with one okay load, one broken one, and no other calls: %struct.wibble = type { %struct.eggs } @global = private unnamed_addr constant [4 x i8] c"%f \00", align 1 define double @quux(i32 %arg, i8** nocapture readnone %arg1) { bb7: ; preds = %bb declare i32 @hoge(i8* nocapture readonly, ...) declare i32 @quux.2(i8* nocapture readonly) !0 = !{#1, !2, i64 0} The ret will become fadd %tmp20, undef |
Okay, here is what happens: In our AA pipeline, basicaa is currently above tbaa (i thought we had it the other way around, so maybe this is a bug). For the first load in my example, BasicAA returns "MustAlias" and so we return that. We never ask TBAA. For the second load, BasicAA returns "MayAlias", so we continue on. The reason TBAA returns NoAlias is because this tree walk is simply wrong. It is attempting to find NCA(tbaa !6, tbaa !1) The clear answer is TBAA !2. However, it gets "no common ancestor". It does the following: Walk A to root, see if we hit B If not, assume no NCA. Here is a counterexample: root Here is a trivially correct way to do this: While walking A to root, build the set of seen nodes This is O(N) DFS number the TBAA tree, use the dfs numbers to answer containment. I will implement #1 as a correctness fix, then #2 if we need it. |
Ugh, this won't work because we have offset parents. What will work, however, is the union-find approach detailed in https://en.wikipedia.org/wiki/Lowest_common_ancestor
|
So, as sanjoy points out, the way this tree is structured, NCA won't work either. The trees are inverted from the representation i'm used to, which is one where the types are children of the struct nodes, not parents. Even so, this tree says that double and the vector types don't alias, because they do not directly end up in the ancestor tree of each other. IE double either needs to be a parent of the struct, and of every other node that has a double in it, or it needs to be a child of every node that cont IE if you invert that tree into our form: double struct Then both have to be parents of the union node, and the tbaa access specificed to the be the union node, and we don't allow that. So i can't see a way to structure a tree that works here without allowing multiple parents (or multiple children). |
See thread [llvm-dev] RFC: Representing unions in TBAA |
A patch to remove TBAA info from union members is posted here: |
I just applied D31885 to Clang trunk to re-test the snippets here. Unfortunately, they still fail. |
I updated the patch and the testcase now works on my machine. |
I'll close the issue as D33328 was merged, which indeed fixes the testcase that prompted this issue. Thanks, |
mentioned in issue llvm/llvm-bugzilla-archive#33343 |
Extended Description
Consider the simple snippet of C++ code below. It is a somewhat contrived reduced example from a much larger codebase, where this issue was first detected. In a situation where a vector type and a normal scalar type alias (but where this is made very explicit to the compiler), Clang's aliasing optimization generates incorrect code.
The class "A" is a double array with 8 entries, which is initialized to [0..8]. The entries can be accessed using Intel-specific vector types (__m256d) and double variables. The begin() and end() functions expose a C++ iterator interface to step through the values, and the main() function then prints them.
When compiled on my machine (with "clang++ test.cpp -std=c++11 -O3 -mavx -o test"), I observe the following output
0.000000 1.000000 2.000000 3.000000 4.000000 0.000000 6.000000 7.000000
(^ note the zero value here, which should be 5)
If compiled with -fno-strict-aliasing, the fifth entry is equal to 5.000000, confirming that this is indeed an aliasing optimization-related issue.
In case this is an issue with the C++ code, I would be curious how I can signal to Clang that an array type may alias with its underlying scalar type (it appears to me that this is a fairly essential requirement in all sorts of situations)
Thanks,
Wenzel
======= C++ snippet =========
#include <immintrin.h>
#include <stdio.h>
struct A {
A () {
a = _mm256_setr_pd(0.0, 1.0, 2.0, 3.0);
b = _mm256_setr_pd(4.0, 5.0, 6.0, 7.0);
}
};
int main(int argc, char *argv[]) {
A a;
for (double value : a)
printf("%f ", value);
}
==== LLVM IR Disassembly =====
(note the undef in the line
%12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
)
define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 {
%3 = alloca %struct.Test, align 32
%4 = bitcast %struct.Test* %3 to i8*
call void @llvm.lifetime.start(i64 64, i8* nonnull %4) #3
%5 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %5, align 32, !tbaa !2
%6 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %6, align 32, !tbaa !6
%7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 0.000000e+00)
%8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.000000e+00)
%9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 2.000000e+00)
%10 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 3.000000e+00)
%11 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 4.000000e+00)
%12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
%13 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 6
%14 = load double, double* %13, align 16, !tbaa !7
%15 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %14)
%16 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 7
%17 = load double, double* %16, align 8, !tbaa !7
%18 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %17)
call void @llvm.lifetime.end(i64 64, i8* nonnull %4) #3
ret i32 0
}
The text was updated successfully, but these errors were encountered: