Invalid code generation (aliasing between scalar and vector type) #31404

wjakob · 2017-02-24T10:27:27Z


Bugzilla Link	32056
Resolution	FIXED
Resolved on	Jul 27, 2017 05:58
Version	trunk
OS	All
CC	@topperc,@erichkeane,@kparzysz-quic,@RKSimon,@sanjoy,@rotateright,@ZviRackover

Extended Description

Consider the simple snippet of C++ code below. It is a somewhat contrived reduced example from a much larger codebase, where this issue was first detected. In a situation where a vector type and a normal scalar type alias (but where this is made very explicit to the compiler), Clang's aliasing optimization generates incorrect code.

The class "A" is a double array with 8 entries, which is initialized to [0..8]. The entries can be accessed using Intel-specific vector types (__m256d) and double variables. The begin() and end() functions expose a C++ iterator interface to step through the values, and the main() function then prints them.

When compiled on my machine (with "clang++ test.cpp -std=c++11 -O3 -mavx -o test"), I observe the following output

0.000000 1.000000 2.000000 3.000000 4.000000 0.000000 6.000000 7.000000
(^ note the zero value here, which should be 5)

If compiled with -fno-strict-aliasing, the fifth entry is equal to 5.000000, confirming that this is indeed an aliasing optimization-related issue.

In case this is an issue with the C++ code, I would be curious how I can signal to Clang that an array type may alias with its underlying scalar type (it appears to me that this is a fairly essential requirement in all sorts of situations)

Thanks,
Wenzel

======= C++ snippet =========

#include <immintrin.h>
#include <stdio.h>

struct A {
A () {
a = _mm256_setr_pd(0.0, 1.0, 2.0, 3.0);
b = _mm256_setr_pd(4.0, 5.0, 6.0, 7.0);
}

const double *begin() { return c; }
const double *end() { return c+8; }

union {
    struct { __m256d a, b; };
    double c[8];
};

};

int main(int argc, char *argv[]) {
A a;
for (double value : a)
printf("%f ", value);

return 0;

}

==== LLVM IR Disassembly =====

(note the undef in the line
%12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
)

define i32 @main(i32, i8** nocapture readnone) local_unnamed_addr #0 {
%3 = alloca %struct.Test, align 32
%4 = bitcast %struct.Test* %3 to i8*
call void @llvm.lifetime.start(i64 64, i8* nonnull %4) #3
%5 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %5, align 32, !tbaa !2
%6 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %6, align 32, !tbaa !6
%7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 0.000000e+00)
%8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.000000e+00)
%9 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 2.000000e+00)
%10 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 3.000000e+00)
%11 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 4.000000e+00)
%12 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
%13 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 6
%14 = load double, double* %13, align 16, !tbaa !7
%15 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %14)
%16 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 7
%17 = load double, double* %16, align 8, !tbaa !7
%18 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %17)
call void @llvm.lifetime.end(i64 64, i8* nonnull %4) #3
ret i32 0
}

The text was updated successfully, but these errors were encountered:

rotateright · 2017-02-24T17:41:48Z

Does bug 31928 provide the answer?

wjakob · 2017-02-24T18:22:45Z

I don't think it does -- this issue has something to do with the fact that 'a' and 'b' are vector types (__m256d).

Two other things that are interesting:

Clang only miscompiles when 'a' and 'b' are initialized with constants (thus a part of the computation can be done at compile time, at which point something goes wrong). When the values are fetched from elsewhere (e.g. an external symbol), the issue disappears.
The issue seems to be specific to vector registers. For instance, if I replace the definition of the class 'A' by the following (technically equivalent) code that does not involve __m256d, everything works without problems.

struct A {
struct Foo { uint64_t a, b, c, d; };

A () {
    /* Binary encoding for double values 0..7 */
    a = Foo{ 0x0000000000000000ull, 0x3FF0000000000000ull,
             0x4000000000000000ull, 0x4008000000000000ull };
    b = Foo{ 0x4010000000000000ull, 0x4014000000000000ull,
             0x4018000000000000ull, 0x401C000000000000ull };
}

const double *begin() { return c; }
const double *end() { return c+8; }

union {
    struct { Foo a, b; };
    double c[8];
};

};

wjakob · 2017-03-01T10:09:19Z

so.. any thoughts on this issue?

PS: added Intel folks to CC in case it is related to X86 specifically.

rotateright · 2017-03-01T18:17:21Z

so.. any thoughts on this issue?

PS: added Intel folks to CC in case it is related to X86 specifically.

I don't think it's an x86 problem (unless there's some mis-specification of the x86 intrinsics in the front-end). There's nothing x86-specific in the IR when the undef appears.

Here's the IR heading into -gvn with no undef on any printf, and "opt -tbaa -gvn" replaces it with undef. I don't know enough about aliasing or gvn to know what's happening here (cc'ing Davide and Daniel), but it does seem strange that we only make that one printf use undef.

; ModuleID = '32056.ll'
source_filename = "32056.cpp"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

%struct.A = type { %union.anon }
%union.anon = type { %struct.anon }
%struct.anon = type { <4 x double>, <4 x double> }

@.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@.str.1 = private unnamed_addr constant [3 x i8] c"\0A\0A\00", align 1
@str = private unnamed_addr constant [2 x i8] c"\0A\00"

; Function Attrs: norecurse ssp uwtable
define i32 @main(i32 %argc, i8** nocapture readnone %argv) local_unnamed_addr #0 {
entry:
%a = alloca %struct.A, align 32
%0 = bitcast %struct.A* %a to i8*
call void @llvm.lifetime.start(i64 64, i8* nonnull %0) #3
%a.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %a.i.i, align 32, !tbaa !2
%b.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %b.i.i, align 32, !tbaa !6
%arraydecay.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0
%add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8
br label %for.body

for.body: ; preds = %entry
%1 = load double, double* %arraydecay.i, align 8, !tbaa !7
%call2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1)
%incdec.ptr = getelementptr inbounds double, double* %arraydecay.i, i64 1
%2 = load double, double* %incdec.ptr, align 8, !tbaa !7
%call2.1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2)
%incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1
%3 = load double, double* %incdec.ptr.1, align 8, !tbaa !7
%call2.2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3)
%incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1
%4 = load double, double* %incdec.ptr.2, align 8, !tbaa !7
%call2.3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4)
%incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1
%5 = load double, double* %incdec.ptr.3, align 8, !tbaa !7
%call2.4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5)
%incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1
%6 = load double, double* %incdec.ptr.4, align 8, !tbaa !7
%call2.5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6)
%incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1
%7 = load double, double* %incdec.ptr.5, align 8, !tbaa !7
%call2.6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7)
%incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1
%8 = load double, double* %incdec.ptr.6, align 8, !tbaa !7
%call2.7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8)
%incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1
%puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @str, i64 0, i64 0))
call void @llvm.lifetime.end(i64 64, i8* nonnull %0) #3
ret i32 0
}

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start(i64, i8* nocapture) #1

; Function Attrs: nounwind
declare i32 @printf(i8* nocapture readonly, ...) local_unnamed_addr #2

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end(i64, i8* nocapture) #1

; Function Attrs: nounwind
declare i32 @puts(i8* nocapture readonly) #3

attributes #0 = { norecurse ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind }
attributes #2 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { nounwind }

!llvm.module.flags = !{#0}
!llvm.ident = !{#1}

!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 5.0.0 (trunk 296618)"}
!2 = !{#3, !4, i64 0}
!3 = !{!"_ZTSN1AUt_Ut_E", !4, i64 0, !4, i64 32}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{#3, !4, i64 32}
!7 = !{#8, !8, i64 0}
!8 = !{!"double", !4, i64 0}

rotateright · 2017-03-01T18:20:11Z

Sorry, I pasted a bigger version than what I was looking at last. This removes some of the unnecessary bits (but can almost certainly be reduced some more):

%struct.A = type { %union.anon }
%union.anon = type { %struct.anon }
%struct.anon = type { <4 x double>, <4 x double> }

@.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@str = private unnamed_addr constant [2 x i8] c"\0A\00"

define i32 @main(i32 %argc, i8** nocapture readnone %argv) {
entry:
%a = alloca %struct.A, align 32
%0 = bitcast %struct.A* %a to i8*
%a.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.0, double 1.0, double 2.0, double 3.0>, <4 x double>* %a.i, align 32, !tbaa !2
%b.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.0, double 5.0, double 6.0, double 7.0>, <4 x double>* %b.i, align 32, !tbaa !6
%arraydecay = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0
%add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8
br label %for.body

for.body:
%1 = load double, double* %arraydecay, align 8, !tbaa !7
%call2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1)
%incdec.ptr = getelementptr inbounds double, double* %arraydecay, i64 1
%2 = load double, double* %incdec.ptr, align 8, !tbaa !7
%call2.1 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2)
%incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1
%3 = load double, double* %incdec.ptr.1, align 8, !tbaa !7
%call2.2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3)
%incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1
%4 = load double, double* %incdec.ptr.2, align 8, !tbaa !7
%call2.3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4)
%incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1
%5 = load double, double* %incdec.ptr.3, align 8, !tbaa !7
%call2.4 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5)
%incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1
%6 = load double, double* %incdec.ptr.4, align 8, !tbaa !7
%call2.5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6)
%incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1
%7 = load double, double* %incdec.ptr.5, align 8, !tbaa !7
%call2.6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7)
%incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1
%8 = load double, double* %incdec.ptr.6, align 8, !tbaa !7
%call2.7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8)
%incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1
%puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @str, i64 0, i64 0))
ret i32 0
}

declare i32 @printf(i8* nocapture readonly, ...)
declare i32 @puts(i8* nocapture readonly)

!0 = !{i32 1, !"PIC Level", i32 2}
!1 = !{!"clang version 5.0.0 (trunk 296618)"}
!2 = !{#3, !4, i64 0}
!3 = !{!"_ZTSN1AUt_Ut_E", !4, i64 0, !4, i64 32}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C++ TBAA"}
!6 = !{#3, !4, i64 32}
!7 = !{#8, !8, i64 0}
!8 = !{!"double", !4, i64 0}

llvmbot · 2017-03-01T20:43:19Z

This is definitely an aliasing issue.
You don't need GVN to see it.
A simple -print-memoryssa will show that aliasing is giving interesting answers.
(note: Unions are wildly borked with TBAA right now, adn there is a discussion on the mailing list about it)

Here's a minimal example:
; ModuleID = 'broken.ll'
source_filename = "broken.ll"

%struct.wibble = type { %struct.eggs }
%struct.eggs = type { %struct.ham }
%struct.ham = type { <4 x double>, <4 x double> }

@global = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@global.1 = private unnamed_addr constant [2 x i8] c"\0A\00"

define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
%tmp = alloca %struct.wibble, align 32
%tmp2 = bitcast %struct.wibble* %tmp to i8*
%tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0
%tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4
%tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
%tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
br label %bb7

bb7: ; preds = %bb
%tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
%tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
%tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
%tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
%tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
%tmp23 = load double, double* %tmp22, align 8, !tbaa !5
%tmp24 = call i32 (i8*, ...) @hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @global, i64 0, i64 0), double %tmp23)
%tmp25 = call i32 @quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @global.1, i64 0, i64 0))
ret i32 0
}

declare i32 @hoge(i8* nocapture readonly, ...)

declare i32 @quux.2(i8* nocapture readonly)

!0 = !{#1, !2, i64 0}
!1 = !{!"_ZTSN1AUt_Ut_E", !2, i64 0, !2, i64 32}
!2 = !{!"omnipotent char", !3, i64 0}
!3 = !{!"Simple C++ TBAA"}
!4 = !{#1, !2, i64 32}
!5 = !{#6, !6, i64 0}
!6 = !{!"double", !2, i64 0}

print-memoryssa shows:
define i32 @quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
%tmp = alloca %struct.wibble, align 32
%tmp2 = bitcast %struct.wibble* %tmp to i8*
%tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
; 1 = MemoryDef(liveOnEntry)
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0
%tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
; 2 = MemoryDef(1)
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4
%tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
%tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
br label %bb7

bb7: ; preds = %bb
%tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
%tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
%tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
%tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
%tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
; MemoryUse(liveOnEntry)
%tmp23 = load double, double* %tmp22, align 8, !tbaa !5
; 3 = MemoryDef(2)
%tmp24 = call i32 (i8*, ...) @hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @global, i64 0, i64 0), double %tmp23)
; 4 = MemoryDef(3)
%tmp25 = call i32 @quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @global.1, i64 0, i64 0))
ret i32 0
}

Note the load (which is the broken one), is a use of live on entry, instead of 1.

This is wrong.

llvmbot · 2017-03-01T20:47:30Z

For simplicity, here's a version with one okay load, one broken one, and no other calls:
; ModuleID = 'broken.ll'
source_filename = "broken.ll"

%struct.wibble = type { %struct.eggs }
%struct.eggs = type { %struct.ham }
%struct.ham = type { <4 x double>, <4 x double> }

@global = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@global.1 = private unnamed_addr constant [2 x i8] c"\0A\00"

define double @quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
%tmp = alloca %struct.wibble, align 32
%tmp2 = bitcast %struct.wibble* %tmp to i8*
%tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !0
%tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !4
%tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
%tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
br label %bb7

bb7: ; preds = %bb
%tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
%tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
%tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
%tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
%tmp20 = load double, double* %tmp19, align 8, !tbaa !5
%tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
%tmp23 = load double, double* %tmp22, align 8, !tbaa !5
%tmp24 = fadd double %tmp20, %tmp23
ret double %tmp24
}

declare i32 @hoge(i8* nocapture readonly, ...)

declare i32 @quux.2(i8* nocapture readonly)

!0 = !{#1, !2, i64 0}
!1 = !{!"_ZTSN1AUt_Ut_E", !2, i64 0, !2, i64 32}
!2 = !{!"omnipotent char", !3, i64 0}
!3 = !{!"Simple C++ TBAA"}
!4 = !{#1, !2, i64 32}
!5 = !{#6, !6, i64 0}
!6 = !{!"double", !2, i64 0}

The ret will become fadd %tmp20, undef

llvmbot · 2017-03-01T21:23:32Z

Okay, here is what happens:

In our AA pipeline, basicaa is currently above tbaa (i thought we had it the other way around, so maybe this is a bug).

For the first load in my example, BasicAA returns "MustAlias" and so we return that. We never ask TBAA.
If we did, it would say "NoAlias".

For the second load, BasicAA returns "MayAlias", so we continue on.
We ask TBAA, it says "NoAlias"
We return that.

The reason TBAA returns NoAlias is because this tree walk is simply wrong.

It is attempting to find NCA(tbaa !6, tbaa !1)

The clear answer is TBAA !2.

However, it gets "no common ancestor".

It does the following:

Walk A to root, see if we hit B
Walk B to root, see if we hit A.

If not, assume no NCA.

Here is a counterexample:

root
|
common ancestor
/
A B

Here is a trivially correct way to do this:

While walking A to root, build the set of seen nodes
while walking B to root, if the node appears in the set from A, that is the common ancestor.

This is O(N)
The constant time way to do this is the same thing we do to the dominator tree:

DFS number the TBAA tree, use the dfs numbers to answer containment.

I will implement #1 as a correctness fix, then #2 if we need it.

llvmbot · 2017-03-01T21:27:49Z

Okay, here is what happens:

In our AA pipeline, basicaa is currently above tbaa (i thought we had it the
other way around, so maybe this is a bug).

For the first load in my example, BasicAA returns "MustAlias" and so we
return that. We never ask TBAA.
If we did, it would say "NoAlias".

For the second load, BasicAA returns "MayAlias", so we continue on.
We ask TBAA, it says "NoAlias"
We return that.

The reason TBAA returns NoAlias is because this tree walk is simply wrong.

It is attempting to find NCA(tbaa !6, tbaa !1)

The clear answer is TBAA !2.

However, it gets "no common ancestor".

It does the following:

Walk A to root, see if we hit B
Walk B to root, see if we hit A.

If not, assume no NCA.

Here is a counterexample:

root
|
common ancestor
/
A B

Here is a trivially correct way to do this:

While walking A to root, build the set of seen nodes
while walking B to root, if the node appears in the set from A, that is the
common ancestor.

This is O(N)
The constant time way to do this is the same thing we do to the dominator
tree:

DFS number the TBAA tree, use the dfs numbers to answer containment.

Ugh, this won't work because we have offset parents.

What will work, however, is the union-find approach detailed in https://en.wikipedia.org/wiki/Lowest_common_ancestor

I will implement #1 as a correctness fix, then #2 if we need it.

llvmbot · 2017-03-01T22:22:02Z

So, as sanjoy points out, the way this tree is structured, NCA won't work either.
In any case, this tree is clearly wrong.

The trees are inverted from the representation i'm used to, which is one where the types are children of the struct nodes, not parents.

Even so, this tree says that double and the vector types don't alias, because they do not directly end up in the ancestor tree of each other.

IE double either needs to be a parent of the struct, and of every other node that has a double in it, or it needs to be a child of every node that cont
if we kept the trees in the same order as gcc, you would just make them both children of a union node, use the union tbaa node and declare victory.

IE
union
/ \
double struct

if you invert that tree into our form:

double struct
| /
| /
| /
| /
| /
union

Then both have to be parents of the union node, and the tbaa access specificed to the be the union node, and we don't allow that.

So i can't see a way to structure a tree that works here without allowing multiple parents (or multiple children).

llvmbot · 2017-03-01T23:31:20Z

See thread [llvm-dev] RFC: Representing unions in TBAA
for more details

rotateright · 2017-04-10T15:19:18Z

A patch to remove TBAA info from union members is posted here:
https://reviews.llvm.org/D31885

wjakob · 2017-04-10T15:55:09Z

I just applied D31885 to Clang trunk to re-test the snippets here. Unfortunately, they still fail.

kparzysz-quic · 2017-04-10T17:43:00Z

I just applied D31885 to Clang trunk to re-test the snippets here.
Unfortunately, they still fail.

I updated the patch and the testcase now works on my machine.

wjakob · 2017-07-27T12:58:42Z

I'll close the issue as D33328 was merged, which indeed fixes the testcase that prompted this issue.

Thanks,
Wenzel

tstellar · 2021-11-26T23:20:55Z

mentioned in issue llvm/llvm-bugzilla-archive#33343

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid code generation (aliasing between scalar and vector type) #31404

Invalid code generation (aliasing between scalar and vector type) #31404

wjakob mannequin commented Feb 24, 2017

rotateright commented Feb 24, 2017

wjakob mannequin commented Feb 24, 2017

wjakob mannequin commented Mar 1, 2017

rotateright commented Mar 1, 2017

rotateright commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

rotateright commented Apr 10, 2017

wjakob mannequin commented Apr 10, 2017

kparzysz-quic commented Apr 10, 2017

wjakob mannequin commented Jul 27, 2017

tstellar commented Nov 26, 2021

Invalid code generation (aliasing between scalar and vector type) #31404

Invalid code generation (aliasing between scalar and vector type) #31404

Comments

wjakob mannequin commented Feb 24, 2017

Extended Description

rotateright commented Feb 24, 2017

wjakob mannequin commented Feb 24, 2017

wjakob mannequin commented Mar 1, 2017

rotateright commented Mar 1, 2017

rotateright commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

llvmbot commented Mar 1, 2017

rotateright commented Apr 10, 2017

wjakob mannequin commented Apr 10, 2017

kparzysz-quic commented Apr 10, 2017

wjakob mannequin commented Jul 27, 2017

tstellar commented Nov 26, 2021