Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid code generation (aliasing between scalar and vector type) #31404

Closed
wjakob mannequin opened this issue Feb 24, 2017 · 16 comments
Closed

Invalid code generation (aliasing between scalar and vector type) #31404

wjakob mannequin opened this issue Feb 24, 2017 · 16 comments
Labels
bugzilla Issues migrated from bugzilla clang:codegen

Comments

@wjakob
Copy link
Mannequin

wjakob mannequin commented Feb 24, 2017

Bugzilla Link 32056
Resolution FIXED
Resolved on Jul 27, 2017 05:58
Version trunk
OS All
CC @topperc,@erichkeane,@kparzysz-quic,@RKSimon,@sanjoy,@rotateright,@ZviRackover

Extended Description

Consider the simple snippet of C++ code below. It is a somewhat contrived reduced example from a much larger codebase, where this issue was first detected. In a situation where a vector type and a normal scalar type alias (but where this is made very explicit to the compiler), Clang's aliasing optimization generates incorrect code.

The class "A" is a double array with 8 entries, which is initialized to [0..8]. The entries can be accessed using Intel-specific vector types (__m256d) and double variables. The begin() and end() functions expose a C++ iterator interface to step through the values, and the main() function then prints them.

When compiled on my machine (with "clang++ test.cpp -std=c++11 -O3 -mavx -o test"), I observe the following output

0.000000 1.000000 2.000000 3.000000 4.000000 0.000000 6.000000 7.000000
(^ note the zero value here, which should be 5)

If compiled with -fno-strict-aliasing, the fifth entry is equal to 5.000000, confirming that this is indeed an aliasing optimization-related issue.

In case this is an issue with the C++ code, I would be curious how I can signal to Clang that an array type may alias with its underlying scalar type (it appears to me that this is a fairly essential requirement in all sorts of situations)

Thanks,
Wenzel

======= C++ snippet =========

#include <immintrin.h>
#include <stdio.h>

struct A {
A () {
a = _mm256_setr_pd(0.0, 1.0, 2.0, 3.0);
b = _mm256_setr_pd(4.0, 5.0, 6.0, 7.0);
}

const double *begin() { return c; }
const double *end() { return c+8; }

union {
    struct { __m256d a, b; };
    double c[8];
};

};

int main(int argc, char *argv[]) {
A a;
for (double value : a)
printf("%f ", value);

return 0;

}

==== LLVM IR Disassembly =====

(note the undef in the line
%12 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
)

define i32 @​main(i32, i8** nocapture readnone) local_unnamed_addr #​0 {
%3 = alloca %struct.Test, align 32
%4 = bitcast %struct.Test* %3 to i8*
call void @​llvm.lifetime.start(i64 64, i8* nonnull %4) #​3
%5 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %5, align 32, !tbaa !​2
%6 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %6, align 32, !tbaa !​6
%7 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 0.000000e+00)
%8 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.000000e+00)
%9 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 2.000000e+00)
%10 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 3.000000e+00)
%11 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 4.000000e+00)
%12 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double undef)
%13 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 6
%14 = load double, double* %13, align 16, !tbaa !​7
%15 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %14)
%16 = getelementptr inbounds %struct.Test, %struct.Test* %3, i64 0, i32 0, i32 0, i32 0, i64 7
%17 = load double, double* %16, align 8, !tbaa !​7
%18 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %17)
call void @​llvm.lifetime.end(i64 64, i8* nonnull %4) #​3
ret i32 0
}

@rotateright
Copy link
Contributor

Does bug 31928 provide the answer?

@wjakob
Copy link
Mannequin Author

wjakob mannequin commented Feb 24, 2017

I don't think it does -- this issue has something to do with the fact that 'a' and 'b' are vector types (__m256d).

Two other things that are interesting:

  1. Clang only miscompiles when 'a' and 'b' are initialized with constants (thus a part of the computation can be done at compile time, at which point something goes wrong). When the values are fetched from elsewhere (e.g. an external symbol), the issue disappears.

  2. The issue seems to be specific to vector registers. For instance, if I replace the definition of the class 'A' by the following (technically equivalent) code that does not involve __m256d, everything works without problems.

struct A {
struct Foo { uint64_t a, b, c, d; };

A () {
    /* Binary encoding for double values 0..7 */
    a = Foo{ 0x0000000000000000ull, 0x3FF0000000000000ull,
             0x4000000000000000ull, 0x4008000000000000ull };
    b = Foo{ 0x4010000000000000ull, 0x4014000000000000ull,
             0x4018000000000000ull, 0x401C000000000000ull };
}

const double *begin() { return c; }
const double *end() { return c+8; }

union {
    struct { Foo a, b; };
    double c[8];
};

};

@wjakob
Copy link
Mannequin Author

wjakob mannequin commented Mar 1, 2017

so.. any thoughts on this issue?

PS: added Intel folks to CC in case it is related to X86 specifically.

@rotateright
Copy link
Contributor

so.. any thoughts on this issue?

PS: added Intel folks to CC in case it is related to X86 specifically.

I don't think it's an x86 problem (unless there's some mis-specification of the x86 intrinsics in the front-end). There's nothing x86-specific in the IR when the undef appears.

Here's the IR heading into -gvn with no undef on any printf, and "opt -tbaa -gvn" replaces it with undef. I don't know enough about aliasing or gvn to know what's happening here (cc'ing Davide and Daniel), but it does seem strange that we only make that one printf use undef.

; ModuleID = '32056.ll'
source_filename = "32056.cpp"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

%struct.A = type { %union.anon }
%union.anon = type { %struct.anon }
%struct.anon = type { <4 x double>, <4 x double> }

@.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@.str.1 = private unnamed_addr constant [3 x i8] c"\0A\0A\00", align 1
@​str = private unnamed_addr constant [2 x i8] c"\0A\00"

; Function Attrs: norecurse ssp uwtable
define i32 @​main(i32 %argc, i8** nocapture readnone %argv) local_unnamed_addr #​0 {
entry:
%a = alloca %struct.A, align 32
%0 = bitcast %struct.A* %a to i8*
call void @​llvm.lifetime.start(i64 64, i8* nonnull %0) #​3
%a.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %a.i.i, align 32, !tbaa !​2
%b.i.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %b.i.i, align 32, !tbaa !​6
%arraydecay.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0
%add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8
br label %for.body

for.body: ; preds = %entry
%1 = load double, double* %arraydecay.i, align 8, !tbaa !​7
%call2 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1)
%incdec.ptr = getelementptr inbounds double, double* %arraydecay.i, i64 1
%2 = load double, double* %incdec.ptr, align 8, !tbaa !​7
%call2.1 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2)
%incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1
%3 = load double, double* %incdec.ptr.1, align 8, !tbaa !​7
%call2.2 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3)
%incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1
%4 = load double, double* %incdec.ptr.2, align 8, !tbaa !​7
%call2.3 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4)
%incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1
%5 = load double, double* %incdec.ptr.3, align 8, !tbaa !​7
%call2.4 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5)
%incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1
%6 = load double, double* %incdec.ptr.4, align 8, !tbaa !​7
%call2.5 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6)
%incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1
%7 = load double, double* %incdec.ptr.5, align 8, !tbaa !​7
%call2.6 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7)
%incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1
%8 = load double, double* %incdec.ptr.6, align 8, !tbaa !​7
%call2.7 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8)
%incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1
%puts = call i32 @​puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @​str, i64 0, i64 0))
call void @​llvm.lifetime.end(i64 64, i8* nonnull %0) #​3
ret i32 0
}

; Function Attrs: argmemonly nounwind
declare void @​llvm.lifetime.start(i64, i8* nocapture) #​1

; Function Attrs: nounwind
declare i32 @​printf(i8* nocapture readonly, ...) local_unnamed_addr #​2

; Function Attrs: argmemonly nounwind
declare void @​llvm.lifetime.end(i64, i8* nocapture) #​1

; Function Attrs: nounwind
declare i32 @​puts(i8* nocapture readonly) #​3

attributes #​0 = { norecurse ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #​1 = { argmemonly nounwind }
attributes #​2 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+avx,+cx16,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #​3 = { nounwind }

!llvm.module.flags = !{#0}
!llvm.ident = !{#1}

!​0 = !{i32 1, !"PIC Level", i32 2}
!​1 = !{!"clang version 5.0.0 (trunk 296618)"}
!​2 = !{#3, !​4, i64 0}
!​3 = !{!"_ZTSN1AUt_Ut_E", !​4, i64 0, !​4, i64 32}
!​4 = !{!"omnipotent char", !​5, i64 0}
!​5 = !{!"Simple C++ TBAA"}
!​6 = !{#3, !​4, i64 32}
!​7 = !{#8, !​8, i64 0}
!​8 = !{!"double", !​4, i64 0}

@rotateright
Copy link
Contributor

Sorry, I pasted a bigger version than what I was looking at last. This removes some of the unnecessary bits (but can almost certainly be reduced some more):

%struct.A = type { %union.anon }
%union.anon = type { %struct.anon }
%struct.anon = type { <4 x double>, <4 x double> }

@.str = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@​str = private unnamed_addr constant [2 x i8] c"\0A\00"

define i32 @​main(i32 %argc, i8** nocapture readnone %argv) {
entry:
%a = alloca %struct.A, align 32
%0 = bitcast %struct.A* %a to i8*
%a.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.0, double 1.0, double 2.0, double 3.0>, <4 x double>* %a.i, align 32, !tbaa !​2
%b.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.0, double 5.0, double 6.0, double 7.0>, <4 x double>* %b.i, align 32, !tbaa !​6
%arraydecay = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 0
%add.ptr.i = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i32 0, i32 0, i64 8
br label %for.body

for.body:
%1 = load double, double* %arraydecay, align 8, !tbaa !​7
%call2 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %1)
%incdec.ptr = getelementptr inbounds double, double* %arraydecay, i64 1
%2 = load double, double* %incdec.ptr, align 8, !tbaa !​7
%call2.1 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %2)
%incdec.ptr.1 = getelementptr inbounds double, double* %incdec.ptr, i64 1
%3 = load double, double* %incdec.ptr.1, align 8, !tbaa !​7
%call2.2 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %3)
%incdec.ptr.2 = getelementptr inbounds double, double* %incdec.ptr.1, i64 1
%4 = load double, double* %incdec.ptr.2, align 8, !tbaa !​7
%call2.3 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %4)
%incdec.ptr.3 = getelementptr inbounds double, double* %incdec.ptr.2, i64 1
%5 = load double, double* %incdec.ptr.3, align 8, !tbaa !​7
%call2.4 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %5)
%incdec.ptr.4 = getelementptr inbounds double, double* %incdec.ptr.3, i64 1
%6 = load double, double* %incdec.ptr.4, align 8, !tbaa !​7
%call2.5 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %6)
%incdec.ptr.5 = getelementptr inbounds double, double* %incdec.ptr.4, i64 1
%7 = load double, double* %incdec.ptr.5, align 8, !tbaa !​7
%call2.6 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %7)
%incdec.ptr.6 = getelementptr inbounds double, double* %incdec.ptr.5, i64 1
%8 = load double, double* %incdec.ptr.6, align 8, !tbaa !​7
%call2.7 = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %8)
%incdec.ptr.7 = getelementptr inbounds double, double* %incdec.ptr.6, i64 1
%puts = call i32 @​puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @​str, i64 0, i64 0))
ret i32 0
}

declare i32 @​printf(i8* nocapture readonly, ...)
declare i32 @​puts(i8* nocapture readonly)

!​0 = !{i32 1, !"PIC Level", i32 2}
!​1 = !{!"clang version 5.0.0 (trunk 296618)"}
!​2 = !{#3, !​4, i64 0}
!​3 = !{!"_ZTSN1AUt_Ut_E", !​4, i64 0, !​4, i64 32}
!​4 = !{!"omnipotent char", !​5, i64 0}
!​5 = !{!"Simple C++ TBAA"}
!​6 = !{#3, !​4, i64 32}
!​7 = !{#8, !​8, i64 0}
!​8 = !{!"double", !​4, i64 0}

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2017

This is definitely an aliasing issue.
You don't need GVN to see it.
A simple -print-memoryssa will show that aliasing is giving interesting answers.
(note: Unions are wildly borked with TBAA right now, adn there is a discussion on the mailing list about it)

Here's a minimal example:
; ModuleID = 'broken.ll'
source_filename = "broken.ll"

%struct.wibble = type { %struct.eggs }
%struct.eggs = type { %struct.ham }
%struct.ham = type { <4 x double>, <4 x double> }

@​global = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@​global.1 = private unnamed_addr constant [2 x i8] c"\0A\00"

define i32 @​quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
%tmp = alloca %struct.wibble, align 32
%tmp2 = bitcast %struct.wibble* %tmp to i8*
%tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !​0
%tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !​4
%tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
%tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
br label %bb7

bb7: ; preds = %bb
%tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
%tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
%tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
%tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
%tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
%tmp23 = load double, double* %tmp22, align 8, !tbaa !​5
%tmp24 = call i32 (i8*, ...) @​hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @​global, i64 0, i64 0), double %tmp23)
%tmp25 = call i32 @​quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @​global.1, i64 0, i64 0))
ret i32 0
}

declare i32 @​hoge(i8* nocapture readonly, ...)

declare i32 @​quux.2(i8* nocapture readonly)

!​0 = !{#1, !​2, i64 0}
!​1 = !{!"_ZTSN1AUt_Ut_E", !​2, i64 0, !​2, i64 32}
!​2 = !{!"omnipotent char", !​3, i64 0}
!​3 = !{!"Simple C++ TBAA"}
!​4 = !{#1, !​2, i64 32}
!​5 = !{#6, !​6, i64 0}
!​6 = !{!"double", !​2, i64 0}

print-memoryssa shows:
define i32 @​quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
%tmp = alloca %struct.wibble, align 32
%tmp2 = bitcast %struct.wibble* %tmp to i8*
%tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
; 1 = MemoryDef(liveOnEntry)
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !​0
%tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
; 2 = MemoryDef(1)
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !​4
%tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
%tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
br label %bb7

bb7: ; preds = %bb
%tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
%tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
%tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
%tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
%tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
; MemoryUse(liveOnEntry)
%tmp23 = load double, double* %tmp22, align 8, !tbaa !​5
; 3 = MemoryDef(2)
%tmp24 = call i32 (i8*, ...) @​hoge(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @​global, i64 0, i64 0), double %tmp23)
; 4 = MemoryDef(3)
%tmp25 = call i32 @​quux.2(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @​global.1, i64 0, i64 0))
ret i32 0
}

Note the load (which is the broken one), is a use of live on entry, instead of 1.

This is wrong.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2017

For simplicity, here's a version with one okay load, one broken one, and no other calls:
; ModuleID = 'broken.ll'
source_filename = "broken.ll"

%struct.wibble = type { %struct.eggs }
%struct.eggs = type { %struct.ham }
%struct.ham = type { <4 x double>, <4 x double> }

@​global = private unnamed_addr constant [4 x i8] c"%f \00", align 1
@​global.1 = private unnamed_addr constant [2 x i8] c"\0A\00"

define double @​quux(i32 %arg, i8** nocapture readnone %arg1) {
bb:
%tmp = alloca %struct.wibble, align 32
%tmp2 = bitcast %struct.wibble* %tmp to i8*
%tmp3 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0
store <4 x double> <double 0.000000e+00, double 1.000000e+00, double 2.000000e+00, double 3.000000e+00>, <4 x double>* %tmp3, align 32, !tbaa !​0
%tmp4 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 1
store <4 x double> <double 4.000000e+00, double 5.000000e+00, double 6.000000e+00, double 7.000000e+00>, <4 x double>* %tmp4, align 32, !tbaa !​4
%tmp5 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 0
%tmp6 = getelementptr inbounds %struct.wibble, %struct.wibble* %tmp, i64 0, i32 0, i32 0, i32 0, i64 8
br label %bb7

bb7: ; preds = %bb
%tmp10 = getelementptr inbounds double, double* %tmp5, i64 1
%tmp13 = getelementptr inbounds double, double* %tmp10, i64 1
%tmp16 = getelementptr inbounds double, double* %tmp13, i64 1
%tmp19 = getelementptr inbounds double, double* %tmp16, i64 1
%tmp20 = load double, double* %tmp19, align 8, !tbaa !​5
%tmp22 = getelementptr inbounds double, double* %tmp19, i64 1
%tmp23 = load double, double* %tmp22, align 8, !tbaa !​5
%tmp24 = fadd double %tmp20, %tmp23
ret double %tmp24
}

declare i32 @​hoge(i8* nocapture readonly, ...)

declare i32 @​quux.2(i8* nocapture readonly)

!​0 = !{#1, !​2, i64 0}
!​1 = !{!"_ZTSN1AUt_Ut_E", !​2, i64 0, !​2, i64 32}
!​2 = !{!"omnipotent char", !​3, i64 0}
!​3 = !{!"Simple C++ TBAA"}
!​4 = !{#1, !​2, i64 32}
!​5 = !{#6, !​6, i64 0}
!​6 = !{!"double", !​2, i64 0}

The ret will become fadd %tmp20, undef

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2017

Okay, here is what happens:

In our AA pipeline, basicaa is currently above tbaa (i thought we had it the other way around, so maybe this is a bug).

For the first load in my example, BasicAA returns "MustAlias" and so we return that. We never ask TBAA.
If we did, it would say "NoAlias".

For the second load, BasicAA returns "MayAlias", so we continue on.
We ask TBAA, it says "NoAlias"
We return that.

The reason TBAA returns NoAlias is because this tree walk is simply wrong.

It is attempting to find NCA(tbaa !​6, tbaa !​1)

The clear answer is TBAA !​2.

However, it gets "no common ancestor".

It does the following:

Walk A to root, see if we hit B
Walk B to root, see if we hit A.

If not, assume no NCA.

Here is a counterexample:

root
|
common ancestor
/
A B

Here is a trivially correct way to do this:

While walking A to root, build the set of seen nodes
while walking B to root, if the node appears in the set from A, that is the common ancestor.

This is O(N)
The constant time way to do this is the same thing we do to the dominator tree:

DFS number the TBAA tree, use the dfs numbers to answer containment.

I will implement #​1 as a correctness fix, then #​2 if we need it.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2017

Okay, here is what happens:

In our AA pipeline, basicaa is currently above tbaa (i thought we had it the
other way around, so maybe this is a bug).

For the first load in my example, BasicAA returns "MustAlias" and so we
return that. We never ask TBAA.
If we did, it would say "NoAlias".

For the second load, BasicAA returns "MayAlias", so we continue on.
We ask TBAA, it says "NoAlias"
We return that.

The reason TBAA returns NoAlias is because this tree walk is simply wrong.

It is attempting to find NCA(tbaa !​6, tbaa !​1)

The clear answer is TBAA !​2.

However, it gets "no common ancestor".

It does the following:

Walk A to root, see if we hit B
Walk B to root, see if we hit A.

If not, assume no NCA.

Here is a counterexample:

root
|
common ancestor
/
A B

Here is a trivially correct way to do this:

While walking A to root, build the set of seen nodes
while walking B to root, if the node appears in the set from A, that is the
common ancestor.

This is O(N)
The constant time way to do this is the same thing we do to the dominator
tree:

DFS number the TBAA tree, use the dfs numbers to answer containment.

Ugh, this won't work because we have offset parents.

What will work, however, is the union-find approach detailed in https://en.wikipedia.org/wiki/Lowest_common_ancestor

I will implement #​1 as a correctness fix, then #​2 if we need it.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2017

So, as sanjoy points out, the way this tree is structured, NCA won't work either.
In any case, this tree is clearly wrong.

The trees are inverted from the representation i'm used to, which is one where the types are children of the struct nodes, not parents.

Even so, this tree says that double and the vector types don't alias, because they do not directly end up in the ancestor tree of each other.

IE double either needs to be a parent of the struct, and of every other node that has a double in it, or it needs to be a child of every node that cont
if we kept the trees in the same order as gcc, you would just make them both children of a union node, use the union tbaa node and declare victory.

IE
union
/ \
double struct

if you invert that tree into our form:

double struct
| /
| /
| /
| /
| /
union

Then both have to be parents of the union node, and the tbaa access specificed to the be the union node, and we don't allow that.

So i can't see a way to structure a tree that works here without allowing multiple parents (or multiple children).

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 1, 2017

See thread [llvm-dev] RFC: Representing unions in TBAA
for more details

@rotateright
Copy link
Contributor

A patch to remove TBAA info from union members is posted here:
https://reviews.llvm.org/D31885

@wjakob
Copy link
Mannequin Author

wjakob mannequin commented Apr 10, 2017

I just applied D31885 to Clang trunk to re-test the snippets here. Unfortunately, they still fail.

@kparzysz-quic
Copy link

I just applied D31885 to Clang trunk to re-test the snippets here.
Unfortunately, they still fail.

I updated the patch and the testcase now works on my machine.

@wjakob
Copy link
Mannequin Author

wjakob mannequin commented Jul 27, 2017

I'll close the issue as D33328 was merged, which indeed fixes the testcase that prompted this issue.

Thanks,
Wenzel

@tstellar
Copy link
Collaborator

mentioned in issue llvm/llvm-bugzilla-archive#33343

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:codegen
Projects
None yet
Development

No branches or pull requests

4 participants