New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid register allocation for AVX512 gather #41400
Comments
Have you seen this occur without the pointer operand being undef? |
Unreduced test case |
Looks like this gather ends up getting split into two gathers. Where the high one has all undef indices. Presumably this occurs because the mask for these elements is also all 0s. %4 = load <16 x float>, <16 x float>** %3, align 8, !invariant.load !0, !dereferenceable !3, !align !4 The easiest fix might be to add a DAG combine to detect the all 0s mask and convert the gather into just the passthru result. |
Patch: https://reviews.llvm.org/D62613 (committed at: rL362015) |
Resolving |
Extended Description
While chasing a SIGILL I came across this. Intrinsics were generated by LoopVectorizer. Manual says: "If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault."
$ cat t.ll
define void @reduced() #0 {
%wide.masked.gather38 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef)
%1 = fadd fast <16 x float> %wide.masked.gather38, zeroinitializer
call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %1, <16 x float>* undef, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
ret void
}
; Function Attrs: nounwind readonly
declare <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*>, i32 immarg, <16 x i1>, <16 x float>) #1
; Function Attrs: argmemonly nounwind
declare void @llvm.masked.store.v16f32.p0v16f32(<16 x float>, <16 x float>*, i32 immarg, <16 x i1>) #2
attributes #0 = { "unsafe-fp-math"="true" }
attributes #1 = { nounwind readonly }
attributes #2 = { argmemonly nounwind }
$ llc -mcpu=skx < t.ll | llvm-mc
:10:2: warning: index and destination registers should be distinct
vgatherqps (,%zmm0), %ymm0 {%k1}
^
The text was updated successfully, but these errors were encountered: