Created attachment 22042 [details] reduced test case While chasing a SIGILL I came across this. Intrinsics were generated by LoopVectorizer. Manual says: "If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault." $ cat t.ll define void @reduced() #0 { %wide.masked.gather38 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef) %1 = fadd fast <16 x float> %wide.masked.gather38, zeroinitializer call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %1, <16 x float>* undef, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>) ret void } ; Function Attrs: nounwind readonly declare <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*>, i32 immarg, <16 x i1>, <16 x float>) #1 ; Function Attrs: argmemonly nounwind declare void @llvm.masked.store.v16f32.p0v16f32(<16 x float>, <16 x float>*, i32 immarg, <16 x i1>) #2 attributes #0 = { "unsafe-fp-math"="true" } attributes #1 = { nounwind readonly } attributes #2 = { argmemonly nounwind } $ llc -mcpu=skx < t.ll | llvm-mc <stdin>:10:2: warning: index and destination registers should be distinct vgatherqps (,%zmm0), %ymm0 {%k1} ^
Have you seen this occur without the pointer operand being undef?
Created attachment 22046 [details] Unreduced test case Bugpoint made them undef, I attached the unreduced test case.
Looks like this gather ends up getting split into two gathers. Where the high one has all undef indices. Presumably this occurs because the mask for these elements is also all 0s. %4 = load <16 x float>*, <16 x float>** %3, align 8, !invariant.load !0, !dereferenceable !3, !align !4 %5 = getelementptr inbounds [256 x float], [256 x float]* %1, i64 0, <16 x i64> <i64 0, i64 56, i64 112, i64 168, i64 224, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef> %wide.masked.gather = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %5, i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef), !invariant.load !0, !noalias !5 The easiest fix might be to add a DAG combine to detect the all 0s mask and convert the gather into just the passthru result.
Patch: https://reviews.llvm.org/D62613 (committed at: rL362015)
(In reply to Simon Pilgrim from comment #4) > Patch: https://reviews.llvm.org/D62613 (committed at: rL362015) Resolving