LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 42055 - Invalid register allocation for AVX512 gather
Summary: Invalid register allocation for AVX512 gather
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-29 02:31 PDT by Benjamin Kramer
Modified: 2019-07-23 06:43 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s): r362015


Attachments
reduced test case (1.05 KB, text/plain)
2019-05-29 02:31 PDT, Benjamin Kramer
Details
Unreduced test case (21.68 KB, text/plain)
2019-05-29 08:46 PDT, Benjamin Kramer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Kramer 2019-05-29 02:31:18 PDT
Created attachment 22042 [details]
reduced test case

While chasing a SIGILL I came across this. Intrinsics were generated by LoopVectorizer. Manual says: "If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault."

$ cat t.ll
define void @reduced() #0 {
  %wide.masked.gather38 = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> undef, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef)
  %1 = fadd fast <16 x float> %wide.masked.gather38, zeroinitializer
  call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %1, <16 x float>* undef, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>)
  ret void
}

; Function Attrs: nounwind readonly
declare <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*>, i32 immarg, <16 x i1>, <16 x float>) #1

; Function Attrs: argmemonly nounwind
declare void @llvm.masked.store.v16f32.p0v16f32(<16 x float>, <16 x float>*, i32 immarg, <16 x i1>) #2

attributes #0 = { "unsafe-fp-math"="true" }
attributes #1 = { nounwind readonly }
attributes #2 = { argmemonly nounwind }



$ llc -mcpu=skx < t.ll | llvm-mc
<stdin>:10:2: warning: index and destination registers should be distinct
        vgatherqps      (,%zmm0), %ymm0 {%k1}
        ^
Comment 1 Craig Topper 2019-05-29 08:06:55 PDT
Have you seen this occur without the pointer operand being undef?
Comment 2 Benjamin Kramer 2019-05-29 08:46:06 PDT
Created attachment 22046 [details]
Unreduced test case

Bugpoint made them undef, I attached the unreduced test case.
Comment 3 Craig Topper 2019-05-29 10:22:10 PDT
Looks like this gather ends up getting split into two gathers. Where the high one has all undef indices. Presumably this occurs because the mask for these elements is also all 0s.

  %4 = load <16 x float>*, <16 x float>** %3, align 8, !invariant.load !0, !dereferenceable !3, !align !4
  %5 = getelementptr inbounds [256 x float], [256 x float]* %1, i64 0, <16 x i64> <i64 0, i64 56, i64 112, i64 168, i64 224, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef>
  %wide.masked.gather = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float*> %5, i32 16, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef), !invariant.load !0, !noalias !5


The easiest fix might be to add a DAG combine to detect the all 0s mask and convert the gather into just the passthru result.
Comment 4 Simon Pilgrim 2019-05-30 08:27:49 PDT
Patch: https://reviews.llvm.org/D62613 (committed at: rL362015)
Comment 5 Simon Pilgrim 2019-07-23 06:43:59 PDT
(In reply to Simon Pilgrim from comment #4)
> Patch: https://reviews.llvm.org/D62613 (committed at: rL362015)

Resolving