LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 49716 - [x86] crash and fail to match pmaddwd
Summary: [x86] crash and fail to match pmaddwd
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-24 14:45 PDT by Evan Nemerson
Modified: 2021-03-31 07:19 PDT (History)
7 users (show)

See Also:
Fixed By Commit(s): a283d7258360 e694e19a7931


Attachments
Non-reduced, un-preprocessed source (128.25 KB, application/x-xz)
2021-03-24 14:45 PDT, Evan Nemerson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Nemerson 2021-03-24 14:45:23 PDT
Created attachment 24690 [details]
Non-reduced, un-preprocessed source

Here is a reduced test case:


#include <stdlib.h>
typedef union {
  int16_t i16 __attribute__((__vector_size__(32)));
  int32_t i32 __attribute__((__vector_size__(32)));
} simde__m256i_private;
simde__m256i_private simde__m256i_to_private();
int simde_mm256_madd_epi16() {
  simde__m256i_private r_, a_ = simde__m256i_to_private(),
                           b_ = simde__m256i_to_private();
  for (size_t i = 0; i < sizeof sizeof(r_); i += 2)
    r_.i32[i] = a_.i16[i] * b_.i16[i] + a_.i16[i + 1] * b_.i16[i + 1];
  simde__m256i_from_private(r_);
}


Compile with -O2 using clang (clang++ works) on x86_64.  Godbolt link: https://godbolt.org/z/71o5hdY4h

The problem only manifests in my codebase with clang 12, but this test case seems to reliably reproduce the issue in earlier versions as well (back to 7 on godbolt).

I'm also attaching the original (non-reduced) source.  Please let me know if you need any additional information.
Comment 1 Sanjay Patel 2021-03-27 02:50:01 PDT
This should prevent the crashing:
https://reviews.llvm.org/rGa283d7258360

But we want the output to be a "pmaddwd" instruction?
Comment 2 Sanjay Patel 2021-03-27 03:39:22 PDT
The matching code for this pattern was translated from a different/existing pattern here:
https://reviews.llvm.org/D49636

But it looks like we need to make more adjustments - extract subvector, not truncate? Also possible that the inputs are shorter vectors than the output?
Comment 3 Craig Topper 2021-03-29 12:32:21 PDT
Is the "sizeof sizeof(r_)" in the for loop a mistake in your code that exposed the bug? It doesn't logically make sense.
Comment 4 Sanjay Patel 2021-03-30 04:48:12 PDT
More pmadd matching:
https://reviews.llvm.org/rGe694e19a7931
Comment 5 Sanjay Patel 2021-03-31 07:19:10 PDT
Resolving as fixed.

I added a test based on Craig's suggestion in https://reviews.llvm.org/D99531 that shows we could go even further to try to match pmadd, so if there's a real-world need for that, please file another bug.