49716 – [x86] crash and fail to match pmaddwd

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 49716 - [x86] crash and fail to match pmaddwd

Summary: [x86] crash and fail to match pmaddwd

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC Windows NT

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-03-24 14:45 PDT by Evan Nemerson
Modified:	2021-03-31 07:19 PDT (History)
CC List:	7 users (show)

See Also:
Fixed By Commit(s):	a283d7258360 e694e19a7931

Attachments
Non-reduced, un-preprocessed source (128.25 KB, application/x-xz) 2021-03-24 14:45 PDT, Evan Nemerson	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Evan Nemerson 2021-03-24 14:45:23 PDT

Created attachment 24690 [details]
Non-reduced, un-preprocessed source

Here is a reduced test case:


#include <stdlib.h>
typedef union {
  int16_t i16 __attribute__((__vector_size__(32)));
  int32_t i32 __attribute__((__vector_size__(32)));
} simde__m256i_private;
simde__m256i_private simde__m256i_to_private();
int simde_mm256_madd_epi16() {
  simde__m256i_private r_, a_ = simde__m256i_to_private(),
                           b_ = simde__m256i_to_private();
  for (size_t i = 0; i < sizeof sizeof(r_); i += 2)
    r_.i32[i] = a_.i16[i] * b_.i16[i] + a_.i16[i + 1] * b_.i16[i + 1];
  simde__m256i_from_private(r_);
}


Compile with -O2 using clang (clang++ works) on x86_64.  Godbolt link: https://godbolt.org/z/71o5hdY4h

The problem only manifests in my codebase with clang 12, but this test case seems to reliably reproduce the issue in earlier versions as well (back to 7 on godbolt).

I'm also attaching the original (non-reduced) source.  Please let me know if you need any additional information.

Comment 1 Sanjay Patel 2021-03-27 02:50:01 PDT

This should prevent the crashing:
https://reviews.llvm.org/rGa283d7258360

But we want the output to be a "pmaddwd" instruction?

Comment 2 Sanjay Patel 2021-03-27 03:39:22 PDT

The matching code for this pattern was translated from a different/existing pattern here:
https://reviews.llvm.org/D49636

But it looks like we need to make more adjustments - extract subvector, not truncate? Also possible that the inputs are shorter vectors than the output?

Comment 3 Craig Topper 2021-03-29 12:32:21 PDT

Is the "sizeof sizeof(r_)" in the for loop a mistake in your code that exposed the bug? It doesn't logically make sense.

Comment 4 Sanjay Patel 2021-03-30 04:48:12 PDT

More pmadd matching:
https://reviews.llvm.org/rGe694e19a7931

Comment 5 Sanjay Patel 2021-03-31 07:19:10 PDT

Resolving as fixed.

I added a test based on Craig's suggestion in https://reviews.llvm.org/D99531 that shows we could go even further to try to match pmadd, so if there's a real-world need for that, please file another bug.