You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem only manifests in my codebase with clang 12, but this test case seems to reliably reproduce the issue in earlier versions as well (back to 7 on godbolt).
I'm also attaching the original (non-reduced) source. Please let me know if you need any additional information.
The text was updated successfully, but these errors were encountered:
But it looks like we need to make more adjustments - extract subvector, not truncate? Also possible that the inputs are shorter vectors than the output?
I added a test based on Craig's suggestion in https://reviews.llvm.org/D99531 that shows we could go even further to try to match pmadd, so if there's a real-world need for that, please file another bug.
Extended Description
Here is a reduced test case:
#include <stdlib.h>
typedef union {
int16_t i16 attribute((vector_size(32)));
int32_t i32 attribute((vector_size(32)));
} simde__m256i_private;
simde__m256i_private simde__m256i_to_private();
int simde_mm256_madd_epi16() {
simde__m256i_private r_, a_ = simde__m256i_to_private(),
b_ = simde__m256i_to_private();
for (size_t i = 0; i < sizeof sizeof(r_); i += 2)
r_.i32[i] = a_.i16[i] * b_.i16[i] + a_.i16[i + 1] * b_.i16[i + 1];
simde__m256i_from_private(r_);
}
Compile with -O2 using clang (clang++ works) on x86_64. Godbolt link: https://godbolt.org/z/71o5hdY4h
The problem only manifests in my codebase with clang 12, but this test case seems to reliably reproduce the issue in earlier versions as well (back to 7 on godbolt).
I'm also attaching the original (non-reduced) source. Please let me know if you need any additional information.
The text was updated successfully, but these errors were encountered: