43574 – Vectorized fp min/max reduction loop uses minnum/maxnum for horizontal reduction, but not loop body

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 43574 - Vectorized fp min/max reduction loop uses minnum/maxnum for horizontal reduction, but not loop body

Summary: Vectorized fp min/max reduction loop uses minnum/maxnum for horizontal reduct...

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Loop Optimizer (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-10-05 10:11 PDT by Craig Topper
Modified:	2019-10-07 20:03 PDT (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Craig Topper 2019-10-05 10:11:09 PDT

For a min/max reduction loop, the vectorizer adds fast math flags to the fcmp/select for the horizontal reduction after the loop. But the selects in the loop don't have fastmath flags. This leads to InstCombine creating fmaxnum/fminnum intrinsics for the horizontal reduction. X86 codegens both the loop body and the horizontal reduction to the FP min/max instructions so the difference didn't end up mattering. It just looked inconsistent.

clang -Ofast -march=skylake-avx512

float a[1024];

float foo() {
  float min = 0.0;
  for (int i = 0; i != 1024; ++i)
    min = min < a[i] ? min : a[i];

  return min;
}

https://godbolt.org/z/AMWYWm