For a min/max reduction loop, the vectorizer adds fast math flags to the fcmp/select for the horizontal reduction after the loop. But the selects in the loop don't have fastmath flags. This leads to InstCombine creating fmaxnum/fminnum intrinsics for the horizontal reduction. X86 codegens both the loop body and the horizontal reduction to the FP min/max instructions so the difference didn't end up mattering. It just looked inconsistent. clang -Ofast -march=skylake-avx512 float a[1024]; float foo() { float min = 0.0; for (int i = 0; i != 1024; ++i) min = min < a[i] ? min : a[i]; return min; } https://godbolt.org/z/AMWYWm