Created attachment 10654 [details] Test case where we vectorize without considering the precision requirements The attached simple loop is vectorized under the triple 'thumbv7-linux-gnueabi'. Due to NEON not providing IEEE 745 compatibility we should not introduce it's use under linux, if the user did not specifically allowed imprecise floating point computations. http://llvm.org/PR16274 is about fixing the ARM target to only issue NEON instructions if the user (or the default compiler flags) set the precision requirements such that it is legal to do so. This bug is about the vectorizer and its cost model to only introduce LLVM-IR vector instructions in case we know the ARM target can actually translate them into NEON instructions. GCC had a similar issue and fixed it in this bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703
Still pertinent, as we still vectorize using NEON without fast-math flags.
For the loop vectorizer, here's the review: http://reviews.llvm.org/D17141 The SLP vectorizer seems to get it right, already. I need to look into it a bit better.
Bug #21778 is an example of the SLP vectorizer getting it wrong.
After discussion with James Greenhalgh, GCC seems to be doing what the original patch expected, so I just simplified it and rebased: http://reviews.llvm.org/D18701 For now, fast-math is required (exactly like GCC), but we don't have an -fsubnormal-maths flag, so we can't expand on that further. If there is enough interest in getting that flag (GCC seems to have ignored that for many years), we can create a new bug and work with them to find a common flag syntax.
Fixed in r266363