16275 – Loop Vectorizer: Only vectorize on ARM if precision requirements allow the use of NEON

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 16275 - Loop Vectorizer: Only vectorize on ARM if precision requirements allow the use of NEON

Summary: Loop Vectorizer: Only vectorize on ARM if precision requirements allow the us...

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Loop Optimizer (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Renato Golin

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-06-07 16:29 PDT by Tobias Grosser
Modified:	2016-04-14 15:45 PDT (History)
CC List:	3 users (show)

See Also:
Fixed By Commit(s):

Attachments
Test case where we vectorize without considering the precision requirements (927 bytes, application/octet-stream) 2013-06-07 16:29 PDT, Tobias Grosser	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tobias Grosser 2013-06-07 16:29:24 PDT

Created attachment 10654 [details]
Test case where we vectorize without considering the precision requirements

The attached simple loop is vectorized under the triple 'thumbv7-linux-gnueabi'.

Due to NEON not providing IEEE 745 compatibility we should not introduce it's use under linux, if the user did not specifically allowed imprecise floating point computations. http://llvm.org/PR16274 is about fixing the ARM target to only issue NEON instructions if the user (or the default compiler flags) set the precision requirements such that it is legal to do so.

This bug is about the vectorizer and its cost model to only introduce LLVM-IR vector instructions in case we know the ARM target can actually translate them into NEON instructions.

GCC had a similar issue and fixed it in this bug report:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703

Comment 1 Renato Golin 2016-01-18 04:34:50 PST

Still pertinent, as we still vectorize using NEON without fast-math flags.

Comment 2 Renato Golin 2016-02-12 08:22:32 PST

For the loop vectorizer, here's the review: http://reviews.llvm.org/D17141

The SLP vectorizer seems to get it right, already. I need to look into it a bit better.

Comment 3 Renato Golin 2016-02-23 11:41:42 PST

Bug #21778 is an example of the SLP vectorizer getting it wrong.

Comment 4 Renato Golin 2016-04-01 12:37:24 PDT

After discussion with James Greenhalgh, GCC seems to be doing what the original patch expected, so I just simplified it and rebased:

http://reviews.llvm.org/D18701

For now, fast-math is required (exactly like GCC), but we don't have an -fsubnormal-maths flag, so we can't expand on that further.

If there is enough interest in getting that flag (GCC seems to have ignored that for many years), we can create a new bug and work with them to find a common flag syntax.

Comment 5 Renato Golin 2016-04-14 15:45:02 PDT

Fixed in r266363