Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop Vectorizer: Only vectorize on ARM if precision requirements allow the use of NEON #16649

Closed
tobiasgrosser opened this issue Jun 7, 2013 · 6 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla loopoptim

Comments

@tobiasgrosser
Copy link
Contributor

Bugzilla Link 16275
Resolution FIXED
Resolved on Apr 14, 2016 15:45
Version trunk
OS Linux
Attachments Test case where we vectorize without considering the precision requirements
CC @rengolin

Extended Description

The attached simple loop is vectorized under the triple 'thumbv7-linux-gnueabi'.

Due to NEON not providing IEEE 745 compatibility we should not introduce it's use under linux, if the user did not specifically allowed imprecise floating point computations. #16648 is about fixing the ARM target to only issue NEON instructions if the user (or the default compiler flags) set the precision requirements such that it is legal to do so.

This bug is about the vectorizer and its cost model to only introduce LLVM-IR vector instructions in case we know the ARM target can actually translate them into NEON instructions.

GCC had a similar issue and fixed it in this bug report:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43703

@tobiasgrosser
Copy link
Contributor Author

assigned to @rengolin

@rengolin
Copy link
Member

Still pertinent, as we still vectorize using NEON without fast-math flags.

@rengolin
Copy link
Member

For the loop vectorizer, here's the review: http://reviews.llvm.org/D17141

The SLP vectorizer seems to get it right, already. I need to look into it a bit better.

@rengolin
Copy link
Member

Bug #​21778 is an example of the SLP vectorizer getting it wrong.

@rengolin
Copy link
Member

rengolin commented Apr 1, 2016

After discussion with James Greenhalgh, GCC seems to be doing what the original patch expected, so I just simplified it and rebased:

http://reviews.llvm.org/D18701

For now, fast-math is required (exactly like GCC), but we don't have an -fsubnormal-maths flag, so we can't expand on that further.

If there is enough interest in getting that flag (GCC seems to have ignored that for many years), we can create a new bug and work with them to find a common flag syntax.

@rengolin
Copy link
Member

Fixed in r266363

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla loopoptim
Projects
None yet
Development

No branches or pull requests

2 participants