float instructions should only be lowered to NEON if precision constraints permit #16648

tobiasgrosser · 2013-06-07T23:01:20Z


Bugzilla Link	16274
Version	trunk
OS	Linux
Attachments	Test file attached
CC	@rengolin,@stephenhines

Extended Description

On ARM it is difficult to generate optimal code that matches certain floating point precision requirements. The only way that currently exist is to explicitly alter the feature flags of the CPU that we target. This currently causes problems, such that it is e.g. not possible to take advantage of NEON for integer instructions while at the same time NEON is avoided for vector floating point operations.

The following test cases illustrate how I expect llc to behave:

; RUN: llc -march=arm -mattr=+vfp3,+neon,+neonfp < %s | FileCheck %s
; RUN: llc -march=arm -mattr=+vfp3,+neon,-neonfp -enable-flush-denormals-fp-math < %s | FileCheck %s -check-prefix=ALLOW-FLUSH

; fooP() performs a vector floating point multiplication with full precision
; requirement. Even if some piece of hardware supports NEON (modeled with
; -mattr=+neon), NEON should not be used to implement this function as NEON
; does not comply to the full precision requirements (NEON rounds denormals
; to zero)
;
; However, if the user specifies flushing denormals to zero is legal, we should
; obviously use NEON.
define <4 x float> @&#8203;fooP(<4 x float> %A, <4 x float> %B)
{
        %C = fmul <4 x float> %A, %B
; CHECK: fooP
; CHECK: vmul.f32       s
; CHECK: vmul.f32       s
; CHECK: vmul.f32       s
; CHECK: vmul.f32       s

; CHECK-ALLOW-FLUSH: fooP
; CHECK-ALLOW-FLUSH: vmul.f32   q
        ret <4 x float> %C
}

; fooR() performs a vector floating point multiplication with relaxed precision
; requirements. In case the precision loss introduced by neon is acceptable
; we should generate NEON instructions
;
; Relaxed precision requirements can be specified by using the fast-math flag
; fast or by introducing a specific allow-flush-denormals flag in LLVM-IR.
define <4 x float> @&#8203;fooR(<4 x float> %A, <4 x float> %B)
{
        %C = fmul fast <4 x float> %A, %B

; We may decide to not implement this immediately as making decissions based on
; the floating point flags may require costum lowering of instructions.
; CHECK: fooR
; CHECK: vmul.f32       q

; CHECK-ALLOW-FLUSH: fooR
; CHECK-ALLOW-FLUSH: vmul.f32   q
        ret <4 x float> %C
}

; fooS() perform a scalar floating point multiplication.
;
; On some ARM devices scalar floating point operations are faster when executed
; with NEON instructions. This is modeled by the '+neonfp' feature flag.
;
; Independently on which features the device supports (and which features are
; fast on a device) we should use NEON only if it matches the precision
; requirements of the target as provided by an -enable-flush-denormals-fp-math
; option. (The default for this option may differ on darwin and non-darwin
; systems).
define float @&#8203;fooS(float %A, float %B)
{
        %C = fmul fast float %A, %B

; CHECK: fooS
; CHECK: vmul.f32       s
; CHECK: vmul.f32       s
; CHECK: vmul.f32       s
; CHECK: vmul.f32       s

; CHECK-ALLOW-FLUSH: fooS
; CHECK-ALLOW-FLUSH: vmul.f32   q
        ret float %C
}

; bar() performs a vector integer multiplication. On an ARM NEON device, this
; code should always be execute as vector code, as floating point precision
; requirements do not apply.
define <4 x i32> @&#8203;bar(<4 x i32> %A, <4 x i32> %B)
{
        %C = mul <4 x i32> %A, %B
; CHECK: bar
; CHECK: vmul.i32       q

; CHECK-ALLOW-FLUSH: bar
; CHECK-ALLOW-FLUSH: vmul.i32   q
        ret <4 x i32> %C
}

The text was updated successfully, but these errors were encountered:

rengolin · 2016-01-18T19:03:23Z

This may be a way to fix the fast-math issue in llvm/llvm-bugzilla-archive#16275 .

rengolin · 2016-02-12T16:29:04Z

After a long discussion on the list, this approach can be very problematic for NEON intrinsics (which require that NEON instructions be generated no matter what IEEE status or fast-math flags). Since IR doens't differentiate between code that has been produced by vectorizers or NEON intrinsics, we can't apply any serialization rule indiscriminately.

The only option left would be to have an extra command line option requesting IEEE compliance, and then it would be the user's responsibility to check the existence of NEON intrinsics, hand-crafted IR, etc.

This is also too big a hammer to fix #16275, which already has its own fix.

All in all interesting, but too low on the priority list for me to work on it.

tobiasgrosser · 2021-11-26T18:04:24Z

mentioned in issue llvm/llvm-bugzilla-archive#16275

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

This was referenced Sep 1, 2024

SLPVectorizer does not respect isFPVectorizationPotentiallyUnsafe() #106909

Closed

Difference between compile-time and runtime float precision on 32-bit x86 without SSE can cause miscompilation leading to segfault #89885

Open

efriedma-quic mentioned this issue Feb 20, 2025

[ARM][AArch64] Vector intrinsics do not match hardware behavior for NaN, subnormals #128006

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float instructions should only be lowered to NEON if precision constraints permit #16648

float instructions should only be lowered to NEON if precision constraints permit #16648

tobiasgrosser commented Jun 7, 2013 •

edited by nikic

Loading

rengolin commented Jan 18, 2016

rengolin commented Feb 12, 2016

tobiasgrosser commented Nov 26, 2021

float instructions should only be lowered to NEON if precision constraints permit #16648

float instructions should only be lowered to NEON if precision constraints permit #16648

Comments

tobiasgrosser commented Jun 7, 2013 • edited by nikic Loading

Extended Description

rengolin commented Jan 18, 2016

rengolin commented Feb 12, 2016

tobiasgrosser commented Nov 26, 2021

tobiasgrosser commented Jun 7, 2013 •

edited by nikic

Loading