-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed optimization: inefficient codegen for __builtin_addc #35591
Comments
I have also found this. Here is the code I am using:
ADD96_v1 was my original function, and ADD96_v2 is my attempt to rewrite it using __builtin_addc. While the second version works, I am surprised that it is less optimal than the first version: using the Godbolt Compiler Explorer, the generated ARM code contains an additional three seemingly unnecessary instructions; worse the PPC code utilises branch operations! My compiler options are: -target arm-unknown-linux-gnueabihf -mcpu=cortex-a9 -O2 and: -target powerpc-unknown-linux-gnu -mcpu=603e -O2 It seems that the __builtin_addc* functions don't really have an advantage; is that true? |
Current Codegen: https://godbolt.org/z/cVSccY |
rG257acbf6aee9 solves much of the poor codegen generically in DAGCombine x86/arm code works as expected ppc struggles as the target still needs to be updated to use ADDCARRY/SUBCARRY Andy - maybe open a PPC specific bug? |
Adding some PPC guys who might know whats best to do to improve PPC support and reduce the branching. Do we need to extend the ISD::ADD/SUBCARRY combines (where possible) to better support ISD::ADD/SUBE or can we move PPC away from ISD::ADD/SUBC + ISD::ADD/SUBE? |
We aren't currently working on moving away from the old carry setting/using nodes to the new ones, but we do not have any fundamental issue with doing so. I am not sure this statement is any help though. |
Extended Description
clang has __builtin_addc* functions, which are supposed to emit hardware add-with-carry instructions. However, there is no corresponding intrinsic on LLVM side, so clang emits a sequence of instructions that is only recognized and folded to a single hw instruction in two cases:
This means that any carry chains longer than 2 result in inefficient code:
Compiles to:
I suppose we're going to need a new target-independent generic intrinsic,
say { iX, i1 } @llvm.uadd.with.overflow.carry.iX(iX, iX, i1) (and a corresponding one for subtraction as well) and map it to ISD::ADDE / ISD::ADDCARRY.
The text was updated successfully, but these errors were encountered: