Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang asm chooses poorly for "=rm" #46874

Closed
llvmbot opened this issue Sep 14, 2020 · 7 comments
Closed

clang asm chooses poorly for "=rm" #46874

llvmbot opened this issue Sep 14, 2020 · 7 comments
Labels
bugzilla Issues migrated from bugzilla clang:codegen duplicate Resolved as duplicate

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 14, 2020

Bugzilla Link 47530
Resolution DUPLICATE
Resolved on Sep 15, 2020 16:20
Version trunk
OS Linux
Blocks #4440
Reporter LLVM Bugzilla Contributor
CC @topperc,@echristo,@efriedma-quic,@isanbard,@jyknight,@nickdesaulniers,@zygoloid

Extended Description

With this trivial input:

unsigned long mov_zero(void)
{
unsigned long ret;
asm ("movq $0, %0" : "=rm" (ret));
return ret;
}

clang -O2 generates this very suboptimal output:

mov_zero:
movq $0x0,-0x8(%rsp)
mov -0x8(%rsp),%rax
ret

gcc does much better:

mov_zero:
mov $0x0,%rax
retq

@topperc
Copy link
Collaborator

topperc commented Sep 14, 2020

I think we just blindly give priority to the memory contraint over the register constraint.

@isanbard
Copy link
Contributor

The major issue with supporting multiple constraints is how we model those constraints until register allocation is complete. (Thank you, Capt. Obvious!) The decision of which constraint to use is made during DAG selection. So there's no (easy) way to change this during register allocation.

Half-baked idea (all very hand-wavy):

Different constraints use different set ups (pre-asm code) and tear downs (post-asm code) for the inline asm. What if we created pseudo-instructions to represent inline asm set up and tear down? Something like this:

INLINEASM_SETUP <representing register/memory setup>
INLINEASM <...>
INLINEASM_TEARDOWN <representing copy of results into vregs/pregs>

The register allocator could then try different constraints (going from most restrictive to least restrictive) until it find one that works.

One drawback is that the RA needs to process INLINEASM before it can generate the correct code for INLINEASM_SETUP. That might be doable if the three instructions are treated as a single unit.

@efriedma-quic
Copy link
Collaborator

*** This bug has been marked as a duplicate of bug #20571 ***

@efriedma-quic
Copy link
Collaborator

INLINEASM_SETUP <representing register/memory setup>
INLINEASM <...>
INLINEASM_TEARDOWN <representing copy of results into vregs/pregs>

If we just care about "rm", specifically, there's a much simpler solution: we emit the operand as a register, but add flag to indicate it can be rewritten to a memory operand.

@isanbard
Copy link
Contributor

It might be fine to have that as the first step, but I think there are other unoptimized uses to consider, e.g.:

int test(int n, int idx) {
asm("add %1, %0"
: "=ro"(idx)
: "ro"(n)
: "cc");
return idx;
}

Though this involves teaching the reg alloc. about addressing modes.

@nickdesaulniers
Copy link
Member

mentioned in issue llvm/llvm-bugzilla-archive#47531

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@nickdesaulniers
Copy link
Member

this was duplicated to: #20571

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:codegen duplicate Resolved as duplicate
Projects
None yet
Development

No branches or pull requests

5 participants