47530 – clang asm chooses poorly for "=rm"

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47530 - clang asm chooses poorly for "=rm"

Summary: clang asm chooses poorly for "=rm"

Status:	RESOLVED DUPLICATE of bug 20197

Alias:	None

Product:	clang
Classification:	Unclassified
Component:	LLVM Codegen (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned Clang Bugs

URL:
Keywords:

Depends on:
Blocks:	4068
	Show dependency tree

Reported:	2020-09-14 14:55 PDT by Andy Lutomirski
Modified:	2020-09-15 16:20 PDT (History)
CC List:	9 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andy Lutomirski 2020-09-14 14:55:11 PDT

With this trivial input:

unsigned long mov_zero(void)
{
	unsigned long ret;
	asm ("movq $0, %0" : "=rm" (ret));
	return ret;
}

clang -O2 generates this very suboptimal output:

mov_zero:
 movq $0x0,-0x8(%rsp)
 mov -0x8(%rsp),%rax
 ret

gcc does much better:

mov_zero:
 mov $0x0,%rax
 retq

Comment 1 Craig Topper 2020-09-14 15:13:20 PDT

I think we just blindly give priority to the memory contraint over the register constraint.

Comment 2 Bill Wendling 2020-09-15 15:07:13 PDT

The major issue with supporting multiple constraints is how we model those constraints until register allocation is complete. (Thank you, Capt. Obvious!) The decision of which constraint to use is made during DAG selection. So there's no (easy) way to change this during register allocation.

Half-baked idea (all very hand-wavy):

Different constraints use different set ups (pre-asm code) and tear downs (post-asm code) for the inline asm. What if we created pseudo-instructions to represent inline asm set up and tear down? Something like this:

  INLINEASM_SETUP <representing register/memory setup>
  INLINEASM <...>
  INLINEASM_TEARDOWN <representing copy of results into vregs/pregs>

The register allocator could then try different constraints (going from most restrictive to least restrictive) until it find one that works.

One drawback is that the RA needs to process INLINEASM before it can generate the correct code for INLINEASM_SETUP. That might be doable if the three instructions are treated as a single unit.

Comment 3 Eli Friedman 2020-09-15 15:13:26 PDT


*** This bug has been marked as a duplicate of bug 20197 ***

Comment 4 Eli Friedman 2020-09-15 16:15:39 PDT

(In reply to Bill Wendling from comment #2)
>   INLINEASM_SETUP <representing register/memory setup>
>   INLINEASM <...>
>   INLINEASM_TEARDOWN <representing copy of results into vregs/pregs>

If we just care about "rm", specifically, there's a much simpler solution: we emit the operand as a register, but add flag to indicate it can be rewritten to a memory operand.

Comment 5 Bill Wendling 2020-09-15 16:20:46 PDT

It might be fine to have that as the first step, but I think there are other unoptimized uses to consider, e.g.:

int test(int n, int idx) {
 asm("add %1, %0"
        : "=ro"(idx)
        : "ro"(n)
        : "cc");      
 return idx;
}

Though this involves teaching the reg alloc. about addressing modes.