inline asm "rm" constraint lowered "m" when "r" would be preferable #20571

d0k · 2014-07-03T16:40:05Z


Bugzilla Link	20197
Version	trunk
OS	Linux
Blocks	#4440
CC	@legrosbuffle,@echristo,@isanbard,@josephcsible,@nickdesaulniers,@zygoloid,@tstellar

Extended Description

When multiple alternatives in an inline asm constraint are given we ignore all of them but the most "general". This gives nasty artifacts in the code.

int bsr(unsigned v) {
  int ret;
  __asm__("bsr %1, %0" : "=&r"(ret) : "rm"(v) : "cc");
  return ret;
}

$ clang -O3 -S -o - t.c
bsr:
	movl	%edi, -4(%rsp)
	#APP
	bsrl	-4(%rsp), %eax
	#NO_APP
	retq

The spilling is totally unnecessary. GCC gets this one right. On 32 bit x86 it's even worse:

$ clang -O3 -S -o - t.c -m32
bsr:
	pushl	%eax
	movl	8(%esp), %eax
	movl	%eax, (%esp)
	#APP
	bsrl	(%esp), %eax
	#NO_APP
	popl	%edx
	retl

GCC knows a better way:

$ gcc-4.8 -O3 -S -o - t.c -m32
bsr:
#APP
	bsr 4(%esp), %eax
#NO_APP
	ret

The constraint "g" is just as bad, being translated into "imr" internally.

echristo · 2014-07-03T19:36:10Z

Agreed.

llvmbot · 2018-03-20T15:52:24Z

*** Bug #34837 has been marked as a duplicate of this bug. ***

llvmbot · 2018-03-20T17:41:58Z

Would not it be better to at least treat 'mr' as 'r' rather than 'm'? This will probably yield better code in many cases.

llvmbot · 2018-06-01T14:33:02Z

*** Bug #36931 has been marked as a duplicate of this bug. ***

llvmbot · 2018-06-01T14:40:11Z

*** Bug #30873 has been marked as a duplicate of this bug. ***

josephcsible · 2020-06-06T18:08:35Z

Would not it be better to at least treat 'mr' as 'r' rather than 'm'? This
will probably yield better code in many cases.

"mr" isn't being treated as "m". Consider this C function:

void f(int x) {
    asm volatile("# x is in %0" :: "mr"(x));
}

With "clang -O3 -m32", it compiles into this mess:

f:
        pushl   %eax
        movl    8(%esp), %eax
        movl    %eax, (%esp)
        # x is in (%esp)
        popl    %eax
        retl

But if I use "m" instead of "mr", then it compiles into what I wanted:

f:
        # x is in 4(%esp)
        retl

So the presence of "r" is somehow making the codegen worse even though it's putting the value in memory anyway.

efriedma-quic · 2020-09-15T22:13:26Z

*** Bug #46874 has been marked as a duplicate of this bug. ***

isanbard · 2020-09-15T22:41:41Z

[Copy-n-paste of my harebrained idea here]

The major issue with supporting multiple constraints is how we model those constraints until register allocation is complete. (Thank you, Capt. Obvious!) The decision of which constraint to use is made during DAG selection. So there's no (easy) way to change this during register allocation.

Half-baked idea (all very hand-wavy):

Different constraints use different set ups (pre-asm code) and tear downs (post-asm code) for the inline asm. What if we created pseudo-instructions to represent inline asm set up and tear down? Something like this:

INLINEASM_SETUP <representing register/memory setup>
INLINEASM <...>
INLINEASM_TEARDOWN <representing copy of results into vregs/pregs>

The register allocator could then try different constraints (going from most restrictive to least restrictive) until it find one that works.

One drawback is that the RA needs to process INLINEASM before it can generate the correct code for INLINEASM_SETUP. That might be doable if the three instructions are treated as a single unit.

efriedma-quic · 2021-03-03T23:02:16Z

*** Bug #48750 has been marked as a duplicate of this bug. ***

llvmbot · 2021-11-26T22:45:56Z

mentioned in issue llvm/llvm-bugzilla-archive#31525

llvmbot · 2021-11-27T00:02:33Z

mentioned in issue llvm/llvm-bugzilla-archive#35489

llvmbot · 2021-11-27T00:42:05Z

mentioned in issue llvm/llvm-bugzilla-archive#37583

edwintorok · 2021-11-27T01:36:49Z

mentioned in issue #4440

efriedma-quic · 2021-11-27T03:38:02Z

mentioned in issue llvm/llvm-bugzilla-archive#47530

efriedma-quic · 2021-11-27T04:08:55Z

mentioned in issue llvm/llvm-bugzilla-archive#49406

nickdesaulniers · 2021-12-13T22:23:01Z

I was thinking of this issue while reading ChooseConstraint in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp.

bwendling · 2021-12-13T22:58:07Z

I was thinking of this issue while reading ChooseConstraint in llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp.

We need to talk about your reading habits. :-P

llvmbot · 2022-08-02T08:12:14Z

@llvm/issue-subscribers-backend-x86

nickdesaulniers · 2022-11-28T18:00:44Z

At the llvm dev conf '22 @topperc or @compnerd or @arsenm mentioned that perhaps we could add a new register flag (akin to early-clobber, def, or kill) and then another pass ("memfold"? or possibly just regalloc) could try to DTRT.

EDIT: I think @topperc was referring to TargetInstrInfo::foldMemoryOperand or InlineSpiller::foldMemoryOperand.

nickdesaulniers · 2023-08-25T22:54:21Z

Ah, looks like we haven't made any progress on this since my last report above 9mo ago. Still an issue though.
https://lore.kernel.org/lkml/CAHk-=whVvD05T0yD5DQj803uETLD6qDq-Vx-SiLPcrL=eO77LQ@mail.gmail.com/

nickdesaulniers · 2023-08-30T17:05:29Z

So I have something hacked up that is slightly working; it's not ready to be published today but maybe by end of next week I'll have something to demonstrate feasibilty.

The idea is:

have selectiondag ChooseConstraint be more aggressive, but add a new InlineAsm::Kind_* value between Kind_Register and Kind_Mem called Kind_SpillableRegister (or w/e) which basically behaves as Kind_Register in all cases but the one described below, which is what selectiondag will set.
have greedy register allocator check for these kind of operands before imminent register exhaustion; if they exist, spill before the inline asm, reload after the inline asm, and update the inline asm to use the newly created stack slot.

Basically, it seems that RA doesn't know how to spill INLINEASM (and INLINEASM_BR) register operands. We need to be able to:

signal to RA the difference between 3 cases:

"the constraint string for this operand allows the operand to be spilled" (i.e. "rm")
"the constraint string for this operand does not allow the operand to be spilled (i.e. "r")"
"the constraint string for this operand does not allow the operand to exist in a register (i.e. "m").

Right now, selectiondag picks one of the last two, lacking the ability to express the first, since RA lacks the machinery from 2 below.

the transformation machinery for INLINEASM:

insert spill before INLINEASM
insert reload after INLINEASM (or reloads after every indirect branch target of INLINEASM_BR)
transform operand from reguse to mem
transform operand from virtreg to stack slot

There's probably more I need to do for greedy RA to update the live range info (I don't fully understand LiveRangeEdit quite yet). Right now my hacky code is failing to clean up the previously used virtreg which leads to an assertion failure down the line; I hope to have that fixed by EOD but we'll see.

cc @qcolombet

This enables -regalloc=greedy to memfold spillable inline asm MachineOperands. Because no instruction selection framework marks MachineOperands as spillable, no language frontend can observe functional changes from this patch. That will change once instruction selection frameworks are updated. Link: #20571

nickdesaulniers · 2023-12-01T16:52:04Z

ok, the core parts have all landed for greedy.

(I noticed that peephole-opt will fold loads BEFORE reaching any register allocator, but not any stores).

Next up:

I noticed that peephole-opt folds loads; we might want to disable that for inline asm...I need to think more about that case.

In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm (llvm#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: llvm#20571

In commit b053359 ("[X86InstrInfo] support memfold on spillable inline asm (#70832)"), I had a last minute fix to update the memoperands. I originally did this in the parent foldInlineAsmMemOperand call, updated the mir test via update_mir_test_checks.py, but then decided to move it to the child call of foldInlineAsmMemOperand. But I forgot to rerun update_mir_test_checks.py. That last minute change caused the same memoperand to be added twice when recursion occurred (for tied operands). I happened to get lucky that trailing content omitted from the CHECK line doesn't result in test failure. But rerunning update_mir_test_checks.py on the mir test added in that commit produces updated output. This is resulting in updates to the test that: 1. conflate additions to the test in child commits with simply updating the test as it should have been when first committed. 2. look wrong because the same memoperand is specified twice (we don't deduplicate memoperands when added). Example: INLINEASM ... :: (load (s32) from %stack.0) (load (s32) from %stack.0) Fix the bug, so that in child commits, we don't have additional unrelated test changes (which would be wrong anyways) from simply running update_mir_test_checks.py. Link: #20571

nickdesaulniers · 2024-04-09T15:43:37Z

putting this down for now. If someone else wants to pick up the torch, I can push my WIP branch somewhere which may be useful as a reference (but will probably bit rot).

bwendling · 2024-04-10T19:19:29Z

I can push my WIP branch somewhere which may be useful as a reference (but will probably bit rot).

Please do.

As alluded to in llvm#20571, it would be nice if we could mutate operand lists of MachineInstr's more safely.

nickdesaulniers · 2024-04-10T20:07:16Z

Are some of my branches (I have 6 locally; they're free) but those 3 appear to be the latest/most developed. Really, the issue I ran into with RegAllocFast was more so: do we try to reassign a register to a stack slot and mutate the inline asm upon discovering we're in a state of register exhaustion (Regalloc fast was not designed to do this, everything is a const reference, and I'd wind up with duplicated instructions inserted in a bunch of places), or try to spill inline asm "rm" variables proactively (shot down in code review).

https://gist.github.com/nickdesaulniers/8f20bea3bcdd9fe97219428ab6e8bf8b were a handful of end-to-end tests I was using to test support for all architectures. You may find these useful.

danilaml · 2024-04-10T20:25:30Z

regallocfact can fail to allocate regs even without this patch. I think it's not supposed to be used in production, just like fast-isel (might be wrong), so another possibility would be just to revert the old (always in-memory) behavior when fast regalloc is selected.

nickdesaulniers · 2024-04-10T20:36:18Z

There is no "revert [to] the old behavior" here. ISEL needs to be changed for this to work. Then which regalloc frameworks runs after needs to be able to handle what ISEL has decided. If we don't change ISEL, we don't fix this bug. If we don't fix regallocfast, then we may fail to compile code that previously we could.

danilaml · 2024-04-10T21:11:27Z

@nickdesaulniers is there an issue with making isel "deciding" to treat "rm" as "m" (old behavior) when fast regalloc is selected?

nickdesaulniers · 2024-04-10T21:20:24Z

I don't recall where I was told this, but I recall being told in no uncertain terms that all ISEL frameworks need to support all regalloc frameworks. This M:N ISEL frameworks to regalloc frameworks is what makes fixing this bug a PITA.

danilaml · 2024-04-10T22:51:48Z

This doesn't change the support though. Although I agree that tying some isel-level option like "Treat rm constraints as m" with the selected regalloc feels a bit icky, but that's the problem with fast regalloc. It can fail and it's best-effort. Any optimization that puts something that was previously in memory into registers can potentially trigger this. The question is if running fastregalloc on code with inline asm with "rm" constraints is a common enough scenario that it needs to have so much effort spent on trying to make it better (it already works to my understanding).

bwendling · 2024-04-11T21:23:39Z

I don't recall where I was told this, but I recall being told in no uncertain terms that all ISEL frameworks need to support all regalloc frameworks. This M:N ISEL frameworks to regalloc frameworks is what makes fixing this bug a PITA.

Maybe in this comment? #20571 (comment)

bwendling · 2024-04-16T00:53:27Z

I'm thinking of following this course of action:

Have Clang generate a memory constraint for 'rm', the current behavior, for non-Greedy register alloctors.
Clang generates a foldable 'r' constraint for the Greedy allocator.

This will take care of (almost) all uses of this feature.

What about code from another front-end or hand-written LLVM IR?

Fuck that shit....
Okay, if we MUST, we could modify the inline asm instruction before code-gen.

nikic · 2024-04-16T01:30:54Z

If supporting this in RegAllocFast is too hard, I think making sure that the problematic patterns don't reach it is okay, assuming this is done in a reliable matter (making this a clang frontend decision is obviously not acceptable). Having all isel work with all regalloc is a nice goal to have, but I don't think it needs to block this feature if it's too hard to achieve. At least that's my 2c.

bwendling · 2024-04-16T18:38:57Z

making this a clang frontend decision is obviously not acceptable

While I agree in general, this is how it's done today, and there aren't a lot of alternative methods of input we are desperate to support. The inline ASM syntax is specific to GCC (and only "mostly" supported in Clang). It requires supporting only those languages which either GCC supports or which use its inline ASM syntax (in which case, "What hath God wrought?"). Given that restrictive set, we're then left with IR as the main thing we need to support. It's possible to handle this while reading the IR, and it's important for testing. But I think focusing on the front-end first will give us a greater benefit for our work.

arsenm · 2024-04-17T17:49:03Z

I haven't followed the details here in this giant thread, but

Okay, if we MUST, we could modify the inline asm instruction before code-gen.

This doesn't sound too horrible to me

Are some of my branches (I have 6 locally; they're free) but those 3 appear to be the latest/most developed. Really, the issue I ran into with RegAllocFast was more so: do we try to reassign a register to a stack slot and mutate the inline asm upon discovering we're in a state of register exhaustion (Regalloc fast was not designed to do this, everything is a const reference, and I'd wind up with duplicated instructions inserted in a bunch of places), or try to spill inline asm "rm" variables proactively (shot down in code review).

I don't see the issue here? The FastRA philosophy is to just spill everything all the time to avoid doing anything hard

nickdesaulniers · 2024-04-17T18:20:41Z

Are some of my branches (I have 6 locally; they're free) but those 3 appear to be the latest/most developed. Really, the issue I ran into with RegAllocFast was more so: do we try to reassign a register to a stack slot and mutate the inline asm upon discovering we're in a state of register exhaustion (Regalloc fast was not designed to do this, everything is a const reference, and I'd wind up with duplicated instructions inserted in a bunch of places), or try to spill inline asm "rm" variables proactively (shot down in code review).

I don't see the issue here? The FastRA philosophy is to just spill everything all the time to avoid doing anything hard

@MatzeB 's feedback seemed to want to avoid doing a pre-emptive scan over every instruction to find inline asm instructions (which probably don't exist in most programs).

Rereading his feedback though, it sounds like he wasn't necessarily against the approach I had proposed (as I had mis-remembered). I don't recall whether I tested his suggestion, and what the results were.

arsenm · 2024-04-17T18:24:49Z

@MatzeB 's feedback seemed to want to avoid doing a pre-emptive scan over every instruction to find inline asm instructions (which probably don't exist in most programs).

MachineFunction already has hasInlineAsm to skip the walk over the function. Could even refine this if really needed

When using the inline asm constraint string "rm" (or "g"), we generally would like the compiler to choose "r", but it is permitted to choose "m" if there's register pressure. This is distinct from "r" in which the register is not permitted to be spilled to the stack. The decision of which to use must be made at some point. Currently, the instruction selection frameworks (ISELs) make the choice, and the register allocators had better be able to handle the result. Steal a bit from Storage when using register operands to disambiguate between the two cases. Add helpers/getters/setters, and print in MIR when such a register is foldable. The getter will later be used by the register allocation frameworks (and asserted by the ISELs) while the setters will be used by the instruction selection frameworks. Link: llvm/llvm-project#20571

@qcolombet

… ops as having infinite spill weight (#70747) This is necessary for RegAllocGreedy support for memory folding inline asm that uses "rm" constraints. Thanks to @qcolombet for the suggestion. Link: llvm/llvm-project#20571

bwendling · 2024-04-23T23:02:51Z

There's a gross error that happens with the current WIP code:

long int __attribute__((noinline)) foo(long int y) {
  long int x, z;

  asm __volatile__ ("pushf\n\t"
                    "popq %0\n\t"
                    "pushq %2\n\t"
                    "popq %1"
                    : "=rm" (x), "=rm" (z)
                    : "rm" (y)
                    : "ax", "bx", "cx", "dx", "di", "si", "bp",
                      "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15"
                    );
  return x;
}

int main() {
  __builtin_printf("flags: 0x%08lx\n", foo(991278));
  return 0;
}

What should be generated is:

	movq	%rdi, -24(%rsp)
	#APP
	pushfq
	popq	-16(%rsp)
	pushq	-24(%rsp)
	popq	-8(%rsp)
	#NO_APP
	movq	-16(%rsp), %rax

However, this is generated:

	movq	%rdi, -32(%rsp)                 # 8-byte Spill
	#APP                                    # 8-byte Folded Reload
	pushfq
	popq	-32(%rsp)
	pushq	-32(%rsp)
	popq	-24(%rsp)
	#NO_APP
	movq	-32(%rsp), %rax                 # 8-byte Reload

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

nickdesaulniers mentioned this issue Dec 13, 2021

__builtin_ia32_readeflags_u64() unnecessarily forces a frame pointer #46875

Closed

nickdesaulniers mentioned this issue Dec 16, 2021

clang asm chooses poorly for "=rm" #46874

Closed

jyknight mentioned this issue Mar 3, 2022

[inline asm] Inline assembly 'g' constraint seem to always prefer memory over registers #54173

Closed

nickdesaulniers mentioned this issue Aug 1, 2022

Odd inline asm code generation with pointless memory operand #56789

Closed

RKSimon added the backend:X86 label Aug 2, 2022

nickdesaulniers changed the title ~~Inline asm constraint alternatives ignored~~ inline asm "rm" constraint lowered "m" when "r" would be preferable Aug 3, 2022

nickdesaulniers mentioned this issue Nov 28, 2022

Suboptimal code generated for inline asm doing pop %0 to a memory operand #59069

Closed

Endilll added the confirmed Verified by a second party label Aug 30, 2023

RKSimon added the inline-asm label Nov 29, 2023

nickdesaulniers mentioned this issue Dec 1, 2023

[TargetInstrInfo] update INLINEASM memoperands once #74135

Merged

nickdesaulniers mentioned this issue Dec 4, 2023

[RegAllocFast] fold foldable inline asm #74344

Draft

nickdesaulniers mentioned this issue Feb 5, 2024

Missed optimization with inline assembly (chooses memory over register) #80528

Closed

nickdesaulniers removed their assignment Apr 9, 2024

nickdesaulniers added a commit to nickdesaulniers/llvm-project that referenced this issue Apr 10, 2024

[MachineInstr] add insert method for variadic instructions

cfd4ecd

As alluded to in llvm#20571, it would be nice if we could mutate operand lists of MachineInstr's more safely.

inline asm "rm" constraint lowered "m" when "r" would be preferable #20571

inline asm "rm" constraint lowered "m" when "r" would be preferable #20571

Comments

d0k commented Jul 3, 2014 • edited by nickdesaulniers

Extended Description

echristo commented Jul 3, 2014

llvmbot commented Mar 20, 2018 • edited by jyknight

llvmbot commented Mar 20, 2018

llvmbot commented Jun 1, 2018 • edited by jyknight

llvmbot commented Jun 1, 2018 • edited by jyknight

josephcsible commented Jun 6, 2020 • edited by jyknight

efriedma-quic commented Sep 15, 2020 • edited by jyknight

isanbard commented Sep 15, 2020

efriedma-quic commented Mar 3, 2021 • edited by jyknight

llvmbot commented Nov 26, 2021

llvmbot commented Nov 27, 2021

llvmbot commented Nov 27, 2021

edwintorok commented Nov 27, 2021

efriedma-quic commented Nov 27, 2021

efriedma-quic commented Nov 27, 2021

nickdesaulniers commented Dec 13, 2021 • edited

bwendling commented Dec 13, 2021

llvmbot commented Aug 2, 2022

nickdesaulniers commented Nov 28, 2022 • edited

nickdesaulniers commented Aug 25, 2023

nickdesaulniers commented Aug 30, 2023 • edited

nickdesaulniers commented Dec 1, 2023 • edited

nickdesaulniers commented Apr 9, 2024

bwendling commented Apr 10, 2024

nickdesaulniers commented Apr 10, 2024

danilaml commented Apr 10, 2024

nickdesaulniers commented Apr 10, 2024

danilaml commented Apr 10, 2024

nickdesaulniers commented Apr 10, 2024

danilaml commented Apr 10, 2024

bwendling commented Apr 11, 2024

bwendling commented Apr 16, 2024

nikic commented Apr 16, 2024

bwendling commented Apr 16, 2024

arsenm commented Apr 17, 2024

nickdesaulniers commented Apr 17, 2024

arsenm commented Apr 17, 2024

bwendling commented Apr 23, 2024

d0k commented Jul 3, 2014 •

edited by nickdesaulniers

llvmbot commented Mar 20, 2018 •

edited by jyknight

llvmbot commented Jun 1, 2018 •

edited by jyknight

llvmbot commented Jun 1, 2018 •

edited by jyknight

josephcsible commented Jun 6, 2020 •

edited by jyknight

efriedma-quic commented Sep 15, 2020 •

edited by jyknight

efriedma-quic commented Mar 3, 2021 •

edited by jyknight

nickdesaulniers commented Dec 13, 2021 •

edited

nickdesaulniers commented Nov 28, 2022 •

edited

nickdesaulniers commented Aug 30, 2023 •

edited

nickdesaulniers commented Dec 1, 2023 •

edited