24529 – Add support for jumping between MSVC inline asm blob labels

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 24529 - Add support for jumping between MSVC inline asm blob labels

Summary: Add support for jumping between MSVC inline asm blob labels

Status:	RESOLVED WONTFIX

Alias:	None

Product:	clang
Classification:	Unclassified
Component:	Frontend (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Unassigned Clang Bugs

URL:
Keywords:

Depends on:	9295
Blocks:
	Show dependency tree

Reported:	2015-08-20 19:38 PDT by Ehsan Akhgari [:ehsan]
Modified:	2017-03-26 10:54 PDT (History)
CC List:	8 users (show)

See Also:	9295
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ehsan Akhgari [:ehsan] 2015-08-20 19:38:49 PDT

Apologies for the crazy test case.  Pay attention to the comments, and ignore the computations (they're just designed to use many registers).

$ cat test.cpp
void g(int);
void f(int x) {
    int k = x + x * (x-1);
    g(k/x);
    int l = (x+1) * x * (x-1);
    int l2 = (x+10) * x * (x-15);
    int l3 = (x+100) * x * (x-150);
    int m = (l2+3) * x * (x-3);
    // at this point, ebx has been written to.
    __asm {
        mov esi, k
    L2:
    inc eax
    }
    // this expression reads ebx.
    int foo = m * l2 / (x -4) * (m+ k/l* l3) * x;
    g(foo);
    if (m == 479594508 && k == 10000) {
        __asm {
            // clobber ebx.
            mov ebx, m
            inc k
            jmp L2
        }
    }
    g(foo);
}
$ cat driver.cpp
#include <stdio.h>
void g(int x) {
  printf("%d\n", x);
}
int main() {
  void f(int);
  f(100);
}
$ ./bin/clang -fasm-blocks -m32 test.cpp driver.cpp -o test.out && ./test.out
100
454085520
454085520
454085520
$ ./bin/clang -O2 -fasm-blocks -m32 test.cpp driver.cpp -o test.out && ./test.out
100
454085520
748924496
748924496

The problem here is that the jmp between the asm blocks invalidates the clobber registers that we compute for each individual asm block.  Basically, the compiler thinks that the first asm block doesn't clobber ebx so it will try to reuse ebx afterwards, which doesn't work the second time that leave the first block (after having jmp'ed to it from the second block.)

The generated code makes the problem obvious.  I'm including it below with comments inline:

$ ./bin/clang -O2 -fasm-blocks -m32 -c test.cpp && ./bin/llvm-objdump -disassemble -x86-asm-syntax=intel test.o

test.o:	file format Mach-O 32-bit i386

Disassembly of section __TEXT,__text:
__Z1fi:
       0:	55 	push	ebp
       1:	89 e5 	mov	ebp, esp
       3:	53 	push	ebx
       4:	57 	push	edi
       5:	56 	push	esi
       6:	83 ec 1c 	sub	esp, 28
       9:	8b 7d 08 	mov	edi, dword ptr [ebp + 8]
       c:	8d 77 ff 	lea	esi, [edi - 1]
       f:	89 f8 	mov	eax, edi
      11:	0f af c0 	imul	eax, eax
      14:	89 45 f0 	mov	dword ptr [ebp - 16], eax
      17:	99 	cdq
      18:	f7 ff 	idiv	edi
      1a:	89 04 24 	mov	dword ptr [esp], eax
      1d:	e8 de ff ff ff 	call	-34 <__Z1fi>
      22:	8d 47 01 	lea	eax, [edi + 1]
      25:	0f af f7 	imul	esi, edi
      28:	0f af f0 	imul	esi, eax
      2b:	89 75 e8 	mov	dword ptr [ebp - 24], esi
      2e:	8d 47 0a 	lea	eax, [edi + 10]
      31:	0f af c7 	imul	eax, edi
      34:	8d 4f f1 	lea	ecx, [edi - 15]
      37:	0f af c8 	imul	ecx, eax
      3a:	8d 41 03 	lea	eax, [ecx + 3]
      3d:	8d 5f fd 	lea	ebx, [edi - 3]
      40:	0f af df 	imul	ebx, edi

Write to ebx:

      43:	0f af d8 	imul	ebx, eax
      46:	89 5d ec 	mov	dword ptr [ebp - 20], ebx
      49:	8b 75 f0 	mov	esi, dword ptr [ebp - 16]

JMP target:

      4c:	40 	inc	eax

Read from ebx (oops!):

      4d:	0f af cb 	imul	ecx, ebx
      50:	8d 77 fc 	lea	esi, [edi - 4]
      53:	89 c8 	mov	eax, ecx
      55:	99 	cdq
      56:	f7 fe 	idiv	esi
      58:	89 c1 	mov	ecx, eax
      5a:	8b 45 f0 	mov	eax, dword ptr [ebp - 16]
      5d:	99 	cdq
      5e:	f7 7d e8 	idiv	dword ptr [ebp - 24]
      61:	89 c6 	mov	esi, eax
      63:	8d 87 6a ff ff ff 	lea	eax, [edi - 150]
      69:	0f af c7 	imul	eax, edi
      6c:	0f af f0 	imul	esi, eax
      6f:	8d 47 64 	lea	eax, [edi + 100]
      72:	0f af f0 	imul	esi, eax
      75:	01 de 	add	esi, ebx
      77:	0f af f7 	imul	esi, edi
      7a:	0f af f1 	imul	esi, ecx
      7d:	89 34 24 	mov	dword ptr [esp], esi
      80:	e8 7b ff ff ff 	call	-133 <__Z1fi>
      85:	81 fb 0c 08 96 1c 	cmp	ebx, 479594508
      8b:	75 11 	jne	17 <__Z1fi+9E>
      8d:	81 7d f0 10 27 00 00 	cmp	dword ptr [ebp - 16], 10000
      94:	75 08 	jne	8 <__Z1fi+9E>

Clobber ebx in the second block:

      96:	8b 5d ec 	mov	ebx, dword ptr [ebp - 20]
      99:	ff 45 f0 	inc	dword ptr [ebp - 16]
      9c:	eb ae 	jmp	-82 <__Z1fi+4C>
      9e:	89 34 24 	mov	dword ptr [esp], esi
      a1:	e8 5a ff ff ff 	call	-166 <__Z1fi>
      a6:	83 c4 1c 	add	esp, 28
      a9:	5e 	pop	esi
      aa:	5f 	pop	edi
      ab:	5b 	pop	ebx
      ac:	5d 	pop	ebp
      ad:	c3 	ret

MSVC deals with this problem by turning off all optimizations on the function as soon as it encounters a cross asm block jump.  This can be observed by comparing the output of MSVC with the jmp instruction commented out vs with it:

============================
With the jmp instruction:
============================

$ cl -O2 -c test.cpp && llvm-objdump -disassemble -x86-asm-syntax=intel test.obj
Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.cpp

test.obj:       file format COFF-i386

Disassembly of section .text$mn:
?f@@YAXH@Z:
       0:       55      push    ebp
       1:       8b ec   mov     ebp, esp
       3:       83 ec 18        sub     esp, 24
       6:       53      push    ebx
       7:       56      push    esi
       8:       8b 45 08        mov     eax, dword ptr [ebp + 8]
       b:       83 e8 01        sub     eax, 1
       e:       0f af 45 08     imul    eax, dword ptr [ebp + 8]
      12:       03 45 08        add     eax, dword ptr [ebp + 8]
      15:       89 45 fc        mov     dword ptr [ebp - 4], eax
      18:       8b 45 fc        mov     eax, dword ptr [ebp - 4]
      1b:       99      cdq
      1c:       f7 7d 08        idiv    dword ptr [ebp + 8]
      1f:       50      push    eax
      20:       e8 00 00 00 00  call    0 <?f@@YAXH@Z+25>
      25:       83 c4 04        add     esp, 4
      28:       8b 4d 08        mov     ecx, dword ptr [ebp + 8]
      2b:       83 c1 01        add     ecx, 1
      2e:       0f af 4d 08     imul    ecx, dword ptr [ebp + 8]
      32:       8b 55 08        mov     edx, dword ptr [ebp + 8]
      35:       83 ea 01        sub     edx, 1
      38:       0f af ca        imul    ecx, edx
      3b:       89 4d ec        mov     dword ptr [ebp - 20], ecx
      3e:       8b 45 08        mov     eax, dword ptr [ebp + 8]
      41:       83 c0 0a        add     eax, 10
      44:       0f af 45 08     imul    eax, dword ptr [ebp + 8]
      48:       8b 4d 08        mov     ecx, dword ptr [ebp + 8]
      4b:       83 e9 0f        sub     ecx, 15
      4e:       0f af c1        imul    eax, ecx
      51:       89 45 f4        mov     dword ptr [ebp - 12], eax
      54:       8b 55 08        mov     edx, dword ptr [ebp + 8]
      57:       83 c2 64        add     edx, 100
      5a:       0f af 55 08     imul    edx, dword ptr [ebp + 8]
      5e:       8b 45 08        mov     eax, dword ptr [ebp + 8]
      61:       2d 96 00 00 00  sub     eax, 150
      66:       0f af d0        imul    edx, eax
      69:       89 55 e8        mov     dword ptr [ebp - 24], edx
      6c:       8b 4d f4        mov     ecx, dword ptr [ebp - 12]
      6f:       83 c1 03        add     ecx, 3
      72:       0f af 4d 08     imul    ecx, dword ptr [ebp + 8]
      76:       8b 55 08        mov     edx, dword ptr [ebp + 8]
      79:       83 ea 03        sub     edx, 3
      7c:       0f af ca        imul    ecx, edx
      7f:       89 4d f8        mov     dword ptr [ebp - 8], ecx
      82:       8b 75 fc        mov     esi, dword ptr [ebp - 4]
      85:       40      inc     eax
      86:       8b 45 f8        mov     eax, dword ptr [ebp - 8]
      89:       0f af 45 f4     imul    eax, dword ptr [ebp - 12]
      8d:       8b 4d 08        mov     ecx, dword ptr [ebp + 8]
      90:       83 e9 04        sub     ecx, 4
      93:       99      cdq
      94:       f7 f9   idiv    ecx
      96:       8b c8   mov     ecx, eax
      98:       8b 45 fc        mov     eax, dword ptr [ebp - 4]
      9b:       99      cdq
      9c:       f7 7d ec        idiv    dword ptr [ebp - 20]
      9f:       0f af 45 e8     imul    eax, dword ptr [ebp - 24]
      a3:       03 45 f8        add     eax, dword ptr [ebp - 8]
      a6:       0f af c8        imul    ecx, eax
      a9:       0f af 4d 08     imul    ecx, dword ptr [ebp + 8]
      ad:       89 4d f0        mov     dword ptr [ebp - 16], ecx
      b0:       8b 55 f0        mov     edx, dword ptr [ebp - 16]
      b3:       52      push    edx
      b4:       e8 00 00 00 00  call    0 <?f@@YAXH@Z+B9>
      b9:       83 c4 04        add     esp, 4
      bc:       81 7d f8 0c 08 96 1c    cmp     dword ptr [ebp - 8], 479594508
      c3:       75 11   jne     17 <?f@@YAXH@Z+D6>
      c5:       81 7d fc 10 27 00 00    cmp     dword ptr [ebp - 4], 10000
      cc:       75 08   jne     8 <?f@@YAXH@Z+D6>
      ce:       8b 5d f8        mov     ebx, dword ptr [ebp - 8]
      d1:       ff 45 fc        inc     dword ptr [ebp - 4]
      d4:       eb af   jmp     -81 <?f@@YAXH@Z+85>
      d6:       8b 45 f0        mov     eax, dword ptr [ebp - 16]
      d9:       50      push    eax
      da:       e8 00 00 00 00  call    0 <?f@@YAXH@Z+DF>
      df:       83 c4 04        add     esp, 4
      e2:       5e      pop     esi
      e3:       5b      pop     ebx
      e4:       8b e5   mov     esp, ebp
      e6:       5d      pop     ebp
      e7:       c3      ret

============================
Without the jmp instruction:
============================

$ cl -O2 -c test.cpp && llvm-objdump -disassemble -x86-asm-syntax=intel test.obj
Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

test.cpp

test.obj:       file format COFF-i386

Disassembly of section .text$mn:
?f@@YAXH@Z:
       0:       51      push    ecx
       1:       53      push    ebx
       2:       55      push    ebp
       3:       8b 6c 24 10     mov     ebp, dword ptr [esp + 16]
       7:       8b c5   mov     eax, ebp
       9:       0f af c5        imul    eax, ebp
       c:       56      push    esi
       d:       57      push    edi
       e:       89 44 24 18     mov     dword ptr [esp + 24], eax
      12:       99      cdq
      13:       f7 fd   idiv    ebp
      15:       50      push    eax
      16:       e8 00 00 00 00  call    0 <?f@@YAXH@Z+1B>
      1b:       8d 45 0a        lea     eax, [ebp + 10]
      1e:       83 c4 04        add     esp, 4
      21:       8d 7d f1        lea     edi, [ebp - 15]
      24:       0f af f8        imul    edi, eax
      27:       8d 45 fd        lea     eax, [ebp - 3]
      2a:       0f af fd        imul    edi, ebp
      2d:       8d 5f 03        lea     ebx, [edi + 3]
      30:       0f af d8        imul    ebx, eax
      33:       0f af dd        imul    ebx, ebp
      36:       89 5c 24 10     mov     dword ptr [esp + 16], ebx
      3a:       8b 74 24 18     mov     esi, dword ptr [esp + 24]
      3e:       40      inc     eax
      3f:       8d 45 01        lea     eax, [ebp + 1]
      42:       8d 4d ff        lea     ecx, [ebp - 1]
      45:       0f af c8        imul    ecx, eax
      48:       8b 44 24 18     mov     eax, dword ptr [esp + 24]
      4c:       99      cdq
      4d:       0f af cd        imul    ecx, ebp
      50:       f7 f9   idiv    ecx
      52:       8d 8d 6a ff ff ff       lea     ecx, [ebp - 150]
      58:       8b f0   mov     esi, eax
      5a:       8b c3   mov     eax, ebx
      5c:       0f af f1        imul    esi, ecx
      5f:       8d 4d 64        lea     ecx, [ebp + 100]
      62:       0f af c7        imul    eax, edi
      65:       0f af f1        imul    esi, ecx
      68:       8d 4d fc        lea     ecx, [ebp - 4]
      6b:       99      cdq
      6c:       f7 f9   idiv    ecx
      6e:       0f af f5        imul    esi, ebp
      71:       03 f3   add     esi, ebx
      73:       0f af f0        imul    esi, eax
      76:       0f af f5        imul    esi, ebp
      79:       56      push    esi
      7a:       e8 00 00 00 00  call    0 <?f@@YAXH@Z+7F>
      7f:       83 c4 04        add     esp, 4
      82:       81 fb 0c 08 96 1c       cmp     ebx, 479594508
      88:       75 12   jne     18 <?f@@YAXH@Z+9C>
      8a:       81 7c 24 18 10 27 00 00         cmp     dword ptr [esp + 24], 10000
      92:       75 08   jne     8 <?f@@YAXH@Z+9C>
      94:       8b 5c 24 10     mov     ebx, dword ptr [esp + 16]
      98:       ff 44 24 18     inc     dword ptr [esp + 24]
      9c:       56      push    esi
      9d:       e8 00 00 00 00  call    0 <?f@@YAXH@Z+A2>
      a2:       83 c4 04        add     esp, 4
      a5:       5f      pop     edi
      a6:       5e      pop     esi
      a7:       5d      pop     ebp
      a8:       5b      pop     ebx
      a9:       59      pop     ecx
      aa:       c3      ret

Comment 1 Ehsan Akhgari [:ehsan] 2015-08-20 19:40:54 PDT

I suggest fixing this bug by clobbering all registers at the end of each inline asm block as soon as we detect a label being used inside a block that doesn't define it.

What do you think?

Comment 2 Reid Kleckner 2015-08-24 14:58:42 PDT

(In reply to comment #1)
> I suggest fixing this bug by clobbering all registers at the end of each
> inline asm block as soon as we detect a label being used inside a block that
> doesn't define it.
> 
> What do you think?

It's not enough. If we want to optimize, the CFG has to be correct, it can't be missing any edges. SSA requires it. This is the same problem that setjmp has, and why users have to add volatile near setjmp.

Here's a different example involving unreachable code:

int f() {
  __asm L1: jmp L2
  return 42;
// unreachable code?
  __asm L2: jmp L1
}

LLVM will see this:

define i32 @f() {
entry:
  call sideffect asm "..."()
  ret i32 42

after_ret:
  call sideffect asm "..."()
  unreachable
}

And CFG simplification will delete after_ret, causing errors at assembly time.

Comment 3 James Y Knight 2017-02-21 22:20:58 PST

ISTM that implementing the GCC "asm goto" (https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html), bug 9295, is effectively a prerequisite for this.

Both seem to call for a block-terminator variant of inline asm, with a variable number of target labels tacked on. That'd allow jumping OUT of inline asm safely, and seems like it ought to be implementable without too much complication.

The first complication MSVC style asm adds to that is that you need to parse out the target labels from the assembly instructions, rather than having them listed out for you in one place. But that's just the usual complication for msvc inline-asm.

What seems like the real complication here is that MSVC lets you jump INTO the middle of an asm block (which is *not* allowed in GCC-style "asm goto").

It also allows you to use C labels and asm labels interchangeably: "Both assembly instructions and goto statements can jump to labels inside or outside the __asm block." (per https://msdn.microsoft.com/en-us/library/78cxesy1.aspx).

So, if you do have some (asm or C) code jumping into the middle of an asm statement, how could you annotate that in LLVM? You could create a new llvm block right before the target asm statement and pretend you'll be jumping to the beginning of the block (even though you're jumping to an asm label in the middle) -- after all, the content of the asm is opaque to LLVM, it doesn't care if you jump to the beginning or the middle of it.

But then what if some other instructions are added to the head of that block before the target asm statement? Skipping *those* would be bad...

So, do you split asm with a label in the middle into multiple LLVM blocks? But that seems like a bad idea too...presumably you're allowed to depend on the machine keeping the same state when crossing a label within an asm block. But if an asm block got translated into two separate llvm asm statements, in separate basic blocks, then that would not be guaranteed.

Comment 4 Reid Kleckner 2017-03-01 15:21:34 PST

I agree with everything in c#3. I'm pretty close to just marking this WONTFIX.

We *could* hypothetically implement this the way we implemented WinEH, where we have an inline asm instrution that is both a terminator and a "leader", meaning must be the first non-phi instruction like a landingpad or catchswitch. Such a block has no insertion point: passes must non insert any non-phi instructions into it. In theory, the existence of catchswitch requires that all passes handle these blocks gracefully. Any phis appearing in a leader-terminator block would be demoted to memory. This would ensure that no code is generated between two asm blobs that should be consistent. We would also need to communicate that the second block *must* be the layout successor, which is also pretty weird.

However, that's all quite challenging and probably not worth anyone's time. Let's go with wontfix.