44272 – [CLANG-CL] 64x inline assembler function call/jump miscompiled

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 44272 - [CLANG-CL] 64x inline assembler function call/jump miscompiled

Summary: [CLANG-CL] 64x inline assembler function call/jump miscompiled

Status:	RESOLVED FIXED

Alias:	None

Product:	new-bugs
Classification:	Unclassified
Component:	new bugs (show other bugs)
Version:	9.0
Hardware:	PC Windows NT

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:	miscompilation

Depends on:
Blocks:

Reported:	2019-12-11 02:38 PST by Martynas Skirmantas
Modified:	2020-01-11 13:42 PST (History)
CC List:	4 users (show)

See Also:	44503
Fixed By Commit(s):	dc5b614fa9a1, 1c545f6dbcbb

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martynas Skirmantas 2019-12-11 02:38:00 PST

I've recently starting using Clang-CL in Visual Studio 2019 and I've discovered a fatal bug within the compiler, when attempting to call a function within inline assembly the function pointer is dereferenced, causing to crash due to memory access violation, the same issue exists by attempting to "jmp" to a function directly, I've tried to fix this in various ways, but I only found a way to get around the problem, this happens only with 64bit inline assembler and the fatal bug is clearly visible within assembly output.

The problem is replicated by doing something like this:

int main() {
      __asm {
           xor rcx, rcx
           call exit
      }
      printf("The application didn't quit!");
      return 0;
}

The generated inline assembly looks like this:

xor rcx, rcx
call qword ptr [exit]

The way I get around the problem is by using "lea" instruction to retrieve the function pointer, which looks like this:

__asm {
     xor rcx, rcx
     lea rax, exit
     call rax 
}

Results in generated inline assembly that looks like this:

xor rcx, rcx
lea rax, [exit]
call rax

This no longer results in a crash.

My Clang-CL installation (--version):

CLang Version: 9.0.0 (release-final)
Target: x86_64-pc-windows-msvc
InstalledDir: C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\Llvm\bin

The Clang-CL compiler is downloaded via visual studio 2019 installer, selecting the "C++ Clang tools for Windows (9.0.0 - x64/x86)"

I hope this problem can be addressed soon, this is my first time reporting a bug. Thank you.

Comment 1 Reid Kleckner 2019-12-11 11:38:57 PST

To my knowledge, Intel-style x64 inline asm isn't supported by any other compiler, so this feature has never truly been tested. I know that MSVC does not support this feature. LLVM just happens to support it because we didn't add logic to explicitly disable it. I am not surprised that it is full of bugs. Be wary.

Comment 2 Martynas Skirmantas 2019-12-11 12:02:28 PST

Makes sense, but considering it works its a great thing, I think this is should be supported to at least some extent, I am sure that its not just me that needs to use 64 bit inline assembly and masm or some other assembler is not a option, these days lack of 64bit support is unacceptable, nearly all desktop's run 64bit, from my testing so far this is the only bug that I've encountered.

Comment 3 Eric Astor 2019-12-12 07:33:02 PST

First, as Reid says, no other compiler supports this feature, and MSVC explicitly declared it banned on x86-64 in favor of compiler intrinsics. If you really need to use assembly code, you may want to use standalone assembly files as part of your build.

However:

For easy reference, a minimal example in Compiler Explorer:
https://godbolt.org/z/XwfM6w

A bit of experimentation suggests that there are problems at both the IR generation & assembly generation levels when using 64-bit inline assembly code, so this would need to be done in stages. First task would be to get LLVM IR working that parallels what works in AT&T style, then get the compiler to generate this.

Comment 4 Martynas Skirmantas 2019-12-12 11:26:12 PST

Yeah, understood, doesn't look like this will go anywhere but would be great, I wouldn't say that its banned, there were recent posts of Microsoft claiming they are looking into it but there's still no response from them regarding that, intrinsics don't really cut it when you need absolute control without the function call overhead or some other reason? truly a bummer for someone that works with the system internals all the time.

GCC/G++ (GNU) supports 64bit inline assembly which I have used in the past but moved on due the need to use windows libraries which has many problems when attempted to use with GNU. I've even managed to make a couple of kernel drivers with it, but that is not a option for someone that of focusing on only windows and needs to use very uncommon libraries.

Well, thank you for taking the time at taking a look at the problem, I thought it may be worth reporting regardless so it wouldn't go unnoticed.

Comment 5 Martynas Skirmantas 2019-12-12 14:31:18 PST

Hold up, I have reopened the bug report, I did way more digging and I found many people stating that Clang support 64bit inline assembly and some people are using it by looking at stackoverflow questions, here is old post: http://blog.llvm.org/2018/03/clang-is-now-used-to-build-chrome-for.html 

It states that chrome is using Clang-CL and they use 64 bit and 32 bit inline assembly to build a library. What is going on here? it just doesn't add up, there are many people that claim this if you google enough your self, surely a llvm blogging site wouldn't lie about such a thing.

This is what it says: "Clang supports 64-bit inline assembly. For example, in Chrome we built libyuv (a video format conversion library) with Clang long before we built all of Chrome with it. libyuv had highly-tuned 64-bit inline assembly with performance not reachable with intrinsics, and we could just use that code on Windows." 

There are many uses for a inline assembler as statement by the blog, this bug should be fixed, which has been as confirmed by "Eric Astor" using godbolt compiler explorer. I would like to hear a official statement that this bug can't be fixed or 64bit inline assembly is NOT supported by Clang.

Comment 6 Reid Kleckner 2019-12-12 14:42:31 PST

I agree, we should definitely keep the bug open and fix it.

I only mean to warn folks off of relying heavily on 64-bit MS inline asm. Because there was no pre-existing corpus of code to test this feature on, it has only been lightly tested.

For libyuv and Chromium, my understanding is that all the 64-bit inline asm they had was GCC-style inline asm, for which the support is better tested.

Maybe I misspoke when I said it only works because we haven't taken steps to disable it. Maybe there was an intentional decision to support 64-bit inline asm at one point, seeing as it is documented to work, but I wasn't around when it happened, and at this point I have done a fair amount of the MS inline asm implementation work. :)

Comment 7 Martynas Skirmantas 2019-12-12 14:54:42 PST

Thank you! That's some great news, I am looking forward to using Clang as my main compiler, I also discovered with godbolt compiler explorer that this problem doesn't exist on CLang 3.5.1 and certain older versions as you can see here: https://godbolt.org/z/59vTTs It uses lea like I have done to get around the problem, this bug appeared since Clang 3.6.

Comment 8 Eric Astor 2019-12-13 15:09:18 PST

Bit more research later... we need to fix two things to make this work.

LLVM IR inline assembly issue: We currently don't support operand modifiers in Intel-dialect inline assembly... but that's an easy fix.

C++ inline assembly issue: When taking the operand of `call exit` out of the inline assembly, it should be replaced by "${0:P}$, not just "$0". This is required to avoid adding the address prefix. We'll need to decide when this logic should apply, though... which could be tricky.

Comment 9 Eric Astor 2019-12-13 15:12:14 PST

Whoops, a third thing:

C++ inline assembly issue #2: When taking the operand of `call exit` out of the inline assembly, the operand should use the "i" constraint, not "*m"; it's an immediate, not a pointer. This is required to avoid having it treated as relative to %rip. This is similar to part of http://llvm.org/D71436, but will need some more work... even so, it should be relatively straightforward once we figure out how it's being handled.

Comment 10 Eric Astor 2019-12-18 13:48:04 PST

My proposed fix is on Phabricator at https://reviews.llvm.org/D71677.

Comment 11 Martynas Skirmantas 2020-01-08 05:36:30 PST

Hello, I've reopened the bug report once more because the bug has not been fully fixed!

The "jmp" instruction and other jump instructions such as "ccjmp" still have the same fatal issue, I've also noticed that the compiler is still atempting to dereference the function pointer on the "lea" instruction as well with the RIP (RIP + PTR), has anyone even tested if the problem has been fully fixed? This is very unprofessional.

Its even stated in the bug report title that its not just the "call" instruction, jump instructions are of high importance as well. Thank you.

Comment 12 Martynas Skirmantas 2020-01-08 06:03:54 PST

I've made a mistake in my last post, conditional jumps "jmpcc" do not work at all, which appeared since Clang 3.6 like this bug so this problem might be related. When attempting to use a conditional jump, this inline assembler error is given: "Instantiated into assembly here". This can be seen via godbolt compiler explorer.

Comment 13 Eric Astor 2020-01-08 07:51:11 PST

Thanks, Martynas. If you can share a concrete example, that would be very helpful for diagnostic purposes while fixing this.

However, yes, my fix only applied to the call instructions, since the technique I used was only documented as appropriate to that and we had no concrete example of a problem in another context. We should be able to use this technique more broadly, but we'll need to make a complete list of all affected instructions... We may then need to figure out a way to annotate them in the X86 target specification.

Comment 14 Eric Astor 2020-01-08 15:07:26 PST

I think it probably is worth supporting jmp to C functions, though I'm unclear on the use-case for jmp/jcc as opposed to call.

http://llvm.org/D72417 has my extended fix, which should hopefully apply to all branch target operands (including jmp and the jcc variants). If I've missed any contexts where we can pass a C function as a branch target, it should be fairly easy to add them.

Unfortunately, dealing with `lea` seems less trivial, and I haven't done so yet. We'll need to spend some more time testing MSVC to make sure we match its handling of PC-relative addressing for lea in inline assembly. I think I can find the time to do so this week, but I'm not certain it'll make the 10.0 cut, since Reid (the obvious reviewer) is on vacation at the moment.

(Also, Martynas, please refrain from personal insults if possible. See https://llvm.org/docs/CodeOfConduct.html)

Comment 15 Eric Astor 2020-01-08 15:19:52 PST

(In reply to Eric Astor from comment #14)
> We'll need to spend some more time testing MSVC to make sure we match
> its handling of PC-relative addressing for lea in inline assembly.

Of course, I completely forgot - MSVC doesn't support 64-bit inline assembler at all. We'll need to design this ourselves if we want to deal with it appropriately... but I suspect the correct issue is actually different (related to llvm.org/D71436), and just comes down to marking the operand as immediate rather than memory.

Comment 16 Eric Astor 2020-01-09 10:58:34 PST

Having looked into it - there are many cases remaining where MSVC and clang-cl differ while handling inline assembly that attempts to manipulate function pointers.

However, the specific issue here (inappropriately dereferencing a function pointer on call or jump) should be resolved with http://llvm.org/D72417.

I've spun off a different bug (http://llvm.org/PR44503) for the other cases I've found; if anyone has a concrete real-world use case where anything like the lea/mov behavior comes up, please comment on THAT bug, as it should make it much easier to prioritize this.

Comment 17 Martynas Skirmantas 2020-01-11 12:51:18 PST

Hello, sorry for the late response, there are a ton of uses for function pointers, especially jump instructions. 

Some of the uses: 

1. Shellcode (Not malicious).
2. Doing something that a compiler can't do.
3. Micro-Optimisation, can be quite important when dealing with system components, where it has to be as fast as posssible, writing a system component in pure assembly without C code would be a nightmare.
4. Self modifying code, used in a DRM for example.
5. Cleaner project structure, even though you can use a assembler and link it, having pure assembly functions in other files can get messy, having assembly and non-assembly functions in one file would be great.

All of the instructions that miscompile function pointers can be easily seen at the compiler explorer, https://godbolt.org/, just do something like this:

void test() { }

int main() {
   __asm {
      jmp test
   }
   return 0;
}

That will immediately show the problem.
Its true, modern compilers are capable of almost anything these days but when talking about native code, assembly is the root of it, you will encounter it no matter what, it doesn't make any sense of why people working with native code would be restricted from making certain changes at assembly level, especially of how MSVC doesn't support 64bit inline assembler at all, 32bit is obsolete, at this point its the same thing as being forced to use 16bit on a 64bit machine just to have that extra control over the code.

I can clearly see that this bug is most likely won't go anywhere, its true that not many people need that extra amount of control but there are still a lot of things out there where assembly is used and cannot be avoided.

Thank you for the time spent on this, I'll just have to use a assembler, even for functions when it comes to changing something very minor. Its basically, 1000 lines of code (pure assembly) > 30 lines of code (Assembly Mix).

Comment 18 Eric Astor 2020-01-11 13:15:37 PST

(In reply to Martynas Skirmantas from comment #17)
> I can clearly see that this bug is most likely won't go anywhere...

You're jumping to some interesting conclusions! I've actually submitted the fix I developed for jmp/jcc instructions, and in fact all branch targets... which is why I've resolved this bug as fixed. (Compiler Explorer's llvm(trunk) may not quite be caught up, but it does work at HEAD.)

If you'd like to help determine what else needs fixing, please join me on http://llvm.org/PR44503 - the bug I've split off to deal with other symbol reference troubles in MS-style inline assembly! If you can help list useful, concrete examples where things are still differing, and help give perspective on why you think MSVC is correct, it'll be much easier to actually fix them properly.