Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r7 not saved/restored around inline asm blocks that kill it #40920

Open
gburgessiv opened this issue Apr 24, 2019 · 7 comments
Open

r7 not saved/restored around inline asm blocks that kill it #40920

gburgessiv opened this issue Apr 24, 2019 · 7 comments
Labels
backend:ARM bugzilla Issues migrated from bugzilla confirmed Verified by a second party

Comments

@gburgessiv
Copy link
Member

Bugzilla Link 41575
Version trunk
OS Linux
CC @efriedma-quic,@kbeyls,@lalozano,@m-gupta,@nico,@smithp35,@TNorthover

Extended Description

With clang trunk:

$ cat /tmp/repro.c
void escape(void *);

void Bar(int p0, int p1, int p2, int p3, int p4, int p5) {
register unsigned magic asm("r7") = 0x000f0002;
asm volatile("svc 0\n" : : "r"(magic) : "memory");

int local = p5;
escape(&local);
}

$ clang --target=arm-linux-gnueabihf -march=armv7-a -mthumb /tmp/repro.c -S -o - -O2 -emit-llvm

<< snip >>

; Function Attrs: nounwind
define dso_local void @​Bar(i32 %p0, i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5) local_unnamed_addr #​0 {
entry:
%local = alloca i32, align 4
tail call void asm sideeffect "svc 0\0A", "{r7},~{memory}"(i32 983042) #​3, !srcloc !​3
%0 = bitcast i32* %local to i8*
call void @​llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #​3
store i32 %p5, i32* %local, align 4, !tbaa !​4
call void @​escape(i8* nonnull %0) #​3
call void @​llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #​3
ret void
}

$ clang --target=arm-linux-gnueabihf -march=armv7-a -mthumb /tmp/repro.c -S -o - -O2

<< snip >>

Bar:
.fnstart
@ %bb.0: @ %entry
.save {r6, r7, lr}
push {r6, r7, lr}
.setfp r7, sp, #​4
add r7, sp, #​4
.pad #​4
sub sp, #​4
movs r7, #​2
movt r7, #​15
@​APP
svc #​0

    @&#8203;NO_APP
    ldr     r0, [r7, #&#8203;12]
    str     r0, [sp]
    mov     r0, sp
    bl      escape
    add     sp, #&#8203;4
    pop     {r6, r7, pc}

So, we set up r7 as our fp, it gets clobbered by magic for inline asm to do stuff with, and we proceed to use the magic value as though it was still our pc in a later ldr.

As one who's not super familiar with our rules around inline asm, I'd assume that tail call void asm sideeffect "svc 0\0A", "{r7},~{memory}"(i32 983042) #&#8203;3, !srcloc !&#8203;3 should emit code to save/restore r7, instead of leaving it in a clobbered state.

FWIW, we have code that looks very similar to the non-reduced repro in compiler-rt's clear_cache.c. That appears to work because r7 is a callee-saved register, so I presume we save/restore everything properly there (and don't use r7 as a fp).

@efriedma-quic
Copy link
Collaborator

If you have a "register" variable which refers to a register that isn't allocatable, the compiler assumes you know what you're doing when you read/modify it. This is generally useful in various situations: you can read the value of the stack pointer, access x18 on AArch64 with -ffixed-x18, etc.

There's maybe some argument that frame/base pointers in particular should be treated specially, since it isn't necessarily easy for the user to predict which registers will be reserved, especially on ARM. But then you get into weird edge cases. If the function contains dynamic stack allocation, and an inline asm is using the registers we would normally use for the frame/base pointers, what do we do? Temporarily reassign them to some other registers? I guess that's possible, if there are enough other registers available, but it would be really complicated to implement and test.

@TNorthover
Copy link
Contributor

We fairly recently added a warning when a reserved register was put into a clobber list of an asm block. This seems like it should do the same.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 25, 2019

Note that compiler-rt uses very similar inline assembly, so this might also be broken then:
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/clear_cache.c#L122

@kbeyls
Copy link
Collaborator

kbeyls commented Apr 25, 2019

If you have a "register" variable which refers to a register that isn't
allocatable, the compiler assumes you know what you're doing when you
read/modify it. This is generally useful in various situations: you can
read the value of the stack pointer, access x18 on AArch64 with -ffixed-x18,
etc.

There's maybe some argument that frame/base pointers in particular should be
treated specially, since it isn't necessarily easy for the user to predict
which registers will be reserved, especially on ARM. But then you get into
weird edge cases. If the function contains dynamic stack allocation, and an
inline asm is using the registers we would normally use for the frame/base
pointers, what do we do? Temporarily reassign them to some other registers?
I guess that's possible, if there are enough other registers available, but
it would be really complicated to implement and test.

I think the only documentation for this feature is https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html.
There, it states that "The only supported use for this feature is to specify registers for input and output operands when calling Extended asm".
It seems to me that that documentation might imply that uses such as trying to read the value of the stack pointer or accessing x18 on AArch64 with -ffixed-x18 are not supported.

The documentation doesn't state anything about whether the compiler should aim to avoid conflicts when the register is already used for other reasons, beyond just passing in a value in a specific register to an extended assembly block.
If really the only supported use case for this feature is to pass in/out values from an extended assembly block, maybe it wouldn't be unreasonable to expect the compiler to save/restore other live values in that register around the inline asm block. That being said, I haven't look into how hard it would be to implement that.
Implementing this behaviour may indeed break existing code that relies on behaviour that according to the documentation isn't supported.

The work-around for the miscompile reported above seems to be to not use the "specified register for a local variable"-feature, but achieve getting the needed value into r7 in another way. Indeed, compiler-rt seems to have very similar code. libgcc seems to implement the entire function in assembly.

@nico
Copy link
Contributor

nico commented Apr 26, 2019

Does anyone have a suggestion what we can do to make __clear_cache in compiler-rt work? Since both clang and compiler-rt are part of the LLVM project, we really should try to come up with something that allows both parts to cooperate :)

@TNorthover
Copy link
Contributor

Three possible ideas on the compiler-rt side:

  1. Switch to assembly.
  2. Use -marm, where r11 is the frame pointer on Linux.
  3. -fomit-frame-pointer (though note iOS actively disallows that and will warn if the flag is applied for it).

None of them are entirely ideal.

@gburgessiv
Copy link
Member Author

In Chrome, we've settled on the workaround of marking __clear_cache with noinline for the moment, since the calling convention appears to produce the appropriate r7 save. To Nico's point, we should keep pieces of LLVM working together harmoniously, so is anyone against me doing that while we decide on/develop a more ideal fix?

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla confirmed Verified by a second party
Projects
None yet
Development

No branches or pull requests

6 participants