Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clang arm assembler does not support pseudo-instruction 'adrl' #24724

Open
llvmbot opened this issue Aug 4, 2015 · 26 comments
Open

clang arm assembler does not support pseudo-instruction 'adrl' #24724

llvmbot opened this issue Aug 4, 2015 · 26 comments
Assignees
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 4, 2015

Bugzilla Link 24350
Version trunk
OS All
Blocks #4440 #24719
Reporter LLVM Bugzilla Contributor
CC @Arnaud-de-Grandmaison-ARM,@jcai19,@jmolloy,@kbeyls,@lalozano,@nickdesaulniers,@rengolin,@smithp35,@stephenhines

Extended Description

"adrl rd, a_label" calculates a_label's address using pc-relative addressing, which means the section containing the insn and a_label can be put to any address. Without the support of this pseudo insn, it's more cumbersome to do this. "MOV32" or "LDR rd,=a_label" doesn't work in this scenario because both of these use absolute addressing.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Aug 4, 2015

assigned to @rengolin

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 11, 2015

FYI - we had a patch to disable adrl in chromeos openssl.

The CL is here -
https://chromium-review.googlesource.com/#/c/290457/

@rengolin
Copy link
Member

Can you provide an example assembly file and a command line that you use to reproduce?

Here's the documentation about ADRL:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489i/Cacecdga.html

@cmtice
Copy link
Contributor

cmtice commented Nov 20, 2015

first test case

@cmtice
Copy link
Contributor

cmtice commented Nov 20, 2015

second test case

@cmtice
Copy link
Contributor

cmtice commented Nov 20, 2015

I just attached two test case .s files. Use the command:

$ llvm-mc -triple=armv7-linux-gnu

$ ./bin/llvm-mc -triple=armv7-linux-gnu ~/adr-invalid.s
.text

start:
adr r0, var
adr r0, undefinedvar
/usr/local/google/home/cmtice/adr-invalid.s:5:6: error: invalid operand for instruction
adrl r1, var
^
/usr/local/google/home/cmtice/adr-invalid.s:6:6: error: invalid operand for instruction
adrl r1, undefinedvar
^

.data

.globl	var

var:
.long 0
$
$ ./bin/llvm-mc -triple=armv7-linux-gnu ~/adrl.s
.text

.globl	foo

foo:
.align 0
.Ltmp0:
.zero 8192
.Ltmp1:
/usr/local/google/home/cmtice/adrl.s:9:14: error: invalid operand for instruction
adrl r0, 1b
^
/usr/local/google/home/cmtice/adrl.s:10:14: error: invalid operand for instruction
adrl r0, 1f
^
/usr/local/google/home/cmtice/adrl.s:11:14: error: invalid operand for instruction
adrl r0, 2b
^
/usr/local/google/home/cmtice/adrl.s:12:14: error: invalid operand for instruction
adrl r0, 2f
^
/usr/local/google/home/cmtice/adrl.s:13:10: error: invalid instruction
adrEQl r0, 2f
^
.Ltmp3:
/usr/local/google/home/cmtice/adrl.s:15:14: error: invalid operand for instruction
adrl r0, foo
^
/usr/local/google/home/cmtice/adrl.s:16:14: error: invalid operand for instruction
adrl r0, X
^
.zero 8184
.Ltmp2:
adr lr, X
.zero 260

.globl	X

/usr/local/google/home/cmtice/adrl.s:22:58: error: unexpected token at start of statement
.globl X ;
^

X:
.align 5

@rengolin
Copy link
Member

I just attached two test case .s files. Use the command:

Thanks Carolina!

The examples were great, but after a while investigating how that would be actually implemented, it turns out it's not that simple. The assembler will need a lot of convincing to expand two instructions from an asm input stream.

There are ways to do it, of course, but all of them are too big for such a small and isolated feature. For that reason, I want to understand more why can't you just use ADR+ADD/SUB or even with macros. Can you give me an example where it would be far worse?

cheers,
--renato

PS: I see you're using "adrEQl". This is pre-unified syntax and is not supported by LLVM (nor will be). Can you make sure you have ".syntax unified" on all assembly files, and move the predicate to the end on all instructions (ex. "adrlEQ")? This will make it easier for the future, when we sort out the ADRL problems.

@cmtice
Copy link
Contributor

cmtice commented Nov 24, 2015

I am trying to compile code from someone else (OpenSSL, in the crypto part) which contains inline assembly, to optimize performance.

The inline assembly contains the adrl instruction.

The code compiles just fine with GCC (and the GNU as assembler). We would like to be able to use Clang/LLVM, but we need to be able to compile/link this third-party code.

I am working on trying to add this capability to LLVM, but since I am learning the compiler code as I go, it's taking me a bit of time.

The test cases I gave you came straight from the GNU assembler regression test suite.

@rengolin
Copy link
Member

I am trying to compile code from someone else (OpenSSL, in the crypto part)
which contains inline assembly, to optimize performance.

I know. Most IAS bug reports we get are from third-party optimised code.

The code compiles just fine with GCC (and the GNU as assembler). We would
like to be able to use Clang/LLVM, but we need to be able to compile/link
this third-party code.

I understand, but "compiling like GAS" is not our goal. We aim to produce a working assembler with modern and documented assembly constructs.

The ADRL case is borderline, because it's documented in ARMASM, but not in the ARM manual, and GAS doesn't seem to support ADRL in Thumb2, either.

I am working on trying to add this capability to LLVM, but since I am
learning the compiler code as I go, it's taking me a bit of time.

No worries, it does take time. :)

I'm still working on the implementation, but it is not going to be pretty, and that's why we're trying to avoid it.

If you could contact the original author and point out this bug report, I'm hoping he/she could have a look at that particular sequence and try and use something else.

At the very least, pre-UAL code will not be accepted by LLVM's ARM assembler, so that'll block you further, even if we do add ADRL. In the end, contacting the original author will be necessary to progress with IAL for OpenSSL.

The test cases I gave you came straight from the GNU assembler regression
test suite.

Right, then we may have a problem. GAS and GCC are GPLv3 and LLVM is not, so we can't reuse GNU code as-is due to licensing problems. Can you send the snippet in the OpenSSL source that you're having problems with, so we can deal with that first?

I'll keep in mind the examples you gave already when creating a test for that, but I can't re-use it.

@rengolin
Copy link
Member

After some investigation, I found a few problematic issues with implementing this in the LLVM assembler:

  1. The assembler parser has only limited access to the stream of instructions, and the vast majority of the transformations are for canonicalisation (changing encodings) rather than creating new instructions. For that reason, we'll have to change how the process is done to allow "expandInstruction" to run before "processInstruction". I've done that, doesn't disrupt much.

  2. The immediate comes as an MCExpr with the label. We need to transform it into a PC-relative symbol by creating a temporary symbol and adding it to the immediate. I've done that, but the end result is a non-constant expression that we can't know the real value until it's finally emitted by the MCStreamer (because distances can change, especially in our case, that we're adding new instructions, forward ranges can change for previously emitted ADRLs).

This second point will require that we add a multi-instruction fixup, because the immediate can be bigger than any possible encoding (the whole point of ADRL), so the relative result of the final expression will have to be split in two instructions, and encoded differently (depending on which opcode we use for each).

That calculation cannot be done in the assembler parser (it's definitely not the place), but it's not yet supported in the streamer because of the type of non-constant multi-instruction fixups.

I'm not aware of any such fixup type, so unless there is precedent and the streamer can cope with it, implementing ADRL would require us to ultimately change how we stream instructions into asm and obj.

One way to do this is to keep the ADRL instruction valid with virtually any immediate value plus a special fixup, and always add a NOP after it early on in the assembler. This can be done with step 1 above. Later, the MCStreamer would recognise the fixup and re-use the NOP as an extra instruction if needed. Adding a NOP simplifies range calculations and is what's expected in case the ADRL can turn into a single ADD/SUB.

Arnaud, James, do you guys know of anything like this being done elsewhere?

@cmtice
Copy link
Contributor

cmtice commented Nov 30, 2015

patch for processInstruction to return multiple instructions
The first step I was working on was to have processInstruction be able to return multiple instructions, since (as you pointed out) that is a necessary first step to adding 'adrl'. I based this implementation on what happens in the Mips AsmParser, which also has to be able to return multiple instructions from processInstruction. I have attached a patch that does this. The regression tests pass with this patch.

I'll try to find you the openssl code with the inline asm.

@cmtice
Copy link
Contributor

cmtice commented Nov 30, 2015

openssl file with 'adrl' instruction
The 'adrl' instruction is on/near line 598

@rengolin
Copy link
Member

Hi Caroline,

Thanks for the patch, it's similar to what I've done locally. But that's not enough to translate ADRL into ads and subs for the other reasons I outlined. This patch (or my local version), as it is, doesn't help anything.

Did you check with the original developer about re-writing some of their ARM assembly?

@rengolin
Copy link
Member

Created attachment 15369 [details]
openssl file with 'adrl' instruction

The 'adrl' instruction is on/near line 598

That single use of adrl could be removed by merging the Thumb2 and ARM codes together. I have the impression that the developer thought ADRL was a single instruction and was trying to "optimise" the code on ARM...

Can you try to remove the #ifdef / #else adrl / #endif and compile again?

--renato

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 30, 2015

openssl isn't the only user of adrl...
But probably the most relevant one.

The only other use of adrl in an AOSP 6.0.0_r26 checkout is in Tremolo (vorbis decoder), attaching the relevant file for reference.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 30, 2015

Tremolo file that makes use of adrl
That file is old style asm syntax -- fixing that is easy, but getting rid of adrl, a bit more difficult.

I wonder how many more users of adrl exist out there.

@rengolin
Copy link
Member

Hi Caroline,

Sorry I moved this to me, I was told to look at it and thought no one was working on it. Just to wrap up my comments...

I did some investigation and the changes in the assembler would be much bigger than the benefits would bring us, especially since there are only two uses we know of and one of them is easy to replace. Usually, when the instruction is not in the ISA documents, we tend to need a stronger reason to implement it.

For instance, in ADRL, GNU tools only implement it for ARM, while ARMCC does also for Thumb2, but not 1. On ARM, the complications are in the MCStreamer (as I described earlier), but in Thumb2 you'll additionally have to worry about IT blocks (I'm guessing this is why GAS didn't do for Thumb2).

Our rule of thumb for implementing things not documented in the ARM ARM is:

  • If there are no alternatives in UAL / ARM ARM.
  • If the mapping is simple and clear, documented elsewhere
  • Implementation won't break or make code harder to understand/maintain
  • If the usage in the wild is extremely extensive and well known

In the ADRL case, there is an alternative (used for Thumb2 in your attachment), the mapping is not clear ("whatever range two adds can cope with"), it will make code harder to understand, and is not extensively used in the wild.

So, there's a pretty strong argument against implementing it. :)

cheers,
--renato

@rengolin
Copy link
Member

Bero,

In the same asm file you uploaded there is a sequence of ADR+ADD on the same symbol that could be used on all examples, and all of them reading the symbol ahead of the instruction, so there's no doubt it'd be an ADD, not a SUB.

cheers,
--renato

@cmtice
Copy link
Contributor

cmtice commented Dec 1, 2015

We have patched the version of openssl that we use, to remove the 'adrl' instruction, and I have reached out to the original openssl developer to ask him if he could remove it from openssl (no response so far).

You have convinced me that this bug is not worth trying to fix. Should I close it as won't fix, or just leave it open and remove myself from the 'assigned' field?

@rengolin
Copy link
Member

rengolin commented Dec 1, 2015

We have patched the version of openssl that we use, to remove the 'adrl'
instruction, and I have reached out to the original openssl developer to ask
him if he could remove it from openssl (no response so far).

Thank you! Having a working patch that is as good as the original code is 99% of the work done to convince people to update their sources. :)

You have convinced me that this bug is not worth trying to fix. Should I
close it as won't fix, or just leave it open and remove myself from the
'assigned' field?

It's up to you, really.

There is a similar issue in Android (which is why I was looking at it, too), so feel free to assign the bug to me, and I'll close once we patch the Tremolo sources, or keep it for yourself until both accept the work upstream.

Whatever works. :)

@cmtice
Copy link
Contributor

cmtice commented Dec 1, 2015

Ok, it's all yours now! ;-)

@rengolin
Copy link
Member

rengolin commented Dec 2, 2015

Bero,

Can you confirm the changes on Tremolo's side? Should be as simple as in openssl.

cheers,
--renato

@rengolin
Copy link
Member

rengolin commented Dec 2, 2015

For the sake of completeness, the Tremolo case exhibits problems when the ADR range goes beyond 4096, so ADRL was used. There are two ways to overcome that without ADRL:

  1. Temp label

Change:

ADRL r7, .Label
...
ADRL r7, .Label
...
.Label
.word 0xf00

To:

ADR r7, .Temp1
ADD r7, r7, #(.Label-.Temp1)
.Temp1
...
ADR r7, .Temp2
ADD r7, r7, #(.Label-.Temp2)
.Temp2
...
.Label
.word 0xf00

This would only break if .Label move up past the ADR instructions, in which case you'd have to use a SUB instead, or if .Label-.TempX becomes too large for the ADD. Since ADD's range is a lot bigger than ADR, you could cope with a lot of new cases.

  1. Repeat pattern

If the above doesn't help, repeating the maths will:

ADR r7, .Label1
...
.Label1
.word (.Global-.Label1)
...
ADR r7, .Label2
...
.Label2
.word (.Global-.Label2)
...
etc

as many times as necessary. This is uglier, but works every time as long as you keep them close enough and often enough.

@rengolin
Copy link
Member

rengolin commented Jan 4, 2016

Alternatives proposed, worked in the wild, so closing this bug. Please, refer to these solutions for future cases and let me know if they don't work on your particular case.

@smithp35
Copy link
Collaborator

I've done some more investigation to see what would be needed to make ADRL work in the assembler. In summary I agree with Renato, making ADRL work to the same specification as the GNU assembler and armasm would not be easy, and would likely be rejected upstream as not worth the cost. In general it is quite difficult to get difficult to implement pseudo instructions accepted unless they are officially required by the architecture or ABI.

I've put some thoughts on implementation at the end [*] just in case anyone wants to build on them. I'm new to LLVM so it is possible that I've missed something.

It isn't possible to replicate the full generality of ADRL in a macro but it is possible to come close. In effect we fix the rotations that are used for each of the instructions, whereas the assembler can use different rotations that could potentially reach further addresses if those addresses were suitably aligned. We also need to choose ahead of time whether to use add (label is defined later) or sub (label has already been defined).

I won't claim that these are the optimal or clearest expressions of the macro, but it should be possible to build on them. For example if you know your labels are 4-byte aligned it should be possible to extend the range by altering the masks.

    .macro MYADRL reg:req, label:req
    add \reg, pc, #((\label - .) & 0xff00) 
    add \reg, \reg, #((\label - .) - ((\label - .) & 0xff00)) - 4        
    .endm

    .macro MYADRLSUB reg:req, label:req
    sub \reg, pc, #((. - \label) & 0xff00)
    sub \reg, \reg, #((. - \label) - ((. - \label) & 0xff00)) + 4
    .endm 

[*]
It has already been observed that the ARM assembler backend does not support expansion of pseudo instructions into multiple instructions. This can be fixed using a similar solution to the Mips backend. In ARM there is the additional complication of IT blocks. If ADRL is implemented like it is defined in armasm which supports ADRL in Thumb2 then we have to account for ADRL being the last member of an IT block. It is true that we could restrict ADRL to ARM state like in the GNU assembler so that this never happens, but we would have to be clear about the restrictions should anyone attempt to make further modifications.

A more difficult problem is that while the first instruction is in effect an ADR which can be transformed by a fixup into either an ADD or a SUB depending on the final value of the label post layout, the second instruction is either an ADD or SUB which cannot currently be transformed by a fixup post layout. The reason is that for ADD and SUB the opcode bits are defined by tablegen and a fixup can only OR extra bits and not clear existing ones. I think it could be possible to either:

  • reimplement ARM modified immediate data-processing instructions to take the opcode from a fixup. I don't think that this would be a trivial piece of work and could have implications for disassembly.
  • Add a pseudo-like ADRL_pt2 instruction similar to ADR that can be transformed into either an ADD or SUB via a fixup, but with customer parsing and decoding such that it couldn't be written in the assembler or disassembled.

Neither of the options fit well within the existing framework.

I did consider (ab)using relaxation to change an ADDri into a SUBri and vice-versa, depending on the value of the immediate at layout time. Unfortunately because ADDri and SUBri are inverses the layout algorithm won't terminate. For implementing ADRL it could be possible to fix this by just doing an ADDri to SUBri and always emitting an ADDri, however this would also affect vanilla ADDri instructions in an asymmetric way (When expr isn't known till layout time, you could change an "ADD r0, r0, expr" to "SUB r0, r0, -expr" if expr later turned out to be negative, but you couldn't change "SUB r0, r0, expr" back to "ADD r0, r0, -expr").

It could also be possible to heuristically guess early whether an ADD/SUB is needed based on the expression. However we can't guarantee we would guess correctly in all cases.

@edwintorok
Copy link
Contributor

mentioned in issue #4440

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

5 participants