LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 13241 - ARM Assembler problem with ADR instruction
Summary: ARM Assembler problem with ADR instruction
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: ARM (show other bugs)
Version: trunk
Hardware: All All
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks: 18926
  Show dependency tree
 
Reported: 2012-06-29 12:46 PDT by Richard Barton
Modified: 2016-01-18 11:16 PST (History)
3 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Barton 2012-06-29 12:46:48 PDT
The MCInst for the ARM ADR instruction as created by the MC assembler is re-encoded into an ADD/SUB instruction (Reproducers below)

It seems to me that there has been some re-engineering of the ADD/SUB instructions and the ADR instruction has not been taken into account in this.

Reproduce with:

(A1 encoding)
echo 'ADR      r0,#0x40000000' | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding
        .section        __TEXT,__text,regular,pure_instructions
        adr     r0, #1073741824         @ encoding: [0x00,0x00,0x8f,0xe2]
                                        @ <MCInst #30 ADR
                                        @  <MCOperand Reg:60>
                                        @  <MCOperand Imm:1073741824>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>

 echo 0x00 0x00 0x8f 0xe2 | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble 
        .section        __TEXT,__text,regular,pure_instructions
        add     r0, pc, #0              @ encoding: [0x00,0x00,0x8f,0xe2]
                                        @ <MCInst #24 ADDri
                                        @  <MCOperand Reg:60>
                                        @  <MCOperand Reg:43>
                                        @  <MCOperand Imm:0>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>
                                        @  <MCOperand Reg:0>>

(A2 encoding)
echo 'ADR      r0,#-0x0' | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding
        .section        __TEXT,__text,regular,pure_instructions
        adr     r0, #-2147483648        @ encoding: [0x00,0x00,0x4f,0xe2]
                                        @ <MCInst #30 ADR
                                        @  <MCOperand Reg:60>
                                        @  <MCOperand Imm:-2147483648>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>

echo 0x00 0x00 0x4f 0xe2 | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
        .section        __TEXT,__text,regular,pure_instructions
        sub     r0, pc, #0              @ encoding: [0x00,0x00,0x4f,0xe2]
                                        @ <MCInst #456 SUBri
                                        @  <MCOperand Reg:60>
                                        @  <MCOperand Reg:43>
                                        @  <MCOperand Imm:0>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>
                                        @  <MCOperand Reg:0>>
Comment 1 Jiangning Liu 2012-07-23 01:02:18 PDT
ARM ARM says below in A4.2.2 Use of labels in UAL instruction syntax,

"When the assembler calculates an offset of 0 for the normal syntax of this instruction, it must assemble the encoding that adds 0 to the Align(PC,4) value of the instruction. The encoding that subtracts 0 from the Align(PC,4) value cannot be specified by the normal syntax."

So the decoding of "0x00 0x00 0x4f 0xe2" is "sub     r0, pc, #0" follows the rule.
Comment 2 Jiangning Liu 2012-07-23 04:16:49 PDT
For thumb2, ARM ARM says,

It is recommended that the alternative syntax forms are avoided where possible. However, the only possible syntax for encoding T2 with all immediate bits zero is
SUB<c><q> <Rd>,PC,#0.
Comment 3 Jiangning Liu 2012-07-23 04:31:57 PDT
Come up with an even general case,

$ echo 'adr     r0,#0x4' | llvm-mc -triple=armv7 -show-inst -show-encoding
	.section	__TEXT,__text,regular,pure_instructions
	adr	r0, #4                  @ encoding: [0x04,0x00,0x8f,0xe2]
                                        @ <MCInst #30 ADR
                                        @  <MCOperand Reg:60>
                                        @  <MCOperand Imm:4>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>
$ echo 0x04 0x00 0x8f 0xe2 | llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
	.section	__TEXT,__text,regular,pure_instructions
	add	r0, pc, #4              @ encoding: [0x04,0x00,0x8f,0xe2]
                                        @ <MCInst #24 ADDri
                                        @  <MCOperand Reg:60>
                                        @  <MCOperand Reg:43>
                                        @  <MCOperand Imm:4>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>
                                        @  <MCOperand Reg:0>>
Comment 4 Jiangning Liu 2012-07-24 00:34:59 PDT
/* the following example is incorrect */
$ echo 'adr     r5,#0x1234' | llvm-mc -triple=armv7 -show-inst -show-encoding
	.section	__TEXT,__text,regular,pure_instructions
	adr	r5, #4660               @ encoding: [0x34,0x52,0xcf,0xe2]
                                        @ <MCInst #30 ADR
                                        @  <MCOperand Reg:65>
                                        @  <MCOperand Imm:4660>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>
/* the following example is correct */
$ echo 0x34 0x52 0xcf 0xe2 | llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
	.section	__TEXT,__text,regular,pure_instructions
	sbc	r5, pc, #1073741827     @ encoding: [0x0d,0x51,0xcf,0xe2]
                                        @ <MCInst #330 SBCri
                                        @  <MCOperand Reg:65>
                                        @  <MCOperand Reg:43>
                                        @  <MCOperand Imm:1073741827>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>
                                        @  <MCOperand Reg:0>>

/* the following example is correct */
$ echo 'sbc r5, pc, #1073741827' | llvm-mc -triple=armv7 -show-inst -show-encoding
	.section	__TEXT,__text,regular,pure_instructions
	sbc	r5, pc, #1073741827     @ encoding: [0x0d,0x51,0xcf,0xe2]
                                        @ <MCInst #330 SBCri
                                        @  <MCOperand Reg:65>
                                        @  <MCOperand Reg:43>
                                        @  <MCOperand Imm:1073741827>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>
                                        @  <MCOperand Reg:0>>

This result is quite wired. There are two issues.

1) #0x1234 can't be really encoded into adr instruction, assembler only silently keeps the low 12-bit value, ie. 0x234. Need error/warning message.
2) The coding for 0x234 should be encoded as a modified constant rather than a plain const. So actually even 0x234 can't be really encoded into adr instruction.
Comment 5 Renato Golin 2016-01-18 11:16:13 PST
Hi Richard, is this still a problem?