The MCInst for the ARM ADR instruction as created by the MC assembler is re-encoded into an ADD/SUB instruction (Reproducers below) It seems to me that there has been some re-engineering of the ADD/SUB instructions and the ADR instruction has not been taken into account in this. Reproduce with: (A1 encoding) echo 'ADR r0,#0x40000000' | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding .section __TEXT,__text,regular,pure_instructions adr r0, #1073741824 @ encoding: [0x00,0x00,0x8f,0xe2] @ <MCInst #30 ADR @ <MCOperand Reg:60> @ <MCOperand Imm:1073741824> @ <MCOperand Imm:14> @ <MCOperand Reg:0>> echo 0x00 0x00 0x8f 0xe2 | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble .section __TEXT,__text,regular,pure_instructions add r0, pc, #0 @ encoding: [0x00,0x00,0x8f,0xe2] @ <MCInst #24 ADDri @ <MCOperand Reg:60> @ <MCOperand Reg:43> @ <MCOperand Imm:0> @ <MCOperand Imm:14> @ <MCOperand Reg:0> @ <MCOperand Reg:0>> (A2 encoding) echo 'ADR r0,#-0x0' | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding .section __TEXT,__text,regular,pure_instructions adr r0, #-2147483648 @ encoding: [0x00,0x00,0x4f,0xe2] @ <MCInst #30 ADR @ <MCOperand Reg:60> @ <MCOperand Imm:-2147483648> @ <MCOperand Imm:14> @ <MCOperand Reg:0>> echo 0x00 0x00 0x4f 0xe2 | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble .section __TEXT,__text,regular,pure_instructions sub r0, pc, #0 @ encoding: [0x00,0x00,0x4f,0xe2] @ <MCInst #456 SUBri @ <MCOperand Reg:60> @ <MCOperand Reg:43> @ <MCOperand Imm:0> @ <MCOperand Imm:14> @ <MCOperand Reg:0> @ <MCOperand Reg:0>>
ARM ARM says below in A4.2.2 Use of labels in UAL instruction syntax, "When the assembler calculates an offset of 0 for the normal syntax of this instruction, it must assemble the encoding that adds 0 to the Align(PC,4) value of the instruction. The encoding that subtracts 0 from the Align(PC,4) value cannot be specified by the normal syntax." So the decoding of "0x00 0x00 0x4f 0xe2" is "sub r0, pc, #0" follows the rule.
For thumb2, ARM ARM says, It is recommended that the alternative syntax forms are avoided where possible. However, the only possible syntax for encoding T2 with all immediate bits zero is SUB<c><q> <Rd>,PC,#0.
Come up with an even general case, $ echo 'adr r0,#0x4' | llvm-mc -triple=armv7 -show-inst -show-encoding .section __TEXT,__text,regular,pure_instructions adr r0, #4 @ encoding: [0x04,0x00,0x8f,0xe2] @ <MCInst #30 ADR @ <MCOperand Reg:60> @ <MCOperand Imm:4> @ <MCOperand Imm:14> @ <MCOperand Reg:0>> $ echo 0x04 0x00 0x8f 0xe2 | llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble .section __TEXT,__text,regular,pure_instructions add r0, pc, #4 @ encoding: [0x04,0x00,0x8f,0xe2] @ <MCInst #24 ADDri @ <MCOperand Reg:60> @ <MCOperand Reg:43> @ <MCOperand Imm:4> @ <MCOperand Imm:14> @ <MCOperand Reg:0> @ <MCOperand Reg:0>>
/* the following example is incorrect */ $ echo 'adr r5,#0x1234' | llvm-mc -triple=armv7 -show-inst -show-encoding .section __TEXT,__text,regular,pure_instructions adr r5, #4660 @ encoding: [0x34,0x52,0xcf,0xe2] @ <MCInst #30 ADR @ <MCOperand Reg:65> @ <MCOperand Imm:4660> @ <MCOperand Imm:14> @ <MCOperand Reg:0>> /* the following example is correct */ $ echo 0x34 0x52 0xcf 0xe2 | llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble .section __TEXT,__text,regular,pure_instructions sbc r5, pc, #1073741827 @ encoding: [0x0d,0x51,0xcf,0xe2] @ <MCInst #330 SBCri @ <MCOperand Reg:65> @ <MCOperand Reg:43> @ <MCOperand Imm:1073741827> @ <MCOperand Imm:14> @ <MCOperand Reg:0> @ <MCOperand Reg:0>> /* the following example is correct */ $ echo 'sbc r5, pc, #1073741827' | llvm-mc -triple=armv7 -show-inst -show-encoding .section __TEXT,__text,regular,pure_instructions sbc r5, pc, #1073741827 @ encoding: [0x0d,0x51,0xcf,0xe2] @ <MCInst #330 SBCri @ <MCOperand Reg:65> @ <MCOperand Reg:43> @ <MCOperand Imm:1073741827> @ <MCOperand Imm:14> @ <MCOperand Reg:0> @ <MCOperand Reg:0>> This result is quite wired. There are two issues. 1) #0x1234 can't be really encoded into adr instruction, assembler only silently keeps the low 12-bit value, ie. 0x234. Need error/warning message. 2) The coding for 0x234 should be encoded as a modified constant rather than a plain const. So actually even 0x234 can't be really encoded into adr instruction.
Hi Richard, is this still a problem?