ARM Assembler problem with ADR instruction #13613

RichBarton-Arm · 2012-06-29T19:46:48Z


Bugzilla Link	13241
Version	trunk
OS	All
Blocks	llvm/llvm-bugzilla-archive#18926
CC	@rengolin

Extended Description

The MCInst for the ARM ADR instruction as created by the MC assembler is re-encoded into an ADD/SUB instruction (Reproducers below)

It seems to me that there has been some re-engineering of the ADD/SUB instructions and the ADR instruction has not been taken into account in this.

Reproduce with:

(A1 encoding)
echo 'ADR r0,#0x40000000' | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding
.section __TEXT,__text,regular,pure_instructions
adr r0, #1073741824 @ encoding: [0x00,0x00,0x8f,0xe2]
@ <MCInst #30 ADR
@
@
@
@ >

echo 0x00 0x00 0x8f 0xe2 | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
.section __TEXT,__text,regular,pure_instructions
add r0, pc, #0 @ encoding: [0x00,0x00,0x8f,0xe2]
@ <MCInst #24 ADDri
@
@
@
@
@
@ >

(A2 encoding)
echo 'ADR r0,#-0x0' | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding
.section __TEXT,__text,regular,pure_instructions
adr r0, #-2147483648 @ encoding: [0x00,0x00,0x4f,0xe2]
@ <MCInst #30 ADR
@
@
@
@ >

echo 0x00 0x00 0x4f 0xe2 | ./llvm-oss/build-none/bin/llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
.section __TEXT,__text,regular,pure_instructions
sub r0, pc, #0 @ encoding: [0x00,0x00,0x4f,0xe2]
@ <MCInst #456 SUBri
@
@
@
@
@
@ >

llvmbot · 2012-07-23T08:02:18Z

ARM ARM says below in A4.2.2 Use of labels in UAL instruction syntax,

"When the assembler calculates an offset of 0 for the normal syntax of this instruction, it must assemble the encoding that adds 0 to the Align(PC,4) value of the instruction. The encoding that subtracts 0 from the Align(PC,4) value cannot be specified by the normal syntax."

So the decoding of "0x00 0x00 0x4f 0xe2" is "sub r0, pc, #0" follows the rule.

llvmbot · 2012-07-23T11:16:49Z

For thumb2, ARM ARM says,

It is recommended that the alternative syntax forms are avoided where possible. However, the only possible syntax for encoding T2 with all immediate bits zero is
SUB ,PC,#0.

llvmbot · 2012-07-23T11:31:57Z

Come up with an even general case,

$ echo 'adr r0,#0x4' | llvm-mc -triple=armv7 -show-inst -show-encoding
.section __TEXT,__text,regular,pure_instructions
adr r0, #4 @ encoding: [0x04,0x00,0x8f,0xe2]
@ <MCInst #30 ADR
@
@
@
@ >
$ echo 0x04 0x00 0x8f 0xe2 | llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
.section __TEXT,__text,regular,pure_instructions
add r0, pc, #4 @ encoding: [0x04,0x00,0x8f,0xe2]
@ <MCInst #24 ADDri
@
@
@
@
@
@ >

llvmbot · 2012-07-24T07:34:59Z

/* the following example is incorrect /
$ echo 'adr r5,#0x1234' | llvm-mc -triple=armv7 -show-inst -show-encoding
.section __TEXT,__text,regular,pure_instructions
adr r5, #4660 @ encoding: [0x34,0x52,0xcf,0xe2]
@ <MCInst #30 ADR
@
@
@
@ >
/ the following example is correct */
$ echo 0x34 0x52 0xcf 0xe2 | llvm-mc -triple=armv7 -show-inst -show-encoding -disassemble
.section __TEXT,__text,regular,pure_instructions
sbc r5, pc, #1073741827 @ encoding: [0x0d,0x51,0xcf,0xe2]
@ <MCInst #330 SBCri
@
@
@
@
@
@ >

/* the following example is correct */
$ echo 'sbc r5, pc, #1073741827' | llvm-mc -triple=armv7 -show-inst -show-encoding
.section __TEXT,__text,regular,pure_instructions
sbc r5, pc, #1073741827 @ encoding: [0x0d,0x51,0xcf,0xe2]
@ <MCInst #330 SBCri
@
@
@
@
@
@ >

This result is quite wired. There are two issues.

#0x1234 can't be really encoded into adr instruction, assembler only silently keeps the low 12-bit value, ie. 0x234. Need error/warning message.
The coding for 0x234 should be encoded as a modified constant rather than a plain const. So actually even 0x234 can't be really encoded into adr instruction.

rengolin · 2016-01-18T19:16:13Z

Hi Richard, is this still a problem?

rengolin · 2021-11-26T18:57:49Z

mentioned in issue llvm/llvm-bugzilla-archive#18926

#66343) The FileCheck string `LLVMFuzzerCustomMutatorLongSequence: {{.*}} MS: {{[0-9]*}} {{(([a-zA-Z]*-){11,})}} {{.*}}` is too restrictive and may fail the test in some case. If we look at the commit that added this check(66df989), This check is for printing out the long mutation sequence, such as this one ``` #53552 REDUCE cov: 6 ft: 6 corp: 5/9b lim: 4096 exec/s: 0 rss: 37Mb L: 2/3 MS: 54 ChangeByte-PersAutoDict-ChangeBit-ChangeBinInt-ChangeBit-ChangeBit-ChangeByte-CMP-EraseBytes-EraseBytes-CrossOver-InsertRepeatedBytes-ChangeByte-EraseBytes-InsertRepeatedBytes-ShuffleBytes-ChangeByte-ShuffleBytes-ChangeBit-CrossOver-ChangeBit-ShuffleBytes-ChangeBinInt-ShuffleBytes-EraseBytes-InsertByte-Custom-ShuffleBytes-CopyPart-InsertRepeatedBytes-PersAutoDict-InsertRepeatedBytes-ChangeByte-CrossOver-CrossOver-PersAutoDict-PersAutoDict-EraseBytes-ChangeBit-CopyPart-ChangeByte-CopyPart-InsertRepeatedBytes-CrossOver-CrossOver-CrossOver-CrossOver-ShuffleBytes-EraseBytes-InsertByte-InsertRepeatedBytes-CrossOver-EraseBytes-Custom- DE: "\377\377"-"\001\000"-"\001\000"-"\000\000\000\000\000\000\000\000"-"\001\000\000\000"- ``` But if we look at the code doing the printing ```cpp void MutationDispatcher::PrintMutationSequence(bool Verbose) { Printf("MS: %zd ", CurrentMutatorSequence.size()); size_t EntriesToPrint = Verbose ? CurrentMutatorSequence.size() : std::min(kMaxMutationsToPrint, CurrentMutatorSequence.size()); for (size_t i = 0; i < EntriesToPrint; i++) Printf("%s-", CurrentMutatorSequence[i].Name); if (!CurrentDictionaryEntrySequence.empty()) { Printf(" DE: "); EntriesToPrint = Verbose ? CurrentDictionaryEntrySequence.size() : std::min(kMaxMutationsToPrint, CurrentDictionaryEntrySequence.size()); for (size_t i = 0; i < EntriesToPrint; i++) { Printf("\""); PrintASCII(CurrentDictionaryEntrySequence[i]->GetW(), "\"-"); } } } ``` We can see that the `DE: XXX` is not always printed. So the following output is possible(and is from real-life failure), notince the missing of `DE: XXX`. ``` #13613 NEW cov: 5 ft: 5 corp: 4/6b lim: 4096 exec/s: 0 rss: 32Mb L: 2/2 MS: 27 InsertByte-ChangeBinInt-ChangeBinInt-CrossOver-ShuffleBytes-ChangeBit-EraseBytes-ShuffleBytes-InsertByte-InsertRepeatedBytes-CopyPart-InsertByte-ChangeByte-ChangeBit-InsertByte-CrossOver-EraseBytes-CopyPart-ShuffleBytes-EraseBytes-InsertByte-InsertRepeatedBytes-CrossOver-CrossOver-ShuffleBytes-ChangeBit-Custom- #13765 ...... ``` This output is totally legit and will fail that check. So I remove the check for the following strings, I think `MS: {{[0-9]*}} {{(([a-zA-Z]*-){11,})}}` is sufficient for checking the long mutation sequence. This should help resolve the flaky failure of fuzzer-custommutator.test.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM Assembler problem with ADR instruction #13613

ARM Assembler problem with ADR instruction #13613

RichBarton-Arm commented Jun 29, 2012

llvmbot commented Jul 23, 2012

llvmbot commented Jul 23, 2012

llvmbot commented Jul 23, 2012

llvmbot commented Jul 24, 2012

rengolin commented Jan 18, 2016

rengolin commented Nov 26, 2021

ARM Assembler problem with ADR instruction #13613

ARM Assembler problem with ADR instruction #13613

Comments

RichBarton-Arm commented Jun 29, 2012

Extended Description

llvmbot commented Jul 23, 2012

llvmbot commented Jul 23, 2012

llvmbot commented Jul 23, 2012

llvmbot commented Jul 24, 2012

rengolin commented Jan 18, 2016

rengolin commented Nov 26, 2021