LLVM 23.0.0git
X86Disassembler.cpp
Go to the documentation of this file.
1//===-- X86Disassembler.cpp - Disassembler for x86 and x86_64 -------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file is part of the X86 Disassembler.
10// It contains code to translate the data produced by the decoder into
11// MCInsts.
12//
13//
14// The X86 disassembler is a table-driven disassembler for the 16-, 32-, and
15// 64-bit X86 instruction sets. The main decode sequence for an assembly
16// instruction in this disassembler is:
17//
18// 1. Read the prefix bytes and determine the attributes of the instruction.
19// These attributes, recorded in enum attributeBits
20// (X86DisassemblerDecoderCommon.h), form a bitmask. The table CONTEXTS_SYM
21// provides a mapping from bitmasks to contexts, which are represented by
22// enum InstructionContext (ibid.).
23//
24// 2. Read the opcode, and determine what kind of opcode it is. The
25// disassembler distinguishes four kinds of opcodes, which are enumerated in
26// OpcodeType (X86DisassemblerDecoderCommon.h): one-byte (0xnn), two-byte
27// (0x0f 0xnn), three-byte-38 (0x0f 0x38 0xnn), or three-byte-3a
28// (0x0f 0x3a 0xnn). Mandatory prefixes are treated as part of the context.
29//
30// 3. Depending on the opcode type, look in one of four ClassDecision structures
31// (X86DisassemblerDecoderCommon.h). Use the opcode class to determine which
32// OpcodeDecision (ibid.) to look the opcode in. Look up the opcode, to get
33// a ModRMDecision (ibid.).
34//
35// 4. Some instructions, such as escape opcodes or extended opcodes, or even
36// instructions that have ModRM*Reg / ModRM*Mem forms in LLVM, need the
37// ModR/M byte to complete decode. The ModRMDecision's type is an entry from
38// ModRMDecisionType (X86DisassemblerDecoderCommon.h) that indicates if the
39// ModR/M byte is required and how to interpret it.
40//
41// 5. After resolving the ModRMDecision, the disassembler has a unique ID
42// of type InstrUID (X86DisassemblerDecoderCommon.h). Looking this ID up in
43// INSTRUCTIONS_SYM yields the name of the instruction and the encodings and
44// meanings of its operands.
45//
46// 6. For each operand, its encoding is an entry from OperandEncoding
47// (X86DisassemblerDecoderCommon.h) and its type is an entry from
48// OperandType (ibid.). The encoding indicates how to read it from the
49// instruction; the type indicates how to interpret the value once it has
50// been read. For example, a register operand could be stored in the R/M
51// field of the ModR/M byte, the REG field of the ModR/M byte, or added to
52// the main opcode. This is orthogonal from its meaning (an GPR or an XMM
53// register, for instance). Given this information, the operands can be
54// extracted and interpreted.
55//
56// 7. As the last step, the disassembler translates the instruction information
57// and operands into a format understandable by the client - in this case, an
58// MCInst for use by the MC infrastructure.
59//
60// The disassembler is broken broadly into two parts: the table emitter that
61// emits the instruction decode tables discussed above during compilation, and
62// the disassembler itself. The table emitter is documented in more detail in
63// utils/TableGen/X86DisassemblerEmitter.h.
64//
65// X86Disassembler.cpp contains the code responsible for step 7, and for
66// invoking the decoder to execute steps 1-6.
67// X86DisassemblerDecoderCommon.h contains the definitions needed by both the
68// table emitter and the disassembler.
69// X86DisassemblerDecoder.h contains the public interface of the decoder,
70// factored out into C for possible use by other projects.
71// X86DisassemblerDecoder.c contains the source code of the decoder, which is
72// responsible for steps 1-6.
73//
74//===----------------------------------------------------------------------===//
75
80#include "llvm-c/Visibility.h"
81#include "llvm/MC/MCContext.h"
83#include "llvm/MC/MCExpr.h"
84#include "llvm/MC/MCInst.h"
85#include "llvm/MC/MCInstrInfo.h"
89#include "llvm/Support/Debug.h"
90#include "llvm/Support/Format.h"
92
93using namespace llvm;
94using namespace llvm::X86Disassembler;
95
96#define DEBUG_TYPE "x86-disassembler"
97
98#define debug(s) LLVM_DEBUG(dbgs() << __LINE__ << ": " << s);
99
100// Specifies whether a ModR/M byte is needed and (if so) which
101// instruction each possible value of the ModR/M byte corresponds to. Once
102// this information is known, we have narrowed down to a single instruction.
107
108// Specifies which set of ModR/M->instruction tables to look at
109// given a particular opcode.
113
114// Specifies which opcode->instruction tables to look at given
115// a particular context (set of attributes). Since there are many possible
116// contexts, the decoder first uses CONTEXTS_SYM to determine which context
117// applies given a specific set of attributes. Hence there are only IC_max
118// entries in this table, rather than 2^(ATTR_max).
122
123#include "X86GenDisassemblerTables.inc"
124
125static const ModRMDecision &
127 const OpcodeDecision *Decision;
128 switch (Type) {
129 case ONEBYTE:
130 Decision = &ONEBYTE_SYM.opcodeDecisions[Context];
131 break;
132 case TWOBYTE:
133 Decision = &TWOBYTE_SYM.opcodeDecisions[Context];
134 break;
135 case THREEBYTE_38:
136 Decision = &THREEBYTE38_SYM.opcodeDecisions[Context];
137 break;
138 case THREEBYTE_3A:
139 Decision = &THREEBYTE3A_SYM.opcodeDecisions[Context];
140 break;
141 default:
142 static_assert(XOP8_MAP == 4 && MAP7 == 11);
143 unsigned DecisionIndex =
145 Decision = &SPARSE_OPCODE_DECISIONS_SYM[DecisionIndex];
146 break;
147 }
148 return Decision->modRMDecisions[Opcode];
149}
150
152decodeModRM(const ModRMDecision &Decision, uint8_t ModRM) {
153 switch (Decision.modrm_type) {
154 default:
155 llvm_unreachable("Corrupt table! Unknown modrm_type");
156 return 0;
157 case MODRM_ONEENTRY:
158 llvm_unreachable("MODRM_ONEENTRY does not require a ModR/M byte");
159 case MODRM_SPLITRM:
160 if (modFromModRM(ModRM) == 0x3)
161 return modRMTable[Decision.instructionIDs + 1];
162 return modRMTable[Decision.instructionIDs];
163 case MODRM_SPLITREG:
164 if (modFromModRM(ModRM) == 0x3)
165 return modRMTable[Decision.instructionIDs + ((ModRM & 0x38) >> 3) + 8];
166 return modRMTable[Decision.instructionIDs + ((ModRM & 0x38) >> 3)];
167 case MODRM_SPLITMISC:
168 if (modFromModRM(ModRM) == 0x3)
169 return modRMTable[Decision.instructionIDs + (ModRM & 0x3f) + 8];
170 return modRMTable[Decision.instructionIDs + ((ModRM & 0x38) >> 3)];
171 case MODRM_FULL:
172 return modRMTable[Decision.instructionIDs + ModRM];
173 }
174}
175
176static bool peek(struct InternalInstruction *insn, uint8_t &byte) {
177 uint64_t offset = insn->readerCursor - insn->startLocation;
178 if (offset >= insn->bytes.size())
179 return true;
180 byte = insn->bytes[offset];
181 return false;
182}
183
184template <typename T> static bool consume(InternalInstruction *insn, T &ptr) {
185 auto r = insn->bytes;
186 uint64_t offset = insn->readerCursor - insn->startLocation;
187 if (offset + sizeof(T) > r.size())
188 return true;
190 insn->readerCursor += sizeof(T);
191 return false;
192}
193
194static bool isREX(struct InternalInstruction *insn, uint8_t prefix) {
195 return insn->mode == MODE_64BIT && prefix >= 0x40 && prefix <= 0x4f;
196}
197
198static bool isREX2(struct InternalInstruction *insn, uint8_t prefix) {
199 return insn->mode == MODE_64BIT && prefix == 0xd5;
200}
201
202// Consumes all of an instruction's prefix bytes, and marks the
203// instruction as having them. Also sets the instruction's default operand,
204// address, and other relevant data sizes to report operands correctly.
205//
206// insn must not be empty.
207static int readPrefixes(struct InternalInstruction *insn) {
208 bool isPrefix = true;
209 uint8_t byte = 0;
211
212 LLVM_DEBUG(dbgs() << "readPrefixes()");
213
214 while (isPrefix) {
215 // If we fail reading prefixes, just stop here and let the opcode reader
216 // deal with it.
217 if (consume(insn, byte))
218 break;
219
220 // If the byte is a LOCK/REP/REPNE prefix and not a part of the opcode, then
221 // break and let it be disassembled as a normal "instruction".
222 if (insn->readerCursor - 1 == insn->startLocation && byte == 0xf0) // LOCK
223 break;
224
225 if ((byte == 0xf2 || byte == 0xf3) && !peek(insn, nextByte)) {
226 // If the byte is 0xf2 or 0xf3, and any of the following conditions are
227 // met:
228 // - it is followed by a LOCK (0xf0) prefix
229 // - it is followed by an xchg instruction
230 // then it should be disassembled as a xacquire/xrelease not repne/rep.
231 if (((nextByte == 0xf0) ||
232 ((nextByte & 0xfe) == 0x86 || (nextByte & 0xf8) == 0x90))) {
233 insn->xAcquireRelease = true;
234 if (!(byte == 0xf3 && nextByte == 0x90)) // PAUSE instruction support
235 break;
236 }
237 // Also if the byte is 0xf3, and the following condition is met:
238 // - it is followed by a "mov mem, reg" (opcode 0x88/0x89) or
239 // "mov mem, imm" (opcode 0xc6/0xc7) instructions.
240 // then it should be disassembled as an xrelease not rep.
241 if (byte == 0xf3 && (nextByte == 0x88 || nextByte == 0x89 ||
242 nextByte == 0xc6 || nextByte == 0xc7)) {
243 insn->xAcquireRelease = true;
244 break;
245 }
246 if (isREX(insn, nextByte)) {
247 uint8_t nnextByte;
248 // Go to REX prefix after the current one
249 if (consume(insn, nnextByte))
250 return -1;
251 // We should be able to read next byte after REX prefix
252 if (peek(insn, nnextByte))
253 return -1;
254 --insn->readerCursor;
255 }
256 }
257
258 switch (byte) {
259 case 0xf0: // LOCK
260 insn->hasLockPrefix = true;
261 break;
262 case 0xf2: // REPNE/REPNZ
263 case 0xf3: { // REP or REPE/REPZ
265 if (peek(insn, nextByte))
266 break;
267 // TODO:
268 // 1. There could be several 0x66
269 // 2. if (nextByte == 0x66) and nextNextByte != 0x0f then
270 // it's not mandatory prefix
271 // 3. if (nextByte >= 0x40 && nextByte <= 0x4f) it's REX and we need
272 // 0x0f exactly after it to be mandatory prefix
273 // 4. if (nextByte == 0xd5) it's REX2 and we need
274 // 0x0f exactly after it to be mandatory prefix
275 if (isREX(insn, nextByte) || isREX2(insn, nextByte) || nextByte == 0x0f ||
276 nextByte == 0x66)
277 // The last of 0xf2 /0xf3 is mandatory prefix
278 insn->mandatoryPrefix = byte;
279 insn->repeatPrefix = byte;
280 break;
281 }
282 case 0x2e: // CS segment override -OR- Branch not taken
284 break;
285 case 0x36: // SS segment override -OR- Branch taken
287 break;
288 case 0x3e: // DS segment override
290 break;
291 case 0x26: // ES segment override
293 break;
294 case 0x64: // FS segment override
296 break;
297 case 0x65: // GS segment override
299 break;
300 case 0x66: { // Operand-size override {
302 insn->hasOpSize = true;
303 if (peek(insn, nextByte))
304 break;
305 // 0x66 can't overwrite existing mandatory prefix and should be ignored
306 if (!insn->mandatoryPrefix && (nextByte == 0x0f || isREX(insn, nextByte)))
307 insn->mandatoryPrefix = byte;
308 break;
309 }
310 case 0x67: // Address-size override
311 insn->hasAdSize = true;
312 break;
313 default: // Not a prefix byte
314 isPrefix = false;
315 break;
316 }
317
318 if (isREX(insn, byte)) {
319 insn->rexPrefix = byte;
320 isPrefix = true;
321 LLVM_DEBUG(dbgs() << format("Found REX prefix 0x%hhx", byte));
322 } else if (isPrefix) {
323 insn->rexPrefix = 0;
324 }
325
326 if (isPrefix)
327 LLVM_DEBUG(dbgs() << format("Found prefix 0x%hhx", byte));
328 }
329
331
332 if (byte == 0x62) {
333 uint8_t byte1, byte2;
334 if (consume(insn, byte1)) {
335 LLVM_DEBUG(dbgs() << "Couldn't read second byte of EVEX prefix");
336 return -1;
337 }
338
339 if (peek(insn, byte2)) {
340 LLVM_DEBUG(dbgs() << "Couldn't read third byte of EVEX prefix");
341 return -1;
342 }
343
344 if ((insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0)) {
346 } else {
347 --insn->readerCursor; // unconsume byte1
348 --insn->readerCursor; // unconsume byte
349 }
350
351 if (insn->vectorExtensionType == TYPE_EVEX) {
352 insn->vectorExtensionPrefix[0] = byte;
353 insn->vectorExtensionPrefix[1] = byte1;
354 if (consume(insn, insn->vectorExtensionPrefix[2])) {
355 LLVM_DEBUG(dbgs() << "Couldn't read third byte of EVEX prefix");
356 return -1;
357 }
358 if (consume(insn, insn->vectorExtensionPrefix[3])) {
359 LLVM_DEBUG(dbgs() << "Couldn't read fourth byte of EVEX prefix");
360 return -1;
361 }
362
363 if (insn->mode == MODE_64BIT) {
364 // We simulate the REX prefix for simplicity's sake
365 insn->rexPrefix = 0x40 |
366 (wFromEVEX3of4(insn->vectorExtensionPrefix[2]) << 3) |
367 (rFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 2) |
368 (xFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 1) |
369 (bFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 0);
370
371 // We simulate the REX2 prefix for simplicity's sake
372 insn->rex2ExtensionPrefix[1] =
373 (r2FromEVEX2of4(insn->vectorExtensionPrefix[1]) << 6) |
374 (uFromEVEX3of4(insn->vectorExtensionPrefix[2]) << 5) |
375 (b2FromEVEX2of4(insn->vectorExtensionPrefix[1]) << 4);
376 }
377
379 dbgs() << format(
380 "Found EVEX prefix 0x%hhx 0x%hhx 0x%hhx 0x%hhx",
382 insn->vectorExtensionPrefix[2], insn->vectorExtensionPrefix[3]));
383 }
384 } else if (byte == 0xc4) {
385 uint8_t byte1;
386 if (peek(insn, byte1)) {
387 LLVM_DEBUG(dbgs() << "Couldn't read second byte of VEX");
388 return -1;
389 }
390
391 if (insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0)
393 else
394 --insn->readerCursor;
395
396 if (insn->vectorExtensionType == TYPE_VEX_3B) {
397 insn->vectorExtensionPrefix[0] = byte;
398 consume(insn, insn->vectorExtensionPrefix[1]);
399 consume(insn, insn->vectorExtensionPrefix[2]);
400
401 // We simulate the REX prefix for simplicity's sake
402
403 if (insn->mode == MODE_64BIT)
404 insn->rexPrefix = 0x40 |
405 (wFromVEX3of3(insn->vectorExtensionPrefix[2]) << 3) |
406 (rFromVEX2of3(insn->vectorExtensionPrefix[1]) << 2) |
407 (xFromVEX2of3(insn->vectorExtensionPrefix[1]) << 1) |
408 (bFromVEX2of3(insn->vectorExtensionPrefix[1]) << 0);
409
410 LLVM_DEBUG(dbgs() << format("Found VEX prefix 0x%hhx 0x%hhx 0x%hhx",
411 insn->vectorExtensionPrefix[0],
412 insn->vectorExtensionPrefix[1],
413 insn->vectorExtensionPrefix[2]));
414 }
415 } else if (byte == 0xc5) {
416 uint8_t byte1;
417 if (peek(insn, byte1)) {
418 LLVM_DEBUG(dbgs() << "Couldn't read second byte of VEX");
419 return -1;
420 }
421
422 if (insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0)
424 else
425 --insn->readerCursor;
426
427 if (insn->vectorExtensionType == TYPE_VEX_2B) {
428 insn->vectorExtensionPrefix[0] = byte;
429 consume(insn, insn->vectorExtensionPrefix[1]);
430
431 if (insn->mode == MODE_64BIT)
432 insn->rexPrefix =
433 0x40 | (rFromVEX2of2(insn->vectorExtensionPrefix[1]) << 2);
434
435 switch (ppFromVEX2of2(insn->vectorExtensionPrefix[1])) {
436 default:
437 break;
438 case VEX_PREFIX_66:
439 insn->hasOpSize = true;
440 break;
441 }
442
443 LLVM_DEBUG(dbgs() << format("Found VEX prefix 0x%hhx 0x%hhx",
444 insn->vectorExtensionPrefix[0],
445 insn->vectorExtensionPrefix[1]));
446 }
447 } else if (byte == 0x8f) {
448 uint8_t byte1;
449 if (peek(insn, byte1)) {
450 LLVM_DEBUG(dbgs() << "Couldn't read second byte of XOP");
451 return -1;
452 }
453
454 if ((byte1 & 0x38) != 0x0) // 0 in these 3 bits is a POP instruction.
456 else
457 --insn->readerCursor;
458
459 if (insn->vectorExtensionType == TYPE_XOP) {
460 insn->vectorExtensionPrefix[0] = byte;
461 consume(insn, insn->vectorExtensionPrefix[1]);
462 consume(insn, insn->vectorExtensionPrefix[2]);
463
464 // We simulate the REX prefix for simplicity's sake
465
466 if (insn->mode == MODE_64BIT)
467 insn->rexPrefix = 0x40 |
468 (wFromXOP3of3(insn->vectorExtensionPrefix[2]) << 3) |
469 (rFromXOP2of3(insn->vectorExtensionPrefix[1]) << 2) |
470 (xFromXOP2of3(insn->vectorExtensionPrefix[1]) << 1) |
471 (bFromXOP2of3(insn->vectorExtensionPrefix[1]) << 0);
472
473 switch (ppFromXOP3of3(insn->vectorExtensionPrefix[2])) {
474 default:
475 break;
476 case VEX_PREFIX_66:
477 insn->hasOpSize = true;
478 break;
479 }
480
481 LLVM_DEBUG(dbgs() << format("Found XOP prefix 0x%hhx 0x%hhx 0x%hhx",
482 insn->vectorExtensionPrefix[0],
483 insn->vectorExtensionPrefix[1],
484 insn->vectorExtensionPrefix[2]));
485 }
486 } else if (isREX2(insn, byte)) {
487 uint8_t byte1;
488 if (peek(insn, byte1)) {
489 LLVM_DEBUG(dbgs() << "Couldn't read second byte of REX2");
490 return -1;
491 }
492 insn->rex2ExtensionPrefix[0] = byte;
493 consume(insn, insn->rex2ExtensionPrefix[1]);
494
495 // We simulate the REX prefix for simplicity's sake
496 insn->rexPrefix = 0x40 | (wFromREX2(insn->rex2ExtensionPrefix[1]) << 3) |
497 (rFromREX2(insn->rex2ExtensionPrefix[1]) << 2) |
498 (xFromREX2(insn->rex2ExtensionPrefix[1]) << 1) |
499 (bFromREX2(insn->rex2ExtensionPrefix[1]) << 0);
500 LLVM_DEBUG(dbgs() << format("Found REX2 prefix 0x%hhx 0x%hhx",
501 insn->rex2ExtensionPrefix[0],
502 insn->rex2ExtensionPrefix[1]));
503 } else
504 --insn->readerCursor;
505
506 if (insn->mode == MODE_16BIT) {
507 insn->registerSize = (insn->hasOpSize ? 4 : 2);
508 insn->addressSize = (insn->hasAdSize ? 4 : 2);
509 insn->displacementSize = (insn->hasAdSize ? 4 : 2);
510 insn->immediateSize = (insn->hasOpSize ? 4 : 2);
511 } else if (insn->mode == MODE_32BIT) {
512 insn->registerSize = (insn->hasOpSize ? 2 : 4);
513 insn->addressSize = (insn->hasAdSize ? 2 : 4);
514 insn->displacementSize = (insn->hasAdSize ? 2 : 4);
515 insn->immediateSize = (insn->hasOpSize ? 2 : 4);
516 } else if (insn->mode == MODE_64BIT) {
517 insn->displacementSize = 4;
518 if (insn->rexPrefix && wFromREX(insn->rexPrefix)) {
519 insn->registerSize = 8;
520 insn->addressSize = (insn->hasAdSize ? 4 : 8);
521 insn->immediateSize = 4;
522 insn->hasOpSize = false;
523 } else {
524 insn->registerSize = (insn->hasOpSize ? 2 : 4);
525 insn->addressSize = (insn->hasAdSize ? 4 : 8);
526 insn->immediateSize = (insn->hasOpSize ? 2 : 4);
527 }
528 }
529
530 return 0;
531}
532
533// Consumes the SIB byte to determine addressing information.
534static int readSIB(struct InternalInstruction *insn) {
535 SIBBase sibBaseBase = SIB_BASE_NONE;
536 uint8_t index, base;
537
538 LLVM_DEBUG(dbgs() << "readSIB()");
539 switch (insn->addressSize) {
540 case 2:
541 default:
542 llvm_unreachable("SIB-based addressing doesn't work in 16-bit mode");
543 case 4:
544 insn->sibIndexBase = SIB_INDEX_EAX;
545 sibBaseBase = SIB_BASE_EAX;
546 break;
547 case 8:
548 insn->sibIndexBase = SIB_INDEX_RAX;
549 sibBaseBase = SIB_BASE_RAX;
550 break;
551 }
552
553 if (consume(insn, insn->sib))
554 return -1;
555
556 index = indexFromSIB(insn->sib) | (xFromREX(insn->rexPrefix) << 3) |
557 (x2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
558
559 if (index == 0x4) {
560 insn->sibIndex = SIB_INDEX_NONE;
561 } else {
562 insn->sibIndex = (SIBIndex)(insn->sibIndexBase + index);
563 }
564
565 insn->sibScale = 1 << scaleFromSIB(insn->sib);
566
567 base = baseFromSIB(insn->sib) | (bFromREX(insn->rexPrefix) << 3) |
568 (b2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
569
570 switch (base) {
571 case 0x5:
572 case 0xd:
573 switch (modFromModRM(insn->modRM)) {
574 case 0x0:
576 insn->sibBase = SIB_BASE_NONE;
577 break;
578 case 0x1:
580 insn->sibBase = (SIBBase)(sibBaseBase + base);
581 break;
582 case 0x2:
584 insn->sibBase = (SIBBase)(sibBaseBase + base);
585 break;
586 default:
587 llvm_unreachable("Cannot have Mod = 0b11 and a SIB byte");
588 }
589 break;
590 default:
591 insn->sibBase = (SIBBase)(sibBaseBase + base);
592 break;
593 }
594
595 return 0;
596}
597
598static int readDisplacement(struct InternalInstruction *insn) {
599 int8_t d8;
600 int16_t d16;
601 int32_t d32;
602 LLVM_DEBUG(dbgs() << "readDisplacement()");
603
604 insn->displacementOffset = insn->readerCursor - insn->startLocation;
605 switch (insn->eaDisplacement) {
606 case EA_DISP_NONE:
607 break;
608 case EA_DISP_8:
609 if (consume(insn, d8))
610 return -1;
611 insn->displacement = d8;
612 break;
613 case EA_DISP_16:
614 if (consume(insn, d16))
615 return -1;
616 insn->displacement = d16;
617 break;
618 case EA_DISP_32:
619 if (consume(insn, d32))
620 return -1;
621 insn->displacement = d32;
622 break;
623 }
624
625 return 0;
626}
627
628// Consumes all addressing information (ModR/M byte, SIB byte, and displacement.
629static int readModRM(struct InternalInstruction *insn) {
630 uint8_t mod, rm, reg;
631 LLVM_DEBUG(dbgs() << "readModRM()");
632
633 if (insn->consumedModRM)
634 return 0;
635
636 if (consume(insn, insn->modRM))
637 return -1;
638 insn->consumedModRM = true;
639
640 mod = modFromModRM(insn->modRM);
641 rm = rmFromModRM(insn->modRM);
642 reg = regFromModRM(insn->modRM);
643
644 // This goes by insn->registerSize to pick the correct register, which messes
645 // up if we're using (say) XMM or 8-bit register operands. That gets fixed in
646 // fixupReg().
647 switch (insn->registerSize) {
648 case 2:
649 insn->regBase = MODRM_REG_AX;
650 insn->eaRegBase = EA_REG_AX;
651 break;
652 case 4:
653 insn->regBase = MODRM_REG_EAX;
654 insn->eaRegBase = EA_REG_EAX;
655 break;
656 case 8:
657 insn->regBase = MODRM_REG_RAX;
658 insn->eaRegBase = EA_REG_RAX;
659 break;
660 }
661
662 reg |= (rFromREX(insn->rexPrefix) << 3) |
663 (r2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
664 rm |= (bFromREX(insn->rexPrefix) << 3) |
665 (b2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
666
667 if (insn->vectorExtensionType == TYPE_EVEX && insn->mode == MODE_64BIT)
668 reg |= r2FromEVEX2of4(insn->vectorExtensionPrefix[1]) << 4;
669
670 insn->reg = (Reg)(insn->regBase + reg);
671
672 switch (insn->addressSize) {
673 case 2: {
674 EABase eaBaseBase = EA_BASE_BX_SI;
675
676 switch (mod) {
677 case 0x0:
678 if (rm == 0x6) {
679 insn->eaBase = EA_BASE_NONE;
681 if (readDisplacement(insn))
682 return -1;
683 } else {
684 insn->eaBase = (EABase)(eaBaseBase + rm);
686 }
687 break;
688 case 0x1:
689 insn->eaBase = (EABase)(eaBaseBase + rm);
691 insn->displacementSize = 1;
692 if (readDisplacement(insn))
693 return -1;
694 break;
695 case 0x2:
696 insn->eaBase = (EABase)(eaBaseBase + rm);
698 if (readDisplacement(insn))
699 return -1;
700 break;
701 case 0x3:
702 insn->eaBase = (EABase)(insn->eaRegBase + rm);
703 if (readDisplacement(insn))
704 return -1;
705 break;
706 }
707 break;
708 }
709 case 4:
710 case 8: {
711 EABase eaBaseBase = (insn->addressSize == 4 ? EA_BASE_EAX : EA_BASE_RAX);
712
713 switch (mod) {
714 case 0x0:
715 insn->eaDisplacement = EA_DISP_NONE; // readSIB may override this
716 // In determining whether RIP-relative mode is used (rm=5),
717 // or whether a SIB byte is present (rm=4),
718 // the extension bits (REX.b and EVEX.x) are ignored.
719 switch (rm & 7) {
720 case 0x4: // SIB byte is present
721 insn->eaBase = (insn->addressSize == 4 ? EA_BASE_sib : EA_BASE_sib64);
722 if (readSIB(insn) || readDisplacement(insn))
723 return -1;
724 break;
725 case 0x5: // RIP-relative
726 insn->eaBase = EA_BASE_NONE;
728 if (readDisplacement(insn))
729 return -1;
730 break;
731 default:
732 insn->eaBase = (EABase)(eaBaseBase + rm);
733 break;
734 }
735 break;
736 case 0x1:
737 insn->displacementSize = 1;
738 [[fallthrough]];
739 case 0x2:
740 insn->eaDisplacement = (mod == 0x1 ? EA_DISP_8 : EA_DISP_32);
741 switch (rm & 7) {
742 case 0x4: // SIB byte is present
743 insn->eaBase = EA_BASE_sib;
744 if (readSIB(insn) || readDisplacement(insn))
745 return -1;
746 break;
747 default:
748 insn->eaBase = (EABase)(eaBaseBase + rm);
749 if (readDisplacement(insn))
750 return -1;
751 break;
752 }
753 break;
754 case 0x3:
756 insn->eaBase = (EABase)(insn->eaRegBase + rm);
757 break;
758 }
759 break;
760 }
761 } // switch (insn->addressSize)
762
763 return 0;
764}
765
766#define GENERIC_FIXUP_FUNC(name, base, prefix) \
767 static uint16_t name(struct InternalInstruction *insn, OperandType type, \
768 uint8_t index, uint8_t *valid) { \
769 *valid = 1; \
770 switch (type) { \
771 default: \
772 debug("Unhandled register type"); \
773 *valid = 0; \
774 return 0; \
775 case TYPE_Rv: \
776 return base + index; \
777 case TYPE_R8: \
778 if (insn->rexPrefix && index >= 4 && index <= 7) \
779 return prefix##_SPL + (index - 4); \
780 else \
781 return prefix##_AL + index; \
782 case TYPE_R16: \
783 return prefix##_AX + index; \
784 case TYPE_R32: \
785 return prefix##_EAX + index; \
786 case TYPE_R64: \
787 return prefix##_RAX + index; \
788 case TYPE_ZMM: \
789 return prefix##_ZMM0 + index; \
790 case TYPE_YMM: \
791 return prefix##_YMM0 + index; \
792 case TYPE_XMM: \
793 return prefix##_XMM0 + index; \
794 case TYPE_TMM: \
795 if (index > 7) \
796 *valid = 0; \
797 return prefix##_TMM0 + index; \
798 case TYPE_VK: \
799 index &= 0xf; \
800 if (index > 7) \
801 *valid = 0; \
802 return prefix##_K0 + index; \
803 case TYPE_VK_PAIR: \
804 if (index > 7) \
805 *valid = 0; \
806 return prefix##_K0_K1 + (index / 2); \
807 case TYPE_MM64: \
808 return prefix##_MM0 + (index & 0x7); \
809 case TYPE_SEGMENTREG: \
810 if ((index & 7) > 5) \
811 *valid = 0; \
812 return prefix##_ES + (index & 7); \
813 case TYPE_DEBUGREG: \
814 if (index > 15) \
815 *valid = 0; \
816 return prefix##_DR0 + index; \
817 case TYPE_CONTROLREG: \
818 if (index > 15) \
819 *valid = 0; \
820 return prefix##_CR0 + index; \
821 case TYPE_MVSIBX: \
822 return prefix##_XMM0 + index; \
823 case TYPE_MVSIBY: \
824 return prefix##_YMM0 + index; \
825 case TYPE_MVSIBZ: \
826 return prefix##_ZMM0 + index; \
827 } \
828 }
829
830// Consult an operand type to determine the meaning of the reg or R/M field. If
831// the operand is an XMM operand, for example, an operand would be XMM0 instead
832// of AX, which readModRM() would otherwise misinterpret it as.
833//
834// @param insn - The instruction containing the operand.
835// @param type - The operand type.
836// @param index - The existing value of the field as reported by readModRM().
837// @param valid - The address of a uint8_t. The target is set to 1 if the
838// field is valid for the register class; 0 if not.
839// @return - The proper value.
840GENERIC_FIXUP_FUNC(fixupRegValue, insn->regBase, MODRM_REG)
841GENERIC_FIXUP_FUNC(fixupRMValue, insn->eaRegBase, EA_REG)
842
843// Consult an operand specifier to determine which of the fixup*Value functions
844// to use in correcting readModRM()'ss interpretation.
845//
846// @param insn - See fixup*Value().
847// @param op - The operand specifier.
848// @return - 0 if fixup was successful; -1 if the register returned was
849// invalid for its class.
850static int fixupReg(struct InternalInstruction *insn,
851 const struct OperandSpecifier *op) {
852 uint8_t valid;
853 LLVM_DEBUG(dbgs() << "fixupReg()");
854
855 switch ((OperandEncoding)op->encoding) {
856 default:
857 debug("Expected a REG or R/M encoding in fixupReg");
858 return -1;
859 case ENCODING_VVVV:
860 insn->vvvv =
861 (Reg)fixupRegValue(insn, (OperandType)op->type, insn->vvvv, &valid);
862 if (!valid)
863 return -1;
864 break;
865 case ENCODING_REG:
866 insn->reg = (Reg)fixupRegValue(insn, (OperandType)op->type,
867 insn->reg - insn->regBase, &valid);
868 if (!valid)
869 return -1;
870 break;
872 if (insn->vectorExtensionType == TYPE_EVEX && insn->mode == MODE_64BIT &&
873 modFromModRM(insn->modRM) == 3) {
874 // EVEX_X can extend the register id to 32 for a non-GPR register that is
875 // encoded in RM.
876 // mode : MODE_64_BIT
877 // Only 8 vector registers are available in 32 bit mode
878 // mod : 3
879 // RM encodes a register
880 switch (op->type) {
881 case TYPE_Rv:
882 case TYPE_R8:
883 case TYPE_R16:
884 case TYPE_R32:
885 case TYPE_R64:
886 break;
887 default:
888 insn->eaBase =
889 (EABase)(insn->eaBase +
890 (xFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 4));
891 break;
892 }
893 }
894 [[fallthrough]];
895 case ENCODING_SIB:
896 if (insn->eaBase >= insn->eaRegBase) {
897 insn->eaBase = (EABase)fixupRMValue(
898 insn, (OperandType)op->type, insn->eaBase - insn->eaRegBase, &valid);
899 if (!valid)
900 return -1;
901 }
902 break;
903 }
904
905 return 0;
906}
907
908// Read the opcode (except the ModR/M byte in the case of extended or escape
909// opcodes).
910static bool readOpcode(struct InternalInstruction *insn) {
911 uint8_t current;
912 LLVM_DEBUG(dbgs() << "readOpcode()");
913
914 insn->opcodeType = ONEBYTE;
915 if (insn->vectorExtensionType == TYPE_EVEX) {
916 switch (mmmFromEVEX2of4(insn->vectorExtensionPrefix[1])) {
917 default:
919 dbgs() << format("Unhandled mmm field for instruction (0x%hhx)",
921 return true;
922 case VEX_LOB_0F:
923 insn->opcodeType = TWOBYTE;
924 return consume(insn, insn->opcode);
925 case VEX_LOB_0F38:
926 insn->opcodeType = THREEBYTE_38;
927 return consume(insn, insn->opcode);
928 case VEX_LOB_0F3A:
929 insn->opcodeType = THREEBYTE_3A;
930 return consume(insn, insn->opcode);
931 case VEX_LOB_MAP4:
932 insn->opcodeType = MAP4;
933 return consume(insn, insn->opcode);
934 case VEX_LOB_MAP5:
935 insn->opcodeType = MAP5;
936 return consume(insn, insn->opcode);
937 case VEX_LOB_MAP6:
938 insn->opcodeType = MAP6;
939 return consume(insn, insn->opcode);
940 case VEX_LOB_MAP7:
941 insn->opcodeType = MAP7;
942 return consume(insn, insn->opcode);
943 }
944 } else if (insn->vectorExtensionType == TYPE_VEX_3B) {
945 switch (mmmmmFromVEX2of3(insn->vectorExtensionPrefix[1])) {
946 default:
948 dbgs() << format("Unhandled m-mmmm field for instruction (0x%hhx)",
950 return true;
951 case VEX_LOB_0F:
952 insn->opcodeType = TWOBYTE;
953 return consume(insn, insn->opcode);
954 case VEX_LOB_0F38:
955 insn->opcodeType = THREEBYTE_38;
956 return consume(insn, insn->opcode);
957 case VEX_LOB_0F3A:
958 insn->opcodeType = THREEBYTE_3A;
959 return consume(insn, insn->opcode);
960 case VEX_LOB_MAP5:
961 insn->opcodeType = MAP5;
962 return consume(insn, insn->opcode);
963 case VEX_LOB_MAP6:
964 insn->opcodeType = MAP6;
965 return consume(insn, insn->opcode);
966 case VEX_LOB_MAP7:
967 insn->opcodeType = MAP7;
968 return consume(insn, insn->opcode);
969 }
970 } else if (insn->vectorExtensionType == TYPE_VEX_2B) {
971 insn->opcodeType = TWOBYTE;
972 return consume(insn, insn->opcode);
973 } else if (insn->vectorExtensionType == TYPE_XOP) {
974 switch (mmmmmFromXOP2of3(insn->vectorExtensionPrefix[1])) {
975 default:
977 dbgs() << format("Unhandled m-mmmm field for instruction (0x%hhx)",
979 return true;
980 case XOP_MAP_SELECT_8:
981 insn->opcodeType = XOP8_MAP;
982 return consume(insn, insn->opcode);
983 case XOP_MAP_SELECT_9:
984 insn->opcodeType = XOP9_MAP;
985 return consume(insn, insn->opcode);
986 case XOP_MAP_SELECT_A:
987 insn->opcodeType = XOPA_MAP;
988 return consume(insn, insn->opcode);
989 }
990 } else if (mFromREX2(insn->rex2ExtensionPrefix[1])) {
991 // m bit indicates opcode map 1
992 insn->opcodeType = TWOBYTE;
993 return consume(insn, insn->opcode);
994 }
995
996 if (consume(insn, current))
997 return true;
998
999 if (current == 0x0f) {
1000 LLVM_DEBUG(
1001 dbgs() << format("Found a two-byte escape prefix (0x%hhx)", current));
1002 if (consume(insn, current))
1003 return true;
1004
1005 if (current == 0x38) {
1006 LLVM_DEBUG(dbgs() << format("Found a three-byte escape prefix (0x%hhx)",
1007 current));
1008 if (consume(insn, current))
1009 return true;
1010
1011 insn->opcodeType = THREEBYTE_38;
1012 } else if (current == 0x3a) {
1013 LLVM_DEBUG(dbgs() << format("Found a three-byte escape prefix (0x%hhx)",
1014 current));
1015 if (consume(insn, current))
1016 return true;
1017
1018 insn->opcodeType = THREEBYTE_3A;
1019 } else if (current == 0x0f) {
1020 LLVM_DEBUG(
1021 dbgs() << format("Found a 3dnow escape prefix (0x%hhx)", current));
1022
1023 // Consume operands before the opcode to comply with the 3DNow encoding
1024 if (readModRM(insn))
1025 return true;
1026
1027 if (consume(insn, current))
1028 return true;
1029
1030 insn->opcodeType = THREEDNOW_MAP;
1031 } else {
1032 LLVM_DEBUG(dbgs() << "Didn't find a three-byte escape prefix");
1033 insn->opcodeType = TWOBYTE;
1034 }
1035 } else if (insn->mandatoryPrefix)
1036 // The opcode with mandatory prefix must start with opcode escape.
1037 // If not it's legacy repeat prefix
1038 insn->mandatoryPrefix = 0;
1039
1040 // At this point we have consumed the full opcode.
1041 // Anything we consume from here on must be unconsumed.
1042 insn->opcode = current;
1043
1044 return false;
1045}
1046
1047// Determine whether equiv is the 16-bit equivalent of orig (32-bit or 64-bit).
1048static bool is16BitEquivalent(const char *orig, const char *equiv) {
1049 for (int i = 0;; i++) {
1050 if (orig[i] == '\0' && equiv[i] == '\0')
1051 return true;
1052 if (orig[i] == '\0' || equiv[i] == '\0')
1053 return false;
1054 if (orig[i] != equiv[i]) {
1055 if ((orig[i] == 'Q' || orig[i] == 'L') && equiv[i] == 'W')
1056 continue;
1057 if ((orig[i] == '6' || orig[i] == '3') && equiv[i] == '1')
1058 continue;
1059 if ((orig[i] == '4' || orig[i] == '2') && equiv[i] == '6')
1060 continue;
1061 return false;
1062 }
1063 }
1064}
1065
1066// Determine whether this instruction is a 64-bit instruction.
1067static bool is64Bit(const char *name) {
1068 for (int i = 0;; ++i) {
1069 if (name[i] == '\0')
1070 return false;
1071 if (name[i] == '6' && name[i + 1] == '4')
1072 return true;
1073 }
1074}
1075
1076// Determine the ID of an instruction, consuming the ModR/M byte as appropriate
1077// for extended and escape opcodes, and using a supplied attribute mask.
1078static int getInstructionIDWithAttrMask(uint16_t *instructionID,
1079 struct InternalInstruction *insn,
1080 uint16_t attrMask) {
1081 auto insnCtx = InstructionContext(x86DisassemblerContexts[attrMask]);
1082 const ModRMDecision &Decision =
1083 getDecision(insn->opcodeType, insnCtx, insn->opcode);
1084
1085 if (Decision.modrm_type != MODRM_ONEENTRY) {
1086 if (readModRM(insn))
1087 return -1;
1088 *instructionID = decodeModRM(Decision, insn->modRM);
1089 } else {
1090 *instructionID = modRMTable[Decision.instructionIDs];
1091 }
1092
1093 return 0;
1094}
1095
1097 if (insn->opcodeType != MAP4)
1098 return false;
1099 if (insn->opcode == 0x83 && regFromModRM(insn->modRM) == 7)
1100 return true;
1101 switch (insn->opcode & 0xfe) {
1102 default:
1103 return false;
1104 case 0x38:
1105 case 0x3a:
1106 case 0x84:
1107 return true;
1108 case 0x80:
1109 return regFromModRM(insn->modRM) == 7;
1110 case 0xf6:
1111 return regFromModRM(insn->modRM) == 0;
1112 }
1113}
1114
1115static bool isNF(InternalInstruction *insn) {
1117 return false;
1118 if (insn->opcodeType == MAP4)
1119 return true;
1120 // Below NF instructions are not in map4.
1121 if (insn->opcodeType == THREEBYTE_38 &&
1123 switch (insn->opcode) {
1124 case 0xf2: // ANDN
1125 case 0xf3: // BLSI, BLSR, BLSMSK
1126 case 0xf5: // BZHI
1127 case 0xf7: // BEXTR
1128 return true;
1129 default:
1130 break;
1131 }
1132 }
1133 return false;
1134}
1135
1136// Determine the ID of an instruction, consuming the ModR/M byte as appropriate
1137// for extended and escape opcodes. Determines the attributes and context for
1138// the instruction before doing so.
1140 const MCInstrInfo *mii) {
1141 uint16_t attrMask;
1142 uint16_t instructionID;
1143
1144 LLVM_DEBUG(dbgs() << "getID()");
1145
1146 attrMask = ATTR_NONE;
1147
1148 if (insn->mode == MODE_64BIT)
1149 attrMask |= ATTR_64BIT;
1150
1151 if (insn->vectorExtensionType != TYPE_NO_VEX_XOP) {
1152 attrMask |= (insn->vectorExtensionType == TYPE_EVEX) ? ATTR_EVEX : ATTR_VEX;
1153
1154 if (insn->vectorExtensionType == TYPE_EVEX) {
1155 switch (ppFromEVEX3of4(insn->vectorExtensionPrefix[2])) {
1156 case VEX_PREFIX_66:
1157 attrMask |= ATTR_OPSIZE;
1158 break;
1159 case VEX_PREFIX_F3:
1160 attrMask |= ATTR_XS;
1161 break;
1162 case VEX_PREFIX_F2:
1163 attrMask |= ATTR_XD;
1164 break;
1165 }
1166
1168 attrMask |= ATTR_EVEXKZ;
1169 if (isNF(insn) && !readModRM(insn) &&
1170 !isCCMPOrCTEST(insn)) // NF bit is the MSB of aaa.
1171 attrMask |= ATTR_EVEXNF;
1172 // aaa is not used a opmask in MAP4
1173 else if (aaaFromEVEX4of4(insn->vectorExtensionPrefix[3]) &&
1174 (insn->opcodeType != MAP4))
1175 attrMask |= ATTR_EVEXK;
1176 if (bFromEVEX4of4(insn->vectorExtensionPrefix[3])) {
1177 attrMask |= ATTR_EVEXB;
1178 if (uFromEVEX3of4(insn->vectorExtensionPrefix[2]) && !readModRM(insn) &&
1179 modFromModRM(insn->modRM) == 3)
1180 attrMask |= ATTR_EVEXU;
1181 }
1183 attrMask |= ATTR_VEXL;
1185 attrMask |= ATTR_EVEXL2;
1186 } else if (insn->vectorExtensionType == TYPE_VEX_3B) {
1187 switch (ppFromVEX3of3(insn->vectorExtensionPrefix[2])) {
1188 case VEX_PREFIX_66:
1189 attrMask |= ATTR_OPSIZE;
1190 break;
1191 case VEX_PREFIX_F3:
1192 attrMask |= ATTR_XS;
1193 break;
1194 case VEX_PREFIX_F2:
1195 attrMask |= ATTR_XD;
1196 break;
1197 }
1198
1199 if (lFromVEX3of3(insn->vectorExtensionPrefix[2]))
1200 attrMask |= ATTR_VEXL;
1201 } else if (insn->vectorExtensionType == TYPE_VEX_2B) {
1202 switch (ppFromVEX2of2(insn->vectorExtensionPrefix[1])) {
1203 case VEX_PREFIX_66:
1204 attrMask |= ATTR_OPSIZE;
1205 if (insn->hasAdSize)
1206 attrMask |= ATTR_ADSIZE;
1207 break;
1208 case VEX_PREFIX_F3:
1209 attrMask |= ATTR_XS;
1210 break;
1211 case VEX_PREFIX_F2:
1212 attrMask |= ATTR_XD;
1213 break;
1214 }
1215
1216 if (lFromVEX2of2(insn->vectorExtensionPrefix[1]))
1217 attrMask |= ATTR_VEXL;
1218 } else if (insn->vectorExtensionType == TYPE_XOP) {
1219 switch (ppFromXOP3of3(insn->vectorExtensionPrefix[2])) {
1220 case VEX_PREFIX_66:
1221 attrMask |= ATTR_OPSIZE;
1222 break;
1223 case VEX_PREFIX_F3:
1224 attrMask |= ATTR_XS;
1225 break;
1226 case VEX_PREFIX_F2:
1227 attrMask |= ATTR_XD;
1228 break;
1229 }
1230
1231 if (lFromXOP3of3(insn->vectorExtensionPrefix[2]))
1232 attrMask |= ATTR_VEXL;
1233 } else {
1234 return -1;
1235 }
1236 } else if (!insn->mandatoryPrefix) {
1237 // If we don't have mandatory prefix we should use legacy prefixes here
1238 if (insn->hasOpSize && (insn->mode != MODE_16BIT))
1239 attrMask |= ATTR_OPSIZE;
1240 if (insn->hasAdSize)
1241 attrMask |= ATTR_ADSIZE;
1242 if (insn->opcodeType == ONEBYTE) {
1243 if (insn->repeatPrefix == 0xf3 && (insn->opcode == 0x90))
1244 // Special support for PAUSE
1245 attrMask |= ATTR_XS;
1246 } else {
1247 if (insn->repeatPrefix == 0xf2)
1248 attrMask |= ATTR_XD;
1249 else if (insn->repeatPrefix == 0xf3)
1250 attrMask |= ATTR_XS;
1251 }
1252 } else {
1253 switch (insn->mandatoryPrefix) {
1254 case 0xf2:
1255 attrMask |= ATTR_XD;
1256 break;
1257 case 0xf3:
1258 attrMask |= ATTR_XS;
1259 break;
1260 case 0x66:
1261 if (insn->mode != MODE_16BIT)
1262 attrMask |= ATTR_OPSIZE;
1263 if (insn->hasAdSize)
1264 attrMask |= ATTR_ADSIZE;
1265 break;
1266 case 0x67:
1267 attrMask |= ATTR_ADSIZE;
1268 break;
1269 }
1270 }
1271
1272 if (insn->rexPrefix & 0x08) {
1273 attrMask |= ATTR_REXW;
1274 attrMask &= ~ATTR_ADSIZE;
1275 }
1276
1277 // Absolute jump and pushp/popp need special handling
1278 if (insn->rex2ExtensionPrefix[0] == 0xd5 && insn->opcodeType == ONEBYTE &&
1279 (insn->opcode == 0xA1 || (insn->opcode & 0xf0) == 0x50))
1280 attrMask |= ATTR_REX2;
1281
1282 if (insn->mode == MODE_16BIT) {
1283 // JCXZ/JECXZ need special handling for 16-bit mode because the meaning
1284 // of the AdSize prefix is inverted w.r.t. 32-bit mode.
1285 if (insn->opcodeType == ONEBYTE && insn->opcode == 0xE3)
1286 attrMask ^= ATTR_ADSIZE;
1287 // If we're in 16-bit mode and this is one of the relative jumps and opsize
1288 // prefix isn't present, we need to force the opsize attribute since the
1289 // prefix is inverted relative to 32-bit mode.
1290 if (!insn->hasOpSize && insn->opcodeType == ONEBYTE &&
1291 (insn->opcode == 0xE8 || insn->opcode == 0xE9))
1292 attrMask |= ATTR_OPSIZE;
1293
1294 if (!insn->hasOpSize && insn->opcodeType == TWOBYTE &&
1295 insn->opcode >= 0x80 && insn->opcode <= 0x8F)
1296 attrMask |= ATTR_OPSIZE;
1297 }
1298
1299
1300 if (getInstructionIDWithAttrMask(&instructionID, insn, attrMask))
1301 return -1;
1302
1303 // The following clauses compensate for limitations of the tables.
1304
1305 if (insn->mode != MODE_64BIT &&
1307 // The tables can't distinquish between cases where the W-bit is used to
1308 // select register size and cases where its a required part of the opcode.
1309 if ((insn->vectorExtensionType == TYPE_EVEX &&
1311 (insn->vectorExtensionType == TYPE_VEX_3B &&
1313 (insn->vectorExtensionType == TYPE_XOP &&
1315
1316 uint16_t instructionIDWithREXW;
1317 if (getInstructionIDWithAttrMask(&instructionIDWithREXW, insn,
1318 attrMask | ATTR_REXW)) {
1319 insn->instructionID = instructionID;
1320 insn->spec = &INSTRUCTIONS_SYM[instructionID];
1321 return 0;
1322 }
1323
1324 auto SpecName = mii->getName(instructionIDWithREXW);
1325 // If not a 64-bit instruction. Switch the opcode.
1326 if (!is64Bit(SpecName.data())) {
1327 insn->instructionID = instructionIDWithREXW;
1328 insn->spec = &INSTRUCTIONS_SYM[instructionIDWithREXW];
1329 return 0;
1330 }
1331 }
1332 }
1333
1334 // Absolute moves, umonitor, and movdir64b need special handling.
1335 // -For 16-bit mode because the meaning of the AdSize and OpSize prefixes are
1336 // inverted w.r.t.
1337 // -For 32-bit mode we need to ensure the ADSIZE prefix is observed in
1338 // any position.
1339 if ((insn->opcodeType == ONEBYTE && ((insn->opcode & 0xFC) == 0xA0)) ||
1340 (insn->opcodeType == TWOBYTE && (insn->opcode == 0xAE)) ||
1341 (insn->opcodeType == THREEBYTE_38 && insn->opcode == 0xF8) ||
1342 (insn->opcodeType == MAP4 && insn->opcode == 0xF8)) {
1343 // Make sure we observed the prefixes in any position.
1344 if (insn->hasAdSize)
1345 attrMask |= ATTR_ADSIZE;
1346 if (insn->hasOpSize)
1347 attrMask |= ATTR_OPSIZE;
1348
1349 // In 16-bit, invert the attributes.
1350 if (insn->mode == MODE_16BIT) {
1351 attrMask ^= ATTR_ADSIZE;
1352
1353 // The OpSize attribute is only valid with the absolute moves.
1354 if (insn->opcodeType == ONEBYTE && ((insn->opcode & 0xFC) == 0xA0))
1355 attrMask ^= ATTR_OPSIZE;
1356 }
1357
1358 if (getInstructionIDWithAttrMask(&instructionID, insn, attrMask))
1359 return -1;
1360
1361 insn->instructionID = instructionID;
1362 insn->spec = &INSTRUCTIONS_SYM[instructionID];
1363 return 0;
1364 }
1365
1366 if ((insn->mode == MODE_16BIT || insn->hasOpSize) &&
1367 !(attrMask & ATTR_OPSIZE)) {
1368 // The instruction tables make no distinction between instructions that
1369 // allow OpSize anywhere (i.e., 16-bit operations) and that need it in a
1370 // particular spot (i.e., many MMX operations). In general we're
1371 // conservative, but in the specific case where OpSize is present but not in
1372 // the right place we check if there's a 16-bit operation.
1373 const struct InstructionSpecifier *spec;
1374 uint16_t instructionIDWithOpsize;
1375 llvm::StringRef specName, specWithOpSizeName;
1376
1377 spec = &INSTRUCTIONS_SYM[instructionID];
1378
1379 if (getInstructionIDWithAttrMask(&instructionIDWithOpsize, insn,
1380 attrMask | ATTR_OPSIZE)) {
1381 // ModRM required with OpSize but not present. Give up and return the
1382 // version without OpSize set.
1383 insn->instructionID = instructionID;
1384 insn->spec = spec;
1385 return 0;
1386 }
1387
1388 specName = mii->getName(instructionID);
1389 specWithOpSizeName = mii->getName(instructionIDWithOpsize);
1390
1391 if (is16BitEquivalent(specName.data(), specWithOpSizeName.data()) &&
1392 (insn->mode == MODE_16BIT) ^ insn->hasOpSize) {
1393 insn->instructionID = instructionIDWithOpsize;
1394 insn->spec = &INSTRUCTIONS_SYM[instructionIDWithOpsize];
1395 } else {
1396 insn->instructionID = instructionID;
1397 insn->spec = spec;
1398 }
1399 return 0;
1400 }
1401
1402 if (insn->opcodeType == ONEBYTE && insn->opcode == 0x90 &&
1403 insn->rexPrefix & 0x01) {
1404 // NOOP shouldn't decode as NOOP if REX.b is set. Instead it should decode
1405 // as XCHG %r8, %eax.
1406 const struct InstructionSpecifier *spec;
1407 uint16_t instructionIDWithNewOpcode;
1408 const struct InstructionSpecifier *specWithNewOpcode;
1409
1410 spec = &INSTRUCTIONS_SYM[instructionID];
1411
1412 // Borrow opcode from one of the other XCHGar opcodes
1413 insn->opcode = 0x91;
1414
1415 if (getInstructionIDWithAttrMask(&instructionIDWithNewOpcode, insn,
1416 attrMask)) {
1417 insn->opcode = 0x90;
1418
1419 insn->instructionID = instructionID;
1420 insn->spec = spec;
1421 return 0;
1422 }
1423
1424 specWithNewOpcode = &INSTRUCTIONS_SYM[instructionIDWithNewOpcode];
1425
1426 // Change back
1427 insn->opcode = 0x90;
1428
1429 insn->instructionID = instructionIDWithNewOpcode;
1430 insn->spec = specWithNewOpcode;
1431
1432 return 0;
1433 }
1434
1435 insn->instructionID = instructionID;
1436 insn->spec = &INSTRUCTIONS_SYM[insn->instructionID];
1437
1438 return 0;
1439}
1440
1441// Read an operand from the opcode field of an instruction and interprets it
1442// appropriately given the operand width. Handles AddRegFrm instructions.
1443//
1444// @param insn - the instruction whose opcode field is to be read.
1445// @param size - The width (in bytes) of the register being specified.
1446// 1 means AL and friends, 2 means AX, 4 means EAX, and 8 means
1447// RAX.
1448// @return - 0 on success; nonzero otherwise.
1450 LLVM_DEBUG(dbgs() << "readOpcodeRegister()");
1451
1452 if (size == 0)
1453 size = insn->registerSize;
1454
1455 auto setOpcodeRegister = [&](unsigned base) {
1456 insn->opcodeRegister =
1457 (Reg)(base + ((bFromREX(insn->rexPrefix) << 3) |
1458 (b2FromREX2(insn->rex2ExtensionPrefix[1]) << 4) |
1459 (insn->opcode & 7)));
1460 };
1461
1462 switch (size) {
1463 case 1:
1464 setOpcodeRegister(MODRM_REG_AL);
1465 if (insn->rexPrefix && insn->opcodeRegister >= MODRM_REG_AL + 0x4 &&
1466 insn->opcodeRegister < MODRM_REG_AL + 0x8) {
1467 insn->opcodeRegister =
1468 (Reg)(MODRM_REG_SPL + (insn->opcodeRegister - MODRM_REG_AL - 4));
1469 }
1470
1471 break;
1472 case 2:
1473 setOpcodeRegister(MODRM_REG_AX);
1474 break;
1475 case 4:
1476 setOpcodeRegister(MODRM_REG_EAX);
1477 break;
1478 case 8:
1479 setOpcodeRegister(MODRM_REG_RAX);
1480 break;
1481 }
1482
1483 return 0;
1484}
1485
1486// Consume an immediate operand from an instruction, given the desired operand
1487// size.
1488//
1489// @param insn - The instruction whose operand is to be read.
1490// @param size - The width (in bytes) of the operand.
1491// @return - 0 if the immediate was successfully consumed; nonzero
1492// otherwise.
1494 uint8_t imm8;
1495 uint16_t imm16;
1496 uint32_t imm32;
1497 uint64_t imm64;
1498
1499 LLVM_DEBUG(dbgs() << "readImmediate()");
1500
1501 assert(insn->numImmediatesConsumed < 2 && "Already consumed two immediates");
1502
1503 insn->immediateSize = size;
1504 insn->immediateOffset = insn->readerCursor - insn->startLocation;
1505
1506 switch (size) {
1507 case 1:
1508 if (consume(insn, imm8))
1509 return -1;
1510 insn->immediates[insn->numImmediatesConsumed] = imm8;
1511 break;
1512 case 2:
1513 if (consume(insn, imm16))
1514 return -1;
1515 insn->immediates[insn->numImmediatesConsumed] = imm16;
1516 break;
1517 case 4:
1518 if (consume(insn, imm32))
1519 return -1;
1520 insn->immediates[insn->numImmediatesConsumed] = imm32;
1521 break;
1522 case 8:
1523 if (consume(insn, imm64))
1524 return -1;
1525 insn->immediates[insn->numImmediatesConsumed] = imm64;
1526 break;
1527 default:
1528 llvm_unreachable("invalid size");
1529 }
1530
1531 insn->numImmediatesConsumed++;
1532
1533 return 0;
1534}
1535
1536// Consume vvvv from an instruction if it has a VEX prefix.
1537static int readVVVV(struct InternalInstruction *insn) {
1538 LLVM_DEBUG(dbgs() << "readVVVV()");
1539
1540 int vvvv;
1541 if (insn->vectorExtensionType == TYPE_EVEX)
1542 vvvv = (v2FromEVEX4of4(insn->vectorExtensionPrefix[3]) << 4 |
1544 else if (insn->vectorExtensionType == TYPE_VEX_3B)
1545 vvvv = vvvvFromVEX3of3(insn->vectorExtensionPrefix[2]);
1546 else if (insn->vectorExtensionType == TYPE_VEX_2B)
1547 vvvv = vvvvFromVEX2of2(insn->vectorExtensionPrefix[1]);
1548 else if (insn->vectorExtensionType == TYPE_XOP)
1549 vvvv = vvvvFromXOP3of3(insn->vectorExtensionPrefix[2]);
1550 else
1551 return -1;
1552
1553 if (insn->mode != MODE_64BIT)
1554 vvvv &= 0xf; // Can only clear bit 4. Bit 3 must be cleared later.
1555
1556 insn->vvvv = static_cast<Reg>(vvvv);
1557 return 0;
1558}
1559
1560// Read an mask register from the opcode field of an instruction.
1561//
1562// @param insn - The instruction whose opcode field is to be read.
1563// @return - 0 on success; nonzero otherwise.
1564static int readMaskRegister(struct InternalInstruction *insn) {
1565 LLVM_DEBUG(dbgs() << "readMaskRegister()");
1566
1567 if (insn->vectorExtensionType != TYPE_EVEX)
1568 return -1;
1569
1570 insn->writemask =
1571 static_cast<Reg>(aaaFromEVEX4of4(insn->vectorExtensionPrefix[3]));
1572 return 0;
1573}
1574
1575// Consults the specifier for an instruction and consumes all
1576// operands for that instruction, interpreting them as it goes.
1577static int readOperands(struct InternalInstruction *insn) {
1578 int hasVVVV, needVVVV;
1579 int sawRegImm = 0;
1580
1581 LLVM_DEBUG(dbgs() << "readOperands()");
1582
1583 // If non-zero vvvv specified, make sure one of the operands uses it.
1584 hasVVVV = !readVVVV(insn);
1585 needVVVV = hasVVVV && (insn->vvvv != 0);
1586
1587 for (const auto &Op : x86OperandSets[insn->spec->operands]) {
1588 switch (Op.encoding) {
1589 case ENCODING_NONE:
1590 case ENCODING_SI:
1591 case ENCODING_DI:
1592 break;
1594 // VSIB can use the V2 bit so check only the other bits.
1595 if (needVVVV)
1596 needVVVV = hasVVVV & ((insn->vvvv & 0xf) != 0);
1597 if (readModRM(insn))
1598 return -1;
1599
1600 // Reject if SIB wasn't used.
1601 if (insn->eaBase != EA_BASE_sib && insn->eaBase != EA_BASE_sib64)
1602 return -1;
1603
1604 // If sibIndex was set to SIB_INDEX_NONE, index offset is 4.
1605 if (insn->sibIndex == SIB_INDEX_NONE)
1606 insn->sibIndex = (SIBIndex)(insn->sibIndexBase + 4);
1607
1608 // If EVEX.v2 is set this is one of the 16-31 registers.
1609 if (insn->vectorExtensionType == TYPE_EVEX && insn->mode == MODE_64BIT &&
1611 insn->sibIndex = (SIBIndex)(insn->sibIndex + 16);
1612
1613 // Adjust the index register to the correct size.
1614 switch ((OperandType)Op.type) {
1615 default:
1616 debug("Unhandled VSIB index type");
1617 return -1;
1618 case TYPE_MVSIBX:
1619 insn->sibIndex =
1620 (SIBIndex)(SIB_INDEX_XMM0 + (insn->sibIndex - insn->sibIndexBase));
1621 break;
1622 case TYPE_MVSIBY:
1623 insn->sibIndex =
1624 (SIBIndex)(SIB_INDEX_YMM0 + (insn->sibIndex - insn->sibIndexBase));
1625 break;
1626 case TYPE_MVSIBZ:
1627 insn->sibIndex =
1628 (SIBIndex)(SIB_INDEX_ZMM0 + (insn->sibIndex - insn->sibIndexBase));
1629 break;
1630 }
1631
1632 // Apply the AVX512 compressed displacement scaling factor.
1633 if (Op.encoding != ENCODING_REG && insn->eaDisplacement == EA_DISP_8)
1634 insn->displacement *= 1 << (Op.encoding - ENCODING_VSIB);
1635 break;
1636 case ENCODING_SIB:
1637 // Reject if SIB wasn't used.
1638 if (insn->eaBase != EA_BASE_sib && insn->eaBase != EA_BASE_sib64)
1639 return -1;
1640 if (readModRM(insn))
1641 return -1;
1642 if (fixupReg(insn, &Op))
1643 return -1;
1644 break;
1645 case ENCODING_REG:
1647 if (readModRM(insn))
1648 return -1;
1649 if (fixupReg(insn, &Op))
1650 return -1;
1651 // Apply the AVX512 compressed displacement scaling factor.
1652 if (Op.encoding != ENCODING_REG && insn->eaDisplacement == EA_DISP_8)
1653 insn->displacement *= 1 << (Op.encoding - ENCODING_RM);
1654 break;
1655 case ENCODING_IB:
1656 if (sawRegImm) {
1657 // Saw a register immediate so don't read again and instead split the
1658 // previous immediate. FIXME: This is a hack.
1659 insn->immediates[insn->numImmediatesConsumed] =
1660 insn->immediates[insn->numImmediatesConsumed - 1] & 0xf;
1661 ++insn->numImmediatesConsumed;
1662 break;
1663 }
1664 if (readImmediate(insn, 1))
1665 return -1;
1666 if (Op.type == TYPE_XMM || Op.type == TYPE_YMM)
1667 sawRegImm = 1;
1668 break;
1669 case ENCODING_IW:
1670 if (readImmediate(insn, 2))
1671 return -1;
1672 break;
1673 case ENCODING_ID:
1674 if (readImmediate(insn, 4))
1675 return -1;
1676 break;
1677 case ENCODING_IO:
1678 if (readImmediate(insn, 8))
1679 return -1;
1680 break;
1681 case ENCODING_Iv:
1682 if (readImmediate(insn, insn->immediateSize))
1683 return -1;
1684 break;
1685 case ENCODING_Ia:
1686 if (readImmediate(insn, insn->addressSize))
1687 return -1;
1688 break;
1689 case ENCODING_IRC:
1690 insn->RC = (l2FromEVEX4of4(insn->vectorExtensionPrefix[3]) << 1) |
1692 break;
1693 case ENCODING_RB:
1694 if (readOpcodeRegister(insn, 1))
1695 return -1;
1696 break;
1697 case ENCODING_RW:
1698 if (readOpcodeRegister(insn, 2))
1699 return -1;
1700 break;
1701 case ENCODING_RD:
1702 if (readOpcodeRegister(insn, 4))
1703 return -1;
1704 break;
1705 case ENCODING_RO:
1706 if (readOpcodeRegister(insn, 8))
1707 return -1;
1708 break;
1709 case ENCODING_Rv:
1710 if (readOpcodeRegister(insn, 0))
1711 return -1;
1712 break;
1713 case ENCODING_CF:
1715 needVVVV = false; // oszc shares the same bits with VVVV
1716 break;
1717 case ENCODING_CC:
1718 if (isCCMPOrCTEST(insn))
1719 insn->immediates[2] = scFromEVEX4of4(insn->vectorExtensionPrefix[3]);
1720 else
1721 insn->immediates[1] = insn->opcode & 0xf;
1722 break;
1723 case ENCODING_FP:
1724 break;
1725 case ENCODING_VVVV:
1726 needVVVV = 0; // Mark that we have found a VVVV operand.
1727 if (!hasVVVV)
1728 return -1;
1729 if (insn->mode != MODE_64BIT)
1730 insn->vvvv = static_cast<Reg>(insn->vvvv & 0x7);
1731 if (fixupReg(insn, &Op))
1732 return -1;
1733 break;
1734 case ENCODING_WRITEMASK:
1735 if (readMaskRegister(insn))
1736 return -1;
1737 break;
1738 case ENCODING_DUP:
1739 break;
1740 default:
1741 LLVM_DEBUG(dbgs() << "Encountered an operand with an unknown encoding.");
1742 return -1;
1743 }
1744 }
1745
1746 // If we didn't find ENCODING_VVVV operand, but non-zero vvvv present, fail
1747 if (needVVVV)
1748 return -1;
1749
1750 return 0;
1751}
1752
1753namespace llvm {
1754
1755// Fill-ins to make the compiler happy. These constants are never actually
1756// assigned; they are just filler to make an automatically-generated switch
1757// statement work.
1758namespace X86 {
1759 enum {
1760 BX_SI = 500,
1761 BX_DI = 501,
1762 BP_SI = 502,
1763 BP_DI = 503,
1764 sib = 504,
1765 sib64 = 505
1766 };
1767} // namespace X86
1768
1769} // namespace llvm
1770
1771static bool translateInstruction(MCInst &target,
1772 InternalInstruction &source,
1773 const MCDisassembler *Dis);
1774
1775namespace {
1776
1777/// Generic disassembler for all X86 platforms. All each platform class should
1778/// have to do is subclass the constructor, and provide a different
1779/// disassemblerMode value.
1780class X86GenericDisassembler : public MCDisassembler {
1781 std::unique_ptr<const MCInstrInfo> MII;
1782public:
1783 X86GenericDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx,
1784 std::unique_ptr<const MCInstrInfo> MII);
1785public:
1786 DecodeStatus getInstruction(MCInst &instr, uint64_t &size,
1787 ArrayRef<uint8_t> Bytes, uint64_t Address,
1788 raw_ostream &cStream) const override;
1789
1790private:
1791 DisassemblerMode fMode;
1792};
1793
1794} // namespace
1795
1796X86GenericDisassembler::X86GenericDisassembler(
1797 const MCSubtargetInfo &STI,
1798 MCContext &Ctx,
1799 std::unique_ptr<const MCInstrInfo> MII)
1800 : MCDisassembler(STI, Ctx), MII(std::move(MII)) {
1801 const FeatureBitset &FB = STI.getFeatureBits();
1802 if (FB[X86::Is16Bit]) {
1803 fMode = MODE_16BIT;
1804 return;
1805 } else if (FB[X86::Is32Bit]) {
1806 fMode = MODE_32BIT;
1807 return;
1808 } else if (FB[X86::Is64Bit]) {
1809 fMode = MODE_64BIT;
1810 return;
1811 }
1812
1813 llvm_unreachable("Invalid CPU mode");
1814}
1815
1816MCDisassembler::DecodeStatus X86GenericDisassembler::getInstruction(
1817 MCInst &Instr, uint64_t &Size, ArrayRef<uint8_t> Bytes, uint64_t Address,
1818 raw_ostream &CStream) const {
1819 CommentStream = &CStream;
1820
1821 InternalInstruction Insn;
1822 memset(&Insn, 0, sizeof(InternalInstruction));
1823 Insn.bytes = Bytes;
1824 Insn.startLocation = Address;
1825 Insn.readerCursor = Address;
1826 Insn.mode = fMode;
1827
1828 if (Bytes.empty() || readPrefixes(&Insn) || readOpcode(&Insn) ||
1829 getInstructionID(&Insn, MII.get()) || Insn.instructionID == 0 ||
1830 readOperands(&Insn)) {
1831 Size = Insn.readerCursor - Address;
1832 return Fail;
1833 }
1834
1835 Insn.operands = x86OperandSets[Insn.spec->operands];
1836 Insn.length = Insn.readerCursor - Insn.startLocation;
1837 Size = Insn.length;
1838 if (Size > 15)
1839 LLVM_DEBUG(dbgs() << "Instruction exceeds 15-byte limit");
1840
1841 bool Ret = translateInstruction(Instr, Insn, this);
1842 if (!Ret) {
1843 unsigned Flags = X86::IP_NO_PREFIX;
1844 if (Insn.hasAdSize)
1846 if (!Insn.mandatoryPrefix) {
1847 if (Insn.hasOpSize)
1849 if (Insn.repeatPrefix == 0xf2)
1851 else if (Insn.repeatPrefix == 0xf3 &&
1852 // It should not be 'pause' f3 90
1853 Insn.opcode != 0x90)
1855 if (Insn.hasLockPrefix)
1857 }
1858 Instr.setFlags(Flags);
1859 }
1860 return (!Ret) ? Success : Fail;
1861}
1862
1863//
1864// Private code that translates from struct InternalInstructions to MCInsts.
1865//
1866
1867/// translateRegister - Translates an internal register to the appropriate LLVM
1868/// register, and appends it as an operand to an MCInst.
1869///
1870/// @param mcInst - The MCInst to append to.
1871/// @param reg - The Reg to append.
1872static void translateRegister(MCInst &mcInst, Reg reg) {
1873#define ENTRY(x) X86::x,
1874 static constexpr MCPhysReg llvmRegnums[] = {ALL_REGS};
1875#undef ENTRY
1876
1877 MCPhysReg llvmRegnum = llvmRegnums[reg];
1878 mcInst.addOperand(MCOperand::createReg(llvmRegnum));
1879}
1880
1882 0, // SEG_OVERRIDE_NONE
1883 X86::CS,
1884 X86::SS,
1885 X86::DS,
1886 X86::ES,
1887 X86::FS,
1888 X86::GS
1889};
1890
1891/// translateSrcIndex - Appends a source index operand to an MCInst.
1892///
1893/// @param mcInst - The MCInst to append to.
1894/// @param insn - The internal instruction.
1895static bool translateSrcIndex(MCInst &mcInst, InternalInstruction &insn) {
1896 unsigned baseRegNo;
1897
1898 if (insn.mode == MODE_64BIT)
1899 baseRegNo = insn.hasAdSize ? X86::ESI : X86::RSI;
1900 else if (insn.mode == MODE_32BIT)
1901 baseRegNo = insn.hasAdSize ? X86::SI : X86::ESI;
1902 else {
1903 assert(insn.mode == MODE_16BIT);
1904 baseRegNo = insn.hasAdSize ? X86::ESI : X86::SI;
1905 }
1906 MCOperand baseReg = MCOperand::createReg(baseRegNo);
1907 mcInst.addOperand(baseReg);
1908
1909 MCOperand segmentReg;
1911 mcInst.addOperand(segmentReg);
1912 return false;
1913}
1914
1915/// translateDstIndex - Appends a destination index operand to an MCInst.
1916///
1917/// @param mcInst - The MCInst to append to.
1918/// @param insn - The internal instruction.
1919
1920static bool translateDstIndex(MCInst &mcInst, InternalInstruction &insn) {
1921 unsigned baseRegNo;
1922
1923 if (insn.mode == MODE_64BIT)
1924 baseRegNo = insn.hasAdSize ? X86::EDI : X86::RDI;
1925 else if (insn.mode == MODE_32BIT)
1926 baseRegNo = insn.hasAdSize ? X86::DI : X86::EDI;
1927 else {
1928 assert(insn.mode == MODE_16BIT);
1929 baseRegNo = insn.hasAdSize ? X86::EDI : X86::DI;
1930 }
1931 MCOperand baseReg = MCOperand::createReg(baseRegNo);
1932 mcInst.addOperand(baseReg);
1933 return false;
1934}
1935
1936/// translateImmediate - Appends an immediate operand to an MCInst.
1937///
1938/// @param mcInst - The MCInst to append to.
1939/// @param immediate - The immediate value to append.
1940/// @param operand - The operand, as stored in the descriptor table.
1941/// @param insn - The internal instruction.
1942static void translateImmediate(MCInst &mcInst, uint64_t immediate,
1943 const OperandSpecifier &operand,
1944 InternalInstruction &insn,
1945 const MCDisassembler *Dis) {
1946 // Sign-extend the immediate if necessary.
1947
1948 OperandType type = (OperandType)operand.type;
1949
1950 bool isBranch = false;
1951 uint64_t pcrel = 0;
1952 if (type == TYPE_REL) {
1953 isBranch = true;
1954 pcrel = insn.startLocation + insn.length;
1955 switch (operand.encoding) {
1956 default:
1957 break;
1958 case ENCODING_Iv:
1959 switch (insn.displacementSize) {
1960 default:
1961 break;
1962 case 1:
1963 if(immediate & 0x80)
1964 immediate |= ~(0xffull);
1965 break;
1966 case 2:
1967 if(immediate & 0x8000)
1968 immediate |= ~(0xffffull);
1969 break;
1970 case 4:
1971 if(immediate & 0x80000000)
1972 immediate |= ~(0xffffffffull);
1973 break;
1974 case 8:
1975 break;
1976 }
1977 break;
1978 case ENCODING_IB:
1979 if(immediate & 0x80)
1980 immediate |= ~(0xffull);
1981 break;
1982 case ENCODING_IW:
1983 if(immediate & 0x8000)
1984 immediate |= ~(0xffffull);
1985 break;
1986 case ENCODING_ID:
1987 if(immediate & 0x80000000)
1988 immediate |= ~(0xffffffffull);
1989 break;
1990 }
1991 }
1992 // By default sign-extend all X86 immediates based on their encoding.
1993 else if (type == TYPE_IMM) {
1994 switch (operand.encoding) {
1995 default:
1996 break;
1997 case ENCODING_IB:
1998 if(immediate & 0x80)
1999 immediate |= ~(0xffull);
2000 break;
2001 case ENCODING_IW:
2002 if(immediate & 0x8000)
2003 immediate |= ~(0xffffull);
2004 break;
2005 case ENCODING_ID:
2006 if(immediate & 0x80000000)
2007 immediate |= ~(0xffffffffull);
2008 break;
2009 case ENCODING_IO:
2010 break;
2011 }
2012 }
2013
2014 switch (type) {
2015 case TYPE_XMM:
2016 mcInst.addOperand(MCOperand::createReg(X86::XMM0 + (immediate >> 4)));
2017 return;
2018 case TYPE_YMM:
2019 mcInst.addOperand(MCOperand::createReg(X86::YMM0 + (immediate >> 4)));
2020 return;
2021 case TYPE_ZMM:
2022 mcInst.addOperand(MCOperand::createReg(X86::ZMM0 + (immediate >> 4)));
2023 return;
2024 default:
2025 // operand is 64 bits wide. Do nothing.
2026 break;
2027 }
2028
2029 if (!Dis->tryAddingSymbolicOperand(
2030 mcInst, immediate + pcrel, insn.startLocation, isBranch,
2031 insn.immediateOffset, insn.immediateSize, insn.length))
2032 mcInst.addOperand(MCOperand::createImm(immediate));
2033
2034 if (type == TYPE_MOFFS) {
2035 MCOperand segmentReg;
2037 mcInst.addOperand(segmentReg);
2038 }
2039}
2040
2041/// translateRMRegister - Translates a register stored in the R/M field of the
2042/// ModR/M byte to its LLVM equivalent and appends it to an MCInst.
2043/// @param mcInst - The MCInst to append to.
2044/// @param insn - The internal instruction to extract the R/M field
2045/// from.
2046/// @return - 0 on success; -1 otherwise
2047static bool translateRMRegister(MCInst &mcInst,
2048 InternalInstruction &insn) {
2049 if (insn.eaBase == EA_BASE_sib || insn.eaBase == EA_BASE_sib64) {
2050 debug("A R/M register operand may not have a SIB byte");
2051 return true;
2052 }
2053
2054 switch (insn.eaBase) {
2055 default:
2056 debug("Unexpected EA base register");
2057 return true;
2058 case EA_BASE_NONE:
2059 debug("EA_BASE_NONE for ModR/M base");
2060 return true;
2061#define ENTRY(x) case EA_BASE_##x:
2063#undef ENTRY
2064 debug("A R/M register operand may not have a base; "
2065 "the operand must be a register.");
2066 return true;
2067#define ENTRY(x) \
2068 case EA_REG_##x: \
2069 mcInst.addOperand(MCOperand::createReg(X86::x)); break;
2070 ALL_REGS
2071#undef ENTRY
2072 }
2073
2074 return false;
2075}
2076
2077/// translateRMMemory - Translates a memory operand stored in the Mod and R/M
2078/// fields of an internal instruction (and possibly its SIB byte) to a memory
2079/// operand in LLVM's format, and appends it to an MCInst.
2080///
2081/// @param mcInst - The MCInst to append to.
2082/// @param insn - The instruction to extract Mod, R/M, and SIB fields
2083/// from.
2084/// @param ForceSIB - The instruction must use SIB.
2085/// @return - 0 on success; nonzero otherwise
2087 const MCDisassembler *Dis,
2088 bool ForceSIB = false) {
2089 // Addresses in an MCInst are represented as five operands:
2090 // 1. basereg (register) The R/M base, or (if there is a SIB) the
2091 // SIB base
2092 // 2. scaleamount (immediate) 1, or (if there is a SIB) the specified
2093 // scale amount
2094 // 3. indexreg (register) x86_registerNONE, or (if there is a SIB)
2095 // the index (which is multiplied by the
2096 // scale amount)
2097 // 4. displacement (immediate) 0, or the displacement if there is one
2098 // 5. segmentreg (register) x86_registerNONE for now, but could be set
2099 // if we have segment overrides
2100
2101 MCOperand baseReg;
2102 MCOperand scaleAmount;
2103 MCOperand indexReg;
2104 MCOperand displacement;
2105 MCOperand segmentReg;
2106 uint64_t pcrel = 0;
2107
2108 if (insn.eaBase == EA_BASE_sib || insn.eaBase == EA_BASE_sib64) {
2109 if (insn.sibBase != SIB_BASE_NONE) {
2110 switch (insn.sibBase) {
2111 default:
2112 debug("Unexpected sibBase");
2113 return true;
2114#define ENTRY(x) \
2115 case SIB_BASE_##x: \
2116 baseReg = MCOperand::createReg(X86::x); break;
2118#undef ENTRY
2119 }
2120 } else {
2121 baseReg = MCOperand::createReg(X86::NoRegister);
2122 }
2123
2124 if (insn.sibIndex != SIB_INDEX_NONE) {
2125 switch (insn.sibIndex) {
2126 default:
2127 debug("Unexpected sibIndex");
2128 return true;
2129#define ENTRY(x) \
2130 case SIB_INDEX_##x: \
2131 indexReg = MCOperand::createReg(X86::x); break;
2134 REGS_XMM
2135 REGS_YMM
2136 REGS_ZMM
2137#undef ENTRY
2138 }
2139 } else {
2140 // Use EIZ/RIZ for a few ambiguous cases where the SIB byte is present,
2141 // but no index is used and modrm alone should have been enough.
2142 // -No base register in 32-bit mode. In 64-bit mode this is used to
2143 // avoid rip-relative addressing.
2144 // -Any base register used other than ESP/RSP/R12D/R12. Using these as a
2145 // base always requires a SIB byte.
2146 // -A scale other than 1 is used.
2147 if (!ForceSIB &&
2148 (insn.sibScale != 1 ||
2149 (insn.sibBase == SIB_BASE_NONE && insn.mode != MODE_64BIT) ||
2150 (insn.sibBase != SIB_BASE_NONE &&
2151 insn.sibBase != SIB_BASE_ESP && insn.sibBase != SIB_BASE_RSP &&
2152 insn.sibBase != SIB_BASE_R12D && insn.sibBase != SIB_BASE_R12))) {
2153 indexReg = MCOperand::createReg(insn.addressSize == 4 ? X86::EIZ :
2154 X86::RIZ);
2155 } else
2156 indexReg = MCOperand::createReg(X86::NoRegister);
2157 }
2158
2159 scaleAmount = MCOperand::createImm(insn.sibScale);
2160 } else {
2161 switch (insn.eaBase) {
2162 case EA_BASE_NONE:
2163 if (insn.eaDisplacement == EA_DISP_NONE) {
2164 debug("EA_BASE_NONE and EA_DISP_NONE for ModR/M base");
2165 return true;
2166 }
2167 if (insn.mode == MODE_64BIT){
2168 pcrel = insn.startLocation + insn.length;
2170 insn.startLocation +
2171 insn.displacementOffset);
2172 // Section 2.2.1.6
2173 baseReg = MCOperand::createReg(insn.addressSize == 4 ? X86::EIP :
2174 X86::RIP);
2175 }
2176 else
2177 baseReg = MCOperand::createReg(X86::NoRegister);
2178
2179 indexReg = MCOperand::createReg(X86::NoRegister);
2180 break;
2181 case EA_BASE_BX_SI:
2182 baseReg = MCOperand::createReg(X86::BX);
2183 indexReg = MCOperand::createReg(X86::SI);
2184 break;
2185 case EA_BASE_BX_DI:
2186 baseReg = MCOperand::createReg(X86::BX);
2187 indexReg = MCOperand::createReg(X86::DI);
2188 break;
2189 case EA_BASE_BP_SI:
2190 baseReg = MCOperand::createReg(X86::BP);
2191 indexReg = MCOperand::createReg(X86::SI);
2192 break;
2193 case EA_BASE_BP_DI:
2194 baseReg = MCOperand::createReg(X86::BP);
2195 indexReg = MCOperand::createReg(X86::DI);
2196 break;
2197 default:
2198 indexReg = MCOperand::createReg(X86::NoRegister);
2199 switch (insn.eaBase) {
2200 default:
2201 debug("Unexpected eaBase");
2202 return true;
2203 // Here, we will use the fill-ins defined above. However,
2204 // BX_SI, BX_DI, BP_SI, and BP_DI are all handled above and
2205 // sib and sib64 were handled in the top-level if, so they're only
2206 // placeholders to keep the compiler happy.
2207#define ENTRY(x) \
2208 case EA_BASE_##x: \
2209 baseReg = MCOperand::createReg(X86::x); break;
2211#undef ENTRY
2212#define ENTRY(x) case EA_REG_##x:
2213 ALL_REGS
2214#undef ENTRY
2215 debug("A R/M memory operand may not be a register; "
2216 "the base field must be a base.");
2217 return true;
2218 }
2219 }
2220
2221 scaleAmount = MCOperand::createImm(1);
2222 }
2223
2224 displacement = MCOperand::createImm(insn.displacement);
2225
2227
2228 mcInst.addOperand(baseReg);
2229 mcInst.addOperand(scaleAmount);
2230 mcInst.addOperand(indexReg);
2231
2232 const uint8_t dispSize =
2233 (insn.eaDisplacement == EA_DISP_NONE) ? 0 : insn.displacementSize;
2234
2235 if (!Dis->tryAddingSymbolicOperand(
2236 mcInst, insn.displacement + pcrel, insn.startLocation, false,
2237 insn.displacementOffset, dispSize, insn.length))
2238 mcInst.addOperand(displacement);
2239 mcInst.addOperand(segmentReg);
2240 return false;
2241}
2242
2243/// translateRM - Translates an operand stored in the R/M (and possibly SIB)
2244/// byte of an instruction to LLVM form, and appends it to an MCInst.
2245///
2246/// @param mcInst - The MCInst to append to.
2247/// @param operand - The operand, as stored in the descriptor table.
2248/// @param insn - The instruction to extract Mod, R/M, and SIB fields
2249/// from.
2250/// @return - 0 on success; nonzero otherwise
2251static bool translateRM(MCInst &mcInst, const OperandSpecifier &operand,
2252 InternalInstruction &insn, const MCDisassembler *Dis) {
2253 switch (operand.type) {
2254 default:
2255 debug("Unexpected type for a R/M operand");
2256 return true;
2257 case TYPE_R8:
2258 case TYPE_R16:
2259 case TYPE_R32:
2260 case TYPE_R64:
2261 case TYPE_Rv:
2262 case TYPE_MM64:
2263 case TYPE_XMM:
2264 case TYPE_YMM:
2265 case TYPE_ZMM:
2266 case TYPE_TMM:
2267 case TYPE_VK_PAIR:
2268 case TYPE_VK:
2269 case TYPE_DEBUGREG:
2270 case TYPE_CONTROLREG:
2271 case TYPE_BNDR:
2272 return translateRMRegister(mcInst, insn);
2273 case TYPE_M:
2274 case TYPE_MVSIBX:
2275 case TYPE_MVSIBY:
2276 case TYPE_MVSIBZ:
2277 return translateRMMemory(mcInst, insn, Dis);
2278 case TYPE_MSIB:
2279 return translateRMMemory(mcInst, insn, Dis, true);
2280 }
2281}
2282
2283/// translateFPRegister - Translates a stack position on the FPU stack to its
2284/// LLVM form, and appends it to an MCInst.
2285///
2286/// @param mcInst - The MCInst to append to.
2287/// @param stackPos - The stack position to translate.
2288static void translateFPRegister(MCInst &mcInst,
2289 uint8_t stackPos) {
2290 mcInst.addOperand(MCOperand::createReg(X86::ST0 + stackPos));
2291}
2292
2293/// translateMaskRegister - Translates a 3-bit mask register number to
2294/// LLVM form, and appends it to an MCInst.
2295///
2296/// @param mcInst - The MCInst to append to.
2297/// @param maskRegNum - Number of mask register from 0 to 7.
2298/// @return - false on success; true otherwise.
2299static bool translateMaskRegister(MCInst &mcInst,
2300 uint8_t maskRegNum) {
2301 if (maskRegNum >= 8) {
2302 debug("Invalid mask register number");
2303 return true;
2304 }
2305
2306 mcInst.addOperand(MCOperand::createReg(X86::K0 + maskRegNum));
2307 return false;
2308}
2309
2310/// translateOperand - Translates an operand stored in an internal instruction
2311/// to LLVM's format and appends it to an MCInst.
2312///
2313/// @param mcInst - The MCInst to append to.
2314/// @param operand - The operand, as stored in the descriptor table.
2315/// @param insn - The internal instruction.
2316/// @return - false on success; true otherwise.
2317static bool translateOperand(MCInst &mcInst, const OperandSpecifier &operand,
2318 InternalInstruction &insn,
2319 const MCDisassembler *Dis) {
2320 switch (operand.encoding) {
2321 default:
2322 debug("Unhandled operand encoding during translation");
2323 return true;
2324 case ENCODING_REG:
2325 translateRegister(mcInst, insn.reg);
2326 return false;
2327 case ENCODING_WRITEMASK:
2328 return translateMaskRegister(mcInst, insn.writemask);
2329 case ENCODING_SIB:
2332 return translateRM(mcInst, operand, insn, Dis);
2333 case ENCODING_IB:
2334 case ENCODING_IW:
2335 case ENCODING_ID:
2336 case ENCODING_IO:
2337 case ENCODING_Iv:
2338 case ENCODING_Ia:
2339 translateImmediate(mcInst,
2341 operand,
2342 insn,
2343 Dis);
2344 return false;
2345 case ENCODING_IRC:
2346 mcInst.addOperand(MCOperand::createImm(insn.RC));
2347 return false;
2348 case ENCODING_SI:
2349 return translateSrcIndex(mcInst, insn);
2350 case ENCODING_DI:
2351 return translateDstIndex(mcInst, insn);
2352 case ENCODING_RB:
2353 case ENCODING_RW:
2354 case ENCODING_RD:
2355 case ENCODING_RO:
2356 case ENCODING_Rv:
2357 translateRegister(mcInst, insn.opcodeRegister);
2358 return false;
2359 case ENCODING_CF:
2361 return false;
2362 case ENCODING_CC:
2363 if (isCCMPOrCTEST(&insn))
2365 else
2367 return false;
2368 case ENCODING_FP:
2369 translateFPRegister(mcInst, insn.modRM & 7);
2370 return false;
2371 case ENCODING_VVVV:
2372 translateRegister(mcInst, insn.vvvv);
2373 return false;
2374 case ENCODING_DUP:
2375 return translateOperand(mcInst, insn.operands[operand.type - TYPE_DUP0],
2376 insn, Dis);
2377 }
2378}
2379
2380/// translateInstruction - Translates an internal instruction and all its
2381/// operands to an MCInst.
2382///
2383/// @param mcInst - The MCInst to populate with the instruction's data.
2384/// @param insn - The internal instruction.
2385/// @return - false on success; true otherwise.
2386static bool translateInstruction(MCInst &mcInst,
2387 InternalInstruction &insn,
2388 const MCDisassembler *Dis) {
2389 if (!insn.spec) {
2390 debug("Instruction has no specification");
2391 return true;
2392 }
2393
2394 mcInst.clear();
2395 mcInst.setOpcode(insn.instructionID);
2396 // If when reading the prefix bytes we determined the overlapping 0xf2 or 0xf3
2397 // prefix bytes should be disassembled as xrelease and xacquire then set the
2398 // opcode to those instead of the rep and repne opcodes.
2399 if (insn.xAcquireRelease) {
2400 if(mcInst.getOpcode() == X86::REP_PREFIX)
2401 mcInst.setOpcode(X86::XRELEASE_PREFIX);
2402 else if(mcInst.getOpcode() == X86::REPNE_PREFIX)
2403 mcInst.setOpcode(X86::XACQUIRE_PREFIX);
2404 }
2405
2406 insn.numImmediatesTranslated = 0;
2407
2408 for (const auto &Op : insn.operands) {
2409 if (Op.encoding != ENCODING_NONE) {
2410 if (translateOperand(mcInst, Op, insn, Dis)) {
2411 return true;
2412 }
2413 }
2414 }
2415
2416 return false;
2417}
2418
2420 const MCSubtargetInfo &STI,
2421 MCContext &Ctx) {
2422 std::unique_ptr<const MCInstrInfo> MII(T.createMCInstrInfo());
2423 return new X86GenericDisassembler(STI, Ctx, std::move(MII));
2424}
2425
#define Fail
MCDisassembler::DecodeStatus DecodeStatus
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
#define LLVM_ATTRIBUTE_NOINLINE
LLVM_ATTRIBUTE_NOINLINE - On compilers where we have a directive to do so, mark a method "not for inl...
Definition Compiler.h:348
#define op(i)
#define T
static bool isBranch(unsigned Opcode)
static const char * name
#define LLVM_DEBUG(...)
Definition Debug.h:119
#define LLVM_C_ABI
LLVM_C_ABI is the export/visibility macro used to mark symbols declared in llvm-c as exported when bu...
Definition Visibility.h:40
static uint8_t readOpcode(WasmObjectFile::ReadContext &Ctx)
static int nextByte(ArrayRef< uint8_t > Bytes, uint64_t &Size)
static bool isPrefix(unsigned Opcode, const MCInstrInfo &MCII)
Check if the instruction is a prefix.
#define TWOBYTE_SYM
#define CASE_ENCODING_VSIB
#define CASE_ENCODING_RM
#define INSTRUCTIONS_SYM
#define THREEBYTE3A_SYM
#define SPARSE_OPCODE_DECISION_INDICES_SYM
#define THREEBYTE38_SYM
#define SPARSE_OPCODE_DECISIONS_SYM
#define ONEBYTE_SYM
#define rFromEVEX2of4(evex)
#define lFromEVEX4of4(evex)
#define l2FromEVEX4of4(evex)
#define rFromVEX2of3(vex)
#define zFromEVEX4of4(evex)
#define wFromREX2(rex2)
#define rFromREX(rex)
#define bFromXOP2of3(xop)
#define xFromVEX2of3(vex)
#define mmmmmFromVEX2of3(vex)
#define rmFromModRM(modRM)
#define bFromREX2(rex2)
#define baseFromSIB(sib)
#define bFromEVEX4of4(evex)
#define rFromVEX2of2(vex)
#define ppFromEVEX3of4(evex)
#define v2FromEVEX4of4(evex)
#define modFromModRM(modRM)
#define rFromXOP2of3(xop)
#define wFromREX(rex)
#define lFromXOP3of3(xop)
#define EA_BASES_64BIT
#define lFromVEX2of2(vex)
#define REGS_YMM
#define x2FromREX2(rex2)
#define scFromEVEX4of4(evex)
#define scaleFromSIB(sib)
#define REGS_XMM
#define rFromREX2(rex2)
#define regFromModRM(modRM)
#define b2FromEVEX2of4(evex)
#define b2FromREX2(rex2)
#define vvvvFromVEX2of2(vex)
#define nfFromEVEX4of4(evex)
#define ALL_REGS
#define ppFromXOP3of3(xop)
#define ALL_SIB_BASES
#define vvvvFromVEX3of3(vex)
#define r2FromEVEX2of4(evex)
#define uFromEVEX3of4(evex)
#define xFromREX2(rex2)
#define EA_BASES_32BIT
#define xFromXOP2of3(xop)
#define wFromEVEX3of4(evex)
#define bFromVEX2of3(vex)
#define wFromVEX3of3(vex)
#define mmmmmFromXOP2of3(xop)
#define aaaFromEVEX4of4(evex)
#define lFromVEX3of3(vex)
#define mmmFromEVEX2of4(evex)
#define ppFromVEX3of3(vex)
#define bFromEVEX2of4(evex)
#define xFromEVEX2of4(evex)
#define REGS_ZMM
#define ppFromVEX2of2(vex)
#define indexFromSIB(sib)
#define ALL_EA_BASES
#define mFromREX2(rex2)
#define vvvvFromXOP3of3(xop)
#define wFromXOP3of3(xop)
#define r2FromREX2(rex2)
#define oszcFromEVEX3of4(evex)
#define vvvvFromEVEX3of4(evex)
#define xFromREX(rex)
#define bFromREX(rex)
static void translateRegister(MCInst &mcInst, Reg reg)
translateRegister - Translates an internal register to the appropriate LLVM register,...
static bool isREX2(struct InternalInstruction *insn, uint8_t prefix)
static int getInstructionID(struct InternalInstruction *insn, const MCInstrInfo *mii)
static bool readOpcode(struct InternalInstruction *insn)
static MCDisassembler * createX86Disassembler(const Target &T, const MCSubtargetInfo &STI, MCContext &Ctx)
static bool translateMaskRegister(MCInst &mcInst, uint8_t maskRegNum)
translateMaskRegister - Translates a 3-bit mask register number to LLVM form, and appends it to an MC...
static bool translateDstIndex(MCInst &mcInst, InternalInstruction &insn)
translateDstIndex - Appends a destination index operand to an MCInst.
static void translateImmediate(MCInst &mcInst, uint64_t immediate, const OperandSpecifier &operand, InternalInstruction &insn, const MCDisassembler *Dis)
translateImmediate - Appends an immediate operand to an MCInst.
static int readOperands(struct InternalInstruction *insn)
static void translateFPRegister(MCInst &mcInst, uint8_t stackPos)
translateFPRegister - Translates a stack position on the FPU stack to its LLVM form,...
static bool is64Bit(const char *name)
static const uint8_t segmentRegnums[SEG_OVERRIDE_max]
static int readImmediate(struct InternalInstruction *insn, uint8_t size)
static int getInstructionIDWithAttrMask(uint16_t *instructionID, struct InternalInstruction *insn, uint16_t attrMask)
static int readSIB(struct InternalInstruction *insn)
static bool isREX(struct InternalInstruction *insn, uint8_t prefix)
static int readVVVV(struct InternalInstruction *insn)
static bool isNF(InternalInstruction *insn)
static bool translateSrcIndex(MCInst &mcInst, InternalInstruction &insn)
translateSrcIndex - Appends a source index operand to an MCInst.
static const ModRMDecision & getDecision(OpcodeType Type, InstructionContext Context, uint8_t Opcode)
#define GENERIC_FIXUP_FUNC(name, base, prefix)
static int readMaskRegister(struct InternalInstruction *insn)
static bool translateRM(MCInst &mcInst, const OperandSpecifier &operand, InternalInstruction &insn, const MCDisassembler *Dis)
translateRM - Translates an operand stored in the R/M (and possibly SIB) byte of an instruction to LL...
static int readOpcodeRegister(struct InternalInstruction *insn, uint8_t size)
static int readDisplacement(struct InternalInstruction *insn)
static bool isCCMPOrCTEST(InternalInstruction *insn)
static LLVM_ATTRIBUTE_NOINLINE InstrUID decodeModRM(const ModRMDecision &Decision, uint8_t ModRM)
LLVM_C_ABI void LLVMInitializeX86Disassembler()
static int fixupReg(struct InternalInstruction *insn, const struct OperandSpecifier *op)
#define debug(s)
static int readModRM(struct InternalInstruction *insn)
static bool is16BitEquivalent(const char *orig, const char *equiv)
static bool translateRMMemory(MCInst &mcInst, InternalInstruction &insn, const MCDisassembler *Dis, bool ForceSIB=false)
translateRMMemory - Translates a memory operand stored in the Mod and R/M fields of an internal instr...
static bool translateInstruction(MCInst &target, InternalInstruction &source, const MCDisassembler *Dis)
translateInstruction - Translates an internal instruction and all its operands to an MCInst.
static bool translateRMRegister(MCInst &mcInst, InternalInstruction &insn)
translateRMRegister - Translates a register stored in the R/M field of the ModR/M byte to its LLVM eq...
static bool translateOperand(MCInst &mcInst, const OperandSpecifier &operand, InternalInstruction &insn, const MCDisassembler *Dis)
translateOperand - Translates an operand stored in an internal instruction to LLVM's format and appen...
static int readPrefixes(struct InternalInstruction *insn)
static bool peek(struct InternalInstruction *insn, uint8_t &byte)
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
Get the array size.
Definition ArrayRef.h:141
bool empty() const
Check if the array is empty.
Definition ArrayRef.h:136
Context object for machine code objects.
Definition MCContext.h:83
Superclass for all disassemblers.
bool tryAddingSymbolicOperand(MCInst &Inst, int64_t Value, uint64_t Address, bool IsBranch, uint64_t Offset, uint64_t OpSize, uint64_t InstSize) const
void tryAddingPcLoadReferenceComment(int64_t Value, uint64_t Address) const
DecodeStatus
Ternary decode status.
Instances of this class represent a single low-level machine instruction.
Definition MCInst.h:188
unsigned getOpcode() const
Definition MCInst.h:202
void addOperand(const MCOperand Op)
Definition MCInst.h:215
void setOpcode(unsigned Op)
Definition MCInst.h:201
void clear()
Definition MCInst.h:223
Interface to description of machine instruction set.
Definition MCInstrInfo.h:27
StringRef getName(unsigned Opcode) const
Returns the name for the instructions with the given opcode.
Definition MCInstrInfo.h:96
Instances of this class represent operands of the MCInst class.
Definition MCInst.h:40
static MCOperand createReg(MCRegister Reg)
Definition MCInst.h:138
static MCOperand createImm(int64_t Val)
Definition MCInst.h:145
Generic base class for all target subtargets.
const FeatureBitset & getFeatureBits() const
Represent a constant reference to a string, i.e.
Definition StringRef.h:56
constexpr const char * data() const
Get a pointer to the start of the string (which may not be null terminated).
Definition StringRef.h:138
Target - Wrapper for Target specific information.
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:46
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
EABase
All possible values of the base field for effective-address computations, a.k.a.
Reg
All possible values of the reg field in the ModR/M byte.
DisassemblerMode
Decoding mode for the Intel disassembler.
SIBBase
All possible values of the SIB base field.
SIBIndex
All possible values of the SIB index field.
Define some predicates that are used for node matching.
@ IP_HAS_REPEAT_NE
Definition X86BaseInfo.h:55
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
value_type read(const void *memory, endianness endian)
Read a value of a particular endianness from memory.
Definition Endian.h:60
This is an optimization pass for GlobalISel generic memory operations.
LLVM_ATTRIBUTE_ALWAYS_INLINE DynamicAPInt mod(const DynamicAPInt &LHS, const DynamicAPInt &RHS)
is always non-negative.
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1669
Target & getTheX86_32Target()
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition Format.h:94
@ Success
The lock was released successfully.
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
DWARFExpression::Operation Op
OutputIt move(R &&Range, OutputIt Out)
Provide wrappers to std::move which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1917
Target & getTheX86_64Target()
Implement std::hash so that hash_code can be used in STL containers.
Definition BitVector.h:860
OpcodeDecision opcodeDecisions[IC_max]
ModRMDecision modRMDecisions[256]
static void RegisterMCDisassembler(Target &T, Target::MCDisassemblerCtorTy Fn)
RegisterMCDisassembler - Register a MCDisassembler implementation for the given target.
The specification for how to extract and interpret a full instruction and its operands.
The x86 internal instruction, which is produced by the decoder.
The specification for how to extract and interpret one operand.