LLVM 23.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-switch-precision",
304 cl::desc("Controls the number of cases considered by MSan for LLVM switch "
305 "instructions. 0 means no UUMs detected. Higher values lead to "
306 "fewer false negatives but may impact compiler and/or "
307 "application performance. N.B. LLVM switch instructions do not "
308 "correspond exactly to C++ switch statements."),
309 cl::Hidden, cl::init(99));
310
312 "msan-handle-lifetime-intrinsics",
313 cl::desc(
314 "when possible, poison scoped variables at the beginning of the scope "
315 "(slower, but more precise)"),
316 cl::Hidden, cl::init(true));
317
318// When compiling the Linux kernel, we sometimes see false positives related to
319// MSan being unable to understand that inline assembly calls may initialize
320// local variables.
321// This flag makes the compiler conservatively unpoison every memory location
322// passed into an assembly call. Note that this may cause false positives.
323// Because it's impossible to figure out the array sizes, we can only unpoison
324// the first sizeof(type) bytes for each type* pointer.
326 "msan-handle-asm-conservative",
327 cl::desc("conservative handling of inline assembly"), cl::Hidden,
328 cl::init(true));
329
330// This flag controls whether we check the shadow of the address
331// operand of load or store. Such bugs are very rare, since load from
332// a garbage address typically results in SEGV, but still happen
333// (e.g. only lower bits of address are garbage, or the access happens
334// early at program startup where malloc-ed memory is more likely to
335// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
337 "msan-check-access-address",
338 cl::desc("report accesses through a pointer which has poisoned shadow"),
339 cl::Hidden, cl::init(true));
340
342 "msan-eager-checks",
343 cl::desc("check arguments and return values at function call boundaries"),
344 cl::Hidden, cl::init(false));
345
347 "msan-dump-strict-instructions",
348 cl::desc("print out instructions with default strict semantics i.e.,"
349 "check that all the inputs are fully initialized, and mark "
350 "the output as fully initialized. These semantics are applied "
351 "to instructions that could not be handled explicitly nor "
352 "heuristically."),
353 cl::Hidden, cl::init(false));
354
355// Currently, all the heuristically handled instructions are specifically
356// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
357// to parallel 'msan-dump-strict-instructions', and to keep the door open to
358// handling non-intrinsic instructions heuristically.
360 "msan-dump-heuristic-instructions",
361 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
362 "Use -msan-dump-strict-instructions to print instructions that "
363 "could not be handled explicitly nor heuristically."),
364 cl::Hidden, cl::init(false));
365
367 "msan-instrumentation-with-call-threshold",
368 cl::desc(
369 "If the function being instrumented requires more than "
370 "this number of checks and origin stores, use callbacks instead of "
371 "inline checks (-1 means never use callbacks)."),
372 cl::Hidden, cl::init(3500));
373
374static cl::opt<bool>
375 ClEnableKmsan("msan-kernel",
376 cl::desc("Enable KernelMemorySanitizer instrumentation"),
377 cl::Hidden, cl::init(false));
378
379static cl::opt<bool>
380 ClDisableChecks("msan-disable-checks",
381 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
382 cl::init(false));
383
384static cl::opt<bool>
385 ClCheckConstantShadow("msan-check-constant-shadow",
386 cl::desc("Insert checks for constant shadow values"),
387 cl::Hidden, cl::init(true));
388
389// This is off by default because of a bug in gold:
390// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
391static cl::opt<bool>
392 ClWithComdat("msan-with-comdat",
393 cl::desc("Place MSan constructors in comdat sections"),
394 cl::Hidden, cl::init(false));
395
396// These options allow to specify custom memory map parameters
397// See MemoryMapParams for details.
398static cl::opt<uint64_t> ClAndMask("msan-and-mask",
399 cl::desc("Define custom MSan AndMask"),
400 cl::Hidden, cl::init(0));
401
402static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
403 cl::desc("Define custom MSan XorMask"),
404 cl::Hidden, cl::init(0));
405
406static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
407 cl::desc("Define custom MSan ShadowBase"),
408 cl::Hidden, cl::init(0));
409
410static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
411 cl::desc("Define custom MSan OriginBase"),
412 cl::Hidden, cl::init(0));
413
414static cl::opt<int>
415 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
416 cl::desc("Define threshold for number of checks per "
417 "debug location to force origin update."),
418 cl::Hidden, cl::init(3));
419
420const char kMsanModuleCtorName[] = "msan.module_ctor";
421const char kMsanInitName[] = "__msan_init";
422
423namespace {
424
425// Memory map parameters used in application-to-shadow address calculation.
426// Offset = (Addr & ~AndMask) ^ XorMask
427// Shadow = ShadowBase + Offset
428// Origin = OriginBase + Offset
429struct MemoryMapParams {
430 uint64_t AndMask;
431 uint64_t XorMask;
432 uint64_t ShadowBase;
433 uint64_t OriginBase;
434};
435
436struct PlatformMemoryMapParams {
437 const MemoryMapParams *bits32;
438 const MemoryMapParams *bits64;
439};
440
441} // end anonymous namespace
442
443// i386 Linux
444static const MemoryMapParams Linux_I386_MemoryMapParams = {
445 0x000080000000, // AndMask
446 0, // XorMask (not used)
447 0, // ShadowBase (not used)
448 0x000040000000, // OriginBase
449};
450
451// x86_64 Linux
452static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
453 0, // AndMask (not used)
454 0x500000000000, // XorMask
455 0, // ShadowBase (not used)
456 0x100000000000, // OriginBase
457};
458
459// mips32 Linux
460// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
461// after picking good constants
462
463// mips64 Linux
464static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
465 0, // AndMask (not used)
466 0x008000000000, // XorMask
467 0, // ShadowBase (not used)
468 0x002000000000, // OriginBase
469};
470
471// ppc32 Linux
472// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
473// after picking good constants
474
475// ppc64 Linux
476static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
477 0xE00000000000, // AndMask
478 0x100000000000, // XorMask
479 0x080000000000, // ShadowBase
480 0x1C0000000000, // OriginBase
481};
482
483// s390x Linux
484static const MemoryMapParams Linux_S390X_MemoryMapParams = {
485 0xC00000000000, // AndMask
486 0, // XorMask (not used)
487 0x080000000000, // ShadowBase
488 0x1C0000000000, // OriginBase
489};
490
491// arm32 Linux
492// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
493// after picking good constants
494
495// aarch64 Linux
496static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
497 0, // AndMask (not used)
498 0x0B00000000000, // XorMask
499 0, // ShadowBase (not used)
500 0x0200000000000, // OriginBase
501};
502
503// loongarch64 Linux
504static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
505 0, // AndMask (not used)
506 0x500000000000, // XorMask
507 0, // ShadowBase (not used)
508 0x100000000000, // OriginBase
509};
510
511// riscv32 Linux
512// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
513// after picking good constants
514
515// aarch64 FreeBSD
516static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
517 0x1800000000000, // AndMask
518 0x0400000000000, // XorMask
519 0x0200000000000, // ShadowBase
520 0x0700000000000, // OriginBase
521};
522
523// i386 FreeBSD
524static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
525 0x000180000000, // AndMask
526 0x000040000000, // XorMask
527 0x000020000000, // ShadowBase
528 0x000700000000, // OriginBase
529};
530
531// x86_64 FreeBSD
532static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
533 0xc00000000000, // AndMask
534 0x200000000000, // XorMask
535 0x100000000000, // ShadowBase
536 0x380000000000, // OriginBase
537};
538
539// x86_64 NetBSD
540static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
541 0, // AndMask
542 0x500000000000, // XorMask
543 0, // ShadowBase
544 0x100000000000, // OriginBase
545};
546
547static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
550};
551
552static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
553 nullptr,
555};
556
557static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
558 nullptr,
560};
561
562static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
563 nullptr,
565};
566
567static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
568 nullptr,
570};
571
572static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
573 nullptr,
575};
576
577static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
578 nullptr,
580};
581
582static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
585};
586
587static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
588 nullptr,
590};
591
593
594namespace {
595
596/// Instrument functions of a module to detect uninitialized reads.
597///
598/// Instantiating MemorySanitizer inserts the msan runtime library API function
599/// declarations into the module if they don't exist already. Instantiating
600/// ensures the __msan_init function is in the list of global constructors for
601/// the module.
602class MemorySanitizer {
603public:
604 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
605 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
606 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
607 initializeModule(M);
608 }
609
610 // MSan cannot be moved or copied because of MapParams.
611 MemorySanitizer(MemorySanitizer &&) = delete;
612 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
613 MemorySanitizer(const MemorySanitizer &) = delete;
614 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
615
616 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
617
618private:
619 friend struct MemorySanitizerVisitor;
620 friend struct VarArgHelperBase;
621 friend struct VarArgAMD64Helper;
622 friend struct VarArgAArch64Helper;
623 friend struct VarArgPowerPC64Helper;
624 friend struct VarArgPowerPC32Helper;
625 friend struct VarArgSystemZHelper;
626 friend struct VarArgI386Helper;
627 friend struct VarArgGenericHelper;
628
629 void initializeModule(Module &M);
630 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
631 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
632 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
633
634 template <typename... ArgsTy>
635 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
636 ArgsTy... Args);
637
638 /// True if we're compiling the Linux kernel.
639 bool CompileKernel;
640 /// Track origins (allocation points) of uninitialized values.
641 int TrackOrigins;
642 bool Recover;
643 bool EagerChecks;
644
645 Triple TargetTriple;
646 LLVMContext *C;
647 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
648 Type *OriginTy;
649 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
650
651 // XxxTLS variables represent the per-thread state in MSan and per-task state
652 // in KMSAN.
653 // For the userspace these point to thread-local globals. In the kernel land
654 // they point to the members of a per-task struct obtained via a call to
655 // __msan_get_context_state().
656
657 /// Thread-local shadow storage for function parameters.
658 Value *ParamTLS;
659
660 /// Thread-local origin storage for function parameters.
661 Value *ParamOriginTLS;
662
663 /// Thread-local shadow storage for function return value.
664 Value *RetvalTLS;
665
666 /// Thread-local origin storage for function return value.
667 Value *RetvalOriginTLS;
668
669 /// Thread-local shadow storage for in-register va_arg function.
670 Value *VAArgTLS;
671
672 /// Thread-local shadow storage for in-register va_arg function.
673 Value *VAArgOriginTLS;
674
675 /// Thread-local shadow storage for va_arg overflow area.
676 Value *VAArgOverflowSizeTLS;
677
678 /// Are the instrumentation callbacks set up?
679 bool CallbacksInitialized = false;
680
681 /// The run-time callback to print a warning.
682 FunctionCallee WarningFn;
683
684 // These arrays are indexed by log2(AccessSize).
685 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
686 FunctionCallee MaybeWarningVarSizeFn;
687 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
688
689 /// Run-time helper that generates a new origin value for a stack
690 /// allocation.
691 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
692 // No description version
693 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
694
695 /// Run-time helper that poisons stack on function entry.
696 FunctionCallee MsanPoisonStackFn;
697
698 /// Run-time helper that records a store (or any event) of an
699 /// uninitialized value and returns an updated origin id encoding this info.
700 FunctionCallee MsanChainOriginFn;
701
702 /// Run-time helper that paints an origin over a region.
703 FunctionCallee MsanSetOriginFn;
704
705 /// MSan runtime replacements for memmove, memcpy and memset.
706 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
707
708 /// KMSAN callback for task-local function argument shadow.
709 StructType *MsanContextStateTy;
710 FunctionCallee MsanGetContextStateFn;
711
712 /// Functions for poisoning/unpoisoning local variables
713 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
714
715 /// Pair of shadow/origin pointers.
716 Type *MsanMetadata;
717
718 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
719 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
720 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
721 FunctionCallee MsanMetadataPtrForStore_1_8[4];
722 FunctionCallee MsanInstrumentAsmStoreFn;
723
724 /// Storage for return values of the MsanMetadataPtrXxx functions.
725 Value *MsanMetadataAlloca;
726
727 /// Helper to choose between different MsanMetadataPtrXxx().
728 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
729
730 /// Memory map parameters used in application-to-shadow calculation.
731 const MemoryMapParams *MapParams;
732
733 /// Custom memory map parameters used when -msan-shadow-base or
734 // -msan-origin-base is provided.
735 MemoryMapParams CustomMapParams;
736
737 MDNode *ColdCallWeights;
738
739 /// Branch weights for origin store.
740 MDNode *OriginStoreWeights;
741};
742
743void insertModuleCtor(Module &M) {
746 /*InitArgTypes=*/{},
747 /*InitArgs=*/{},
748 // This callback is invoked when the functions are created the first
749 // time. Hook them into the global ctors list in that case:
750 [&](Function *Ctor, FunctionCallee) {
751 if (!ClWithComdat) {
752 appendToGlobalCtors(M, Ctor, 0);
753 return;
754 }
755 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
756 Ctor->setComdat(MsanCtorComdat);
757 appendToGlobalCtors(M, Ctor, 0, Ctor);
758 });
759}
760
761template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
762 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
763}
764
765} // end anonymous namespace
766
768 bool EagerChecks)
769 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
770 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
771 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
772 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
773
776 // Return early if nosanitize_memory module flag is present for the module.
777 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
778 return PreservedAnalyses::all();
779 bool Modified = false;
780 if (!Options.Kernel) {
781 insertModuleCtor(M);
782 Modified = true;
783 }
784
785 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
786 for (Function &F : M) {
787 if (F.empty())
788 continue;
789 MemorySanitizer Msan(*F.getParent(), Options);
790 Modified |=
791 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
792 }
793
794 if (!Modified)
795 return PreservedAnalyses::all();
796
798 // GlobalsAA is considered stateless and does not get invalidated unless
799 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
800 // make changes that require GlobalsAA to be invalidated.
801 PA.abandon<GlobalsAA>();
802 return PA;
803}
804
806 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
808 OS, MapClassName2PassName);
809 OS << '<';
810 if (Options.Recover)
811 OS << "recover;";
812 if (Options.Kernel)
813 OS << "kernel;";
814 if (Options.EagerChecks)
815 OS << "eager-checks;";
816 OS << "track-origins=" << Options.TrackOrigins;
817 OS << '>';
818}
819
820/// Create a non-const global initialized with the given string.
821///
822/// Creates a writable global for Str so that we can pass it to the
823/// run-time lib. Runtime uses first 4 bytes of the string to store the
824/// frame ID, so the string needs to be mutable.
826 StringRef Str) {
827 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
828 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
829 GlobalValue::PrivateLinkage, StrConst, "");
830}
831
832template <typename... ArgsTy>
834MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
835 ArgsTy... Args) {
836 if (TargetTriple.getArch() == Triple::systemz) {
837 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
838 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
839 std::forward<ArgsTy>(Args)...);
840 }
841
842 return M.getOrInsertFunction(Name, MsanMetadata,
843 std::forward<ArgsTy>(Args)...);
844}
845
846/// Create KMSAN API callbacks.
847void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
848 IRBuilder<> IRB(*C);
849
850 // These will be initialized in insertKmsanPrologue().
851 RetvalTLS = nullptr;
852 RetvalOriginTLS = nullptr;
853 ParamTLS = nullptr;
854 ParamOriginTLS = nullptr;
855 VAArgTLS = nullptr;
856 VAArgOriginTLS = nullptr;
857 VAArgOverflowSizeTLS = nullptr;
858
859 WarningFn = M.getOrInsertFunction("__msan_warning",
860 TLI.getAttrList(C, {0}, /*Signed=*/false),
861 IRB.getVoidTy(), IRB.getInt32Ty());
862
863 // Requests the per-task context state (kmsan_context_state*) from the
864 // runtime library.
865 MsanContextStateTy = StructType::get(
866 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
867 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
868 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
869 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
870 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
871 OriginTy);
872 MsanGetContextStateFn =
873 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
874
875 MsanMetadata = StructType::get(PtrTy, PtrTy);
876
877 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
878 std::string name_load =
879 "__msan_metadata_ptr_for_load_" + std::to_string(size);
880 std::string name_store =
881 "__msan_metadata_ptr_for_store_" + std::to_string(size);
882 MsanMetadataPtrForLoad_1_8[ind] =
883 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
884 MsanMetadataPtrForStore_1_8[ind] =
885 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
886 }
887
888 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
889 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
890 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
891 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
892
893 // Functions for poisoning and unpoisoning memory.
894 MsanPoisonAllocaFn = M.getOrInsertFunction(
895 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
896 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
897 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
898}
899
901 return M.getOrInsertGlobal(Name, Ty, [&] {
902 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
903 nullptr, Name, nullptr,
905 });
906}
907
908/// Insert declarations for userspace-specific functions and globals.
909void MemorySanitizer::createUserspaceApi(Module &M,
910 const TargetLibraryInfo &TLI) {
911 IRBuilder<> IRB(*C);
912
913 // Create the callback.
914 // FIXME: this function should have "Cold" calling conv,
915 // which is not yet implemented.
916 if (TrackOrigins) {
917 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
918 : "__msan_warning_with_origin_noreturn";
919 WarningFn = M.getOrInsertFunction(WarningFnName,
920 TLI.getAttrList(C, {0}, /*Signed=*/false),
921 IRB.getVoidTy(), IRB.getInt32Ty());
922 } else {
923 StringRef WarningFnName =
924 Recover ? "__msan_warning" : "__msan_warning_noreturn";
925 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
926 }
927
928 // Create the global TLS variables.
929 RetvalTLS =
930 getOrInsertGlobal(M, "__msan_retval_tls",
931 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
932
933 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
934
935 ParamTLS =
936 getOrInsertGlobal(M, "__msan_param_tls",
937 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
938
939 ParamOriginTLS =
940 getOrInsertGlobal(M, "__msan_param_origin_tls",
941 ArrayType::get(OriginTy, kParamTLSSize / 4));
942
943 VAArgTLS =
944 getOrInsertGlobal(M, "__msan_va_arg_tls",
945 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
946
947 VAArgOriginTLS =
948 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
949 ArrayType::get(OriginTy, kParamTLSSize / 4));
950
951 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
952 IRB.getIntPtrTy(M.getDataLayout()));
953
954 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
955 AccessSizeIndex++) {
956 unsigned AccessSize = 1 << AccessSizeIndex;
957 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
958 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
959 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
960 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
961 MaybeWarningVarSizeFn = M.getOrInsertFunction(
962 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
963 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
964 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
965 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
966 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
967 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
968 IRB.getInt32Ty());
969 }
970
971 MsanSetAllocaOriginWithDescriptionFn =
972 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
973 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
974 MsanSetAllocaOriginNoDescriptionFn =
975 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
976 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
977 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
978 IRB.getVoidTy(), PtrTy, IntptrTy);
979}
980
981/// Insert extern declaration of runtime-provided functions and globals.
982void MemorySanitizer::initializeCallbacks(Module &M,
983 const TargetLibraryInfo &TLI) {
984 // Only do this once.
985 if (CallbacksInitialized)
986 return;
987
988 IRBuilder<> IRB(*C);
989 // Initialize callbacks that are common for kernel and userspace
990 // instrumentation.
991 MsanChainOriginFn = M.getOrInsertFunction(
992 "__msan_chain_origin",
993 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
994 IRB.getInt32Ty());
995 MsanSetOriginFn = M.getOrInsertFunction(
996 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
997 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
998 MemmoveFn =
999 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
1000 MemcpyFn =
1001 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
1002 MemsetFn = M.getOrInsertFunction("__msan_memset",
1003 TLI.getAttrList(C, {1}, /*Signed=*/true),
1004 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
1005
1006 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
1007 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
1008
1009 if (CompileKernel) {
1010 createKernelApi(M, TLI);
1011 } else {
1012 createUserspaceApi(M, TLI);
1013 }
1014 CallbacksInitialized = true;
1015}
1016
1017FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1018 int size) {
1019 FunctionCallee *Fns =
1020 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1021 switch (size) {
1022 case 1:
1023 return Fns[0];
1024 case 2:
1025 return Fns[1];
1026 case 4:
1027 return Fns[2];
1028 case 8:
1029 return Fns[3];
1030 default:
1031 return nullptr;
1032 }
1033}
1034
1035/// Module-level initialization.
1036///
1037/// inserts a call to __msan_init to the module's constructor list.
1038void MemorySanitizer::initializeModule(Module &M) {
1039 auto &DL = M.getDataLayout();
1040
1041 TargetTriple = M.getTargetTriple();
1042
1043 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1044 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1045 // Check the overrides first
1046 if (ShadowPassed || OriginPassed) {
1047 CustomMapParams.AndMask = ClAndMask;
1048 CustomMapParams.XorMask = ClXorMask;
1049 CustomMapParams.ShadowBase = ClShadowBase;
1050 CustomMapParams.OriginBase = ClOriginBase;
1051 MapParams = &CustomMapParams;
1052 } else {
1053 switch (TargetTriple.getOS()) {
1054 case Triple::FreeBSD:
1055 switch (TargetTriple.getArch()) {
1056 case Triple::aarch64:
1057 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1058 break;
1059 case Triple::x86_64:
1060 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1061 break;
1062 case Triple::x86:
1063 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1064 break;
1065 default:
1066 report_fatal_error("unsupported architecture");
1067 }
1068 break;
1069 case Triple::NetBSD:
1070 switch (TargetTriple.getArch()) {
1071 case Triple::x86_64:
1072 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1073 break;
1074 default:
1075 report_fatal_error("unsupported architecture");
1076 }
1077 break;
1078 case Triple::Linux:
1079 switch (TargetTriple.getArch()) {
1080 case Triple::x86_64:
1081 MapParams = Linux_X86_MemoryMapParams.bits64;
1082 break;
1083 case Triple::x86:
1084 MapParams = Linux_X86_MemoryMapParams.bits32;
1085 break;
1086 case Triple::mips64:
1087 case Triple::mips64el:
1088 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1089 break;
1090 case Triple::ppc64:
1091 case Triple::ppc64le:
1092 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1093 break;
1094 case Triple::systemz:
1095 MapParams = Linux_S390_MemoryMapParams.bits64;
1096 break;
1097 case Triple::aarch64:
1098 case Triple::aarch64_be:
1099 MapParams = Linux_ARM_MemoryMapParams.bits64;
1100 break;
1102 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1103 break;
1104 default:
1105 report_fatal_error("unsupported architecture");
1106 }
1107 break;
1108 default:
1109 report_fatal_error("unsupported operating system");
1110 }
1111 }
1112
1113 C = &(M.getContext());
1114 IRBuilder<> IRB(*C);
1115 IntptrTy = IRB.getIntPtrTy(DL);
1116 OriginTy = IRB.getInt32Ty();
1117 PtrTy = IRB.getPtrTy();
1118
1119 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1120 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1121
1122 if (!CompileKernel) {
1123 if (TrackOrigins)
1124 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1125 return new GlobalVariable(
1126 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1127 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1128 });
1129
1130 if (Recover)
1131 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1132 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1133 GlobalValue::WeakODRLinkage,
1134 IRB.getInt32(Recover), "__msan_keep_going");
1135 });
1136 }
1137}
1138
1139namespace {
1140
1141/// A helper class that handles instrumentation of VarArg
1142/// functions on a particular platform.
1143///
1144/// Implementations are expected to insert the instrumentation
1145/// necessary to propagate argument shadow through VarArg function
1146/// calls. Visit* methods are called during an InstVisitor pass over
1147/// the function, and should avoid creating new basic blocks. A new
1148/// instance of this class is created for each instrumented function.
1149struct VarArgHelper {
1150 virtual ~VarArgHelper() = default;
1151
1152 /// Visit a CallBase.
1153 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1154
1155 /// Visit a va_start call.
1156 virtual void visitVAStartInst(VAStartInst &I) = 0;
1157
1158 /// Visit a va_copy call.
1159 virtual void visitVACopyInst(VACopyInst &I) = 0;
1160
1161 /// Finalize function instrumentation.
1162 ///
1163 /// This method is called after visiting all interesting (see above)
1164 /// instructions in a function.
1165 virtual void finalizeInstrumentation() = 0;
1166};
1167
1168struct MemorySanitizerVisitor;
1169
1170} // end anonymous namespace
1171
1172static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1173 MemorySanitizerVisitor &Visitor);
1174
1175static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1176 if (TS.isScalable())
1177 // Scalable types unconditionally take slowpaths.
1178 return kNumberOfAccessSizes;
1179 unsigned TypeSizeFixed = TS.getFixedValue();
1180 if (TypeSizeFixed <= 8)
1181 return 0;
1182 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1183}
1184
1185namespace {
1186
1187/// Helper class to attach debug information of the given instruction onto new
1188/// instructions inserted after.
1189class NextNodeIRBuilder : public IRBuilder<> {
1190public:
1191 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1192 SetCurrentDebugLocation(IP->getDebugLoc());
1193 }
1194};
1195
1196/// This class does all the work for a given function. Store and Load
1197/// instructions store and load corresponding shadow and origin
1198/// values. Most instructions propagate shadow from arguments to their
1199/// return values. Certain instructions (most importantly, BranchInst)
1200/// test their argument shadow and print reports (with a runtime call) if it's
1201/// non-zero.
1202struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1203 Function &F;
1204 MemorySanitizer &MS;
1205 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1206 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1207 std::unique_ptr<VarArgHelper> VAHelper;
1208 const TargetLibraryInfo *TLI;
1209 Instruction *FnPrologueEnd;
1210 SmallVector<Instruction *, 16> Instructions;
1211
1212 // The following flags disable parts of MSan instrumentation based on
1213 // exclusion list contents and command-line options.
1214 bool InsertChecks;
1215 bool PropagateShadow;
1216 bool PoisonStack;
1217 bool PoisonUndef;
1218 bool PoisonUndefVectors;
1219
1220 struct ShadowOriginAndInsertPoint {
1221 Value *Shadow;
1222 Value *Origin;
1223 Instruction *OrigIns;
1224
1225 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1226 : Shadow(S), Origin(O), OrigIns(I) {}
1227 };
1229 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1230 SmallSetVector<AllocaInst *, 16> AllocaSet;
1233 int64_t SplittableBlocksCount = 0;
1234
1235 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1236 const TargetLibraryInfo &TLI)
1237 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1238 bool SanitizeFunction =
1239 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1240 InsertChecks = SanitizeFunction;
1241 PropagateShadow = SanitizeFunction;
1242 PoisonStack = SanitizeFunction && ClPoisonStack;
1243 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1244 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1245
1246 // In the presence of unreachable blocks, we may see Phi nodes with
1247 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1248 // blocks, such nodes will not have any shadow value associated with them.
1249 // It's easier to remove unreachable blocks than deal with missing shadow.
1251
1252 MS.initializeCallbacks(*F.getParent(), TLI);
1253 FnPrologueEnd =
1254 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1255 .CreateIntrinsic(Intrinsic::donothing, {});
1256
1257 if (MS.CompileKernel) {
1258 IRBuilder<> IRB(FnPrologueEnd);
1259 insertKmsanPrologue(IRB);
1260 }
1261
1262 LLVM_DEBUG(if (!InsertChecks) dbgs()
1263 << "MemorySanitizer is not inserting checks into '"
1264 << F.getName() << "'\n");
1265 }
1266
1267 bool instrumentWithCalls(Value *V) {
1268 // Constants likely will be eliminated by follow-up passes.
1269 if (isa<Constant>(V))
1270 return false;
1271 ++SplittableBlocksCount;
1273 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1274 }
1275
1276 bool isInPrologue(Instruction &I) {
1277 return I.getParent() == FnPrologueEnd->getParent() &&
1278 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1279 }
1280
1281 // Creates a new origin and records the stack trace. In general we can call
1282 // this function for any origin manipulation we like. However it will cost
1283 // runtime resources. So use this wisely only if it can provide additional
1284 // information helpful to a user.
1285 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1286 if (MS.TrackOrigins <= 1)
1287 return V;
1288 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1289 }
1290
1291 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1292 const DataLayout &DL = F.getDataLayout();
1293 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1294 if (IntptrSize == kOriginSize)
1295 return Origin;
1296 assert(IntptrSize == kOriginSize * 2);
1297 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1298 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1299 }
1300
1301 /// Fill memory range with the given origin value.
1302 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1303 TypeSize TS, Align Alignment) {
1304 const DataLayout &DL = F.getDataLayout();
1305 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1306 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1307 assert(IntptrAlignment >= kMinOriginAlignment);
1308 assert(IntptrSize >= kOriginSize);
1309
1310 // Note: The loop based formation works for fixed length vectors too,
1311 // however we prefer to unroll and specialize alignment below.
1312 if (TS.isScalable()) {
1313 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1314 Value *RoundUp =
1315 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1316 Value *End =
1317 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1318 auto [InsertPt, Index] =
1320 IRB.SetInsertPoint(InsertPt);
1321
1322 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1324 return;
1325 }
1326
1327 unsigned Size = TS.getFixedValue();
1328
1329 unsigned Ofs = 0;
1330 Align CurrentAlignment = Alignment;
1331 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1332 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1333 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1334 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1335 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1336 : IntptrOriginPtr;
1337 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1338 Ofs += IntptrSize / kOriginSize;
1339 CurrentAlignment = IntptrAlignment;
1340 }
1341 }
1342
1343 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1344 Value *GEP =
1345 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1346 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1347 CurrentAlignment = kMinOriginAlignment;
1348 }
1349 }
1350
1351 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1352 Value *OriginPtr, Align Alignment) {
1353 const DataLayout &DL = F.getDataLayout();
1354 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1355 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1356 // ZExt cannot convert between vector and scalar
1357 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1358 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1359 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1360 // Origin is not needed: value is initialized or const shadow is
1361 // ignored.
1362 return;
1363 }
1364 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1365 // Copy origin as the value is definitely uninitialized.
1366 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1367 OriginAlignment);
1368 return;
1369 }
1370 // Fallback to runtime check, which still can be optimized out later.
1371 }
1372
1373 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1374 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1375 if (instrumentWithCalls(ConvertedShadow) &&
1376 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1377 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1378 Value *ConvertedShadow2 =
1379 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1380 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1381 CB->addParamAttr(0, Attribute::ZExt);
1382 CB->addParamAttr(2, Attribute::ZExt);
1383 } else {
1384 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1386 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1387 IRBuilder<> IRBNew(CheckTerm);
1388 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1389 OriginAlignment);
1390 }
1391 }
1392
1393 void materializeStores() {
1394 for (StoreInst *SI : StoreList) {
1395 IRBuilder<> IRB(SI);
1396 Value *Val = SI->getValueOperand();
1397 Value *Addr = SI->getPointerOperand();
1398 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1399 Value *ShadowPtr, *OriginPtr;
1400 Type *ShadowTy = Shadow->getType();
1401 const Align Alignment = SI->getAlign();
1402 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1403 std::tie(ShadowPtr, OriginPtr) =
1404 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1405
1406 [[maybe_unused]] StoreInst *NewSI =
1407 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1408 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1409
1410 if (SI->isAtomic())
1411 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1412
1413 if (MS.TrackOrigins && !SI->isAtomic())
1414 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1415 OriginAlignment);
1416 }
1417 }
1418
1419 // Returns true if Debug Location corresponds to multiple warnings.
1420 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1421 if (MS.TrackOrigins < 2)
1422 return false;
1423
1424 if (LazyWarningDebugLocationCount.empty())
1425 for (const auto &I : InstrumentationList)
1426 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1427
1428 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1429 }
1430
1431 /// Helper function to insert a warning at IRB's current insert point.
1432 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1433 if (!Origin)
1434 Origin = (Value *)IRB.getInt32(0);
1435 assert(Origin->getType()->isIntegerTy());
1436
1437 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1438 // Try to create additional origin with debug info of the last origin
1439 // instruction. It may provide additional information to the user.
1440 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1441 assert(MS.TrackOrigins);
1442 auto NewDebugLoc = OI->getDebugLoc();
1443 // Origin update with missing or the same debug location provides no
1444 // additional value.
1445 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1446 // Insert update just before the check, so we call runtime only just
1447 // before the report.
1448 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1449 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1450 Origin = updateOrigin(Origin, IRBOrigin);
1451 }
1452 }
1453 }
1454
1455 if (MS.CompileKernel || MS.TrackOrigins)
1456 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1457 else
1458 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1459 // FIXME: Insert UnreachableInst if !MS.Recover?
1460 // This may invalidate some of the following checks and needs to be done
1461 // at the very end.
1462 }
1463
1464 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1465 Value *Origin) {
1466 const DataLayout &DL = F.getDataLayout();
1467 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1468 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1469 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1470 // ZExt cannot convert between vector and scalar
1471 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1472 Value *ConvertedShadow2 =
1473 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1474
1475 if (SizeIndex < kNumberOfAccessSizes) {
1476 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1477 CallBase *CB = IRB.CreateCall(
1478 Fn,
1479 {ConvertedShadow2,
1480 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1481 CB->addParamAttr(0, Attribute::ZExt);
1482 CB->addParamAttr(1, Attribute::ZExt);
1483 } else {
1484 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1485 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1486 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1487 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1488 CallBase *CB = IRB.CreateCall(
1489 Fn,
1490 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1491 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1492 CB->addParamAttr(1, Attribute::ZExt);
1493 CB->addParamAttr(2, Attribute::ZExt);
1494 }
1495 } else {
1496 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1498 Cmp, &*IRB.GetInsertPoint(),
1499 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1500
1501 IRB.SetInsertPoint(CheckTerm);
1502 insertWarningFn(IRB, Origin);
1503 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1504 }
1505 }
1506
1507 void materializeInstructionChecks(
1508 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1509 const DataLayout &DL = F.getDataLayout();
1510 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1511 // correct origin.
1512 bool Combine = !MS.TrackOrigins;
1513 Instruction *Instruction = InstructionChecks.front().OrigIns;
1514 Value *Shadow = nullptr;
1515 for (const auto &ShadowData : InstructionChecks) {
1516 assert(ShadowData.OrigIns == Instruction);
1517 IRBuilder<> IRB(Instruction);
1518
1519 Value *ConvertedShadow = ShadowData.Shadow;
1520
1521 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1522 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1523 // Skip, value is initialized or const shadow is ignored.
1524 continue;
1525 }
1526 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1527 // Report as the value is definitely uninitialized.
1528 insertWarningFn(IRB, ShadowData.Origin);
1529 if (!MS.Recover)
1530 return; // Always fail and stop here, not need to check the rest.
1531 // Skip entire instruction,
1532 continue;
1533 }
1534 // Fallback to runtime check, which still can be optimized out later.
1535 }
1536
1537 if (!Combine) {
1538 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1539 continue;
1540 }
1541
1542 if (!Shadow) {
1543 Shadow = ConvertedShadow;
1544 continue;
1545 }
1546
1547 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1548 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1549 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1550 }
1551
1552 if (Shadow) {
1553 assert(Combine);
1554 IRBuilder<> IRB(Instruction);
1555 materializeOneCheck(IRB, Shadow, nullptr);
1556 }
1557 }
1558
1559 static bool isAArch64SVCount(Type *Ty) {
1560 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1561 return TTy->getName() == "aarch64.svcount";
1562 return false;
1563 }
1564
1565 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1566 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1567 static bool isScalableNonVectorType(Type *Ty) {
1568 if (!isAArch64SVCount(Ty))
1569 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1570 << "\n");
1571
1572 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1573 }
1574
1575 void materializeChecks() {
1576#ifndef NDEBUG
1577 // For assert below.
1578 SmallPtrSet<Instruction *, 16> Done;
1579#endif
1580
1581 for (auto I = InstrumentationList.begin();
1582 I != InstrumentationList.end();) {
1583 auto OrigIns = I->OrigIns;
1584 // Checks are grouped by the original instruction. We call all
1585 // `insertShadowCheck` for an instruction at once.
1586 assert(Done.insert(OrigIns).second);
1587 auto J = std::find_if(I + 1, InstrumentationList.end(),
1588 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1589 return OrigIns != R.OrigIns;
1590 });
1591 // Process all checks of instruction at once.
1592 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1593 I = J;
1594 }
1595
1596 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1597 }
1598
1599 // Returns the last instruction in the new prologue
1600 void insertKmsanPrologue(IRBuilder<> &IRB) {
1601 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1602 Constant *Zero = IRB.getInt32(0);
1603 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1604 {Zero, IRB.getInt32(0)}, "param_shadow");
1605 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1606 {Zero, IRB.getInt32(1)}, "retval_shadow");
1607 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1608 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1609 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1610 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1611 MS.VAArgOverflowSizeTLS =
1612 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1613 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1614 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1615 {Zero, IRB.getInt32(5)}, "param_origin");
1616 MS.RetvalOriginTLS =
1617 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1618 {Zero, IRB.getInt32(6)}, "retval_origin");
1619 if (MS.TargetTriple.getArch() == Triple::systemz)
1620 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1621 }
1622
1623 /// Add MemorySanitizer instrumentation to a function.
1624 bool runOnFunction() {
1625 // Iterate all BBs in depth-first order and create shadow instructions
1626 // for all instructions (where applicable).
1627 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1628 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1629 visit(*BB);
1630
1631 // `visit` above only collects instructions. Process them after iterating
1632 // CFG to avoid requirement on CFG transformations.
1633 for (Instruction *I : Instructions)
1635
1636 // Finalize PHI nodes.
1637 for (PHINode *PN : ShadowPHINodes) {
1638 PHINode *PNS = cast<PHINode>(getShadow(PN));
1639 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1640 size_t NumValues = PN->getNumIncomingValues();
1641 for (size_t v = 0; v < NumValues; v++) {
1642 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1643 if (PNO)
1644 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1645 }
1646 }
1647
1648 VAHelper->finalizeInstrumentation();
1649
1650 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1651 // instrumenting only allocas.
1653 for (auto Item : LifetimeStartList) {
1654 instrumentAlloca(*Item.second, Item.first);
1655 AllocaSet.remove(Item.second);
1656 }
1657 }
1658 // Poison the allocas for which we didn't instrument the corresponding
1659 // lifetime intrinsics.
1660 for (AllocaInst *AI : AllocaSet)
1661 instrumentAlloca(*AI);
1662
1663 // Insert shadow value checks.
1664 materializeChecks();
1665
1666 // Delayed instrumentation of StoreInst.
1667 // This may not add new address checks.
1668 materializeStores();
1669
1670 return true;
1671 }
1672
1673 /// Compute the shadow type that corresponds to a given Value.
1674 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1675
1676 /// Compute the shadow type that corresponds to a given Type.
1677 Type *getShadowTy(Type *OrigTy) {
1678 if (!OrigTy->isSized()) {
1679 return nullptr;
1680 }
1681 // For integer type, shadow is the same as the original type.
1682 // This may return weird-sized types like i1.
1683 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1684 return IT;
1685 const DataLayout &DL = F.getDataLayout();
1686 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1687 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1688 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1689 VT->getElementCount());
1690 }
1691 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1692 return ArrayType::get(getShadowTy(AT->getElementType()),
1693 AT->getNumElements());
1694 }
1695 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1697 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1698 Elements.push_back(getShadowTy(ST->getElementType(i)));
1699 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1700 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1701 return Res;
1702 }
1703 if (isScalableNonVectorType(OrigTy)) {
1704 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1705 << "\n");
1706 return OrigTy;
1707 }
1708
1709 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1710 return IntegerType::get(*MS.C, TypeSize);
1711 }
1712
1713 /// Extract combined shadow of struct elements as a bool
1714 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1715 IRBuilder<> &IRB) {
1716 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1717 Value *Aggregator = FalseVal;
1718
1719 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1720 // Combine by ORing together each element's bool shadow
1721 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1722 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1723
1724 if (Aggregator != FalseVal)
1725 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1726 else
1727 Aggregator = ShadowBool;
1728 }
1729
1730 return Aggregator;
1731 }
1732
1733 // Extract combined shadow of array elements
1734 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1735 IRBuilder<> &IRB) {
1736 if (!Array->getNumElements())
1737 return IRB.getIntN(/* width */ 1, /* value */ 0);
1738
1739 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1740 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1741
1742 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1743 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1744 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1745 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1746 }
1747 return Aggregator;
1748 }
1749
1750 /// Convert a shadow value to it's flattened variant. The resulting
1751 /// shadow may not necessarily have the same bit width as the input
1752 /// value, but it will always be comparable to zero.
1753 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1754 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1755 return collapseStructShadow(Struct, V, IRB);
1756 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1757 return collapseArrayShadow(Array, V, IRB);
1758 if (isa<VectorType>(V->getType())) {
1759 if (isa<ScalableVectorType>(V->getType()))
1760 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1761 unsigned BitWidth =
1762 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1763 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1764 }
1765 return V;
1766 }
1767
1768 // Convert a scalar value to an i1 by comparing with 0
1769 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1770 Type *VTy = V->getType();
1771 if (!VTy->isIntegerTy())
1772 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1773 if (VTy->getIntegerBitWidth() == 1)
1774 // Just converting a bool to a bool, so do nothing.
1775 return V;
1776 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1777 }
1778
1779 Type *ptrToIntPtrType(Type *PtrTy) const {
1780 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1781 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1782 VectTy->getElementCount());
1783 }
1784 assert(PtrTy->isIntOrPtrTy());
1785 return MS.IntptrTy;
1786 }
1787
1788 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1789 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1790 return VectorType::get(
1791 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1792 VectTy->getElementCount());
1793 }
1794 assert(IntPtrTy == MS.IntptrTy);
1795 return MS.PtrTy;
1796 }
1797
1798 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1799 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1801 VectTy->getElementCount(),
1802 constToIntPtr(VectTy->getElementType(), C));
1803 }
1804 assert(IntPtrTy == MS.IntptrTy);
1805 // TODO: Avoid implicit trunc?
1806 // See https://github.com/llvm/llvm-project/issues/112510.
1807 return ConstantInt::get(MS.IntptrTy, C, /*IsSigned=*/false,
1808 /*ImplicitTrunc=*/true);
1809 }
1810
1811 /// Returns the integer shadow offset that corresponds to a given
1812 /// application address, whereby:
1813 ///
1814 /// Offset = (Addr & ~AndMask) ^ XorMask
1815 /// Shadow = ShadowBase + Offset
1816 /// Origin = (OriginBase + Offset) & ~Alignment
1817 ///
1818 /// Note: for efficiency, many shadow mappings only require use the XorMask
1819 /// and OriginBase; the AndMask and ShadowBase are often zero.
1820 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1821 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1822 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1823
1824 if (uint64_t AndMask = MS.MapParams->AndMask)
1825 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1826
1827 if (uint64_t XorMask = MS.MapParams->XorMask)
1828 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1829 return OffsetLong;
1830 }
1831
1832 /// Compute the shadow and origin addresses corresponding to a given
1833 /// application address.
1834 ///
1835 /// Shadow = ShadowBase + Offset
1836 /// Origin = (OriginBase + Offset) & ~3ULL
1837 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1838 /// a single pointee.
1839 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1840 std::pair<Value *, Value *>
1841 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1842 MaybeAlign Alignment) {
1843 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1844 if (!VectTy) {
1845 assert(Addr->getType()->isPointerTy());
1846 } else {
1847 assert(VectTy->getElementType()->isPointerTy());
1848 }
1849 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1850 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1851 Value *ShadowLong = ShadowOffset;
1852 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1853 ShadowLong =
1854 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1855 }
1856 Value *ShadowPtr = IRB.CreateIntToPtr(
1857 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1858
1859 Value *OriginPtr = nullptr;
1860 if (MS.TrackOrigins) {
1861 Value *OriginLong = ShadowOffset;
1862 uint64_t OriginBase = MS.MapParams->OriginBase;
1863 if (OriginBase != 0)
1864 OriginLong =
1865 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1866 if (!Alignment || *Alignment < kMinOriginAlignment) {
1867 uint64_t Mask = kMinOriginAlignment.value() - 1;
1868 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1869 }
1870 OriginPtr = IRB.CreateIntToPtr(
1871 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1872 }
1873 return std::make_pair(ShadowPtr, OriginPtr);
1874 }
1875
1876 template <typename... ArgsTy>
1877 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1878 ArgsTy... Args) {
1879 if (MS.TargetTriple.getArch() == Triple::systemz) {
1880 IRB.CreateCall(Callee,
1881 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1882 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1883 }
1884
1885 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1886 }
1887
1888 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1889 IRBuilder<> &IRB,
1890 Type *ShadowTy,
1891 bool isStore) {
1892 Value *ShadowOriginPtrs;
1893 const DataLayout &DL = F.getDataLayout();
1894 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1895
1896 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1897 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1898 if (Getter) {
1899 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1900 } else {
1901 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1902 ShadowOriginPtrs = createMetadataCall(
1903 IRB,
1904 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1905 AddrCast, SizeVal);
1906 }
1907 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1908 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1909 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1910
1911 return std::make_pair(ShadowPtr, OriginPtr);
1912 }
1913
1914 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1915 /// a single pointee.
1916 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1917 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1918 IRBuilder<> &IRB,
1919 Type *ShadowTy,
1920 bool isStore) {
1921 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1922 if (!VectTy) {
1923 assert(Addr->getType()->isPointerTy());
1924 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1925 }
1926
1927 // TODO: Support callbacs with vectors of addresses.
1928 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1929 Value *ShadowPtrs = ConstantInt::getNullValue(
1930 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1931 Value *OriginPtrs = nullptr;
1932 if (MS.TrackOrigins)
1933 OriginPtrs = ConstantInt::getNullValue(
1934 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1935 for (unsigned i = 0; i < NumElements; ++i) {
1936 Value *OneAddr =
1937 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1938 auto [ShadowPtr, OriginPtr] =
1939 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1940
1941 ShadowPtrs = IRB.CreateInsertElement(
1942 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1943 if (MS.TrackOrigins)
1944 OriginPtrs = IRB.CreateInsertElement(
1945 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1946 }
1947 return {ShadowPtrs, OriginPtrs};
1948 }
1949
1950 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1951 Type *ShadowTy,
1952 MaybeAlign Alignment,
1953 bool isStore) {
1954 if (MS.CompileKernel)
1955 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1956 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1957 }
1958
1959 /// Compute the shadow address for a given function argument.
1960 ///
1961 /// Shadow = ParamTLS+ArgOffset.
1962 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1963 return IRB.CreatePtrAdd(MS.ParamTLS,
1964 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1965 }
1966
1967 /// Compute the origin address for a given function argument.
1968 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1969 if (!MS.TrackOrigins)
1970 return nullptr;
1971 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1972 ConstantInt::get(MS.IntptrTy, ArgOffset),
1973 "_msarg_o");
1974 }
1975
1976 /// Compute the shadow address for a retval.
1977 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1978 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1979 }
1980
1981 /// Compute the origin address for a retval.
1982 Value *getOriginPtrForRetval() {
1983 // We keep a single origin for the entire retval. Might be too optimistic.
1984 return MS.RetvalOriginTLS;
1985 }
1986
1987 /// Set SV to be the shadow value for V.
1988 void setShadow(Value *V, Value *SV) {
1989 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1990 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1991 }
1992
1993 /// Set Origin to be the origin value for V.
1994 void setOrigin(Value *V, Value *Origin) {
1995 if (!MS.TrackOrigins)
1996 return;
1997 assert(!OriginMap.count(V) && "Values may only have one origin");
1998 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1999 OriginMap[V] = Origin;
2000 }
2001
2002 Constant *getCleanShadow(Type *OrigTy) {
2003 Type *ShadowTy = getShadowTy(OrigTy);
2004 if (!ShadowTy)
2005 return nullptr;
2006 return Constant::getNullValue(ShadowTy);
2007 }
2008
2009 /// Create a clean shadow value for a given value.
2010 ///
2011 /// Clean shadow (all zeroes) means all bits of the value are defined
2012 /// (initialized).
2013 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2014
2015 /// Create a dirty shadow of a given shadow type.
2016 Constant *getPoisonedShadow(Type *ShadowTy) {
2017 assert(ShadowTy);
2018 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2019 return Constant::getAllOnesValue(ShadowTy);
2020 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2021 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2022 getPoisonedShadow(AT->getElementType()));
2023 return ConstantArray::get(AT, Vals);
2024 }
2025 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2026 SmallVector<Constant *, 4> Vals;
2027 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2028 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2029 return ConstantStruct::get(ST, Vals);
2030 }
2031 llvm_unreachable("Unexpected shadow type");
2032 }
2033
2034 /// Create a dirty shadow for a given value.
2035 Constant *getPoisonedShadow(Value *V) {
2036 Type *ShadowTy = getShadowTy(V);
2037 if (!ShadowTy)
2038 return nullptr;
2039 return getPoisonedShadow(ShadowTy);
2040 }
2041
2042 /// Create a clean (zero) origin.
2043 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2044
2045 /// Get the shadow value for a given Value.
2046 ///
2047 /// This function either returns the value set earlier with setShadow,
2048 /// or extracts if from ParamTLS (for function arguments).
2049 Value *getShadow(Value *V) {
2050 if (Instruction *I = dyn_cast<Instruction>(V)) {
2051 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2052 return getCleanShadow(V);
2053 // For instructions the shadow is already stored in the map.
2054 Value *Shadow = ShadowMap[V];
2055 if (!Shadow) {
2056 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2057 assert(Shadow && "No shadow for a value");
2058 }
2059 return Shadow;
2060 }
2061 // Handle fully undefined values
2062 // (partially undefined constant vectors are handled later)
2063 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2064 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2065 : getCleanShadow(V);
2066 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2067 return AllOnes;
2068 }
2069 if (Argument *A = dyn_cast<Argument>(V)) {
2070 // For arguments we compute the shadow on demand and store it in the map.
2071 Value *&ShadowPtr = ShadowMap[V];
2072 if (ShadowPtr)
2073 return ShadowPtr;
2074 Function *F = A->getParent();
2075 IRBuilder<> EntryIRB(FnPrologueEnd);
2076 unsigned ArgOffset = 0;
2077 const DataLayout &DL = F->getDataLayout();
2078 for (auto &FArg : F->args()) {
2079 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2080 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2081 ? "vscale not fully supported\n"
2082 : "Arg is not sized\n"));
2083 if (A == &FArg) {
2084 ShadowPtr = getCleanShadow(V);
2085 setOrigin(A, getCleanOrigin());
2086 break;
2087 }
2088 continue;
2089 }
2090
2091 unsigned Size = FArg.hasByValAttr()
2092 ? DL.getTypeAllocSize(FArg.getParamByValType())
2093 : DL.getTypeAllocSize(FArg.getType());
2094
2095 if (A == &FArg) {
2096 bool Overflow = ArgOffset + Size > kParamTLSSize;
2097 if (FArg.hasByValAttr()) {
2098 // ByVal pointer itself has clean shadow. We copy the actual
2099 // argument shadow to the underlying memory.
2100 // Figure out maximal valid memcpy alignment.
2101 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2102 FArg.getParamAlign(), FArg.getParamByValType());
2103 Value *CpShadowPtr, *CpOriginPtr;
2104 std::tie(CpShadowPtr, CpOriginPtr) =
2105 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2106 /*isStore*/ true);
2107 if (!PropagateShadow || Overflow) {
2108 // ParamTLS overflow.
2109 EntryIRB.CreateMemSet(
2110 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2111 Size, ArgAlign);
2112 } else {
2113 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2114 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2115 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2116 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2117 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2118
2119 if (MS.TrackOrigins) {
2120 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2121 // FIXME: OriginSize should be:
2122 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2123 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2124 EntryIRB.CreateMemCpy(
2125 CpOriginPtr,
2126 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2127 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2128 OriginSize);
2129 }
2130 }
2131 }
2132
2133 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2134 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2135 ShadowPtr = getCleanShadow(V);
2136 setOrigin(A, getCleanOrigin());
2137 } else {
2138 // Shadow over TLS
2139 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2140 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2142 if (MS.TrackOrigins) {
2143 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2144 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2145 }
2146 }
2148 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2149 break;
2150 }
2151
2152 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2153 }
2154 assert(ShadowPtr && "Could not find shadow for an argument");
2155 return ShadowPtr;
2156 }
2157
2158 // Check for partially-undefined constant vectors
2159 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2160 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2161 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2162 PoisonUndefVectors) {
2163 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2164 SmallVector<Constant *, 32> ShadowVector(NumElems);
2165 for (unsigned i = 0; i != NumElems; ++i) {
2166 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2167 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2168 : getCleanShadow(Elem);
2169 }
2170
2171 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2172 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2173 << *ShadowConstant << "\n");
2174
2175 return ShadowConstant;
2176 }
2177
2178 // TODO: partially-undefined constant arrays, structures, and nested types
2179
2180 // For everything else the shadow is zero.
2181 return getCleanShadow(V);
2182 }
2183
2184 /// Get the shadow for i-th argument of the instruction I.
2185 Value *getShadow(Instruction *I, int i) {
2186 return getShadow(I->getOperand(i));
2187 }
2188
2189 /// Get the origin for a value.
2190 Value *getOrigin(Value *V) {
2191 if (!MS.TrackOrigins)
2192 return nullptr;
2193 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2194 return getCleanOrigin();
2196 "Unexpected value type in getOrigin()");
2197 if (Instruction *I = dyn_cast<Instruction>(V)) {
2198 if (I->getMetadata(LLVMContext::MD_nosanitize))
2199 return getCleanOrigin();
2200 }
2201 Value *Origin = OriginMap[V];
2202 assert(Origin && "Missing origin");
2203 return Origin;
2204 }
2205
2206 /// Get the origin for i-th argument of the instruction I.
2207 Value *getOrigin(Instruction *I, int i) {
2208 return getOrigin(I->getOperand(i));
2209 }
2210
2211 /// Remember the place where a shadow check should be inserted.
2212 ///
2213 /// This location will be later instrumented with a check that will print a
2214 /// UMR warning in runtime if the shadow value is not 0.
2215 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2216 assert(Shadow);
2217 if (!InsertChecks)
2218 return;
2219
2220 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2221 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2222 << *OrigIns << "\n");
2223 return;
2224 }
2225
2226 Type *ShadowTy = Shadow->getType();
2227 if (isScalableNonVectorType(ShadowTy)) {
2228 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2229 << " before " << *OrigIns << "\n");
2230 return;
2231 }
2232#ifndef NDEBUG
2233 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2234 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2235 "Can only insert checks for integer, vector, and aggregate shadow "
2236 "types");
2237#endif
2238 InstrumentationList.push_back(
2239 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2240 }
2241
2242 /// Get shadow for value, and remember the place where a shadow check should
2243 /// be inserted.
2244 ///
2245 /// This location will be later instrumented with a check that will print a
2246 /// UMR warning in runtime if the value is not fully defined.
2247 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2248 assert(Val);
2249 Value *Shadow, *Origin;
2251 Shadow = getShadow(Val);
2252 if (!Shadow)
2253 return;
2254 Origin = getOrigin(Val);
2255 } else {
2256 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2257 if (!Shadow)
2258 return;
2259 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2260 }
2261 insertCheckShadow(Shadow, Origin, OrigIns);
2262 }
2263
2265 switch (a) {
2266 case AtomicOrdering::NotAtomic:
2267 return AtomicOrdering::NotAtomic;
2268 case AtomicOrdering::Unordered:
2269 case AtomicOrdering::Monotonic:
2270 case AtomicOrdering::Release:
2271 return AtomicOrdering::Release;
2272 case AtomicOrdering::Acquire:
2273 case AtomicOrdering::AcquireRelease:
2274 return AtomicOrdering::AcquireRelease;
2275 case AtomicOrdering::SequentiallyConsistent:
2276 return AtomicOrdering::SequentiallyConsistent;
2277 }
2278 llvm_unreachable("Unknown ordering");
2279 }
2280
2281 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2282 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2283 uint32_t OrderingTable[NumOrderings] = {};
2284
2285 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2286 OrderingTable[(int)AtomicOrderingCABI::release] =
2287 (int)AtomicOrderingCABI::release;
2288 OrderingTable[(int)AtomicOrderingCABI::consume] =
2289 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2290 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2291 (int)AtomicOrderingCABI::acq_rel;
2292 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2293 (int)AtomicOrderingCABI::seq_cst;
2294
2295 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2296 }
2297
2299 switch (a) {
2300 case AtomicOrdering::NotAtomic:
2301 return AtomicOrdering::NotAtomic;
2302 case AtomicOrdering::Unordered:
2303 case AtomicOrdering::Monotonic:
2304 case AtomicOrdering::Acquire:
2305 return AtomicOrdering::Acquire;
2306 case AtomicOrdering::Release:
2307 case AtomicOrdering::AcquireRelease:
2308 return AtomicOrdering::AcquireRelease;
2309 case AtomicOrdering::SequentiallyConsistent:
2310 return AtomicOrdering::SequentiallyConsistent;
2311 }
2312 llvm_unreachable("Unknown ordering");
2313 }
2314
2315 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2316 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2317 uint32_t OrderingTable[NumOrderings] = {};
2318
2319 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2320 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2321 OrderingTable[(int)AtomicOrderingCABI::consume] =
2322 (int)AtomicOrderingCABI::acquire;
2323 OrderingTable[(int)AtomicOrderingCABI::release] =
2324 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2325 (int)AtomicOrderingCABI::acq_rel;
2326 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2327 (int)AtomicOrderingCABI::seq_cst;
2328
2329 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2330 }
2331
2332 // ------------------- Visitors.
2333 using InstVisitor<MemorySanitizerVisitor>::visit;
2334 void visit(Instruction &I) {
2335 if (I.getMetadata(LLVMContext::MD_nosanitize))
2336 return;
2337 // Don't want to visit if we're in the prologue
2338 if (isInPrologue(I))
2339 return;
2340 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2341 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2342 // We still need to set the shadow and origin to clean values.
2343 setShadow(&I, getCleanShadow(&I));
2344 setOrigin(&I, getCleanOrigin());
2345 return;
2346 }
2347
2348 Instructions.push_back(&I);
2349 }
2350
2351 /// Instrument LoadInst
2352 ///
2353 /// Loads the corresponding shadow and (optionally) origin.
2354 /// Optionally, checks that the load address is fully defined.
2355 void visitLoadInst(LoadInst &I) {
2356 assert(I.getType()->isSized() && "Load type must have size");
2357 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2358 NextNodeIRBuilder IRB(&I);
2359 Type *ShadowTy = getShadowTy(&I);
2360 Value *Addr = I.getPointerOperand();
2361 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2362 const Align Alignment = I.getAlign();
2363 if (PropagateShadow) {
2364 std::tie(ShadowPtr, OriginPtr) =
2365 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2366 setShadow(&I,
2367 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2368 } else {
2369 setShadow(&I, getCleanShadow(&I));
2370 }
2371
2373 insertCheckShadowOf(I.getPointerOperand(), &I);
2374
2375 if (I.isAtomic())
2376 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2377
2378 if (MS.TrackOrigins) {
2379 if (PropagateShadow) {
2380 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2381 setOrigin(
2382 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2383 } else {
2384 setOrigin(&I, getCleanOrigin());
2385 }
2386 }
2387 }
2388
2389 /// Instrument StoreInst
2390 ///
2391 /// Stores the corresponding shadow and (optionally) origin.
2392 /// Optionally, checks that the store address is fully defined.
2393 void visitStoreInst(StoreInst &I) {
2394 StoreList.push_back(&I);
2396 insertCheckShadowOf(I.getPointerOperand(), &I);
2397 }
2398
2399 void handleCASOrRMW(Instruction &I) {
2401
2402 IRBuilder<> IRB(&I);
2403 Value *Addr = I.getOperand(0);
2404 Value *Val = I.getOperand(1);
2405 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2406 /*isStore*/ true)
2407 .first;
2408
2410 insertCheckShadowOf(Addr, &I);
2411
2412 // Only test the conditional argument of cmpxchg instruction.
2413 // The other argument can potentially be uninitialized, but we can not
2414 // detect this situation reliably without possible false positives.
2416 insertCheckShadowOf(Val, &I);
2417
2418 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2419
2420 setShadow(&I, getCleanShadow(&I));
2421 setOrigin(&I, getCleanOrigin());
2422 }
2423
2424 void visitAtomicRMWInst(AtomicRMWInst &I) {
2425 handleCASOrRMW(I);
2426 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2427 }
2428
2429 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2430 handleCASOrRMW(I);
2431 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2432 }
2433
2434 /// Generic handler to compute shadow for == and != comparisons.
2435 ///
2436 /// This function is used by handleEqualityComparison and visitSwitchInst.
2437 ///
2438 /// Sometimes the comparison result is known even if some of the bits of the
2439 /// arguments are not.
2440 Value *propagateEqualityComparison(IRBuilder<> &IRB, Value *A, Value *B,
2441 Value *Sa, Value *Sb) {
2442 assert(getShadowTy(A) == Sa->getType());
2443 assert(getShadowTy(B) == Sb->getType());
2444
2445 // Get rid of pointers and vectors of pointers.
2446 // For ints (and vectors of ints), types of A and Sa match,
2447 // and this is a no-op.
2448 A = IRB.CreatePointerCast(A, Sa->getType());
2449 B = IRB.CreatePointerCast(B, Sb->getType());
2450
2451 // A == B <==> (C = A^B) == 0
2452 // A != B <==> (C = A^B) != 0
2453 // Sc = Sa | Sb
2454 Value *C = IRB.CreateXor(A, B);
2455 Value *Sc = IRB.CreateOr(Sa, Sb);
2456 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2457 // Result is defined if one of the following is true
2458 // * there is a defined 1 bit in C
2459 // * C is fully defined
2460 // Si = !(C & ~Sc) && Sc
2462 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2463 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2464 Value *RHS =
2465 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2466 Value *Si = IRB.CreateAnd(LHS, RHS);
2467 Si->setName("_msprop_icmp");
2468
2469 return Si;
2470 }
2471
2472 // Instrument:
2473 // switch i32 %Val, label %else [ i32 0, label %A
2474 // i32 1, label %B
2475 // i32 2, label %C ]
2476 //
2477 // Typically, the switch input value (%Val) is fully initialized.
2478 //
2479 // Sometimes the compiler may convert (icmp + br) into a switch statement.
2480 // MSan allows icmp eq/ne with partly initialized inputs to still result in a
2481 // fully initialized output, if there exists a bit that is initialized in
2482 // both inputs with a differing value. For compatibility, we support this in
2483 // the switch instrumentation as well. Note that this edge case only applies
2484 // if the switch input value does not match *any* of the cases (matching any
2485 // of the cases requires an exact, fully initialized match).
2486 //
2487 // ShadowCases = 0
2488 // | propagateEqualityComparison(Val, 0)
2489 // | propagateEqualityComparison(Val, 1)
2490 // | propagateEqualityComparison(Val, 2))
2491 void visitSwitchInst(SwitchInst &SI) {
2492 IRBuilder<> IRB(&SI);
2493
2494 Value *Val = SI.getCondition();
2495 Value *ShadowVal = getShadow(Val);
2496 // TODO: add fast path - if the condition is fully initialized, we know
2497 // there is no UUM, without needing to consider the case values below.
2498
2499 // Some code (e.g., AMDGPUGenMCCodeEmitter.inc) has tens of thousands of
2500 // cases. This results in an extremely long chained expression for MSan's
2501 // switch instrumentation, which can cause the JumpThreadingPass to have a
2502 // stack overflow or excessive runtime. We limit the number of cases
2503 // considered, with the tradeoff of niche false negatives.
2504 // TODO: figure out a better solution.
2505 int casesToConsider = ClSwitchPrecision;
2506
2507 Value *ShadowCases = nullptr;
2508 for (auto Case : SI.cases()) {
2509 if (casesToConsider <= 0)
2510 break;
2511
2512 Value *Comparator = Case.getCaseValue();
2513 // TODO: some simplification is possible when comparing multiple cases
2514 // simultaneously.
2515 Value *ComparisonShadow = propagateEqualityComparison(
2516 IRB, Val, Comparator, ShadowVal, getShadow(Comparator));
2517
2518 if (ShadowCases)
2519 ShadowCases = IRB.CreateOr(ShadowCases, ComparisonShadow);
2520 else
2521 ShadowCases = ComparisonShadow;
2522
2523 casesToConsider--;
2524 }
2525
2526 if (ShadowCases)
2527 insertCheckShadow(ShadowCases, getOrigin(Val), &SI);
2528 }
2529
2530 // Vector manipulation.
2531 void visitExtractElementInst(ExtractElementInst &I) {
2532 insertCheckShadowOf(I.getOperand(1), &I);
2533 IRBuilder<> IRB(&I);
2534 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2535 "_msprop"));
2536 setOrigin(&I, getOrigin(&I, 0));
2537 }
2538
2539 void visitInsertElementInst(InsertElementInst &I) {
2540 insertCheckShadowOf(I.getOperand(2), &I);
2541 IRBuilder<> IRB(&I);
2542 auto *Shadow0 = getShadow(&I, 0);
2543 auto *Shadow1 = getShadow(&I, 1);
2544 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2545 "_msprop"));
2546 setOriginForNaryOp(I);
2547 }
2548
2549 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2550 IRBuilder<> IRB(&I);
2551 auto *Shadow0 = getShadow(&I, 0);
2552 auto *Shadow1 = getShadow(&I, 1);
2553 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2554 "_msprop"));
2555 setOriginForNaryOp(I);
2556 }
2557
2558 // Casts.
2559 void visitSExtInst(SExtInst &I) {
2560 IRBuilder<> IRB(&I);
2561 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2562 setOrigin(&I, getOrigin(&I, 0));
2563 }
2564
2565 void visitZExtInst(ZExtInst &I) {
2566 IRBuilder<> IRB(&I);
2567 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2568 setOrigin(&I, getOrigin(&I, 0));
2569 }
2570
2571 void visitTruncInst(TruncInst &I) {
2572 IRBuilder<> IRB(&I);
2573 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2574 setOrigin(&I, getOrigin(&I, 0));
2575 }
2576
2577 void visitBitCastInst(BitCastInst &I) {
2578 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2579 // a musttail call and a ret, don't instrument. New instructions are not
2580 // allowed after a musttail call.
2581 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2582 if (CI->isMustTailCall())
2583 return;
2584 IRBuilder<> IRB(&I);
2585 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2586 setOrigin(&I, getOrigin(&I, 0));
2587 }
2588
2589 void visitPtrToIntInst(PtrToIntInst &I) {
2590 IRBuilder<> IRB(&I);
2591 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2592 "_msprop_ptrtoint"));
2593 setOrigin(&I, getOrigin(&I, 0));
2594 }
2595
2596 void visitIntToPtrInst(IntToPtrInst &I) {
2597 IRBuilder<> IRB(&I);
2598 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2599 "_msprop_inttoptr"));
2600 setOrigin(&I, getOrigin(&I, 0));
2601 }
2602
2603 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2604 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2605 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2606 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2607 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2608 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2609
2610 /// Generic handler to compute shadow for bitwise AND.
2611 ///
2612 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2613 ///
2614 /// This code is precise: it implements the rule that "And" of an initialized
2615 /// zero bit always results in an initialized value:
2616 // 1&1 => 1; 0&1 => 0; p&1 => p;
2617 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2618 // 1&p => p; 0&p => 0; p&p => p;
2619 //
2620 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2621 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2622 Value *S2) {
2623 Value *S1S2 = IRB.CreateAnd(S1, S2);
2624 Value *V1S2 = IRB.CreateAnd(V1, S2);
2625 Value *S1V2 = IRB.CreateAnd(S1, V2);
2626
2627 if (V1->getType() != S1->getType()) {
2628 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2629 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2630 }
2631
2632 return IRB.CreateOr({S1S2, V1S2, S1V2});
2633 }
2634
2635 /// Handler for bitwise AND operator.
2636 void visitAnd(BinaryOperator &I) {
2637 IRBuilder<> IRB(&I);
2638 Value *V1 = I.getOperand(0);
2639 Value *V2 = I.getOperand(1);
2640 Value *S1 = getShadow(&I, 0);
2641 Value *S2 = getShadow(&I, 1);
2642
2643 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2644
2645 setShadow(&I, OutShadow);
2646 setOriginForNaryOp(I);
2647 }
2648
2649 void visitOr(BinaryOperator &I) {
2650 IRBuilder<> IRB(&I);
2651 // "Or" of 1 and a poisoned value results in unpoisoned value:
2652 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2653 // 1|0 => 1; 0|0 => 0; p|0 => p;
2654 // 1|p => 1; 0|p => p; p|p => p;
2655 //
2656 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2657 //
2658 // If the "disjoint OR" property is violated, the result is poison, and
2659 // hence the entire shadow is uninitialized:
2660 // S = S | SignExt(V1 & V2 != 0)
2661 Value *S1 = getShadow(&I, 0);
2662 Value *S2 = getShadow(&I, 1);
2663 Value *V1 = I.getOperand(0);
2664 Value *V2 = I.getOperand(1);
2665 if (V1->getType() != S1->getType()) {
2666 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2667 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2668 }
2669
2670 Value *NotV1 = IRB.CreateNot(V1);
2671 Value *NotV2 = IRB.CreateNot(V2);
2672
2673 Value *S1S2 = IRB.CreateAnd(S1, S2);
2674 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2675 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2676
2677 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2678
2679 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2680 Value *V1V2 = IRB.CreateAnd(V1, V2);
2681 Value *DisjointOrShadow = IRB.CreateSExt(
2682 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2683 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2684 }
2685
2686 setShadow(&I, S);
2687 setOriginForNaryOp(I);
2688 }
2689
2690 /// Default propagation of shadow and/or origin.
2691 ///
2692 /// This class implements the general case of shadow propagation, used in all
2693 /// cases where we don't know and/or don't care about what the operation
2694 /// actually does. It converts all input shadow values to a common type
2695 /// (extending or truncating as necessary), and bitwise OR's them.
2696 ///
2697 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2698 /// fully initialized), and less prone to false positives.
2699 ///
2700 /// This class also implements the general case of origin propagation. For a
2701 /// Nary operation, result origin is set to the origin of an argument that is
2702 /// not entirely initialized. If there is more than one such arguments, the
2703 /// rightmost of them is picked. It does not matter which one is picked if all
2704 /// arguments are initialized.
2705 template <bool CombineShadow> class Combiner {
2706 Value *Shadow = nullptr;
2707 Value *Origin = nullptr;
2708 IRBuilder<> &IRB;
2709 MemorySanitizerVisitor *MSV;
2710
2711 public:
2712 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2713 : IRB(IRB), MSV(MSV) {}
2714
2715 /// Add a pair of shadow and origin values to the mix.
2716 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2717 if (CombineShadow) {
2718 assert(OpShadow);
2719 if (!Shadow)
2720 Shadow = OpShadow;
2721 else {
2722 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2723 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2724 }
2725 }
2726
2727 if (MSV->MS.TrackOrigins) {
2728 assert(OpOrigin);
2729 if (!Origin) {
2730 Origin = OpOrigin;
2731 } else {
2732 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2733 // No point in adding something that might result in 0 origin value.
2734 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2735 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2736 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2737 }
2738 }
2739 }
2740 return *this;
2741 }
2742
2743 /// Add an application value to the mix.
2744 Combiner &Add(Value *V) {
2745 Value *OpShadow = MSV->getShadow(V);
2746 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2747 return Add(OpShadow, OpOrigin);
2748 }
2749
2750 /// Set the current combined values as the given instruction's shadow
2751 /// and origin.
2752 void Done(Instruction *I) {
2753 if (CombineShadow) {
2754 assert(Shadow);
2755 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2756 MSV->setShadow(I, Shadow);
2757 }
2758 if (MSV->MS.TrackOrigins) {
2759 assert(Origin);
2760 MSV->setOrigin(I, Origin);
2761 }
2762 }
2763
2764 /// Store the current combined value at the specified origin
2765 /// location.
2766 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2767 if (MSV->MS.TrackOrigins) {
2768 assert(Origin);
2769 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2770 }
2771 }
2772 };
2773
2774 using ShadowAndOriginCombiner = Combiner<true>;
2775 using OriginCombiner = Combiner<false>;
2776
2777 /// Propagate origin for arbitrary operation.
2778 void setOriginForNaryOp(Instruction &I) {
2779 if (!MS.TrackOrigins)
2780 return;
2781 IRBuilder<> IRB(&I);
2782 OriginCombiner OC(this, IRB);
2783 for (Use &Op : I.operands())
2784 OC.Add(Op.get());
2785 OC.Done(&I);
2786 }
2787
2788 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2789 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2790 "Vector of pointers is not a valid shadow type");
2791 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2793 : Ty->getPrimitiveSizeInBits();
2794 }
2795
2796 /// Cast between two shadow types, extending or truncating as
2797 /// necessary.
2798 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2799 bool Signed = false) {
2800 Type *srcTy = V->getType();
2801 if (srcTy == dstTy)
2802 return V;
2803 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2804 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2805 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2806 return IRB.CreateICmpNE(V, getCleanShadow(V));
2807
2808 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2809 return IRB.CreateIntCast(V, dstTy, Signed);
2810 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2811 cast<VectorType>(dstTy)->getElementCount() ==
2812 cast<VectorType>(srcTy)->getElementCount())
2813 return IRB.CreateIntCast(V, dstTy, Signed);
2814 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2815 Value *V2 =
2816 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2817 return IRB.CreateBitCast(V2, dstTy);
2818 // TODO: handle struct types.
2819 }
2820
2821 /// Cast an application value to the type of its own shadow.
2822 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2823 Type *ShadowTy = getShadowTy(V);
2824 if (V->getType() == ShadowTy)
2825 return V;
2826 if (V->getType()->isPtrOrPtrVectorTy())
2827 return IRB.CreatePtrToInt(V, ShadowTy);
2828 else
2829 return IRB.CreateBitCast(V, ShadowTy);
2830 }
2831
2832 /// Propagate shadow for arbitrary operation.
2833 void handleShadowOr(Instruction &I) {
2834 IRBuilder<> IRB(&I);
2835 ShadowAndOriginCombiner SC(this, IRB);
2836 for (Use &Op : I.operands())
2837 SC.Add(Op.get());
2838 SC.Done(&I);
2839 }
2840
2841 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2842 // of elements.
2843 //
2844 // For example, suppose we have:
2845 // VectorA: <a0, a1, a2, a3, a4, a5>
2846 // VectorB: <b0, b1, b2, b3, b4, b5>
2847 // ReductionFactor: 3
2848 // Shards: 1
2849 // The output would be:
2850 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2851 //
2852 // If we have:
2853 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2854 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2855 // ReductionFactor: 2
2856 // Shards: 2
2857 // then a and be each have 2 "shards", resulting in the output being
2858 // interleaved:
2859 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2860 //
2861 // This is convenient for instrumenting horizontal add/sub.
2862 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2863 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2864 unsigned Shards, Value *VectorA, Value *VectorB) {
2865 assert(isa<FixedVectorType>(VectorA->getType()));
2866 unsigned NumElems =
2867 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2868
2869 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2870 if (VectorB) {
2871 assert(VectorA->getType() == VectorB->getType());
2872 TotalNumElems *= 2;
2873 }
2874
2875 assert(NumElems % (ReductionFactor * Shards) == 0);
2876
2877 Value *Or = nullptr;
2878
2879 IRBuilder<> IRB(&I);
2880 for (unsigned i = 0; i < ReductionFactor; i++) {
2881 SmallVector<int, 16> Mask;
2882
2883 for (unsigned j = 0; j < Shards; j++) {
2884 unsigned Offset = NumElems / Shards * j;
2885
2886 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2887 Mask.push_back(Offset + X + i);
2888
2889 if (VectorB) {
2890 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2891 Mask.push_back(NumElems + Offset + X + i);
2892 }
2893 }
2894
2895 Value *Masked;
2896 if (VectorB)
2897 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2898 else
2899 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2900
2901 if (Or)
2902 Or = IRB.CreateOr(Or, Masked);
2903 else
2904 Or = Masked;
2905 }
2906
2907 return Or;
2908 }
2909
2910 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2911 /// fields.
2912 ///
2913 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2914 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2915 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2916 assert(I.arg_size() == 1 || I.arg_size() == 2);
2917
2918 assert(I.getType()->isVectorTy());
2919 assert(I.getArgOperand(0)->getType()->isVectorTy());
2920
2921 [[maybe_unused]] FixedVectorType *ParamType =
2922 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2923 assert((I.arg_size() != 2) ||
2924 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2925 [[maybe_unused]] FixedVectorType *ReturnType =
2926 cast<FixedVectorType>(I.getType());
2927 assert(ParamType->getNumElements() * I.arg_size() ==
2928 2 * ReturnType->getNumElements());
2929
2930 IRBuilder<> IRB(&I);
2931
2932 // Horizontal OR of shadow
2933 Value *FirstArgShadow = getShadow(&I, 0);
2934 Value *SecondArgShadow = nullptr;
2935 if (I.arg_size() == 2)
2936 SecondArgShadow = getShadow(&I, 1);
2937
2938 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2939 FirstArgShadow, SecondArgShadow);
2940
2941 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2942
2943 setShadow(&I, OrShadow);
2944 setOriginForNaryOp(I);
2945 }
2946
2947 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2948 /// fields, with the parameters reinterpreted to have elements of a specified
2949 /// width. For example:
2950 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2951 /// conceptually operates on
2952 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2953 /// and can be handled with ReinterpretElemWidth == 16.
2954 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2955 int ReinterpretElemWidth) {
2956 assert(I.arg_size() == 1 || I.arg_size() == 2);
2957
2958 assert(I.getType()->isVectorTy());
2959 assert(I.getArgOperand(0)->getType()->isVectorTy());
2960
2961 FixedVectorType *ParamType =
2962 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2963 assert((I.arg_size() != 2) ||
2964 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2965
2966 [[maybe_unused]] FixedVectorType *ReturnType =
2967 cast<FixedVectorType>(I.getType());
2968 assert(ParamType->getNumElements() * I.arg_size() ==
2969 2 * ReturnType->getNumElements());
2970
2971 IRBuilder<> IRB(&I);
2972
2973 FixedVectorType *ReinterpretShadowTy = nullptr;
2974 assert(isAligned(Align(ReinterpretElemWidth),
2975 ParamType->getPrimitiveSizeInBits()));
2976 ReinterpretShadowTy = FixedVectorType::get(
2977 IRB.getIntNTy(ReinterpretElemWidth),
2978 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2979
2980 // Horizontal OR of shadow
2981 Value *FirstArgShadow = getShadow(&I, 0);
2982 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2983
2984 // If we had two parameters each with an odd number of elements, the total
2985 // number of elements is even, but we have never seen this in extant
2986 // instruction sets, so we enforce that each parameter must have an even
2987 // number of elements.
2989 Align(2),
2990 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2991
2992 Value *SecondArgShadow = nullptr;
2993 if (I.arg_size() == 2) {
2994 SecondArgShadow = getShadow(&I, 1);
2995 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2996 }
2997
2998 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2999 FirstArgShadow, SecondArgShadow);
3000
3001 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
3002
3003 setShadow(&I, OrShadow);
3004 setOriginForNaryOp(I);
3005 }
3006
3007 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
3008
3009 // Handle multiplication by constant.
3010 //
3011 // Handle a special case of multiplication by constant that may have one or
3012 // more zeros in the lower bits. This makes corresponding number of lower bits
3013 // of the result zero as well. We model it by shifting the other operand
3014 // shadow left by the required number of bits. Effectively, we transform
3015 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
3016 // We use multiplication by 2**N instead of shift to cover the case of
3017 // multiplication by 0, which may occur in some elements of a vector operand.
3018 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
3019 Value *OtherArg) {
3020 Constant *ShadowMul;
3021 Type *Ty = ConstArg->getType();
3022 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
3023 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
3024 Type *EltTy = VTy->getElementType();
3026 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
3027 if (ConstantInt *Elt =
3029 const APInt &V = Elt->getValue();
3030 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3031 Elements.push_back(ConstantInt::get(EltTy, V2));
3032 } else {
3033 Elements.push_back(ConstantInt::get(EltTy, 1));
3034 }
3035 }
3036 ShadowMul = ConstantVector::get(Elements);
3037 } else {
3038 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
3039 const APInt &V = Elt->getValue();
3040 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3041 ShadowMul = ConstantInt::get(Ty, V2);
3042 } else {
3043 ShadowMul = ConstantInt::get(Ty, 1);
3044 }
3045 }
3046
3047 IRBuilder<> IRB(&I);
3048 setShadow(&I,
3049 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
3050 setOrigin(&I, getOrigin(OtherArg));
3051 }
3052
3053 void visitMul(BinaryOperator &I) {
3054 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
3055 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
3056 if (constOp0 && !constOp1)
3057 handleMulByConstant(I, constOp0, I.getOperand(1));
3058 else if (constOp1 && !constOp0)
3059 handleMulByConstant(I, constOp1, I.getOperand(0));
3060 else
3061 handleShadowOr(I);
3062 }
3063
3064 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
3065 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
3066 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
3067 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
3068 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
3069 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
3070
3071 void handleIntegerDiv(Instruction &I) {
3072 IRBuilder<> IRB(&I);
3073 // Strict on the second argument.
3074 insertCheckShadowOf(I.getOperand(1), &I);
3075 setShadow(&I, getShadow(&I, 0));
3076 setOrigin(&I, getOrigin(&I, 0));
3077 }
3078
3079 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3080 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3081 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
3082 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
3083
3084 // Floating point division is side-effect free. We can not require that the
3085 // divisor is fully initialized and must propagate shadow. See PR37523.
3086 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
3087 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
3088
3089 /// Instrument == and != comparisons.
3090 ///
3091 /// Sometimes the comparison result is known even if some of the bits of the
3092 /// arguments are not.
3093 void handleEqualityComparison(ICmpInst &I) {
3094 IRBuilder<> IRB(&I);
3095 Value *A = I.getOperand(0);
3096 Value *B = I.getOperand(1);
3097 Value *Sa = getShadow(A);
3098 Value *Sb = getShadow(B);
3099
3100 Value *Si = propagateEqualityComparison(IRB, A, B, Sa, Sb);
3101
3102 setShadow(&I, Si);
3103 setOriginForNaryOp(I);
3104 }
3105
3106 /// Instrument relational comparisons.
3107 ///
3108 /// This function does exact shadow propagation for all relational
3109 /// comparisons of integers, pointers and vectors of those.
3110 /// FIXME: output seems suboptimal when one of the operands is a constant
3111 void handleRelationalComparisonExact(ICmpInst &I) {
3112 IRBuilder<> IRB(&I);
3113 Value *A = I.getOperand(0);
3114 Value *B = I.getOperand(1);
3115 Value *Sa = getShadow(A);
3116 Value *Sb = getShadow(B);
3117
3118 // Get rid of pointers and vectors of pointers.
3119 // For ints (and vectors of ints), types of A and Sa match,
3120 // and this is a no-op.
3121 A = IRB.CreatePointerCast(A, Sa->getType());
3122 B = IRB.CreatePointerCast(B, Sb->getType());
3123
3124 // Let [a0, a1] be the interval of possible values of A, taking into account
3125 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3126 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3127 bool IsSigned = I.isSigned();
3128
3129 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3130 if (IsSigned) {
3131 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3132 // should be preserved, if checked with `getUnsignedPredicate()`.
3133 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3134 // affected, as they are created by effectively adding/substructing from
3135 // A (or B) a value, derived from shadow, with no overflow, either
3136 // before or after sign flip.
3137 APInt MinVal =
3138 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3139 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3140 }
3141 // Minimize undefined bits.
3142 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3143 Value *Max = IRB.CreateOr(V, S);
3144 return std::make_pair(Min, Max);
3145 };
3146
3147 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3148 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3149 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3150 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3151
3152 Value *Si = IRB.CreateXor(S1, S2);
3153 setShadow(&I, Si);
3154 setOriginForNaryOp(I);
3155 }
3156
3157 /// Instrument signed relational comparisons.
3158 ///
3159 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3160 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3161 void handleSignedRelationalComparison(ICmpInst &I) {
3162 Constant *constOp;
3163 Value *op = nullptr;
3165 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3166 op = I.getOperand(0);
3167 pre = I.getPredicate();
3168 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3169 op = I.getOperand(1);
3170 pre = I.getSwappedPredicate();
3171 } else {
3172 handleShadowOr(I);
3173 return;
3174 }
3175
3176 if ((constOp->isNullValue() &&
3177 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3178 (constOp->isAllOnesValue() &&
3179 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3180 IRBuilder<> IRB(&I);
3181 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3182 "_msprop_icmp_s");
3183 setShadow(&I, Shadow);
3184 setOrigin(&I, getOrigin(op));
3185 } else {
3186 handleShadowOr(I);
3187 }
3188 }
3189
3190 void visitICmpInst(ICmpInst &I) {
3191 if (!ClHandleICmp) {
3192 handleShadowOr(I);
3193 return;
3194 }
3195 if (I.isEquality()) {
3196 handleEqualityComparison(I);
3197 return;
3198 }
3199
3200 assert(I.isRelational());
3201 if (ClHandleICmpExact) {
3202 handleRelationalComparisonExact(I);
3203 return;
3204 }
3205 if (I.isSigned()) {
3206 handleSignedRelationalComparison(I);
3207 return;
3208 }
3209
3210 assert(I.isUnsigned());
3211 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3212 handleRelationalComparisonExact(I);
3213 return;
3214 }
3215
3216 handleShadowOr(I);
3217 }
3218
3219 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3220
3221 void handleShift(BinaryOperator &I) {
3222 IRBuilder<> IRB(&I);
3223 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3224 // Otherwise perform the same shift on S1.
3225 Value *S1 = getShadow(&I, 0);
3226 Value *S2 = getShadow(&I, 1);
3227 Value *S2Conv =
3228 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3229 Value *V2 = I.getOperand(1);
3230 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3231 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3232 setOriginForNaryOp(I);
3233 }
3234
3235 void visitShl(BinaryOperator &I) { handleShift(I); }
3236 void visitAShr(BinaryOperator &I) { handleShift(I); }
3237 void visitLShr(BinaryOperator &I) { handleShift(I); }
3238
3239 void handleFunnelShift(IntrinsicInst &I) {
3240 IRBuilder<> IRB(&I);
3241 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3242 // Otherwise perform the same shift on S0 and S1.
3243 Value *S0 = getShadow(&I, 0);
3244 Value *S1 = getShadow(&I, 1);
3245 Value *S2 = getShadow(&I, 2);
3246 Value *S2Conv =
3247 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3248 Value *V2 = I.getOperand(2);
3249 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3250 {S0, S1, V2});
3251 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3252 setOriginForNaryOp(I);
3253 }
3254
3255 /// Instrument llvm.memmove
3256 ///
3257 /// At this point we don't know if llvm.memmove will be inlined or not.
3258 /// If we don't instrument it and it gets inlined,
3259 /// our interceptor will not kick in and we will lose the memmove.
3260 /// If we instrument the call here, but it does not get inlined,
3261 /// we will memmove the shadow twice: which is bad in case
3262 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3263 ///
3264 /// Similar situation exists for memcpy and memset.
3265 void visitMemMoveInst(MemMoveInst &I) {
3266 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3267 IRBuilder<> IRB(&I);
3268 IRB.CreateCall(MS.MemmoveFn,
3269 {I.getArgOperand(0), I.getArgOperand(1),
3270 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3272 }
3273
3274 /// Instrument memcpy
3275 ///
3276 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3277 /// unfortunate as it may slowdown small constant memcpys.
3278 /// FIXME: consider doing manual inline for small constant sizes and proper
3279 /// alignment.
3280 ///
3281 /// Note: This also handles memcpy.inline, which promises no calls to external
3282 /// functions as an optimization. However, with instrumentation enabled this
3283 /// is difficult to promise; additionally, we know that the MSan runtime
3284 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3285 /// instrumentation it's safe to turn memcpy.inline into a call to
3286 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3287 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3288 void visitMemCpyInst(MemCpyInst &I) {
3289 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3290 IRBuilder<> IRB(&I);
3291 IRB.CreateCall(MS.MemcpyFn,
3292 {I.getArgOperand(0), I.getArgOperand(1),
3293 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3295 }
3296
3297 // Same as memcpy.
3298 void visitMemSetInst(MemSetInst &I) {
3299 IRBuilder<> IRB(&I);
3300 IRB.CreateCall(
3301 MS.MemsetFn,
3302 {I.getArgOperand(0),
3303 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3304 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3306 }
3307
3308 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3309
3310 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3311
3312 /// Handle vector store-like intrinsics.
3313 ///
3314 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3315 /// has 1 pointer argument and 1 vector argument, returns void.
3316 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3317 assert(I.arg_size() == 2);
3318
3319 IRBuilder<> IRB(&I);
3320 Value *Addr = I.getArgOperand(0);
3321 Value *Shadow = getShadow(&I, 1);
3322 Value *ShadowPtr, *OriginPtr;
3323
3324 // We don't know the pointer alignment (could be unaligned SSE store!).
3325 // Have to assume to worst case.
3326 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3327 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3328 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3329
3331 insertCheckShadowOf(Addr, &I);
3332
3333 // FIXME: factor out common code from materializeStores
3334 if (MS.TrackOrigins)
3335 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3336 return true;
3337 }
3338
3339 /// Handle vector load-like intrinsics.
3340 ///
3341 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3342 /// has 1 pointer argument, returns a vector.
3343 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3344 assert(I.arg_size() == 1);
3345
3346 IRBuilder<> IRB(&I);
3347 Value *Addr = I.getArgOperand(0);
3348
3349 Type *ShadowTy = getShadowTy(&I);
3350 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3351 if (PropagateShadow) {
3352 // We don't know the pointer alignment (could be unaligned SSE load!).
3353 // Have to assume to worst case.
3354 const Align Alignment = Align(1);
3355 std::tie(ShadowPtr, OriginPtr) =
3356 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3357 setShadow(&I,
3358 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3359 } else {
3360 setShadow(&I, getCleanShadow(&I));
3361 }
3362
3364 insertCheckShadowOf(Addr, &I);
3365
3366 if (MS.TrackOrigins) {
3367 if (PropagateShadow)
3368 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3369 else
3370 setOrigin(&I, getCleanOrigin());
3371 }
3372 return true;
3373 }
3374
3375 /// Handle (SIMD arithmetic)-like intrinsics.
3376 ///
3377 /// Instrument intrinsics with any number of arguments of the same type [*],
3378 /// equal to the return type, plus a specified number of trailing flags of
3379 /// any type.
3380 ///
3381 /// [*] The type should be simple (no aggregates or pointers; vectors are
3382 /// fine).
3383 ///
3384 /// Caller guarantees that this intrinsic does not access memory.
3385 ///
3386 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3387 /// by this handler. See horizontalReduce().
3388 ///
3389 /// TODO: permutation intrinsics are also often incorrectly matched.
3390 [[maybe_unused]] bool
3391 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3392 unsigned int trailingFlags) {
3393 Type *RetTy = I.getType();
3394 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3395 return false;
3396
3397 unsigned NumArgOperands = I.arg_size();
3398 assert(NumArgOperands >= trailingFlags);
3399 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3400 Type *Ty = I.getArgOperand(i)->getType();
3401 if (Ty != RetTy)
3402 return false;
3403 }
3404
3405 IRBuilder<> IRB(&I);
3406 ShadowAndOriginCombiner SC(this, IRB);
3407 for (unsigned i = 0; i < NumArgOperands; ++i)
3408 SC.Add(I.getArgOperand(i));
3409 SC.Done(&I);
3410
3411 return true;
3412 }
3413
3414 /// Returns whether it was able to heuristically instrument unknown
3415 /// intrinsics.
3416 ///
3417 /// The main purpose of this code is to do something reasonable with all
3418 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3419 /// We recognize several classes of intrinsics by their argument types and
3420 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3421 /// sure that we know what the intrinsic does.
3422 ///
3423 /// We special-case intrinsics where this approach fails. See llvm.bswap
3424 /// handling as an example of that.
3425 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3426 unsigned NumArgOperands = I.arg_size();
3427 if (NumArgOperands == 0)
3428 return false;
3429
3430 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3431 I.getArgOperand(1)->getType()->isVectorTy() &&
3432 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3433 // This looks like a vector store.
3434 return handleVectorStoreIntrinsic(I);
3435 }
3436
3437 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3438 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3439 // This looks like a vector load.
3440 return handleVectorLoadIntrinsic(I);
3441 }
3442
3443 if (I.doesNotAccessMemory())
3444 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3445 return true;
3446
3447 // FIXME: detect and handle SSE maskstore/maskload?
3448 // Some cases are now handled in handleAVXMasked{Load,Store}.
3449 return false;
3450 }
3451
3452 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3453 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3455 dumpInst(I);
3456
3457 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3458 << "\n");
3459 return true;
3460 } else
3461 return false;
3462 }
3463
3464 void handleInvariantGroup(IntrinsicInst &I) {
3465 setShadow(&I, getShadow(&I, 0));
3466 setOrigin(&I, getOrigin(&I, 0));
3467 }
3468
3469 void handleLifetimeStart(IntrinsicInst &I) {
3470 if (!PoisonStack)
3471 return;
3472 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3473 if (AI)
3474 LifetimeStartList.push_back(std::make_pair(&I, AI));
3475 }
3476
3477 void handleBswap(IntrinsicInst &I) {
3478 IRBuilder<> IRB(&I);
3479 Value *Op = I.getArgOperand(0);
3480 Type *OpType = Op->getType();
3481 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3482 getShadow(Op)));
3483 setOrigin(&I, getOrigin(Op));
3484 }
3485
3486 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3487 // and a 1. If the input is all zero, it is fully initialized iff
3488 // !is_zero_poison.
3489 //
3490 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3491 // concrete value 0/1, and ? is an uninitialized bit:
3492 // - 0001 0??? is fully initialized
3493 // - 000? ???? is fully uninitialized (*)
3494 // - ???? ???? is fully uninitialized
3495 // - 0000 0000 is fully uninitialized if is_zero_poison,
3496 // fully initialized otherwise
3497 //
3498 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3499 // only need to poison 4 bits.
3500 //
3501 // OutputShadow =
3502 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3503 // || (is_zero_poison && AllZeroSrc)
3504 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3505 IRBuilder<> IRB(&I);
3506 Value *Src = I.getArgOperand(0);
3507 Value *SrcShadow = getShadow(Src);
3508
3509 Value *False = IRB.getInt1(false);
3510 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3511 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3512 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3513 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3514
3515 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3516 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3517
3518 Value *NotAllZeroShadow =
3519 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3520 Value *OutputShadow =
3521 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3522
3523 // If zero poison is requested, mix in with the shadow
3524 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3525 if (!IsZeroPoison->isNullValue()) {
3526 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3527 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3528 }
3529
3530 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3531
3532 setShadow(&I, OutputShadow);
3533 setOriginForNaryOp(I);
3534 }
3535
3536 /// Handle Arm NEON vector convert intrinsics.
3537 ///
3538 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3539 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
3540 ///
3541 /// For conversions to or from fixed-point, there is a trailing argument to
3542 /// indicate the fixed-point precision:
3543 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
3544 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
3545 ///
3546 /// For x86 SSE vector convert intrinsics, see
3547 /// handleSSEVectorConvertIntrinsic().
3548 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I, bool FixedPoint) {
3549 if (FixedPoint)
3550 assert(I.arg_size() == 2);
3551 else
3552 assert(I.arg_size() == 1);
3553
3554 IRBuilder<> IRB(&I);
3555 Value *S0 = getShadow(&I, 0);
3556
3557 if (FixedPoint) {
3558 Value *Precision = I.getOperand(1);
3559 insertCheckShadowOf(Precision, &I);
3560 }
3561
3562 /// For scalars:
3563 /// Since they are converting from floating-point to integer, the output is
3564 /// - fully uninitialized if *any* bit of the input is uninitialized
3565 /// - fully ininitialized if all bits of the input are ininitialized
3566 /// We apply the same principle on a per-field basis for vectors.
3567 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3568 getShadowTy(&I));
3569 setShadow(&I, OutShadow);
3570 setOriginForNaryOp(I);
3571 }
3572
3573 /// Some instructions have additional zero-elements in the return type
3574 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3575 ///
3576 /// This function will return a vector type with the same number of elements
3577 /// as the input, but same per-element width as the return value e.g.,
3578 /// <8 x i8>.
3579 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3580 assert(isa<FixedVectorType>(getShadowTy(&I)));
3581 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3582
3583 // TODO: generalize beyond 2x?
3584 if (ShadowType->getElementCount() ==
3585 cast<VectorType>(Src->getType())->getElementCount() * 2)
3586 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3587
3588 assert(ShadowType->getElementCount() ==
3589 cast<VectorType>(Src->getType())->getElementCount());
3590
3591 return ShadowType;
3592 }
3593
3594 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3595 /// to match the length of the shadow for the instruction.
3596 /// If scalar types of the vectors are different, it will use the type of the
3597 /// input vector.
3598 /// This is more type-safe than CreateShadowCast().
3599 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3600 IRBuilder<> IRB(&I);
3602 assert(isa<FixedVectorType>(I.getType()));
3603
3604 Value *FullShadow = getCleanShadow(&I);
3605 unsigned ShadowNumElems =
3606 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3607 unsigned FullShadowNumElems =
3608 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3609
3610 assert((ShadowNumElems == FullShadowNumElems) ||
3611 (ShadowNumElems * 2 == FullShadowNumElems));
3612
3613 if (ShadowNumElems == FullShadowNumElems) {
3614 FullShadow = Shadow;
3615 } else {
3616 // TODO: generalize beyond 2x?
3617 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3618 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3619
3620 // Append zeros
3621 FullShadow =
3622 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3623 }
3624
3625 return FullShadow;
3626 }
3627
3628 /// Handle x86 SSE vector conversion.
3629 ///
3630 /// e.g., single-precision to half-precision conversion:
3631 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3632 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3633 ///
3634 /// floating-point to integer:
3635 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3636 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3637 ///
3638 /// Note: if the output has more elements, they are zero-initialized (and
3639 /// therefore the shadow will also be initialized).
3640 ///
3641 /// This differs from handleSSEVectorConvertIntrinsic() because it
3642 /// propagates uninitialized shadow (instead of checking the shadow).
3643 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3644 bool HasRoundingMode) {
3645 if (HasRoundingMode) {
3646 assert(I.arg_size() == 2);
3647 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3648 assert(RoundingMode->getType()->isIntegerTy());
3649 } else {
3650 assert(I.arg_size() == 1);
3651 }
3652
3653 Value *Src = I.getArgOperand(0);
3654 assert(Src->getType()->isVectorTy());
3655
3656 // The return type might have more elements than the input.
3657 // Temporarily shrink the return type's number of elements.
3658 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3659
3660 IRBuilder<> IRB(&I);
3661 Value *S0 = getShadow(&I, 0);
3662
3663 /// For scalars:
3664 /// Since they are converting to and/or from floating-point, the output is:
3665 /// - fully uninitialized if *any* bit of the input is uninitialized
3666 /// - fully ininitialized if all bits of the input are ininitialized
3667 /// We apply the same principle on a per-field basis for vectors.
3668 Value *Shadow =
3669 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3670
3671 // The return type might have more elements than the input.
3672 // Extend the return type back to its original width if necessary.
3673 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3674
3675 setShadow(&I, FullShadow);
3676 setOriginForNaryOp(I);
3677 }
3678
3679 // Instrument x86 SSE vector convert intrinsic.
3680 //
3681 // This function instruments intrinsics like cvtsi2ss:
3682 // %Out = int_xxx_cvtyyy(%ConvertOp)
3683 // or
3684 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3685 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3686 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3687 // elements from \p CopyOp.
3688 // In most cases conversion involves floating-point value which may trigger a
3689 // hardware exception when not fully initialized. For this reason we require
3690 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3691 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3692 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3693 // return a fully initialized value.
3694 //
3695 // For Arm NEON vector convert intrinsics, see
3696 // handleNEONVectorConvertIntrinsic().
3697 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3698 bool HasRoundingMode = false) {
3699 IRBuilder<> IRB(&I);
3700 Value *CopyOp, *ConvertOp;
3701
3702 assert((!HasRoundingMode ||
3703 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3704 "Invalid rounding mode");
3705
3706 switch (I.arg_size() - HasRoundingMode) {
3707 case 2:
3708 CopyOp = I.getArgOperand(0);
3709 ConvertOp = I.getArgOperand(1);
3710 break;
3711 case 1:
3712 ConvertOp = I.getArgOperand(0);
3713 CopyOp = nullptr;
3714 break;
3715 default:
3716 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3717 }
3718
3719 // The first *NumUsedElements* elements of ConvertOp are converted to the
3720 // same number of output elements. The rest of the output is copied from
3721 // CopyOp, or (if not available) filled with zeroes.
3722 // Combine shadow for elements of ConvertOp that are used in this operation,
3723 // and insert a check.
3724 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3725 // int->any conversion.
3726 Value *ConvertShadow = getShadow(ConvertOp);
3727 Value *AggShadow = nullptr;
3728 if (ConvertOp->getType()->isVectorTy()) {
3729 AggShadow = IRB.CreateExtractElement(
3730 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3731 for (int i = 1; i < NumUsedElements; ++i) {
3732 Value *MoreShadow = IRB.CreateExtractElement(
3733 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3734 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3735 }
3736 } else {
3737 AggShadow = ConvertShadow;
3738 }
3739 assert(AggShadow->getType()->isIntegerTy());
3740 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3741
3742 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3743 // ConvertOp.
3744 if (CopyOp) {
3745 assert(CopyOp->getType() == I.getType());
3746 assert(CopyOp->getType()->isVectorTy());
3747 Value *ResultShadow = getShadow(CopyOp);
3748 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3749 for (int i = 0; i < NumUsedElements; ++i) {
3750 ResultShadow = IRB.CreateInsertElement(
3751 ResultShadow, ConstantInt::getNullValue(EltTy),
3752 ConstantInt::get(IRB.getInt32Ty(), i));
3753 }
3754 setShadow(&I, ResultShadow);
3755 setOrigin(&I, getOrigin(CopyOp));
3756 } else {
3757 setShadow(&I, getCleanShadow(&I));
3758 setOrigin(&I, getCleanOrigin());
3759 }
3760 }
3761
3762 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3763 // zeroes if it is zero, and all ones otherwise.
3764 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3765 if (S->getType()->isVectorTy())
3766 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3767 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3768 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3769 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3770 }
3771
3772 // Given a vector, extract its first element, and return all
3773 // zeroes if it is zero, and all ones otherwise.
3774 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3775 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3776 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3777 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3778 }
3779
3780 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3781 Type *T = S->getType();
3782 assert(T->isVectorTy());
3783 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3784 return IRB.CreateSExt(S2, T);
3785 }
3786
3787 // Instrument vector shift intrinsic.
3788 //
3789 // This function instruments intrinsics like int_x86_avx2_psll_w.
3790 // Intrinsic shifts %In by %ShiftSize bits.
3791 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3792 // size, and the rest is ignored. Behavior is defined even if shift size is
3793 // greater than register (or field) width.
3794 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3795 assert(I.arg_size() == 2);
3796 IRBuilder<> IRB(&I);
3797 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3798 // Otherwise perform the same shift on S1.
3799 Value *S1 = getShadow(&I, 0);
3800 Value *S2 = getShadow(&I, 1);
3801 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3802 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3803 Value *V1 = I.getOperand(0);
3804 Value *V2 = I.getOperand(1);
3805 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3806 {IRB.CreateBitCast(S1, V1->getType()), V2});
3807 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3808 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3809 setOriginForNaryOp(I);
3810 }
3811
3812 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3813 // vectors.
3814 Type *getMMXVectorTy(unsigned EltSizeInBits,
3815 unsigned X86_MMXSizeInBits = 64) {
3816 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3817 "Illegal MMX vector element size");
3818 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3819 X86_MMXSizeInBits / EltSizeInBits);
3820 }
3821
3822 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3823 // intrinsic.
3824 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3825 switch (id) {
3826 case Intrinsic::x86_sse2_packsswb_128:
3827 case Intrinsic::x86_sse2_packuswb_128:
3828 return Intrinsic::x86_sse2_packsswb_128;
3829
3830 case Intrinsic::x86_sse2_packssdw_128:
3831 case Intrinsic::x86_sse41_packusdw:
3832 return Intrinsic::x86_sse2_packssdw_128;
3833
3834 case Intrinsic::x86_avx2_packsswb:
3835 case Intrinsic::x86_avx2_packuswb:
3836 return Intrinsic::x86_avx2_packsswb;
3837
3838 case Intrinsic::x86_avx2_packssdw:
3839 case Intrinsic::x86_avx2_packusdw:
3840 return Intrinsic::x86_avx2_packssdw;
3841
3842 case Intrinsic::x86_mmx_packsswb:
3843 case Intrinsic::x86_mmx_packuswb:
3844 return Intrinsic::x86_mmx_packsswb;
3845
3846 case Intrinsic::x86_mmx_packssdw:
3847 return Intrinsic::x86_mmx_packssdw;
3848
3849 case Intrinsic::x86_avx512_packssdw_512:
3850 case Intrinsic::x86_avx512_packusdw_512:
3851 return Intrinsic::x86_avx512_packssdw_512;
3852
3853 case Intrinsic::x86_avx512_packsswb_512:
3854 case Intrinsic::x86_avx512_packuswb_512:
3855 return Intrinsic::x86_avx512_packsswb_512;
3856
3857 default:
3858 llvm_unreachable("unexpected intrinsic id");
3859 }
3860 }
3861
3862 // Instrument vector pack intrinsic.
3863 //
3864 // This function instruments intrinsics like x86_mmx_packsswb, that
3865 // packs elements of 2 input vectors into half as many bits with saturation.
3866 // Shadow is propagated with the signed variant of the same intrinsic applied
3867 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3868 // MMXEltSizeInBits is used only for x86mmx arguments.
3869 //
3870 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3871 void handleVectorPackIntrinsic(IntrinsicInst &I,
3872 unsigned MMXEltSizeInBits = 0) {
3873 assert(I.arg_size() == 2);
3874 IRBuilder<> IRB(&I);
3875 Value *S1 = getShadow(&I, 0);
3876 Value *S2 = getShadow(&I, 1);
3877 assert(S1->getType()->isVectorTy());
3878
3879 // SExt and ICmpNE below must apply to individual elements of input vectors.
3880 // In case of x86mmx arguments, cast them to appropriate vector types and
3881 // back.
3882 Type *T =
3883 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3884 if (MMXEltSizeInBits) {
3885 S1 = IRB.CreateBitCast(S1, T);
3886 S2 = IRB.CreateBitCast(S2, T);
3887 }
3888 Value *S1_ext =
3890 Value *S2_ext =
3892 if (MMXEltSizeInBits) {
3893 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3894 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3895 }
3896
3897 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3898 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3899 "_msprop_vector_pack");
3900 if (MMXEltSizeInBits)
3901 S = IRB.CreateBitCast(S, getShadowTy(&I));
3902 setShadow(&I, S);
3903 setOriginForNaryOp(I);
3904 }
3905
3906 // Convert `Mask` into `<n x i1>`.
3907 Constant *createDppMask(unsigned Width, unsigned Mask) {
3908 SmallVector<Constant *, 4> R(Width);
3909 for (auto &M : R) {
3910 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3911 Mask >>= 1;
3912 }
3913 return ConstantVector::get(R);
3914 }
3915
3916 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3917 // arg is poisoned, entire dot product is poisoned.
3918 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3919 unsigned DstMask) {
3920 const unsigned Width =
3921 cast<FixedVectorType>(S->getType())->getNumElements();
3922
3923 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3925 Value *SElem = IRB.CreateOrReduce(S);
3926 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3927 Value *DstMaskV = createDppMask(Width, DstMask);
3928
3929 return IRB.CreateSelect(
3930 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3931 }
3932
3933 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3934 //
3935 // 2 and 4 element versions produce single scalar of dot product, and then
3936 // puts it into elements of output vector, selected by 4 lowest bits of the
3937 // mask. Top 4 bits of the mask control which elements of input to use for dot
3938 // product.
3939 //
3940 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3941 // mask. According to the spec it just operates as 4 element version on first
3942 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3943 // output.
3944 void handleDppIntrinsic(IntrinsicInst &I) {
3945 IRBuilder<> IRB(&I);
3946
3947 Value *S0 = getShadow(&I, 0);
3948 Value *S1 = getShadow(&I, 1);
3949 Value *S = IRB.CreateOr(S0, S1);
3950
3951 const unsigned Width =
3952 cast<FixedVectorType>(S->getType())->getNumElements();
3953 assert(Width == 2 || Width == 4 || Width == 8);
3954
3955 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3956 const unsigned SrcMask = Mask >> 4;
3957 const unsigned DstMask = Mask & 0xf;
3958
3959 // Calculate shadow as `<n x i1>`.
3960 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3961 if (Width == 8) {
3962 // First 4 elements of shadow are already calculated. `makeDppShadow`
3963 // operats on 32 bit masks, so we can just shift masks, and repeat.
3964 SI1 = IRB.CreateOr(
3965 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3966 }
3967 // Extend to real size of shadow, poisoning either all or none bits of an
3968 // element.
3969 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3970
3971 setShadow(&I, S);
3972 setOriginForNaryOp(I);
3973 }
3974
3975 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3976 C = CreateAppToShadowCast(IRB, C);
3977 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3978 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3979 C = IRB.CreateAShr(C, ElSize - 1);
3980 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3981 return IRB.CreateTrunc(C, FVT);
3982 }
3983
3984 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3985 void handleBlendvIntrinsic(IntrinsicInst &I) {
3986 Value *C = I.getOperand(2);
3987 Value *T = I.getOperand(1);
3988 Value *F = I.getOperand(0);
3989
3990 Value *Sc = getShadow(&I, 2);
3991 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3992
3993 {
3994 IRBuilder<> IRB(&I);
3995 // Extract top bit from condition and its shadow.
3996 C = convertBlendvToSelectMask(IRB, C);
3997 Sc = convertBlendvToSelectMask(IRB, Sc);
3998
3999 setShadow(C, Sc);
4000 setOrigin(C, Oc);
4001 }
4002
4003 handleSelectLikeInst(I, C, T, F);
4004 }
4005
4006 // Instrument sum-of-absolute-differences intrinsic.
4007 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
4008 const unsigned SignificantBitsPerResultElement = 16;
4009 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
4010 unsigned ZeroBitsPerResultElement =
4011 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
4012
4013 IRBuilder<> IRB(&I);
4014 auto *Shadow0 = getShadow(&I, 0);
4015 auto *Shadow1 = getShadow(&I, 1);
4016 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4017 S = IRB.CreateBitCast(S, ResTy);
4018 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
4019 ResTy);
4020 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
4021 S = IRB.CreateBitCast(S, getShadowTy(&I));
4022 setShadow(&I, S);
4023 setOriginForNaryOp(I);
4024 }
4025
4026 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
4027 //
4028 // e.g., Two operands:
4029 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
4030 //
4031 // Two operands which require an EltSizeInBits override:
4032 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
4033 //
4034 // Three operands:
4035 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
4036 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
4037 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
4038 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
4039 // (these are equivalent to multiply-add on %a and %b, followed by
4040 // adding/"accumulating" %s. "Accumulation" stores the result in one
4041 // of the source registers, but this accumulate vs. add distinction
4042 // is lost when dealing with LLVM intrinsics.)
4043 //
4044 // ZeroPurifies means that multiplying a known-zero with an uninitialized
4045 // value results in an initialized value. This is applicable for integer
4046 // multiplication, but not floating-point (counter-example: NaN).
4047 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
4048 unsigned ReductionFactor,
4049 bool ZeroPurifies,
4050 unsigned EltSizeInBits,
4051 enum OddOrEvenLanes Lanes) {
4052 IRBuilder<> IRB(&I);
4053
4054 [[maybe_unused]] FixedVectorType *ReturnType =
4055 cast<FixedVectorType>(I.getType());
4056 assert(isa<FixedVectorType>(ReturnType));
4057
4058 // Vectors A and B, and shadows
4059 Value *Va = nullptr;
4060 Value *Vb = nullptr;
4061 Value *Sa = nullptr;
4062 Value *Sb = nullptr;
4063
4064 assert(I.arg_size() == 2 || I.arg_size() == 3);
4065 if (I.arg_size() == 2) {
4066 assert(Lanes == kBothLanes);
4067
4068 Va = I.getOperand(0);
4069 Vb = I.getOperand(1);
4070
4071 Sa = getShadow(&I, 0);
4072 Sb = getShadow(&I, 1);
4073 } else if (I.arg_size() == 3) {
4074 // Operand 0 is the accumulator. We will deal with that below.
4075 Va = I.getOperand(1);
4076 Vb = I.getOperand(2);
4077
4078 Sa = getShadow(&I, 1);
4079 Sb = getShadow(&I, 2);
4080
4081 if (Lanes == kEvenLanes || Lanes == kOddLanes) {
4082 // Convert < S0, S1, S2, S3, S4, S5, S6, S7 >
4083 // to < S0, S0, S2, S2, S4, S4, S6, S6 > (if even)
4084 // to < S1, S1, S3, S3, S5, S5, S7, S7 > (if odd)
4085 //
4086 // Note: for aarch64.neon.bfmlalb/t, the odd/even-indexed values are
4087 // zeroed, not duplicated. However, for shadow propagation, this
4088 // distinction is unimportant because Step 1 below will squeeze
4089 // each pair of elements (e.g., [S0, S0]) into a single bit, and
4090 // we only care if it is fully initialized.
4091
4092 FixedVectorType *InputShadowType = cast<FixedVectorType>(Sa->getType());
4093 unsigned Width = InputShadowType->getNumElements();
4094
4095 Sa = IRB.CreateShuffleVector(
4096 Sa, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4097 Sb = IRB.CreateShuffleVector(
4098 Sb, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4099 }
4100 }
4101
4102 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
4103 assert(ParamType == Vb->getType());
4104
4105 assert(ParamType->getPrimitiveSizeInBits() ==
4106 ReturnType->getPrimitiveSizeInBits());
4107
4108 if (I.arg_size() == 3) {
4109 [[maybe_unused]] auto *AccumulatorType =
4110 cast<FixedVectorType>(I.getOperand(0)->getType());
4111 assert(AccumulatorType == ReturnType);
4112 }
4113
4114 FixedVectorType *ImplicitReturnType =
4115 cast<FixedVectorType>(getShadowTy(ReturnType));
4116 // Step 1: instrument multiplication of corresponding vector elements
4117 if (EltSizeInBits) {
4118 ImplicitReturnType = cast<FixedVectorType>(
4119 getMMXVectorTy(EltSizeInBits * ReductionFactor,
4120 ParamType->getPrimitiveSizeInBits()));
4121 ParamType = cast<FixedVectorType>(
4122 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
4123
4124 Va = IRB.CreateBitCast(Va, ParamType);
4125 Vb = IRB.CreateBitCast(Vb, ParamType);
4126
4127 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
4128 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
4129 } else {
4130 assert(ParamType->getNumElements() ==
4131 ReturnType->getNumElements() * ReductionFactor);
4132 }
4133
4134 // Each element of the vector is represented by a single bit (poisoned or
4135 // not) e.g., <8 x i1>.
4136 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
4137 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
4138 Value *And;
4139 if (ZeroPurifies) {
4140 // Multiplying an *initialized* zero by an uninitialized element results
4141 // in an initialized zero element.
4142 //
4143 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4144 // results in an unpoisoned value.
4145 Value *VaInt = Va;
4146 Value *VbInt = Vb;
4147 if (!Va->getType()->isIntegerTy()) {
4148 VaInt = CreateAppToShadowCast(IRB, Va);
4149 VbInt = CreateAppToShadowCast(IRB, Vb);
4150 }
4151
4152 // We check for non-zero on a per-element basis, not per-bit.
4153 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4154 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4155
4156 And = handleBitwiseAnd(IRB, VaNonZero, VbNonZero, SaNonZero, SbNonZero);
4157 } else {
4158 And = IRB.CreateOr({SaNonZero, SbNonZero});
4159 }
4160
4161 // Extend <8 x i1> to <8 x i16>.
4162 // (The real pmadd intrinsic would have computed intermediate values of
4163 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4164 // consider each element to be either fully initialized or fully
4165 // uninitialized.)
4166 And = IRB.CreateSExt(And, Sa->getType());
4167
4168 // Step 2: instrument horizontal add
4169 // We don't need bit-precise horizontalReduce because we only want to check
4170 // if each pair/quad of elements is fully zero.
4171 // Cast to <4 x i32>.
4172 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4173
4174 // Compute <4 x i1>, then extend back to <4 x i32>.
4175 Value *OutShadow = IRB.CreateSExt(
4176 IRB.CreateICmpNE(Horizontal,
4177 Constant::getNullValue(Horizontal->getType())),
4178 ImplicitReturnType);
4179
4180 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4181 // AVX, it is already correct).
4182 if (EltSizeInBits)
4183 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4184
4185 // Step 3 (if applicable): instrument accumulator
4186 if (I.arg_size() == 3)
4187 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4188
4189 setShadow(&I, OutShadow);
4190 setOriginForNaryOp(I);
4191 }
4192
4193 // Instrument compare-packed intrinsic.
4194 //
4195 // x86 has the predicate as the third operand, which is ImmArg e.g.,
4196 // - <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double>, <4 x double>, i8)
4197 // - <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double>, <2 x double>, i8)
4198 //
4199 // while Arm has separate intrinsics for >= and > e.g.,
4200 // - <2 x i32> @llvm.aarch64.neon.facge.v2i32.v2f32
4201 // (<2 x float> %A, <2 x float>)
4202 // - <2 x i32> @llvm.aarch64.neon.facgt.v2i32.v2f32
4203 // (<2 x float> %A, <2 x float>)
4204 //
4205 // Bonus: this also handles scalar cases e.g.,
4206 // - i32 @llvm.aarch64.neon.facgt.i32.f32(float %A, float %B)
4207 void handleVectorComparePackedIntrinsic(IntrinsicInst &I,
4208 bool PredicateAsOperand) {
4209 if (PredicateAsOperand) {
4210 assert(I.arg_size() == 3);
4211 assert(I.paramHasAttr(2, Attribute::ImmArg));
4212 } else
4213 assert(I.arg_size() == 2);
4214
4215 IRBuilder<> IRB(&I);
4216
4217 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4218 // all-ones shadow.
4219 Type *ResTy = getShadowTy(&I);
4220 auto *Shadow0 = getShadow(&I, 0);
4221 auto *Shadow1 = getShadow(&I, 1);
4222 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4223 Value *S = IRB.CreateSExt(
4224 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4225 setShadow(&I, S);
4226 setOriginForNaryOp(I);
4227 }
4228
4229 // Instrument compare-scalar intrinsic.
4230 // This handles both cmp* intrinsics which return the result in the first
4231 // element of a vector, and comi* which return the result as i32.
4232 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4233 IRBuilder<> IRB(&I);
4234 auto *Shadow0 = getShadow(&I, 0);
4235 auto *Shadow1 = getShadow(&I, 1);
4236 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4237 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4238 setShadow(&I, S);
4239 setOriginForNaryOp(I);
4240 }
4241
4242 // Instrument generic vector reduction intrinsics
4243 // by ORing together all their fields.
4244 //
4245 // If AllowShadowCast is true, the return type does not need to be the same
4246 // type as the fields
4247 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4248 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4249 assert(I.arg_size() == 1);
4250
4251 IRBuilder<> IRB(&I);
4252 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4253 if (AllowShadowCast)
4254 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4255 else
4256 assert(S->getType() == getShadowTy(&I));
4257 setShadow(&I, S);
4258 setOriginForNaryOp(I);
4259 }
4260
4261 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4262 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4263 // %a1)
4264 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4265 //
4266 // The type of the return value, initial starting value, and elements of the
4267 // vector must be identical.
4268 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4269 assert(I.arg_size() == 2);
4270
4271 IRBuilder<> IRB(&I);
4272 Value *Shadow0 = getShadow(&I, 0);
4273 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4274 assert(Shadow0->getType() == Shadow1->getType());
4275 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4276 assert(S->getType() == getShadowTy(&I));
4277 setShadow(&I, S);
4278 setOriginForNaryOp(I);
4279 }
4280
4281 // Instrument vector.reduce.or intrinsic.
4282 // Valid (non-poisoned) set bits in the operand pull low the
4283 // corresponding shadow bits.
4284 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4285 assert(I.arg_size() == 1);
4286
4287 IRBuilder<> IRB(&I);
4288 Value *OperandShadow = getShadow(&I, 0);
4289 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4290 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4291 // Bit N is clean if any field's bit N is 1 and unpoison
4292 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4293 // Otherwise, it is clean if every field's bit N is unpoison
4294 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4295 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4296
4297 setShadow(&I, S);
4298 setOrigin(&I, getOrigin(&I, 0));
4299 }
4300
4301 // Instrument vector.reduce.and intrinsic.
4302 // Valid (non-poisoned) unset bits in the operand pull down the
4303 // corresponding shadow bits.
4304 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4305 assert(I.arg_size() == 1);
4306
4307 IRBuilder<> IRB(&I);
4308 Value *OperandShadow = getShadow(&I, 0);
4309 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4310 // Bit N is clean if any field's bit N is 0 and unpoison
4311 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4312 // Otherwise, it is clean if every field's bit N is unpoison
4313 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4314 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4315
4316 setShadow(&I, S);
4317 setOrigin(&I, getOrigin(&I, 0));
4318 }
4319
4320 void handleStmxcsr(IntrinsicInst &I) {
4321 IRBuilder<> IRB(&I);
4322 Value *Addr = I.getArgOperand(0);
4323 Type *Ty = IRB.getInt32Ty();
4324 Value *ShadowPtr =
4325 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4326
4327 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4328
4330 insertCheckShadowOf(Addr, &I);
4331 }
4332
4333 void handleLdmxcsr(IntrinsicInst &I) {
4334 if (!InsertChecks)
4335 return;
4336
4337 IRBuilder<> IRB(&I);
4338 Value *Addr = I.getArgOperand(0);
4339 Type *Ty = IRB.getInt32Ty();
4340 const Align Alignment = Align(1);
4341 Value *ShadowPtr, *OriginPtr;
4342 std::tie(ShadowPtr, OriginPtr) =
4343 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4344
4346 insertCheckShadowOf(Addr, &I);
4347
4348 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4349 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4350 : getCleanOrigin();
4351 insertCheckShadow(Shadow, Origin, &I);
4352 }
4353
4354 void handleMaskedExpandLoad(IntrinsicInst &I) {
4355 IRBuilder<> IRB(&I);
4356 Value *Ptr = I.getArgOperand(0);
4357 MaybeAlign Align = I.getParamAlign(0);
4358 Value *Mask = I.getArgOperand(1);
4359 Value *PassThru = I.getArgOperand(2);
4360
4362 insertCheckShadowOf(Ptr, &I);
4363 insertCheckShadowOf(Mask, &I);
4364 }
4365
4366 if (!PropagateShadow) {
4367 setShadow(&I, getCleanShadow(&I));
4368 setOrigin(&I, getCleanOrigin());
4369 return;
4370 }
4371
4372 Type *ShadowTy = getShadowTy(&I);
4373 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4374 auto [ShadowPtr, OriginPtr] =
4375 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4376
4377 Value *Shadow =
4378 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4379 getShadow(PassThru), "_msmaskedexpload");
4380
4381 setShadow(&I, Shadow);
4382
4383 // TODO: Store origins.
4384 setOrigin(&I, getCleanOrigin());
4385 }
4386
4387 void handleMaskedCompressStore(IntrinsicInst &I) {
4388 IRBuilder<> IRB(&I);
4389 Value *Values = I.getArgOperand(0);
4390 Value *Ptr = I.getArgOperand(1);
4391 MaybeAlign Align = I.getParamAlign(1);
4392 Value *Mask = I.getArgOperand(2);
4393
4395 insertCheckShadowOf(Ptr, &I);
4396 insertCheckShadowOf(Mask, &I);
4397 }
4398
4399 Value *Shadow = getShadow(Values);
4400 Type *ElementShadowTy =
4401 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4402 auto [ShadowPtr, OriginPtrs] =
4403 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4404
4405 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4406
4407 // TODO: Store origins.
4408 }
4409
4410 void handleMaskedGather(IntrinsicInst &I) {
4411 IRBuilder<> IRB(&I);
4412 Value *Ptrs = I.getArgOperand(0);
4413 const Align Alignment = I.getParamAlign(0).valueOrOne();
4414 Value *Mask = I.getArgOperand(1);
4415 Value *PassThru = I.getArgOperand(2);
4416
4417 Type *PtrsShadowTy = getShadowTy(Ptrs);
4419 insertCheckShadowOf(Mask, &I);
4420 Value *MaskedPtrShadow = IRB.CreateSelect(
4421 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4422 "_msmaskedptrs");
4423 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4424 }
4425
4426 if (!PropagateShadow) {
4427 setShadow(&I, getCleanShadow(&I));
4428 setOrigin(&I, getCleanOrigin());
4429 return;
4430 }
4431
4432 Type *ShadowTy = getShadowTy(&I);
4433 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4434 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4435 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4436
4437 Value *Shadow =
4438 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4439 getShadow(PassThru), "_msmaskedgather");
4440
4441 setShadow(&I, Shadow);
4442
4443 // TODO: Store origins.
4444 setOrigin(&I, getCleanOrigin());
4445 }
4446
4447 void handleMaskedScatter(IntrinsicInst &I) {
4448 IRBuilder<> IRB(&I);
4449 Value *Values = I.getArgOperand(0);
4450 Value *Ptrs = I.getArgOperand(1);
4451 const Align Alignment = I.getParamAlign(1).valueOrOne();
4452 Value *Mask = I.getArgOperand(2);
4453
4454 Type *PtrsShadowTy = getShadowTy(Ptrs);
4456 insertCheckShadowOf(Mask, &I);
4457 Value *MaskedPtrShadow = IRB.CreateSelect(
4458 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4459 "_msmaskedptrs");
4460 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4461 }
4462
4463 Value *Shadow = getShadow(Values);
4464 Type *ElementShadowTy =
4465 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4466 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4467 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4468
4469 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4470
4471 // TODO: Store origin.
4472 }
4473
4474 // Intrinsic::masked_store
4475 //
4476 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4477 // stores are lowered to Intrinsic::masked_store.
4478 void handleMaskedStore(IntrinsicInst &I) {
4479 IRBuilder<> IRB(&I);
4480 Value *V = I.getArgOperand(0);
4481 Value *Ptr = I.getArgOperand(1);
4482 const Align Alignment = I.getParamAlign(1).valueOrOne();
4483 Value *Mask = I.getArgOperand(2);
4484 Value *Shadow = getShadow(V);
4485
4487 insertCheckShadowOf(Ptr, &I);
4488 insertCheckShadowOf(Mask, &I);
4489 }
4490
4491 Value *ShadowPtr;
4492 Value *OriginPtr;
4493 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4494 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4495
4496 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4497
4498 if (!MS.TrackOrigins)
4499 return;
4500
4501 auto &DL = F.getDataLayout();
4502 paintOrigin(IRB, getOrigin(V), OriginPtr,
4503 DL.getTypeStoreSize(Shadow->getType()),
4504 std::max(Alignment, kMinOriginAlignment));
4505 }
4506
4507 // Intrinsic::masked_load
4508 //
4509 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4510 // loads are lowered to Intrinsic::masked_load.
4511 void handleMaskedLoad(IntrinsicInst &I) {
4512 IRBuilder<> IRB(&I);
4513 Value *Ptr = I.getArgOperand(0);
4514 const Align Alignment = I.getParamAlign(0).valueOrOne();
4515 Value *Mask = I.getArgOperand(1);
4516 Value *PassThru = I.getArgOperand(2);
4517
4519 insertCheckShadowOf(Ptr, &I);
4520 insertCheckShadowOf(Mask, &I);
4521 }
4522
4523 if (!PropagateShadow) {
4524 setShadow(&I, getCleanShadow(&I));
4525 setOrigin(&I, getCleanOrigin());
4526 return;
4527 }
4528
4529 Type *ShadowTy = getShadowTy(&I);
4530 Value *ShadowPtr, *OriginPtr;
4531 std::tie(ShadowPtr, OriginPtr) =
4532 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4533 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4534 getShadow(PassThru), "_msmaskedld"));
4535
4536 if (!MS.TrackOrigins)
4537 return;
4538
4539 // Choose between PassThru's and the loaded value's origins.
4540 Value *MaskedPassThruShadow = IRB.CreateAnd(
4541 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4542
4543 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4544
4545 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4546 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4547
4548 setOrigin(&I, Origin);
4549 }
4550
4551 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4552 // dst mask src
4553 //
4554 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4555 // by handleMaskedStore.
4556 //
4557 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4558 // vector of integers, unlike the LLVM masked intrinsics, which require a
4559 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4560 // mentions that the x86 backend does not know how to efficiently convert
4561 // from a vector of booleans back into the AVX mask format; therefore, they
4562 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4563 // intrinsics.
4564 void handleAVXMaskedStore(IntrinsicInst &I) {
4565 assert(I.arg_size() == 3);
4566
4567 IRBuilder<> IRB(&I);
4568
4569 Value *Dst = I.getArgOperand(0);
4570 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4571
4572 Value *Mask = I.getArgOperand(1);
4573 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4574
4575 Value *Src = I.getArgOperand(2);
4576 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4577
4578 const Align Alignment = Align(1);
4579
4580 Value *SrcShadow = getShadow(Src);
4581
4583 insertCheckShadowOf(Dst, &I);
4584 insertCheckShadowOf(Mask, &I);
4585 }
4586
4587 Value *DstShadowPtr;
4588 Value *DstOriginPtr;
4589 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4590 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4591
4592 SmallVector<Value *, 2> ShadowArgs;
4593 ShadowArgs.append(1, DstShadowPtr);
4594 ShadowArgs.append(1, Mask);
4595 // The intrinsic may require floating-point but shadows can be arbitrary
4596 // bit patterns, of which some would be interpreted as "invalid"
4597 // floating-point values (NaN etc.); we assume the intrinsic will happily
4598 // copy them.
4599 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4600
4601 CallInst *CI =
4602 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4603 setShadow(&I, CI);
4604
4605 if (!MS.TrackOrigins)
4606 return;
4607
4608 // Approximation only
4609 auto &DL = F.getDataLayout();
4610 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4611 DL.getTypeStoreSize(SrcShadow->getType()),
4612 std::max(Alignment, kMinOriginAlignment));
4613 }
4614
4615 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4616 // return src mask
4617 //
4618 // Masked-off values are replaced with 0, which conveniently also represents
4619 // initialized memory.
4620 //
4621 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4622 // by handleMaskedStore.
4623 //
4624 // We do not combine this with handleMaskedLoad; see comment in
4625 // handleAVXMaskedStore for the rationale.
4626 //
4627 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4628 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4629 // parameter.
4630 void handleAVXMaskedLoad(IntrinsicInst &I) {
4631 assert(I.arg_size() == 2);
4632
4633 IRBuilder<> IRB(&I);
4634
4635 Value *Src = I.getArgOperand(0);
4636 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4637
4638 Value *Mask = I.getArgOperand(1);
4639 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4640
4641 const Align Alignment = Align(1);
4642
4644 insertCheckShadowOf(Mask, &I);
4645 }
4646
4647 Type *SrcShadowTy = getShadowTy(Src);
4648 Value *SrcShadowPtr, *SrcOriginPtr;
4649 std::tie(SrcShadowPtr, SrcOriginPtr) =
4650 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4651
4652 SmallVector<Value *, 2> ShadowArgs;
4653 ShadowArgs.append(1, SrcShadowPtr);
4654 ShadowArgs.append(1, Mask);
4655
4656 CallInst *CI =
4657 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4658 // The AVX masked load intrinsics do not have integer variants. We use the
4659 // floating-point variants, which will happily copy the shadows even if
4660 // they are interpreted as "invalid" floating-point values (NaN etc.).
4661 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4662
4663 if (!MS.TrackOrigins)
4664 return;
4665
4666 // The "pass-through" value is always zero (initialized). To the extent
4667 // that that results in initialized aligned 4-byte chunks, the origin value
4668 // is ignored. It is therefore correct to simply copy the origin from src.
4669 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4670 setOrigin(&I, PtrSrcOrigin);
4671 }
4672
4673 // Test whether the mask indices are initialized, only checking the bits that
4674 // are actually used.
4675 //
4676 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4677 // used/checked.
4678 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4679 assert(isFixedIntVector(Idx));
4680 auto IdxVectorSize =
4681 cast<FixedVectorType>(Idx->getType())->getNumElements();
4682 assert(isPowerOf2_64(IdxVectorSize));
4683
4684 // Compiler isn't smart enough, let's help it
4685 if (isa<Constant>(Idx))
4686 return;
4687
4688 auto *IdxShadow = getShadow(Idx);
4689 Value *Truncated = IRB.CreateTrunc(
4690 IdxShadow,
4691 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4692 IdxVectorSize));
4693 insertCheckShadow(Truncated, getOrigin(Idx), I);
4694 }
4695
4696 // Instrument AVX permutation intrinsic.
4697 // We apply the same permutation (argument index 1) to the shadow.
4698 void handleAVXVpermilvar(IntrinsicInst &I) {
4699 IRBuilder<> IRB(&I);
4700 Value *Shadow = getShadow(&I, 0);
4701 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4702
4703 // Shadows are integer-ish types but some intrinsics require a
4704 // different (e.g., floating-point) type.
4705 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4706 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4707 {Shadow, I.getArgOperand(1)});
4708
4709 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4710 setOriginForNaryOp(I);
4711 }
4712
4713 // Instrument AVX permutation intrinsic.
4714 // We apply the same permutation (argument index 1) to the shadows.
4715 void handleAVXVpermi2var(IntrinsicInst &I) {
4716 assert(I.arg_size() == 3);
4717 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4718 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4719 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4720 [[maybe_unused]] auto ArgVectorSize =
4721 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4722 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4723 ->getNumElements() == ArgVectorSize);
4724 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4725 ->getNumElements() == ArgVectorSize);
4726 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4727 assert(I.getType() == I.getArgOperand(0)->getType());
4728 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4729 IRBuilder<> IRB(&I);
4730 Value *AShadow = getShadow(&I, 0);
4731 Value *Idx = I.getArgOperand(1);
4732 Value *BShadow = getShadow(&I, 2);
4733
4734 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4735
4736 // Shadows are integer-ish types but some intrinsics require a
4737 // different (e.g., floating-point) type.
4738 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4739 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4740 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4741 {AShadow, Idx, BShadow});
4742 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4743 setOriginForNaryOp(I);
4744 }
4745
4746 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4747 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4748 }
4749
4750 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4751 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4752 }
4753
4754 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4755 return isFixedIntVectorTy(V->getType());
4756 }
4757
4758 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4759 return isFixedFPVectorTy(V->getType());
4760 }
4761
4762 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4763 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4764 // i32 rounding)
4765 //
4766 // Inconveniently, some similar intrinsics have a different operand order:
4767 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4768 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4769 // i16 mask)
4770 //
4771 // If the return type has more elements than A, the excess elements are
4772 // zeroed (and the corresponding shadow is initialized).
4773 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4774 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4775 // i8 mask)
4776 //
4777 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4778 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4779 // where all_or_nothing(x) is fully uninitialized if x has any
4780 // uninitialized bits
4781 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4782 IRBuilder<> IRB(&I);
4783
4784 assert(I.arg_size() == 4);
4785 Value *A = I.getOperand(0);
4786 Value *WriteThrough;
4787 Value *Mask;
4789 if (LastMask) {
4790 WriteThrough = I.getOperand(2);
4791 Mask = I.getOperand(3);
4792 RoundingMode = I.getOperand(1);
4793 } else {
4794 WriteThrough = I.getOperand(1);
4795 Mask = I.getOperand(2);
4796 RoundingMode = I.getOperand(3);
4797 }
4798
4799 assert(isFixedFPVector(A));
4800 assert(isFixedIntVector(WriteThrough));
4801
4802 unsigned ANumElements =
4803 cast<FixedVectorType>(A->getType())->getNumElements();
4804 [[maybe_unused]] unsigned WriteThruNumElements =
4805 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4806 assert(ANumElements == WriteThruNumElements ||
4807 ANumElements * 2 == WriteThruNumElements);
4808
4809 assert(Mask->getType()->isIntegerTy());
4810 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4811 assert(ANumElements == MaskNumElements ||
4812 ANumElements * 2 == MaskNumElements);
4813
4814 assert(WriteThruNumElements == MaskNumElements);
4815
4816 // Some bits of the mask may be unused, though it's unusual to have partly
4817 // uninitialized bits.
4818 insertCheckShadowOf(Mask, &I);
4819
4820 assert(RoundingMode->getType()->isIntegerTy());
4821 // Only some bits of the rounding mode are used, though it's very
4822 // unusual to have uninitialized bits there (more commonly, it's a
4823 // constant).
4824 insertCheckShadowOf(RoundingMode, &I);
4825
4826 assert(I.getType() == WriteThrough->getType());
4827
4828 Value *AShadow = getShadow(A);
4829 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4830
4831 if (ANumElements * 2 == MaskNumElements) {
4832 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4833 // from the zeroed shadow instead of the writethrough's shadow.
4834 Mask =
4835 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4836 Mask =
4837 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4838 }
4839
4840 // Convert i16 mask to <16 x i1>
4841 Mask = IRB.CreateBitCast(
4842 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4843 "_ms_mask_bitcast");
4844
4845 /// For floating-point to integer conversion, the output is:
4846 /// - fully uninitialized if *any* bit of the input is uninitialized
4847 /// - fully ininitialized if all bits of the input are ininitialized
4848 /// We apply the same principle on a per-element basis for vectors.
4849 ///
4850 /// We use the scalar width of the return type instead of A's.
4851 AShadow = IRB.CreateSExt(
4852 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4853 getShadowTy(&I), "_ms_a_shadow");
4854
4855 Value *WriteThroughShadow = getShadow(WriteThrough);
4856 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4857 "_ms_writethru_select");
4858
4859 setShadow(&I, Shadow);
4860 setOriginForNaryOp(I);
4861 }
4862
4863 // Instrument BMI / BMI2 intrinsics.
4864 // All of these intrinsics are Z = I(X, Y)
4865 // where the types of all operands and the result match, and are either i32 or
4866 // i64. The following instrumentation happens to work for all of them:
4867 // Sz = I(Sx, Y) | (sext (Sy != 0))
4868 void handleBmiIntrinsic(IntrinsicInst &I) {
4869 IRBuilder<> IRB(&I);
4870 Type *ShadowTy = getShadowTy(&I);
4871
4872 // If any bit of the mask operand is poisoned, then the whole thing is.
4873 Value *SMask = getShadow(&I, 1);
4874 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4875 ShadowTy);
4876 // Apply the same intrinsic to the shadow of the first operand.
4877 Value *S = IRB.CreateCall(I.getCalledFunction(),
4878 {getShadow(&I, 0), I.getOperand(1)});
4879 S = IRB.CreateOr(SMask, S);
4880 setShadow(&I, S);
4881 setOriginForNaryOp(I);
4882 }
4883
4884 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4885 SmallVector<int, 8> Mask;
4886 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4887 Mask.append(2, X);
4888 }
4889 return Mask;
4890 }
4891
4892 // Instrument pclmul intrinsics.
4893 // These intrinsics operate either on odd or on even elements of the input
4894 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4895 // Replace the unused elements with copies of the used ones, ex:
4896 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4897 // or
4898 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4899 // and then apply the usual shadow combining logic.
4900 void handlePclmulIntrinsic(IntrinsicInst &I) {
4901 IRBuilder<> IRB(&I);
4902 unsigned Width =
4903 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4904 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4905 "pclmul 3rd operand must be a constant");
4906 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4907 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4908 getPclmulMask(Width, Imm & 0x01));
4909 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4910 getPclmulMask(Width, Imm & 0x10));
4911 ShadowAndOriginCombiner SOC(this, IRB);
4912 SOC.Add(Shuf0, getOrigin(&I, 0));
4913 SOC.Add(Shuf1, getOrigin(&I, 1));
4914 SOC.Done(&I);
4915 }
4916
4917 // Instrument _mm_*_sd|ss intrinsics
4918 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4919 IRBuilder<> IRB(&I);
4920 unsigned Width =
4921 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4922 Value *First = getShadow(&I, 0);
4923 Value *Second = getShadow(&I, 1);
4924 // First element of second operand, remaining elements of first operand
4925 SmallVector<int, 16> Mask;
4926 Mask.push_back(Width);
4927 for (unsigned i = 1; i < Width; i++)
4928 Mask.push_back(i);
4929 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4930
4931 setShadow(&I, Shadow);
4932 setOriginForNaryOp(I);
4933 }
4934
4935 void handleVtestIntrinsic(IntrinsicInst &I) {
4936 IRBuilder<> IRB(&I);
4937 Value *Shadow0 = getShadow(&I, 0);
4938 Value *Shadow1 = getShadow(&I, 1);
4939 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4940 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4941 Value *Scalar = convertShadowToScalar(NZ, IRB);
4942 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4943
4944 setShadow(&I, Shadow);
4945 setOriginForNaryOp(I);
4946 }
4947
4948 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4949 IRBuilder<> IRB(&I);
4950 unsigned Width =
4951 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4952 Value *First = getShadow(&I, 0);
4953 Value *Second = getShadow(&I, 1);
4954 Value *OrShadow = IRB.CreateOr(First, Second);
4955 // First element of both OR'd together, remaining elements of first operand
4956 SmallVector<int, 16> Mask;
4957 Mask.push_back(Width);
4958 for (unsigned i = 1; i < Width; i++)
4959 Mask.push_back(i);
4960 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4961
4962 setShadow(&I, Shadow);
4963 setOriginForNaryOp(I);
4964 }
4965
4966 // _mm_round_ps / _mm_round_ps.
4967 // Similar to maybeHandleSimpleNomemIntrinsic except
4968 // the second argument is guaranteed to be a constant integer.
4969 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4970 assert(I.getArgOperand(0)->getType() == I.getType());
4971 assert(I.arg_size() == 2);
4972 assert(isa<ConstantInt>(I.getArgOperand(1)));
4973
4974 IRBuilder<> IRB(&I);
4975 ShadowAndOriginCombiner SC(this, IRB);
4976 SC.Add(I.getArgOperand(0));
4977 SC.Done(&I);
4978 }
4979
4980 // Instrument @llvm.abs intrinsic.
4981 //
4982 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4983 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4984 void handleAbsIntrinsic(IntrinsicInst &I) {
4985 assert(I.arg_size() == 2);
4986 Value *Src = I.getArgOperand(0);
4987 Value *IsIntMinPoison = I.getArgOperand(1);
4988
4989 assert(I.getType()->isIntOrIntVectorTy());
4990
4991 assert(Src->getType() == I.getType());
4992
4993 assert(IsIntMinPoison->getType()->isIntegerTy());
4994 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4995
4996 IRBuilder<> IRB(&I);
4997 Value *SrcShadow = getShadow(Src);
4998
4999 APInt MinVal =
5000 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
5001 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
5002 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
5003
5004 Value *PoisonedShadow = getPoisonedShadow(Src);
5005 Value *PoisonedIfIntMinShadow =
5006 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
5007 Value *Shadow =
5008 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
5009
5010 setShadow(&I, Shadow);
5011 setOrigin(&I, getOrigin(&I, 0));
5012 }
5013
5014 void handleIsFpClass(IntrinsicInst &I) {
5015 IRBuilder<> IRB(&I);
5016 Value *Shadow = getShadow(&I, 0);
5017 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
5018 setOrigin(&I, getOrigin(&I, 0));
5019 }
5020
5021 void handleArithmeticWithOverflow(IntrinsicInst &I) {
5022 IRBuilder<> IRB(&I);
5023 Value *Shadow0 = getShadow(&I, 0);
5024 Value *Shadow1 = getShadow(&I, 1);
5025 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
5026 Value *ShadowElt1 =
5027 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
5028
5029 Value *Shadow = PoisonValue::get(getShadowTy(&I));
5030 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
5031 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
5032
5033 setShadow(&I, Shadow);
5034 setOriginForNaryOp(I);
5035 }
5036
5037 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
5038 assert(isa<FixedVectorType>(V->getType()));
5039 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
5040 Value *Shadow = getShadow(V);
5041 return IRB.CreateExtractElement(Shadow,
5042 ConstantInt::get(IRB.getInt32Ty(), 0));
5043 }
5044
5045 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
5046 //
5047 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
5048 // (<8 x i64>, <16 x i8>, i8)
5049 // A WriteThru Mask
5050 //
5051 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
5052 // (<16 x i32>, <16 x i8>, i16)
5053 //
5054 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
5055 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
5056 //
5057 // If Dst has more elements than A, the excess elements are zeroed (and the
5058 // corresponding shadow is initialized).
5059 //
5060 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
5061 // and is much faster than this handler.
5062 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
5063 IRBuilder<> IRB(&I);
5064
5065 assert(I.arg_size() == 3);
5066 Value *A = I.getOperand(0);
5067 Value *WriteThrough = I.getOperand(1);
5068 Value *Mask = I.getOperand(2);
5069
5070 assert(isFixedIntVector(A));
5071 assert(isFixedIntVector(WriteThrough));
5072
5073 unsigned ANumElements =
5074 cast<FixedVectorType>(A->getType())->getNumElements();
5075 unsigned OutputNumElements =
5076 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
5077 assert(ANumElements == OutputNumElements ||
5078 ANumElements * 2 == OutputNumElements);
5079
5080 assert(Mask->getType()->isIntegerTy());
5081 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5082 insertCheckShadowOf(Mask, &I);
5083
5084 assert(I.getType() == WriteThrough->getType());
5085
5086 // Widen the mask, if necessary, to have one bit per element of the output
5087 // vector.
5088 // We want the extra bits to have '1's, so that the CreateSelect will
5089 // select the values from AShadow instead of WriteThroughShadow ("maskless"
5090 // versions of the intrinsics are sometimes implemented using an all-1's
5091 // mask and an undefined value for WriteThroughShadow). We accomplish this
5092 // by using bitwise NOT before and after the ZExt.
5093 if (ANumElements != OutputNumElements) {
5094 Mask = IRB.CreateNot(Mask);
5095 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
5096 "_ms_widen_mask");
5097 Mask = IRB.CreateNot(Mask);
5098 }
5099 Mask = IRB.CreateBitCast(
5100 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5101
5102 Value *AShadow = getShadow(A);
5103
5104 // The return type might have more elements than the input.
5105 // Temporarily shrink the return type's number of elements.
5106 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
5107
5108 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
5109 // This handler treats them all as truncation, which leads to some rare
5110 // false positives in the cases where the truncated bytes could
5111 // unambiguously saturate the value e.g., if A = ??????10 ????????
5112 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
5113 // fully defined, but the truncated byte is ????????.
5114 //
5115 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
5116 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
5117 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
5118
5119 Value *WriteThroughShadow = getShadow(WriteThrough);
5120
5121 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
5122 setShadow(&I, Shadow);
5123 setOriginForNaryOp(I);
5124 }
5125
5126 // Handle llvm.x86.avx512.* instructions that take vector(s) of floating-point
5127 // values and perform an operation whose shadow propagation should be handled
5128 // as all-or-nothing [*], with masking provided by a vector and a mask
5129 // supplied as an integer.
5130 //
5131 // [*] if all bits of a vector element are initialized, the output is fully
5132 // initialized; otherwise, the output is fully uninitialized
5133 //
5134 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5135 // (<16 x float>, <16 x float>, i16)
5136 // A WriteThru Mask
5137 //
5138 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5139 // (<2 x double>, <2 x double>, i8)
5140 // A WriteThru Mask
5141 //
5142 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5143 // (<8 x double>, i32, <8 x double>, i8, i32)
5144 // A Imm WriteThru Mask Rounding
5145 //
5146 // <16 x float> @llvm.x86.avx512.mask.scalef.ps.512
5147 // (<16 x float>, <16 x float>, <16 x float>, i16, i32)
5148 // WriteThru A B Mask Rnd
5149 //
5150 // All operands other than A, B, ..., and WriteThru (e.g., Mask, Imm,
5151 // Rounding) must be fully initialized.
5152 //
5153 // Dst[i] = Mask[i] ? some_op(A[i], B[i], ...)
5154 // : WriteThru[i]
5155 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i] | B_shadow[i] | ...)
5156 // : WriteThru_shadow[i]
5157 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I,
5158 SmallVector<unsigned, 4> DataIndices,
5159 unsigned WriteThruIndex,
5160 unsigned MaskIndex) {
5161 IRBuilder<> IRB(&I);
5162
5163 unsigned NumArgs = I.arg_size();
5164
5165 assert(WriteThruIndex < NumArgs);
5166 assert(MaskIndex < NumArgs);
5167 assert(WriteThruIndex != MaskIndex);
5168 Value *WriteThru = I.getOperand(WriteThruIndex);
5169
5170 unsigned OutputNumElements =
5171 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5172
5173 assert(DataIndices.size() > 0);
5174
5175 bool isData[16] = {false};
5176 assert(NumArgs <= 16);
5177 for (unsigned i : DataIndices) {
5178 assert(i < NumArgs);
5179 assert(i != WriteThruIndex);
5180 assert(i != MaskIndex);
5181
5182 isData[i] = true;
5183
5184 Value *A = I.getOperand(i);
5185 assert(isFixedFPVector(A));
5186 [[maybe_unused]] unsigned ANumElements =
5187 cast<FixedVectorType>(A->getType())->getNumElements();
5188 assert(ANumElements == OutputNumElements);
5189 }
5190
5191 Value *Mask = I.getOperand(MaskIndex);
5192
5193 assert(isFixedFPVector(WriteThru));
5194
5195 for (unsigned i = 0; i < NumArgs; ++i) {
5196 if (!isData[i] && i != WriteThruIndex) {
5197 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5198 // they be fully initialized.
5199 assert(I.getOperand(i)->getType()->isIntegerTy());
5200 insertCheckShadowOf(I.getOperand(i), &I);
5201 }
5202 }
5203
5204 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5205 if (Mask->getType()->getScalarSizeInBits() == 8 && OutputNumElements < 8)
5206 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, OutputNumElements));
5207 assert(Mask->getType()->getScalarSizeInBits() == OutputNumElements);
5208
5209 assert(I.getType() == WriteThru->getType());
5210
5211 Mask = IRB.CreateBitCast(
5212 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5213
5214 Value *DataShadow = nullptr;
5215 for (unsigned i : DataIndices) {
5216 Value *A = I.getOperand(i);
5217 if (DataShadow)
5218 DataShadow = IRB.CreateOr(DataShadow, getShadow(A));
5219 else
5220 DataShadow = getShadow(A);
5221 }
5222
5223 // All-or-nothing shadow
5224 DataShadow =
5225 IRB.CreateSExt(IRB.CreateICmpNE(DataShadow, getCleanShadow(DataShadow)),
5226 DataShadow->getType());
5227
5228 Value *WriteThruShadow = getShadow(WriteThru);
5229
5230 Value *Shadow = IRB.CreateSelect(Mask, DataShadow, WriteThruShadow);
5231 setShadow(&I, Shadow);
5232
5233 setOriginForNaryOp(I);
5234 }
5235
5236 // For sh.* compiler intrinsics:
5237 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5238 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5239 // A B WriteThru Mask RoundingMode
5240 //
5241 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5242 // DstShadow[1..7] = AShadow[1..7]
5243 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5244 IRBuilder<> IRB(&I);
5245
5246 assert(I.arg_size() == 5);
5247 Value *A = I.getOperand(0);
5248 Value *B = I.getOperand(1);
5249 Value *WriteThrough = I.getOperand(2);
5250 Value *Mask = I.getOperand(3);
5251 Value *RoundingMode = I.getOperand(4);
5252
5253 // Technically, we could probably just check whether the LSB is
5254 // initialized, but intuitively it feels like a partly uninitialized mask
5255 // is unintended, and we should warn the user immediately.
5256 insertCheckShadowOf(Mask, &I);
5257 insertCheckShadowOf(RoundingMode, &I);
5258
5259 assert(isa<FixedVectorType>(A->getType()));
5260 unsigned NumElements =
5261 cast<FixedVectorType>(A->getType())->getNumElements();
5262 assert(NumElements == 8);
5263 assert(A->getType() == B->getType());
5264 assert(B->getType() == WriteThrough->getType());
5265 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5266 assert(RoundingMode->getType()->isIntegerTy());
5267
5268 Value *ALowerShadow = extractLowerShadow(IRB, A);
5269 Value *BLowerShadow = extractLowerShadow(IRB, B);
5270
5271 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5272
5273 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5274
5275 Mask = IRB.CreateBitCast(
5276 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5277 Value *MaskLower =
5278 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5279
5280 Value *AShadow = getShadow(A);
5281 Value *DstLowerShadow =
5282 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5283 Value *DstShadow = IRB.CreateInsertElement(
5284 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5285 "_msprop");
5286
5287 setShadow(&I, DstShadow);
5288 setOriginForNaryOp(I);
5289 }
5290
5291 // Approximately handle AVX Galois Field Affine Transformation
5292 //
5293 // e.g.,
5294 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5295 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5296 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5297 // Out A x b
5298 // where A and x are packed matrices, b is a vector,
5299 // Out = A * x + b in GF(2)
5300 //
5301 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5302 // computation also includes a parity calculation.
5303 //
5304 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5305 // Out_Shadow = (V1_Shadow & V2_Shadow)
5306 // | (V1 & V2_Shadow)
5307 // | (V1_Shadow & V2 )
5308 //
5309 // We approximate the shadow of gf2p8affineqb using:
5310 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5311 // | gf2p8affineqb(x, A_shadow, 0)
5312 // | gf2p8affineqb(x_Shadow, A, 0)
5313 // | set1_epi8(b_Shadow)
5314 //
5315 // This approximation has false negatives: if an intermediate dot-product
5316 // contains an even number of 1's, the parity is 0.
5317 // It has no false positives.
5318 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5319 IRBuilder<> IRB(&I);
5320
5321 assert(I.arg_size() == 3);
5322 Value *A = I.getOperand(0);
5323 Value *X = I.getOperand(1);
5324 Value *B = I.getOperand(2);
5325
5326 assert(isFixedIntVector(A));
5327 assert(cast<VectorType>(A->getType())
5328 ->getElementType()
5329 ->getScalarSizeInBits() == 8);
5330
5331 assert(A->getType() == X->getType());
5332
5333 assert(B->getType()->isIntegerTy());
5334 assert(B->getType()->getScalarSizeInBits() == 8);
5335
5336 assert(I.getType() == A->getType());
5337
5338 Value *AShadow = getShadow(A);
5339 Value *XShadow = getShadow(X);
5340 Value *BZeroShadow = getCleanShadow(B);
5341
5342 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5343 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5344 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5345 {X, AShadow, BZeroShadow});
5346 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5347 {XShadow, A, BZeroShadow});
5348
5349 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5350 Value *BShadow = getShadow(B);
5351 Value *BBroadcastShadow = getCleanShadow(AShadow);
5352 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5353 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5354 // lower appropriately (e.g., VPBROADCASTB).
5355 // Besides, b is often a constant, in which case it is fully initialized.
5356 for (unsigned i = 0; i < NumElements; i++)
5357 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5358
5359 setShadow(&I, IRB.CreateOr(
5360 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5361 setOriginForNaryOp(I);
5362 }
5363
5364 // Handle Arm NEON vector load intrinsics (vld*).
5365 //
5366 // The WithLane instructions (ld[234]lane) are similar to:
5367 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5368 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5369 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5370 // %A)
5371 //
5372 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5373 // to:
5374 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5375 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5376 unsigned int numArgs = I.arg_size();
5377
5378 // Return type is a struct of vectors of integers or floating-point
5379 assert(I.getType()->isStructTy());
5380 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5381 assert(RetTy->getNumElements() > 0);
5383 RetTy->getElementType(0)->isFPOrFPVectorTy());
5384 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5385 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5386
5387 if (WithLane) {
5388 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5389 assert(4 <= numArgs && numArgs <= 6);
5390
5391 // Return type is a struct of the input vectors
5392 assert(RetTy->getNumElements() + 2 == numArgs);
5393 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5394 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5395 } else {
5396 assert(numArgs == 1);
5397 }
5398
5399 IRBuilder<> IRB(&I);
5400
5401 SmallVector<Value *, 6> ShadowArgs;
5402 if (WithLane) {
5403 for (unsigned int i = 0; i < numArgs - 2; i++)
5404 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5405
5406 // Lane number, passed verbatim
5407 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5408 ShadowArgs.push_back(LaneNumber);
5409
5410 // TODO: blend shadow of lane number into output shadow?
5411 insertCheckShadowOf(LaneNumber, &I);
5412 }
5413
5414 Value *Src = I.getArgOperand(numArgs - 1);
5415 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5416
5417 Type *SrcShadowTy = getShadowTy(Src);
5418 auto [SrcShadowPtr, SrcOriginPtr] =
5419 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5420 ShadowArgs.push_back(SrcShadowPtr);
5421
5422 // The NEON vector load instructions handled by this function all have
5423 // integer variants. It is easier to use those rather than trying to cast
5424 // a struct of vectors of floats into a struct of vectors of integers.
5425 CallInst *CI =
5426 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5427 setShadow(&I, CI);
5428
5429 if (!MS.TrackOrigins)
5430 return;
5431
5432 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5433 setOrigin(&I, PtrSrcOrigin);
5434 }
5435
5436 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5437 /// and vst{2,3,4}lane).
5438 ///
5439 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5440 /// last argument, with the initial arguments being the inputs (and lane
5441 /// number for vst{2,3,4}lane). They return void.
5442 ///
5443 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5444 /// abcdabcdabcdabcd... into *outP
5445 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5446 /// writes aaaa...bbbb...cccc...dddd... into *outP
5447 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5448 /// These instructions can all be instrumented with essentially the same
5449 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5450 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5451 IRBuilder<> IRB(&I);
5452
5453 // Don't use getNumOperands() because it includes the callee
5454 int numArgOperands = I.arg_size();
5455
5456 // The last arg operand is the output (pointer)
5457 assert(numArgOperands >= 1);
5458 Value *Addr = I.getArgOperand(numArgOperands - 1);
5459 assert(Addr->getType()->isPointerTy());
5460 int skipTrailingOperands = 1;
5461
5463 insertCheckShadowOf(Addr, &I);
5464
5465 // Second-last operand is the lane number (for vst{2,3,4}lane)
5466 if (useLane) {
5467 skipTrailingOperands++;
5468 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5470 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5471 }
5472
5473 SmallVector<Value *, 8> ShadowArgs;
5474 // All the initial operands are the inputs
5475 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5476 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5477 Value *Shadow = getShadow(&I, i);
5478 ShadowArgs.append(1, Shadow);
5479 }
5480
5481 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5482 // e.g., for:
5483 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5484 // we know the type of the output (and its shadow) is <16 x i8>.
5485 //
5486 // Arm NEON VST is unusual because the last argument is the output address:
5487 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5488 // call void @llvm.aarch64.neon.st2.v16i8.p0
5489 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5490 // and we have no type information about P's operand. We must manually
5491 // compute the type (<16 x i8> x 2).
5492 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5493 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5494 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5495 (numArgOperands - skipTrailingOperands));
5496 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5497
5498 if (useLane)
5499 ShadowArgs.append(1,
5500 I.getArgOperand(numArgOperands - skipTrailingOperands));
5501
5502 Value *OutputShadowPtr, *OutputOriginPtr;
5503 // AArch64 NEON does not need alignment (unless OS requires it)
5504 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5505 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5506 ShadowArgs.append(1, OutputShadowPtr);
5507
5508 CallInst *CI =
5509 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5510 setShadow(&I, CI);
5511
5512 if (MS.TrackOrigins) {
5513 // TODO: if we modelled the vst* instruction more precisely, we could
5514 // more accurately track the origins (e.g., if both inputs are
5515 // uninitialized for vst2, we currently blame the second input, even
5516 // though part of the output depends only on the first input).
5517 //
5518 // This is particularly imprecise for vst{2,3,4}lane, since only one
5519 // lane of each input is actually copied to the output.
5520 OriginCombiner OC(this, IRB);
5521 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5522 OC.Add(I.getArgOperand(i));
5523
5524 const DataLayout &DL = F.getDataLayout();
5525 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5526 OutputOriginPtr);
5527 }
5528 }
5529
5530 // Integer matrix multiplication:
5531 // - <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8
5532 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5533 // - <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5534 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5535 // - <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8
5536 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5537 //
5538 // Note:
5539 // - <4 x i32> is a 2x2 matrix
5540 // - <16 x i8> %X and %Y are 2x8 and 8x2 matrices respectively
5541 //
5542 // 2x8 %X 8x2 %Y
5543 // [ X01 X02 X03 X04 X05 X06 X07 X08 ] [ Y01 Y09 ]
5544 // [ X09 X10 X11 X12 X13 X14 X15 X16 ] x [ Y02 Y10 ]
5545 // [ Y03 Y11 ]
5546 // [ Y04 Y12 ]
5547 // [ Y05 Y13 ]
5548 // [ Y06 Y14 ]
5549 // [ Y07 Y15 ]
5550 // [ Y08 Y16 ]
5551 //
5552 // The general shadow propagation approach is:
5553 // 1) get the shadows of the input matrices %X and %Y
5554 // 2) change the shadow values to 0x1 if the corresponding value is fully
5555 // initialized, and 0x0 otherwise
5556 // 3) perform a matrix multiplication on the shadows of %X and %Y. The output
5557 // will be a 2x2 matrix; for each element, a value of 0x8 means all the
5558 // corresponding inputs were clean.
5559 // 4) blend in the shadow of %R
5560 //
5561 // TODO: consider allowing multiplication of zero with an uninitialized value
5562 // to result in an initialized value.
5563 //
5564 // Floating-point matrix multiplication:
5565 // - <4 x float> @llvm.aarch64.neon.bfmmla
5566 // (<4 x float> %R, <8 x bfloat> %X, <8 x bfloat> %Y)
5567 // %X and %Y are 2x4 and 4x2 matrices respectively
5568 //
5569 // Although there are half as many elements of %X and %Y compared to the
5570 // integer case, each element is twice the bit-width. Thus, we can reuse the
5571 // shadow propagation logic if we cast the shadows to the same type as the
5572 // integer case, and apply ummla to the shadows:
5573 //
5574 // 2x4 %X 4x2 %Y
5575 // [ A01:A02 A03:A04 A05:A06 A07:A08 ] [ B01:B02 B09:B10 ]
5576 // [ A09:A10 A11:A12 A13:A14 A15:A16 ] x [ B03:B04 B11:B12 ]
5577 // [ B05:B06 B13:B14 ]
5578 // [ B07:B08 B15:B16 ]
5579 //
5580 // For example, consider multiplying the first row of %X with the first
5581 // column of Y. We want to know if
5582 // A01:A02*B01:B02 + A03:A04*B03:B04 + A05:A06*B06:B06 + A07:A08*B07:B08 is
5583 // fully initialized, which will be true if and only if (A01, A02, ..., A08)
5584 // and (B01, B02, ..., B08) are each fully initialized. This latter condition
5585 // is equivalent to what is tested by the instrumentation for the integer
5586 // form.
5587 void handleNEONMatrixMultiply(IntrinsicInst &I) {
5588 IRBuilder<> IRB(&I);
5589
5590 assert(I.arg_size() == 3);
5591 Value *R = I.getArgOperand(0);
5592 Value *A = I.getArgOperand(1);
5593 Value *B = I.getArgOperand(2);
5594
5595 assert(I.getType() == R->getType());
5596
5597 assert(isa<FixedVectorType>(R->getType()));
5598 assert(isa<FixedVectorType>(A->getType()));
5599 assert(isa<FixedVectorType>(B->getType()));
5600
5601 [[maybe_unused]] FixedVectorType *RTy = cast<FixedVectorType>(R->getType());
5602 [[maybe_unused]] FixedVectorType *ATy = cast<FixedVectorType>(A->getType());
5603 [[maybe_unused]] FixedVectorType *BTy = cast<FixedVectorType>(B->getType());
5604
5605 Value *ShadowR = getShadow(&I, 0);
5606 Value *ShadowA = getShadow(&I, 1);
5607 Value *ShadowB = getShadow(&I, 2);
5608
5609 // We will use ummla to compute the shadow. These are the types it expects.
5610 // These are also the types of the corresponding shadows.
5611 FixedVectorType *ExpectedRTy =
5613 FixedVectorType *ExpectedATy =
5615 FixedVectorType *ExpectedBTy =
5617
5618 if (RTy->getElementType()->isIntegerTy()) {
5619 // Types of R and A/B are not identical e.g., <4 x i32> %R, <16 x i8> %A
5621
5622 assert(RTy == ExpectedRTy);
5623 assert(ATy == ExpectedATy);
5624 assert(BTy == ExpectedBTy);
5625 } else {
5628
5629 // Technically, what we care about is that:
5630 // getShadowTy(RTy)->canLosslesslyBitCastTo(ExpectedRTy)) etc.
5631 // but that is equivalent.
5632 assert(RTy->canLosslesslyBitCastTo(ExpectedRTy));
5633 assert(ATy->canLosslesslyBitCastTo(ExpectedATy));
5634 assert(BTy->canLosslesslyBitCastTo(ExpectedBTy));
5635
5636 ShadowA = IRB.CreateBitCast(ShadowA, getShadowTy(ExpectedATy));
5637 ShadowB = IRB.CreateBitCast(ShadowB, getShadowTy(ExpectedBTy));
5638 }
5639 assert(ATy->getElementType() == BTy->getElementType());
5640
5641 // From this point on, use Expected{R,A,B}Type.
5642
5643 // If the value is fully initialized, the shadow will be 000...001.
5644 // Otherwise, the shadow will be all zero.
5645 // (This is the opposite of how we typically handle shadows.)
5646 ShadowA =
5647 IRB.CreateZExt(IRB.CreateICmpEQ(ShadowA, getCleanShadow(ExpectedATy)),
5648 getShadowTy(ExpectedATy));
5649 ShadowB =
5650 IRB.CreateZExt(IRB.CreateICmpEQ(ShadowB, getCleanShadow(ExpectedBTy)),
5651 getShadowTy(ExpectedBTy));
5652
5653 Value *ShadowAB =
5654 IRB.CreateIntrinsic(ExpectedRTy, Intrinsic::aarch64_neon_ummla,
5655 {getCleanShadow(ExpectedRTy), ShadowA, ShadowB});
5656
5657 // ummla multiplies a 2x8 matrix with an 8x2 matrix. If all entries of the
5658 // input matrices are equal to 0x1, all entries of the output matrix will
5659 // be 0x8.
5660 Value *FullyInit = ConstantVector::getSplat(
5661 ExpectedRTy->getElementCount(),
5662 ConstantInt::get(ExpectedRTy->getElementType(), 0x8));
5663
5664 ShadowAB = IRB.CreateSExt(IRB.CreateICmpNE(ShadowAB, FullyInit),
5665 ShadowAB->getType());
5666
5667 ShadowR = IRB.CreateSExt(
5668 IRB.CreateICmpNE(ShadowR, getCleanShadow(ExpectedRTy)), ExpectedRTy);
5669
5670 setShadow(&I, IRB.CreateOr(ShadowAB, ShadowR));
5671 setOriginForNaryOp(I);
5672 }
5673
5674 /// Handle intrinsics by applying the intrinsic to the shadows.
5675 ///
5676 /// The trailing arguments are passed verbatim to the intrinsic, though any
5677 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5678 /// intrinsic with one trailing verbatim argument:
5679 /// out = intrinsic(var1, var2, opType)
5680 /// we compute:
5681 /// shadow[out] =
5682 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5683 ///
5684 /// Typically, shadowIntrinsicID will be specified by the caller to be
5685 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5686 /// intrinsic of the same type.
5687 ///
5688 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5689 /// bit-patterns (for example, if the intrinsic accepts floats for
5690 /// var1, we require that it doesn't care if inputs are NaNs).
5691 ///
5692 /// For example, this can be applied to the Arm NEON vector table intrinsics
5693 /// (tbl{1,2,3,4}).
5694 ///
5695 /// The origin is approximated using setOriginForNaryOp.
5696 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5697 Intrinsic::ID shadowIntrinsicID,
5698 unsigned int trailingVerbatimArgs) {
5699 IRBuilder<> IRB(&I);
5700
5701 assert(trailingVerbatimArgs < I.arg_size());
5702
5703 SmallVector<Value *, 8> ShadowArgs;
5704 // Don't use getNumOperands() because it includes the callee
5705 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5706 Value *Shadow = getShadow(&I, i);
5707
5708 // Shadows are integer-ish types but some intrinsics require a
5709 // different (e.g., floating-point) type.
5710 ShadowArgs.push_back(
5711 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5712 }
5713
5714 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5715 i++) {
5716 Value *Arg = I.getArgOperand(i);
5717 ShadowArgs.push_back(Arg);
5718 }
5719
5720 CallInst *CI =
5721 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5722 Value *CombinedShadow = CI;
5723
5724 // Combine the computed shadow with the shadow of trailing args
5725 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5726 i++) {
5727 Value *Shadow =
5728 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5729 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5730 }
5731
5732 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5733
5734 setOriginForNaryOp(I);
5735 }
5736
5737 // Approximation only
5738 //
5739 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5740 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5741 assert(I.arg_size() == 2);
5742
5743 handleShadowOr(I);
5744 }
5745
5746 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5747 switch (I.getIntrinsicID()) {
5748 case Intrinsic::uadd_with_overflow:
5749 case Intrinsic::sadd_with_overflow:
5750 case Intrinsic::usub_with_overflow:
5751 case Intrinsic::ssub_with_overflow:
5752 case Intrinsic::umul_with_overflow:
5753 case Intrinsic::smul_with_overflow:
5754 handleArithmeticWithOverflow(I);
5755 break;
5756 case Intrinsic::abs:
5757 handleAbsIntrinsic(I);
5758 break;
5759 case Intrinsic::bitreverse:
5760 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5761 /*trailingVerbatimArgs*/ 0);
5762 break;
5763 case Intrinsic::is_fpclass:
5764 handleIsFpClass(I);
5765 break;
5766 case Intrinsic::lifetime_start:
5767 handleLifetimeStart(I);
5768 break;
5769 case Intrinsic::launder_invariant_group:
5770 case Intrinsic::strip_invariant_group:
5771 handleInvariantGroup(I);
5772 break;
5773 case Intrinsic::bswap:
5774 handleBswap(I);
5775 break;
5776 case Intrinsic::ctlz:
5777 case Intrinsic::cttz:
5778 handleCountLeadingTrailingZeros(I);
5779 break;
5780 case Intrinsic::masked_compressstore:
5781 handleMaskedCompressStore(I);
5782 break;
5783 case Intrinsic::masked_expandload:
5784 handleMaskedExpandLoad(I);
5785 break;
5786 case Intrinsic::masked_gather:
5787 handleMaskedGather(I);
5788 break;
5789 case Intrinsic::masked_scatter:
5790 handleMaskedScatter(I);
5791 break;
5792 case Intrinsic::masked_store:
5793 handleMaskedStore(I);
5794 break;
5795 case Intrinsic::masked_load:
5796 handleMaskedLoad(I);
5797 break;
5798 case Intrinsic::vector_reduce_and:
5799 handleVectorReduceAndIntrinsic(I);
5800 break;
5801 case Intrinsic::vector_reduce_or:
5802 handleVectorReduceOrIntrinsic(I);
5803 break;
5804
5805 case Intrinsic::vector_reduce_add:
5806 case Intrinsic::vector_reduce_xor:
5807 case Intrinsic::vector_reduce_mul:
5808 // Signed/Unsigned Min/Max
5809 // TODO: handling similarly to AND/OR may be more precise.
5810 case Intrinsic::vector_reduce_smax:
5811 case Intrinsic::vector_reduce_smin:
5812 case Intrinsic::vector_reduce_umax:
5813 case Intrinsic::vector_reduce_umin:
5814 // TODO: this has no false positives, but arguably we should check that all
5815 // the bits are initialized.
5816 case Intrinsic::vector_reduce_fmax:
5817 case Intrinsic::vector_reduce_fmin:
5818 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5819 break;
5820
5821 case Intrinsic::vector_reduce_fadd:
5822 case Intrinsic::vector_reduce_fmul:
5823 handleVectorReduceWithStarterIntrinsic(I);
5824 break;
5825
5826 case Intrinsic::scmp:
5827 case Intrinsic::ucmp: {
5828 handleShadowOr(I);
5829 break;
5830 }
5831
5832 case Intrinsic::fshl:
5833 case Intrinsic::fshr:
5834 handleFunnelShift(I);
5835 break;
5836
5837 case Intrinsic::is_constant:
5838 // The result of llvm.is.constant() is always defined.
5839 setShadow(&I, getCleanShadow(&I));
5840 setOrigin(&I, getCleanOrigin());
5841 break;
5842
5843 default:
5844 return false;
5845 }
5846
5847 return true;
5848 }
5849
5850 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5851 switch (I.getIntrinsicID()) {
5852 case Intrinsic::x86_sse_stmxcsr:
5853 handleStmxcsr(I);
5854 break;
5855 case Intrinsic::x86_sse_ldmxcsr:
5856 handleLdmxcsr(I);
5857 break;
5858
5859 // Convert Scalar Double Precision Floating-Point Value
5860 // to Unsigned Doubleword Integer
5861 // etc.
5862 case Intrinsic::x86_avx512_vcvtsd2usi64:
5863 case Intrinsic::x86_avx512_vcvtsd2usi32:
5864 case Intrinsic::x86_avx512_vcvtss2usi64:
5865 case Intrinsic::x86_avx512_vcvtss2usi32:
5866 case Intrinsic::x86_avx512_cvttss2usi64:
5867 case Intrinsic::x86_avx512_cvttss2usi:
5868 case Intrinsic::x86_avx512_cvttsd2usi64:
5869 case Intrinsic::x86_avx512_cvttsd2usi:
5870 case Intrinsic::x86_avx512_cvtusi2ss:
5871 case Intrinsic::x86_avx512_cvtusi642sd:
5872 case Intrinsic::x86_avx512_cvtusi642ss:
5873 handleSSEVectorConvertIntrinsic(I, 1, true);
5874 break;
5875 case Intrinsic::x86_sse2_cvtsd2si64:
5876 case Intrinsic::x86_sse2_cvtsd2si:
5877 case Intrinsic::x86_sse2_cvtsd2ss:
5878 case Intrinsic::x86_sse2_cvttsd2si64:
5879 case Intrinsic::x86_sse2_cvttsd2si:
5880 case Intrinsic::x86_sse_cvtss2si64:
5881 case Intrinsic::x86_sse_cvtss2si:
5882 case Intrinsic::x86_sse_cvttss2si64:
5883 case Intrinsic::x86_sse_cvttss2si:
5884 handleSSEVectorConvertIntrinsic(I, 1);
5885 break;
5886 case Intrinsic::x86_sse_cvtps2pi:
5887 case Intrinsic::x86_sse_cvttps2pi:
5888 handleSSEVectorConvertIntrinsic(I, 2);
5889 break;
5890
5891 // TODO:
5892 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5893 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5894 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5895
5896 case Intrinsic::x86_vcvtps2ph_128:
5897 case Intrinsic::x86_vcvtps2ph_256: {
5898 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5899 break;
5900 }
5901
5902 // Convert Packed Single Precision Floating-Point Values
5903 // to Packed Signed Doubleword Integer Values
5904 //
5905 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5906 // (<16 x float>, <16 x i32>, i16, i32)
5907 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5908 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5909 break;
5910
5911 // Convert Packed Double Precision Floating-Point Values
5912 // to Packed Single Precision Floating-Point Values
5913 case Intrinsic::x86_sse2_cvtpd2ps:
5914 case Intrinsic::x86_sse2_cvtps2dq:
5915 case Intrinsic::x86_sse2_cvtpd2dq:
5916 case Intrinsic::x86_sse2_cvttps2dq:
5917 case Intrinsic::x86_sse2_cvttpd2dq:
5918 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5919 case Intrinsic::x86_avx_cvt_ps2dq_256:
5920 case Intrinsic::x86_avx_cvt_pd2dq_256:
5921 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5922 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5923 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5924 break;
5925 }
5926
5927 // Convert Single-Precision FP Value to 16-bit FP Value
5928 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5929 // (<16 x float>, i32, <16 x i16>, i16)
5930 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5931 // (<4 x float>, i32, <8 x i16>, i8)
5932 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5933 // (<8 x float>, i32, <8 x i16>, i8)
5934 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5935 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5936 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5937 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5938 break;
5939
5940 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5941 case Intrinsic::x86_avx512_psll_w_512:
5942 case Intrinsic::x86_avx512_psll_d_512:
5943 case Intrinsic::x86_avx512_psll_q_512:
5944 case Intrinsic::x86_avx512_pslli_w_512:
5945 case Intrinsic::x86_avx512_pslli_d_512:
5946 case Intrinsic::x86_avx512_pslli_q_512:
5947 case Intrinsic::x86_avx512_psrl_w_512:
5948 case Intrinsic::x86_avx512_psrl_d_512:
5949 case Intrinsic::x86_avx512_psrl_q_512:
5950 case Intrinsic::x86_avx512_psra_w_512:
5951 case Intrinsic::x86_avx512_psra_d_512:
5952 case Intrinsic::x86_avx512_psra_q_512:
5953 case Intrinsic::x86_avx512_psrli_w_512:
5954 case Intrinsic::x86_avx512_psrli_d_512:
5955 case Intrinsic::x86_avx512_psrli_q_512:
5956 case Intrinsic::x86_avx512_psrai_w_512:
5957 case Intrinsic::x86_avx512_psrai_d_512:
5958 case Intrinsic::x86_avx512_psrai_q_512:
5959 case Intrinsic::x86_avx512_psra_q_256:
5960 case Intrinsic::x86_avx512_psra_q_128:
5961 case Intrinsic::x86_avx512_psrai_q_256:
5962 case Intrinsic::x86_avx512_psrai_q_128:
5963 case Intrinsic::x86_avx2_psll_w:
5964 case Intrinsic::x86_avx2_psll_d:
5965 case Intrinsic::x86_avx2_psll_q:
5966 case Intrinsic::x86_avx2_pslli_w:
5967 case Intrinsic::x86_avx2_pslli_d:
5968 case Intrinsic::x86_avx2_pslli_q:
5969 case Intrinsic::x86_avx2_psrl_w:
5970 case Intrinsic::x86_avx2_psrl_d:
5971 case Intrinsic::x86_avx2_psrl_q:
5972 case Intrinsic::x86_avx2_psra_w:
5973 case Intrinsic::x86_avx2_psra_d:
5974 case Intrinsic::x86_avx2_psrli_w:
5975 case Intrinsic::x86_avx2_psrli_d:
5976 case Intrinsic::x86_avx2_psrli_q:
5977 case Intrinsic::x86_avx2_psrai_w:
5978 case Intrinsic::x86_avx2_psrai_d:
5979 case Intrinsic::x86_sse2_psll_w:
5980 case Intrinsic::x86_sse2_psll_d:
5981 case Intrinsic::x86_sse2_psll_q:
5982 case Intrinsic::x86_sse2_pslli_w:
5983 case Intrinsic::x86_sse2_pslli_d:
5984 case Intrinsic::x86_sse2_pslli_q:
5985 case Intrinsic::x86_sse2_psrl_w:
5986 case Intrinsic::x86_sse2_psrl_d:
5987 case Intrinsic::x86_sse2_psrl_q:
5988 case Intrinsic::x86_sse2_psra_w:
5989 case Intrinsic::x86_sse2_psra_d:
5990 case Intrinsic::x86_sse2_psrli_w:
5991 case Intrinsic::x86_sse2_psrli_d:
5992 case Intrinsic::x86_sse2_psrli_q:
5993 case Intrinsic::x86_sse2_psrai_w:
5994 case Intrinsic::x86_sse2_psrai_d:
5995 case Intrinsic::x86_mmx_psll_w:
5996 case Intrinsic::x86_mmx_psll_d:
5997 case Intrinsic::x86_mmx_psll_q:
5998 case Intrinsic::x86_mmx_pslli_w:
5999 case Intrinsic::x86_mmx_pslli_d:
6000 case Intrinsic::x86_mmx_pslli_q:
6001 case Intrinsic::x86_mmx_psrl_w:
6002 case Intrinsic::x86_mmx_psrl_d:
6003 case Intrinsic::x86_mmx_psrl_q:
6004 case Intrinsic::x86_mmx_psra_w:
6005 case Intrinsic::x86_mmx_psra_d:
6006 case Intrinsic::x86_mmx_psrli_w:
6007 case Intrinsic::x86_mmx_psrli_d:
6008 case Intrinsic::x86_mmx_psrli_q:
6009 case Intrinsic::x86_mmx_psrai_w:
6010 case Intrinsic::x86_mmx_psrai_d:
6011 handleVectorShiftIntrinsic(I, /* Variable */ false);
6012 break;
6013 case Intrinsic::x86_avx2_psllv_d:
6014 case Intrinsic::x86_avx2_psllv_d_256:
6015 case Intrinsic::x86_avx512_psllv_d_512:
6016 case Intrinsic::x86_avx2_psllv_q:
6017 case Intrinsic::x86_avx2_psllv_q_256:
6018 case Intrinsic::x86_avx512_psllv_q_512:
6019 case Intrinsic::x86_avx2_psrlv_d:
6020 case Intrinsic::x86_avx2_psrlv_d_256:
6021 case Intrinsic::x86_avx512_psrlv_d_512:
6022 case Intrinsic::x86_avx2_psrlv_q:
6023 case Intrinsic::x86_avx2_psrlv_q_256:
6024 case Intrinsic::x86_avx512_psrlv_q_512:
6025 case Intrinsic::x86_avx2_psrav_d:
6026 case Intrinsic::x86_avx2_psrav_d_256:
6027 case Intrinsic::x86_avx512_psrav_d_512:
6028 case Intrinsic::x86_avx512_psrav_q_128:
6029 case Intrinsic::x86_avx512_psrav_q_256:
6030 case Intrinsic::x86_avx512_psrav_q_512:
6031 handleVectorShiftIntrinsic(I, /* Variable */ true);
6032 break;
6033
6034 // Pack with Signed/Unsigned Saturation
6035 case Intrinsic::x86_sse2_packsswb_128:
6036 case Intrinsic::x86_sse2_packssdw_128:
6037 case Intrinsic::x86_sse2_packuswb_128:
6038 case Intrinsic::x86_sse41_packusdw:
6039 case Intrinsic::x86_avx2_packsswb:
6040 case Intrinsic::x86_avx2_packssdw:
6041 case Intrinsic::x86_avx2_packuswb:
6042 case Intrinsic::x86_avx2_packusdw:
6043 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
6044 // (<32 x i16> %a, <32 x i16> %b)
6045 // <32 x i16> @llvm.x86.avx512.packssdw.512
6046 // (<16 x i32> %a, <16 x i32> %b)
6047 // Note: AVX512 masked variants are auto-upgraded by LLVM.
6048 case Intrinsic::x86_avx512_packsswb_512:
6049 case Intrinsic::x86_avx512_packssdw_512:
6050 case Intrinsic::x86_avx512_packuswb_512:
6051 case Intrinsic::x86_avx512_packusdw_512:
6052 handleVectorPackIntrinsic(I);
6053 break;
6054
6055 case Intrinsic::x86_sse41_pblendvb:
6056 case Intrinsic::x86_sse41_blendvpd:
6057 case Intrinsic::x86_sse41_blendvps:
6058 case Intrinsic::x86_avx_blendv_pd_256:
6059 case Intrinsic::x86_avx_blendv_ps_256:
6060 case Intrinsic::x86_avx2_pblendvb:
6061 handleBlendvIntrinsic(I);
6062 break;
6063
6064 case Intrinsic::x86_avx_dp_ps_256:
6065 case Intrinsic::x86_sse41_dppd:
6066 case Intrinsic::x86_sse41_dpps:
6067 handleDppIntrinsic(I);
6068 break;
6069
6070 case Intrinsic::x86_mmx_packsswb:
6071 case Intrinsic::x86_mmx_packuswb:
6072 handleVectorPackIntrinsic(I, 16);
6073 break;
6074
6075 case Intrinsic::x86_mmx_packssdw:
6076 handleVectorPackIntrinsic(I, 32);
6077 break;
6078
6079 case Intrinsic::x86_mmx_psad_bw:
6080 handleVectorSadIntrinsic(I, true);
6081 break;
6082 case Intrinsic::x86_sse2_psad_bw:
6083 case Intrinsic::x86_avx2_psad_bw:
6084 handleVectorSadIntrinsic(I);
6085 break;
6086
6087 // Multiply and Add Packed Words
6088 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
6089 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
6090 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
6091 //
6092 // Multiply and Add Packed Signed and Unsigned Bytes
6093 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
6094 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
6095 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
6096 //
6097 // These intrinsics are auto-upgraded into non-masked forms:
6098 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
6099 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
6100 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
6101 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
6102 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
6103 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
6104 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
6105 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
6106 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
6107 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
6108 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
6109 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
6110 case Intrinsic::x86_sse2_pmadd_wd:
6111 case Intrinsic::x86_avx2_pmadd_wd:
6112 case Intrinsic::x86_avx512_pmaddw_d_512:
6113 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
6114 case Intrinsic::x86_avx2_pmadd_ub_sw:
6115 case Intrinsic::x86_avx512_pmaddubs_w_512:
6116 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6117 /*ZeroPurifies=*/true,
6118 /*EltSizeInBits=*/0,
6119 /*Lanes=*/kBothLanes);
6120 break;
6121
6122 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
6123 case Intrinsic::x86_ssse3_pmadd_ub_sw:
6124 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6125 /*ZeroPurifies=*/true,
6126 /*EltSizeInBits=*/8,
6127 /*Lanes=*/kBothLanes);
6128 break;
6129
6130 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
6131 case Intrinsic::x86_mmx_pmadd_wd:
6132 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6133 /*ZeroPurifies=*/true,
6134 /*EltSizeInBits=*/16,
6135 /*Lanes=*/kBothLanes);
6136 break;
6137
6138 // BFloat16 multiply-add to single-precision
6139 // <4 x float> llvm.aarch64.neon.bfmlalt
6140 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6141 case Intrinsic::aarch64_neon_bfmlalt:
6142 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6143 /*ZeroPurifies=*/false,
6144 /*EltSizeInBits=*/0,
6145 /*Lanes=*/kOddLanes);
6146 break;
6147
6148 // <4 x float> llvm.aarch64.neon.bfmlalb
6149 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6150 case Intrinsic::aarch64_neon_bfmlalb:
6151 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6152 /*ZeroPurifies=*/false,
6153 /*EltSizeInBits=*/0,
6154 /*Lanes=*/kEvenLanes);
6155 break;
6156
6157 // AVX Vector Neural Network Instructions: bytes
6158 //
6159 // Multiply and Add Signed Bytes
6160 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
6161 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6162 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
6163 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6164 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
6165 // (<16 x i32>, <64 x i8>, <64 x i8>)
6166 //
6167 // Multiply and Add Signed Bytes With Saturation
6168 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
6169 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6170 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
6171 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6172 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
6173 // (<16 x i32>, <64 x i8>, <64 x i8>)
6174 //
6175 // Multiply and Add Signed and Unsigned Bytes
6176 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
6177 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6178 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
6179 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6180 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
6181 // (<16 x i32>, <64 x i8>, <64 x i8>)
6182 //
6183 // Multiply and Add Signed and Unsigned Bytes With Saturation
6184 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
6185 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6186 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
6187 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6188 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
6189 // (<16 x i32>, <64 x i8>, <64 x i8>)
6190 //
6191 // Multiply and Add Unsigned and Signed Bytes
6192 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
6193 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6194 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
6195 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6196 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
6197 // (<16 x i32>, <64 x i8>, <64 x i8>)
6198 //
6199 // Multiply and Add Unsigned and Signed Bytes With Saturation
6200 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
6201 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6202 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
6203 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6204 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
6205 // (<16 x i32>, <64 x i8>, <64 x i8>)
6206 //
6207 // Multiply and Add Unsigned Bytes
6208 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
6209 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6210 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
6211 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6212 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
6213 // (<16 x i32>, <64 x i8>, <64 x i8>)
6214 //
6215 // Multiply and Add Unsigned Bytes With Saturation
6216 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
6217 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6218 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
6219 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6220 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
6221 // (<16 x i32>, <64 x i8>, <64 x i8>)
6222 //
6223 // These intrinsics are auto-upgraded into non-masked forms:
6224 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
6225 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6226 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
6227 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6228 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
6229 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6230 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6231 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6232 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6233 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6234 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6235 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6236 //
6237 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6238 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6239 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6240 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6241 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6242 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6243 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6244 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6245 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6246 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6247 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6248 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6249 case Intrinsic::x86_avx512_vpdpbusd_128:
6250 case Intrinsic::x86_avx512_vpdpbusd_256:
6251 case Intrinsic::x86_avx512_vpdpbusd_512:
6252 case Intrinsic::x86_avx512_vpdpbusds_128:
6253 case Intrinsic::x86_avx512_vpdpbusds_256:
6254 case Intrinsic::x86_avx512_vpdpbusds_512:
6255 case Intrinsic::x86_avx2_vpdpbssd_128:
6256 case Intrinsic::x86_avx2_vpdpbssd_256:
6257 case Intrinsic::x86_avx10_vpdpbssd_512:
6258 case Intrinsic::x86_avx2_vpdpbssds_128:
6259 case Intrinsic::x86_avx2_vpdpbssds_256:
6260 case Intrinsic::x86_avx10_vpdpbssds_512:
6261 case Intrinsic::x86_avx2_vpdpbsud_128:
6262 case Intrinsic::x86_avx2_vpdpbsud_256:
6263 case Intrinsic::x86_avx10_vpdpbsud_512:
6264 case Intrinsic::x86_avx2_vpdpbsuds_128:
6265 case Intrinsic::x86_avx2_vpdpbsuds_256:
6266 case Intrinsic::x86_avx10_vpdpbsuds_512:
6267 case Intrinsic::x86_avx2_vpdpbuud_128:
6268 case Intrinsic::x86_avx2_vpdpbuud_256:
6269 case Intrinsic::x86_avx10_vpdpbuud_512:
6270 case Intrinsic::x86_avx2_vpdpbuuds_128:
6271 case Intrinsic::x86_avx2_vpdpbuuds_256:
6272 case Intrinsic::x86_avx10_vpdpbuuds_512:
6273 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6274 /*ZeroPurifies=*/true,
6275 /*EltSizeInBits=*/0,
6276 /*Lanes=*/kBothLanes);
6277 break;
6278
6279 // AVX Vector Neural Network Instructions: words
6280 //
6281 // Multiply and Add Signed Word Integers
6282 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6283 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6284 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6285 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6286 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6287 // (<16 x i32>, <32 x i16>, <32 x i16>)
6288 //
6289 // Multiply and Add Signed Word Integers With Saturation
6290 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6291 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6292 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6293 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6294 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6295 // (<16 x i32>, <32 x i16>, <32 x i16>)
6296 //
6297 // Multiply and Add Signed and Unsigned Word Integers
6298 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6299 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6300 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6301 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6302 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6303 // (<16 x i32>, <32 x i16>, <32 x i16>)
6304 //
6305 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6306 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6307 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6308 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6309 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6310 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6311 // (<16 x i32>, <32 x i16>, <32 x i16>)
6312 //
6313 // Multiply and Add Unsigned and Signed Word Integers
6314 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6315 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6316 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6317 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6318 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6319 // (<16 x i32>, <32 x i16>, <32 x i16>)
6320 //
6321 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6322 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6323 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6324 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6325 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6326 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6327 // (<16 x i32>, <32 x i16>, <32 x i16>)
6328 //
6329 // Multiply and Add Unsigned and Unsigned Word Integers
6330 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6331 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6332 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6333 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6334 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6335 // (<16 x i32>, <32 x i16>, <32 x i16>)
6336 //
6337 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6338 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6339 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6340 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6341 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6342 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6343 // (<16 x i32>, <32 x i16>, <32 x i16>)
6344 //
6345 // These intrinsics are auto-upgraded into non-masked forms:
6346 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6347 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6348 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6349 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6350 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6351 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6352 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6353 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6354 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6355 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6356 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6357 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6358 //
6359 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6360 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6361 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6362 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6363 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6364 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6365 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6366 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6367 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6368 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6369 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6370 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6371 case Intrinsic::x86_avx512_vpdpwssd_128:
6372 case Intrinsic::x86_avx512_vpdpwssd_256:
6373 case Intrinsic::x86_avx512_vpdpwssd_512:
6374 case Intrinsic::x86_avx512_vpdpwssds_128:
6375 case Intrinsic::x86_avx512_vpdpwssds_256:
6376 case Intrinsic::x86_avx512_vpdpwssds_512:
6377 case Intrinsic::x86_avx2_vpdpwsud_128:
6378 case Intrinsic::x86_avx2_vpdpwsud_256:
6379 case Intrinsic::x86_avx10_vpdpwsud_512:
6380 case Intrinsic::x86_avx2_vpdpwsuds_128:
6381 case Intrinsic::x86_avx2_vpdpwsuds_256:
6382 case Intrinsic::x86_avx10_vpdpwsuds_512:
6383 case Intrinsic::x86_avx2_vpdpwusd_128:
6384 case Intrinsic::x86_avx2_vpdpwusd_256:
6385 case Intrinsic::x86_avx10_vpdpwusd_512:
6386 case Intrinsic::x86_avx2_vpdpwusds_128:
6387 case Intrinsic::x86_avx2_vpdpwusds_256:
6388 case Intrinsic::x86_avx10_vpdpwusds_512:
6389 case Intrinsic::x86_avx2_vpdpwuud_128:
6390 case Intrinsic::x86_avx2_vpdpwuud_256:
6391 case Intrinsic::x86_avx10_vpdpwuud_512:
6392 case Intrinsic::x86_avx2_vpdpwuuds_128:
6393 case Intrinsic::x86_avx2_vpdpwuuds_256:
6394 case Intrinsic::x86_avx10_vpdpwuuds_512:
6395 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6396 /*ZeroPurifies=*/true,
6397 /*EltSizeInBits=*/0,
6398 /*Lanes=*/kBothLanes);
6399 break;
6400
6401 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6402 // Precision
6403 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6404 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6405 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6406 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6407 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6408 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6409 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6410 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6411 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6412 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6413 /*ZeroPurifies=*/false,
6414 /*EltSizeInBits=*/0,
6415 /*Lanes=*/kBothLanes);
6416 break;
6417
6418 case Intrinsic::x86_sse_cmp_ss:
6419 case Intrinsic::x86_sse2_cmp_sd:
6420 case Intrinsic::x86_sse_comieq_ss:
6421 case Intrinsic::x86_sse_comilt_ss:
6422 case Intrinsic::x86_sse_comile_ss:
6423 case Intrinsic::x86_sse_comigt_ss:
6424 case Intrinsic::x86_sse_comige_ss:
6425 case Intrinsic::x86_sse_comineq_ss:
6426 case Intrinsic::x86_sse_ucomieq_ss:
6427 case Intrinsic::x86_sse_ucomilt_ss:
6428 case Intrinsic::x86_sse_ucomile_ss:
6429 case Intrinsic::x86_sse_ucomigt_ss:
6430 case Intrinsic::x86_sse_ucomige_ss:
6431 case Intrinsic::x86_sse_ucomineq_ss:
6432 case Intrinsic::x86_sse2_comieq_sd:
6433 case Intrinsic::x86_sse2_comilt_sd:
6434 case Intrinsic::x86_sse2_comile_sd:
6435 case Intrinsic::x86_sse2_comigt_sd:
6436 case Intrinsic::x86_sse2_comige_sd:
6437 case Intrinsic::x86_sse2_comineq_sd:
6438 case Intrinsic::x86_sse2_ucomieq_sd:
6439 case Intrinsic::x86_sse2_ucomilt_sd:
6440 case Intrinsic::x86_sse2_ucomile_sd:
6441 case Intrinsic::x86_sse2_ucomigt_sd:
6442 case Intrinsic::x86_sse2_ucomige_sd:
6443 case Intrinsic::x86_sse2_ucomineq_sd:
6444 handleVectorCompareScalarIntrinsic(I);
6445 break;
6446
6447 case Intrinsic::x86_avx_cmp_pd_256:
6448 case Intrinsic::x86_avx_cmp_ps_256:
6449 case Intrinsic::x86_sse2_cmp_pd:
6450 case Intrinsic::x86_sse_cmp_ps:
6451 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/true);
6452 break;
6453
6454 case Intrinsic::x86_bmi_bextr_32:
6455 case Intrinsic::x86_bmi_bextr_64:
6456 case Intrinsic::x86_bmi_bzhi_32:
6457 case Intrinsic::x86_bmi_bzhi_64:
6458 case Intrinsic::x86_bmi_pdep_32:
6459 case Intrinsic::x86_bmi_pdep_64:
6460 case Intrinsic::x86_bmi_pext_32:
6461 case Intrinsic::x86_bmi_pext_64:
6462 handleBmiIntrinsic(I);
6463 break;
6464
6465 case Intrinsic::x86_pclmulqdq:
6466 case Intrinsic::x86_pclmulqdq_256:
6467 case Intrinsic::x86_pclmulqdq_512:
6468 handlePclmulIntrinsic(I);
6469 break;
6470
6471 case Intrinsic::x86_avx_round_pd_256:
6472 case Intrinsic::x86_avx_round_ps_256:
6473 case Intrinsic::x86_sse41_round_pd:
6474 case Intrinsic::x86_sse41_round_ps:
6475 handleRoundPdPsIntrinsic(I);
6476 break;
6477
6478 case Intrinsic::x86_sse41_round_sd:
6479 case Intrinsic::x86_sse41_round_ss:
6480 handleUnarySdSsIntrinsic(I);
6481 break;
6482
6483 case Intrinsic::x86_sse2_max_sd:
6484 case Intrinsic::x86_sse_max_ss:
6485 case Intrinsic::x86_sse2_min_sd:
6486 case Intrinsic::x86_sse_min_ss:
6487 handleBinarySdSsIntrinsic(I);
6488 break;
6489
6490 case Intrinsic::x86_avx_vtestc_pd:
6491 case Intrinsic::x86_avx_vtestc_pd_256:
6492 case Intrinsic::x86_avx_vtestc_ps:
6493 case Intrinsic::x86_avx_vtestc_ps_256:
6494 case Intrinsic::x86_avx_vtestnzc_pd:
6495 case Intrinsic::x86_avx_vtestnzc_pd_256:
6496 case Intrinsic::x86_avx_vtestnzc_ps:
6497 case Intrinsic::x86_avx_vtestnzc_ps_256:
6498 case Intrinsic::x86_avx_vtestz_pd:
6499 case Intrinsic::x86_avx_vtestz_pd_256:
6500 case Intrinsic::x86_avx_vtestz_ps:
6501 case Intrinsic::x86_avx_vtestz_ps_256:
6502 case Intrinsic::x86_avx_ptestc_256:
6503 case Intrinsic::x86_avx_ptestnzc_256:
6504 case Intrinsic::x86_avx_ptestz_256:
6505 case Intrinsic::x86_sse41_ptestc:
6506 case Intrinsic::x86_sse41_ptestnzc:
6507 case Intrinsic::x86_sse41_ptestz:
6508 handleVtestIntrinsic(I);
6509 break;
6510
6511 // Packed Horizontal Add/Subtract
6512 case Intrinsic::x86_ssse3_phadd_w:
6513 case Intrinsic::x86_ssse3_phadd_w_128:
6514 case Intrinsic::x86_ssse3_phsub_w:
6515 case Intrinsic::x86_ssse3_phsub_w_128:
6516 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6517 /*ReinterpretElemWidth=*/16);
6518 break;
6519
6520 case Intrinsic::x86_avx2_phadd_w:
6521 case Intrinsic::x86_avx2_phsub_w:
6522 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6523 /*ReinterpretElemWidth=*/16);
6524 break;
6525
6526 // Packed Horizontal Add/Subtract
6527 case Intrinsic::x86_ssse3_phadd_d:
6528 case Intrinsic::x86_ssse3_phadd_d_128:
6529 case Intrinsic::x86_ssse3_phsub_d:
6530 case Intrinsic::x86_ssse3_phsub_d_128:
6531 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6532 /*ReinterpretElemWidth=*/32);
6533 break;
6534
6535 case Intrinsic::x86_avx2_phadd_d:
6536 case Intrinsic::x86_avx2_phsub_d:
6537 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6538 /*ReinterpretElemWidth=*/32);
6539 break;
6540
6541 // Packed Horizontal Add/Subtract and Saturate
6542 case Intrinsic::x86_ssse3_phadd_sw:
6543 case Intrinsic::x86_ssse3_phadd_sw_128:
6544 case Intrinsic::x86_ssse3_phsub_sw:
6545 case Intrinsic::x86_ssse3_phsub_sw_128:
6546 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6547 /*ReinterpretElemWidth=*/16);
6548 break;
6549
6550 case Intrinsic::x86_avx2_phadd_sw:
6551 case Intrinsic::x86_avx2_phsub_sw:
6552 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6553 /*ReinterpretElemWidth=*/16);
6554 break;
6555
6556 // Packed Single/Double Precision Floating-Point Horizontal Add
6557 case Intrinsic::x86_sse3_hadd_ps:
6558 case Intrinsic::x86_sse3_hadd_pd:
6559 case Intrinsic::x86_sse3_hsub_ps:
6560 case Intrinsic::x86_sse3_hsub_pd:
6561 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6562 break;
6563
6564 case Intrinsic::x86_avx_hadd_pd_256:
6565 case Intrinsic::x86_avx_hadd_ps_256:
6566 case Intrinsic::x86_avx_hsub_pd_256:
6567 case Intrinsic::x86_avx_hsub_ps_256:
6568 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6569 break;
6570
6571 case Intrinsic::x86_avx_maskstore_ps:
6572 case Intrinsic::x86_avx_maskstore_pd:
6573 case Intrinsic::x86_avx_maskstore_ps_256:
6574 case Intrinsic::x86_avx_maskstore_pd_256:
6575 case Intrinsic::x86_avx2_maskstore_d:
6576 case Intrinsic::x86_avx2_maskstore_q:
6577 case Intrinsic::x86_avx2_maskstore_d_256:
6578 case Intrinsic::x86_avx2_maskstore_q_256: {
6579 handleAVXMaskedStore(I);
6580 break;
6581 }
6582
6583 case Intrinsic::x86_avx_maskload_ps:
6584 case Intrinsic::x86_avx_maskload_pd:
6585 case Intrinsic::x86_avx_maskload_ps_256:
6586 case Intrinsic::x86_avx_maskload_pd_256:
6587 case Intrinsic::x86_avx2_maskload_d:
6588 case Intrinsic::x86_avx2_maskload_q:
6589 case Intrinsic::x86_avx2_maskload_d_256:
6590 case Intrinsic::x86_avx2_maskload_q_256: {
6591 handleAVXMaskedLoad(I);
6592 break;
6593 }
6594
6595 // Packed
6596 case Intrinsic::x86_avx512fp16_add_ph_512:
6597 case Intrinsic::x86_avx512fp16_sub_ph_512:
6598 case Intrinsic::x86_avx512fp16_mul_ph_512:
6599 case Intrinsic::x86_avx512fp16_div_ph_512:
6600 case Intrinsic::x86_avx512fp16_max_ph_512:
6601 case Intrinsic::x86_avx512fp16_min_ph_512:
6602 case Intrinsic::x86_avx512_min_ps_512:
6603 case Intrinsic::x86_avx512_min_pd_512:
6604 case Intrinsic::x86_avx512_max_ps_512:
6605 case Intrinsic::x86_avx512_max_pd_512: {
6606 // These AVX512 variants contain the rounding mode as a trailing flag.
6607 // Earlier variants do not have a trailing flag and are already handled
6608 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6609 // maybeHandleUnknownIntrinsic.
6610 [[maybe_unused]] bool Success =
6611 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6612 assert(Success);
6613 break;
6614 }
6615
6616 case Intrinsic::x86_avx_vpermilvar_pd:
6617 case Intrinsic::x86_avx_vpermilvar_pd_256:
6618 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6619 case Intrinsic::x86_avx_vpermilvar_ps:
6620 case Intrinsic::x86_avx_vpermilvar_ps_256:
6621 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6622 handleAVXVpermilvar(I);
6623 break;
6624 }
6625
6626 case Intrinsic::x86_avx512_vpermi2var_d_128:
6627 case Intrinsic::x86_avx512_vpermi2var_d_256:
6628 case Intrinsic::x86_avx512_vpermi2var_d_512:
6629 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6630 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6631 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6632 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6633 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6634 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6635 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6636 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6637 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6638 case Intrinsic::x86_avx512_vpermi2var_q_128:
6639 case Intrinsic::x86_avx512_vpermi2var_q_256:
6640 case Intrinsic::x86_avx512_vpermi2var_q_512:
6641 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6642 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6643 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6644 handleAVXVpermi2var(I);
6645 break;
6646
6647 // Packed Shuffle
6648 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6649 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6650 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6651 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6652 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6653 //
6654 // The following intrinsics are auto-upgraded:
6655 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6656 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6657 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6658 case Intrinsic::x86_avx2_pshuf_b:
6659 case Intrinsic::x86_sse_pshuf_w:
6660 case Intrinsic::x86_ssse3_pshuf_b_128:
6661 case Intrinsic::x86_ssse3_pshuf_b:
6662 case Intrinsic::x86_avx512_pshuf_b_512:
6663 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6664 /*trailingVerbatimArgs=*/1);
6665 break;
6666
6667 // AVX512 PMOV: Packed MOV, with truncation
6668 // Precisely handled by applying the same intrinsic to the shadow
6669 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6670 case Intrinsic::x86_avx512_mask_pmov_db_512:
6671 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6672 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6673 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6674 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6675 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6676 /*trailingVerbatimArgs=*/1);
6677 break;
6678 }
6679
6680 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6681 // Approximately handled using the corresponding truncation intrinsic
6682 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6683 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6684 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6685 handleIntrinsicByApplyingToShadow(I,
6686 Intrinsic::x86_avx512_mask_pmov_dw_512,
6687 /* trailingVerbatimArgs=*/1);
6688 break;
6689 }
6690
6691 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6692 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6693 handleIntrinsicByApplyingToShadow(I,
6694 Intrinsic::x86_avx512_mask_pmov_db_512,
6695 /* trailingVerbatimArgs=*/1);
6696 break;
6697 }
6698
6699 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6700 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6701 handleIntrinsicByApplyingToShadow(I,
6702 Intrinsic::x86_avx512_mask_pmov_qb_512,
6703 /* trailingVerbatimArgs=*/1);
6704 break;
6705 }
6706
6707 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6708 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6709 handleIntrinsicByApplyingToShadow(I,
6710 Intrinsic::x86_avx512_mask_pmov_qw_512,
6711 /* trailingVerbatimArgs=*/1);
6712 break;
6713 }
6714
6715 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6716 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6717 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6718 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6719 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6720 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6721 // slow-path handler.
6722 handleAVX512VectorDownConvert(I);
6723 break;
6724 }
6725
6726 // AVX512/AVX10 Reciprocal
6727 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6728 // (<16 x float>, <16 x float>, i16)
6729 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6730 // (<8 x float>, <8 x float>, i8)
6731 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6732 // (<4 x float>, <4 x float>, i8)
6733 //
6734 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6735 // (<8 x double>, <8 x double>, i8)
6736 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6737 // (<4 x double>, <4 x double>, i8)
6738 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6739 // (<2 x double>, <2 x double>, i8)
6740 //
6741 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6742 // (<32 x bfloat>, <32 x bfloat>, i32)
6743 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6744 // (<16 x bfloat>, <16 x bfloat>, i16)
6745 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6746 // (<8 x bfloat>, <8 x bfloat>, i8)
6747 //
6748 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6749 // (<32 x half>, <32 x half>, i32)
6750 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6751 // (<16 x half>, <16 x half>, i16)
6752 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6753 // (<8 x half>, <8 x half>, i8)
6754 //
6755 // TODO: 3-operand variants are not handled:
6756 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6757 // (<2 x double>, <2 x double>, <2 x double>, i8)
6758 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6759 // (<4 x float>, <4 x float>, <4 x float>, i8)
6760 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6761 // (<8 x half>, <8 x half>, <8 x half>, i8)
6762 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6763 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6764 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6765 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6766 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6767 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6768 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6769 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6770 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6771 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6772 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6773 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6774 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6775 /*WriteThruIndex=*/1,
6776 /*MaskIndex=*/2);
6777 break;
6778
6779 // AVX512/AVX10 Reciprocal Square Root
6780 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6781 // (<16 x float>, <16 x float>, i16)
6782 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6783 // (<8 x float>, <8 x float>, i8)
6784 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6785 // (<4 x float>, <4 x float>, i8)
6786 //
6787 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6788 // (<8 x double>, <8 x double>, i8)
6789 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6790 // (<4 x double>, <4 x double>, i8)
6791 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6792 // (<2 x double>, <2 x double>, i8)
6793 //
6794 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6795 // (<32 x bfloat>, <32 x bfloat>, i32)
6796 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6797 // (<16 x bfloat>, <16 x bfloat>, i16)
6798 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6799 // (<8 x bfloat>, <8 x bfloat>, i8)
6800 //
6801 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6802 // (<32 x half>, <32 x half>, i32)
6803 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6804 // (<16 x half>, <16 x half>, i16)
6805 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6806 // (<8 x half>, <8 x half>, i8)
6807 //
6808 // TODO: 3-operand variants are not handled:
6809 // <2 x double> @llvm.x86.avx512.rcp14.sd
6810 // (<2 x double>, <2 x double>, <2 x double>, i8)
6811 // <4 x float> @llvm.x86.avx512.rcp14.ss
6812 // (<4 x float>, <4 x float>, <4 x float>, i8)
6813 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6814 // (<8 x half>, <8 x half>, <8 x half>, i8)
6815 case Intrinsic::x86_avx512_rcp14_ps_512:
6816 case Intrinsic::x86_avx512_rcp14_ps_256:
6817 case Intrinsic::x86_avx512_rcp14_ps_128:
6818 case Intrinsic::x86_avx512_rcp14_pd_512:
6819 case Intrinsic::x86_avx512_rcp14_pd_256:
6820 case Intrinsic::x86_avx512_rcp14_pd_128:
6821 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6822 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6823 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6824 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6825 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6826 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6827 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6828 /*WriteThruIndex=*/1,
6829 /*MaskIndex=*/2);
6830 break;
6831
6832 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6833 // (<32 x half>, i32, <32 x half>, i32, i32)
6834 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6835 // (<16 x half>, i32, <16 x half>, i32, i16)
6836 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6837 // (<8 x half>, i32, <8 x half>, i32, i8)
6838 //
6839 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6840 // (<16 x float>, i32, <16 x float>, i16, i32)
6841 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6842 // (<8 x float>, i32, <8 x float>, i8)
6843 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6844 // (<4 x float>, i32, <4 x float>, i8)
6845 //
6846 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6847 // (<8 x double>, i32, <8 x double>, i8, i32)
6848 // A Imm WriteThru Mask Rounding
6849 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6850 // (<4 x double>, i32, <4 x double>, i8)
6851 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6852 // (<2 x double>, i32, <2 x double>, i8)
6853 // A Imm WriteThru Mask
6854 //
6855 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6856 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6857 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6858 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6859 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6860 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6861 //
6862 // Not supported: three vectors
6863 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6864 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6865 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6866 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6867 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6868 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6869 // i32)
6870 // A B WriteThru Mask Imm
6871 // Rounding
6872 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6873 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6874 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6875 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6876 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6877 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6878 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6879 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6880 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6881 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6882 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6883 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6884 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6885 /*WriteThruIndex=*/2,
6886 /*MaskIndex=*/3);
6887 break;
6888
6889 // AVX512 FP16 Arithmetic
6890 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6891 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6892 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6893 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6894 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6895 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6896 visitGenericScalarHalfwordInst(I);
6897 break;
6898 }
6899
6900 // AVX Galois Field New Instructions
6901 case Intrinsic::x86_vgf2p8affineqb_128:
6902 case Intrinsic::x86_vgf2p8affineqb_256:
6903 case Intrinsic::x86_vgf2p8affineqb_512:
6904 handleAVXGF2P8Affine(I);
6905 break;
6906
6907 default:
6908 return false;
6909 }
6910
6911 return true;
6912 }
6913
6914 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6915 switch (I.getIntrinsicID()) {
6916 // Two operands e.g.,
6917 // - <8 x i8> @llvm.aarch64.neon.rshrn.v8i8 (<8 x i16>, i32)
6918 // - <4 x i16> @llvm.aarch64.neon.uqrshl.v4i16(<4 x i16>, <4 x i16>)
6919 case Intrinsic::aarch64_neon_rshrn:
6920 case Intrinsic::aarch64_neon_sqrshl:
6921 case Intrinsic::aarch64_neon_sqrshrn:
6922 case Intrinsic::aarch64_neon_sqrshrun:
6923 case Intrinsic::aarch64_neon_sqshl:
6924 case Intrinsic::aarch64_neon_sqshlu:
6925 case Intrinsic::aarch64_neon_sqshrn:
6926 case Intrinsic::aarch64_neon_sqshrun:
6927 case Intrinsic::aarch64_neon_srshl:
6928 case Intrinsic::aarch64_neon_sshl:
6929 case Intrinsic::aarch64_neon_uqrshl:
6930 case Intrinsic::aarch64_neon_uqrshrn:
6931 case Intrinsic::aarch64_neon_uqshl:
6932 case Intrinsic::aarch64_neon_uqshrn:
6933 case Intrinsic::aarch64_neon_urshl:
6934 case Intrinsic::aarch64_neon_ushl:
6935 handleVectorShiftIntrinsic(I, /* Variable */ false);
6936 break;
6937
6938 // Vector Shift Left/Right and Insert
6939 //
6940 // Three operands e.g.,
6941 // - <4 x i16> @llvm.aarch64.neon.vsli.v4i16
6942 // (<4 x i16> %a, <4 x i16> %b, i32 %n)
6943 // - <16 x i8> @llvm.aarch64.neon.vsri.v16i8
6944 // (<16 x i8> %a, <16 x i8> %b, i32 %n)
6945 //
6946 // %b is shifted by %n bits, and the "missing" bits are filled in with %a
6947 // (instead of zero-extending/sign-extending).
6948 case Intrinsic::aarch64_neon_vsli:
6949 case Intrinsic::aarch64_neon_vsri:
6950 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6951 /*trailingVerbatimArgs=*/1);
6952 break;
6953
6954 // TODO: handling max/min similarly to AND/OR may be more precise
6955 // Floating-Point Maximum/Minimum Pairwise
6956 case Intrinsic::aarch64_neon_fmaxp:
6957 case Intrinsic::aarch64_neon_fminp:
6958 // Floating-Point Maximum/Minimum Number Pairwise
6959 case Intrinsic::aarch64_neon_fmaxnmp:
6960 case Intrinsic::aarch64_neon_fminnmp:
6961 // Signed/Unsigned Maximum/Minimum Pairwise
6962 case Intrinsic::aarch64_neon_smaxp:
6963 case Intrinsic::aarch64_neon_sminp:
6964 case Intrinsic::aarch64_neon_umaxp:
6965 case Intrinsic::aarch64_neon_uminp:
6966 // Add Pairwise
6967 case Intrinsic::aarch64_neon_addp:
6968 // Floating-point Add Pairwise
6969 case Intrinsic::aarch64_neon_faddp:
6970 // Add Long Pairwise
6971 case Intrinsic::aarch64_neon_saddlp:
6972 case Intrinsic::aarch64_neon_uaddlp: {
6973 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6974 break;
6975 }
6976
6977 // Floating-point Convert to integer, rounding to nearest with ties to Away
6978 case Intrinsic::aarch64_neon_fcvtas:
6979 case Intrinsic::aarch64_neon_fcvtau:
6980 // Floating-point convert to integer, rounding toward minus infinity
6981 case Intrinsic::aarch64_neon_fcvtms:
6982 case Intrinsic::aarch64_neon_fcvtmu:
6983 // Floating-point convert to integer, rounding to nearest with ties to even
6984 case Intrinsic::aarch64_neon_fcvtns:
6985 case Intrinsic::aarch64_neon_fcvtnu:
6986 // Floating-point convert to integer, rounding toward plus infinity
6987 case Intrinsic::aarch64_neon_fcvtps:
6988 case Intrinsic::aarch64_neon_fcvtpu:
6989 // Floating-point Convert to integer, rounding toward Zero
6990 case Intrinsic::aarch64_neon_fcvtzs:
6991 case Intrinsic::aarch64_neon_fcvtzu:
6992 // Floating-point convert to lower precision narrow, rounding to odd
6993 case Intrinsic::aarch64_neon_fcvtxn:
6994 // Vector Conversions Between Half-Precision and Single-Precision
6995 case Intrinsic::aarch64_neon_vcvthf2fp:
6996 case Intrinsic::aarch64_neon_vcvtfp2hf:
6997 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/false);
6998 break;
6999
7000 // Vector Conversions Between Fixed-Point and Floating-Point
7001 case Intrinsic::aarch64_neon_vcvtfxs2fp:
7002 case Intrinsic::aarch64_neon_vcvtfp2fxs:
7003 case Intrinsic::aarch64_neon_vcvtfxu2fp:
7004 case Intrinsic::aarch64_neon_vcvtfp2fxu:
7005 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/true);
7006 break;
7007
7008 // TODO: bfloat conversions
7009 // - bfloat @llvm.aarch64.neon.bfcvt(float)
7010 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
7011 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
7012
7013 // Add reduction to scalar
7014 case Intrinsic::aarch64_neon_faddv:
7015 case Intrinsic::aarch64_neon_saddv:
7016 case Intrinsic::aarch64_neon_uaddv:
7017 // Signed/Unsigned min/max (Vector)
7018 // TODO: handling similarly to AND/OR may be more precise.
7019 case Intrinsic::aarch64_neon_smaxv:
7020 case Intrinsic::aarch64_neon_sminv:
7021 case Intrinsic::aarch64_neon_umaxv:
7022 case Intrinsic::aarch64_neon_uminv:
7023 // Floating-point min/max (vector)
7024 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
7025 // but our shadow propagation is the same.
7026 case Intrinsic::aarch64_neon_fmaxv:
7027 case Intrinsic::aarch64_neon_fminv:
7028 case Intrinsic::aarch64_neon_fmaxnmv:
7029 case Intrinsic::aarch64_neon_fminnmv:
7030 // Sum long across vector
7031 case Intrinsic::aarch64_neon_saddlv:
7032 case Intrinsic::aarch64_neon_uaddlv:
7033 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
7034 break;
7035
7036 case Intrinsic::aarch64_neon_ld1x2:
7037 case Intrinsic::aarch64_neon_ld1x3:
7038 case Intrinsic::aarch64_neon_ld1x4:
7039 case Intrinsic::aarch64_neon_ld2:
7040 case Intrinsic::aarch64_neon_ld3:
7041 case Intrinsic::aarch64_neon_ld4:
7042 case Intrinsic::aarch64_neon_ld2r:
7043 case Intrinsic::aarch64_neon_ld3r:
7044 case Intrinsic::aarch64_neon_ld4r: {
7045 handleNEONVectorLoad(I, /*WithLane=*/false);
7046 break;
7047 }
7048
7049 case Intrinsic::aarch64_neon_ld2lane:
7050 case Intrinsic::aarch64_neon_ld3lane:
7051 case Intrinsic::aarch64_neon_ld4lane: {
7052 handleNEONVectorLoad(I, /*WithLane=*/true);
7053 break;
7054 }
7055
7056 // Saturating extract narrow
7057 case Intrinsic::aarch64_neon_sqxtn:
7058 case Intrinsic::aarch64_neon_sqxtun:
7059 case Intrinsic::aarch64_neon_uqxtn:
7060 // These only have one argument, but we (ab)use handleShadowOr because it
7061 // does work on single argument intrinsics and will typecast the shadow
7062 // (and update the origin).
7063 handleShadowOr(I);
7064 break;
7065
7066 case Intrinsic::aarch64_neon_st1x2:
7067 case Intrinsic::aarch64_neon_st1x3:
7068 case Intrinsic::aarch64_neon_st1x4:
7069 case Intrinsic::aarch64_neon_st2:
7070 case Intrinsic::aarch64_neon_st3:
7071 case Intrinsic::aarch64_neon_st4: {
7072 handleNEONVectorStoreIntrinsic(I, false);
7073 break;
7074 }
7075
7076 case Intrinsic::aarch64_neon_st2lane:
7077 case Intrinsic::aarch64_neon_st3lane:
7078 case Intrinsic::aarch64_neon_st4lane: {
7079 handleNEONVectorStoreIntrinsic(I, true);
7080 break;
7081 }
7082
7083 // Arm NEON vector table intrinsics have the source/table register(s) as
7084 // arguments, followed by the index register. They return the output.
7085 //
7086 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
7087 // original value unchanged in the destination register.'
7088 // Conveniently, zero denotes a clean shadow, which means out-of-range
7089 // indices for TBL will initialize the user data with zero and also clean
7090 // the shadow. (For TBX, neither the user data nor the shadow will be
7091 // updated, which is also correct.)
7092 case Intrinsic::aarch64_neon_tbl1:
7093 case Intrinsic::aarch64_neon_tbl2:
7094 case Intrinsic::aarch64_neon_tbl3:
7095 case Intrinsic::aarch64_neon_tbl4:
7096 case Intrinsic::aarch64_neon_tbx1:
7097 case Intrinsic::aarch64_neon_tbx2:
7098 case Intrinsic::aarch64_neon_tbx3:
7099 case Intrinsic::aarch64_neon_tbx4: {
7100 // The last trailing argument (index register) should be handled verbatim
7101 handleIntrinsicByApplyingToShadow(
7102 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
7103 /*trailingVerbatimArgs*/ 1);
7104 break;
7105 }
7106
7107 case Intrinsic::aarch64_neon_fmulx:
7108 case Intrinsic::aarch64_neon_pmul:
7109 case Intrinsic::aarch64_neon_pmull:
7110 case Intrinsic::aarch64_neon_smull:
7111 case Intrinsic::aarch64_neon_pmull64:
7112 case Intrinsic::aarch64_neon_umull: {
7113 handleNEONVectorMultiplyIntrinsic(I);
7114 break;
7115 }
7116
7117 case Intrinsic::aarch64_neon_smmla:
7118 case Intrinsic::aarch64_neon_ummla:
7119 case Intrinsic::aarch64_neon_usmmla:
7120 case Intrinsic::aarch64_neon_bfmmla:
7121 handleNEONMatrixMultiply(I);
7122 break;
7123
7124 // <2 x i32> @llvm.aarch64.neon.{u,s,us}dot.v2i32.v8i8
7125 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
7126 // <4 x i32> @llvm.aarch64.neon.{u,s,us}dot.v4i32.v16i8
7127 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
7128 case Intrinsic::aarch64_neon_sdot:
7129 case Intrinsic::aarch64_neon_udot:
7130 case Intrinsic::aarch64_neon_usdot:
7131 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
7132 /*ZeroPurifies=*/true,
7133 /*EltSizeInBits=*/0,
7134 /*Lanes=*/kBothLanes);
7135 break;
7136
7137 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
7138 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
7139 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
7140 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
7141 case Intrinsic::aarch64_neon_bfdot:
7142 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
7143 /*ZeroPurifies=*/false,
7144 /*EltSizeInBits=*/0,
7145 /*Lanes=*/kBothLanes);
7146 break;
7147
7148 // Floating-Point Absolute Compare Greater Than/Equal
7149 case Intrinsic::aarch64_neon_facge:
7150 case Intrinsic::aarch64_neon_facgt:
7151 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/false);
7152 break;
7153
7154 default:
7155 return false;
7156 }
7157
7158 return true;
7159 }
7160
7161 void visitIntrinsicInst(IntrinsicInst &I) {
7162 if (maybeHandleCrossPlatformIntrinsic(I))
7163 return;
7164
7165 if (maybeHandleX86SIMDIntrinsic(I))
7166 return;
7167
7168 if (maybeHandleArmSIMDIntrinsic(I))
7169 return;
7170
7171 if (maybeHandleUnknownIntrinsic(I))
7172 return;
7173
7174 visitInstruction(I);
7175 }
7176
7177 void visitLibAtomicLoad(CallBase &CB) {
7178 // Since we use getNextNode here, we can't have CB terminate the BB.
7179 assert(isa<CallInst>(CB));
7180
7181 IRBuilder<> IRB(&CB);
7182 Value *Size = CB.getArgOperand(0);
7183 Value *SrcPtr = CB.getArgOperand(1);
7184 Value *DstPtr = CB.getArgOperand(2);
7185 Value *Ordering = CB.getArgOperand(3);
7186 // Convert the call to have at least Acquire ordering to make sure
7187 // the shadow operations aren't reordered before it.
7188 Value *NewOrdering =
7189 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
7190 CB.setArgOperand(3, NewOrdering);
7191
7192 NextNodeIRBuilder NextIRB(&CB);
7193 Value *SrcShadowPtr, *SrcOriginPtr;
7194 std::tie(SrcShadowPtr, SrcOriginPtr) =
7195 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7196 /*isStore*/ false);
7197 Value *DstShadowPtr =
7198 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7199 /*isStore*/ true)
7200 .first;
7201
7202 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
7203 if (MS.TrackOrigins) {
7204 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
7206 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
7207 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
7208 }
7209 }
7210
7211 void visitLibAtomicStore(CallBase &CB) {
7212 IRBuilder<> IRB(&CB);
7213 Value *Size = CB.getArgOperand(0);
7214 Value *DstPtr = CB.getArgOperand(2);
7215 Value *Ordering = CB.getArgOperand(3);
7216 // Convert the call to have at least Release ordering to make sure
7217 // the shadow operations aren't reordered after it.
7218 Value *NewOrdering =
7219 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
7220 CB.setArgOperand(3, NewOrdering);
7221
7222 Value *DstShadowPtr =
7223 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
7224 /*isStore*/ true)
7225 .first;
7226
7227 // Atomic store always paints clean shadow/origin. See file header.
7228 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
7229 Align(1));
7230 }
7231
7232 void visitCallBase(CallBase &CB) {
7233 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
7234 if (CB.isInlineAsm()) {
7235 // For inline asm (either a call to asm function, or callbr instruction),
7236 // do the usual thing: check argument shadow and mark all outputs as
7237 // clean. Note that any side effects of the inline asm that are not
7238 // immediately visible in its constraints are not handled.
7240 visitAsmInstruction(CB);
7241 else
7242 visitInstruction(CB);
7243 return;
7244 }
7245 LibFunc LF;
7246 if (TLI->getLibFunc(CB, LF)) {
7247 // libatomic.a functions need to have special handling because there isn't
7248 // a good way to intercept them or compile the library with
7249 // instrumentation.
7250 switch (LF) {
7251 case LibFunc_atomic_load:
7252 if (!isa<CallInst>(CB)) {
7253 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
7254 "Ignoring!\n";
7255 break;
7256 }
7257 visitLibAtomicLoad(CB);
7258 return;
7259 case LibFunc_atomic_store:
7260 visitLibAtomicStore(CB);
7261 return;
7262 default:
7263 break;
7264 }
7265 }
7266
7267 if (auto *Call = dyn_cast<CallInst>(&CB)) {
7268 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7269
7270 // We are going to insert code that relies on the fact that the callee
7271 // will become a non-readonly function after it is instrumented by us. To
7272 // prevent this code from being optimized out, mark that function
7273 // non-readonly in advance.
7274 // TODO: We can likely do better than dropping memory() completely here.
7275 AttributeMask B;
7276 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
7277
7279 if (Function *Func = Call->getCalledFunction()) {
7280 Func->removeFnAttrs(B);
7281 }
7282
7284 }
7285 IRBuilder<> IRB(&CB);
7286 bool MayCheckCall = MS.EagerChecks;
7287 if (Function *Func = CB.getCalledFunction()) {
7288 // __sanitizer_unaligned_{load,store} functions may be called by users
7289 // and always expects shadows in the TLS. So don't check them.
7290 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
7291 }
7292
7293 unsigned ArgOffset = 0;
7294 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7295 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
7296 if (!A->getType()->isSized()) {
7297 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7298 continue;
7299 }
7300
7301 if (A->getType()->isScalableTy()) {
7302 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7303 // Handle as noundef, but don't reserve tls slots.
7304 insertCheckShadowOf(A, &CB);
7305 continue;
7306 }
7307
7308 unsigned Size = 0;
7309 const DataLayout &DL = F.getDataLayout();
7310
7311 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
7312 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
7313 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7314
7315 if (EagerCheck) {
7316 insertCheckShadowOf(A, &CB);
7317 Size = DL.getTypeAllocSize(A->getType());
7318 } else {
7319 [[maybe_unused]] Value *Store = nullptr;
7320 // Compute the Shadow for arg even if it is ByVal, because
7321 // in that case getShadow() will copy the actual arg shadow to
7322 // __msan_param_tls.
7323 Value *ArgShadow = getShadow(A);
7324 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7325 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7326 << " Shadow: " << *ArgShadow << "\n");
7327 if (ByVal) {
7328 // ByVal requires some special handling as it's too big for a single
7329 // load
7330 assert(A->getType()->isPointerTy() &&
7331 "ByVal argument is not a pointer!");
7332 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
7333 if (ArgOffset + Size > kParamTLSSize)
7334 break;
7335 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
7336 MaybeAlign Alignment = std::nullopt;
7337 if (ParamAlignment)
7338 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
7339 Value *AShadowPtr, *AOriginPtr;
7340 std::tie(AShadowPtr, AOriginPtr) =
7341 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
7342 /*isStore*/ false);
7343 if (!PropagateShadow) {
7344 Store = IRB.CreateMemSet(ArgShadowBase,
7346 Size, Alignment);
7347 } else {
7348 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
7349 Alignment, Size);
7350 if (MS.TrackOrigins) {
7351 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7352 // FIXME: OriginSize should be:
7353 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7354 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
7355 IRB.CreateMemCpy(
7356 ArgOriginBase,
7357 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
7358 AOriginPtr,
7359 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
7360 }
7361 }
7362 } else {
7363 // Any other parameters mean we need bit-grained tracking of uninit
7364 // data
7365 Size = DL.getTypeAllocSize(A->getType());
7366 if (ArgOffset + Size > kParamTLSSize)
7367 break;
7368 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
7370 Constant *Cst = dyn_cast<Constant>(ArgShadow);
7371 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7372 IRB.CreateStore(getOrigin(A),
7373 getOriginPtrForArgument(IRB, ArgOffset));
7374 }
7375 }
7376 assert(Store != nullptr);
7377 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7378 }
7379 assert(Size != 0);
7380 ArgOffset += alignTo(Size, kShadowTLSAlignment);
7381 }
7382 LLVM_DEBUG(dbgs() << " done with call args\n");
7383
7384 FunctionType *FT = CB.getFunctionType();
7385 if (FT->isVarArg()) {
7386 VAHelper->visitCallBase(CB, IRB);
7387 }
7388
7389 // Now, get the shadow for the RetVal.
7390 if (!CB.getType()->isSized())
7391 return;
7392 // Don't emit the epilogue for musttail call returns.
7393 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
7394 return;
7395
7396 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
7397 setShadow(&CB, getCleanShadow(&CB));
7398 setOrigin(&CB, getCleanOrigin());
7399 return;
7400 }
7401
7402 IRBuilder<> IRBBefore(&CB);
7403 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7404 Value *Base = getShadowPtrForRetval(IRBBefore);
7405 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
7407 BasicBlock::iterator NextInsn;
7408 if (isa<CallInst>(CB)) {
7409 NextInsn = ++CB.getIterator();
7410 assert(NextInsn != CB.getParent()->end());
7411 } else {
7412 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
7413 if (!NormalDest->getSinglePredecessor()) {
7414 // FIXME: this case is tricky, so we are just conservative here.
7415 // Perhaps we need to split the edge between this BB and NormalDest,
7416 // but a naive attempt to use SplitEdge leads to a crash.
7417 setShadow(&CB, getCleanShadow(&CB));
7418 setOrigin(&CB, getCleanOrigin());
7419 return;
7420 }
7421 // FIXME: NextInsn is likely in a basic block that has not been visited
7422 // yet. Anything inserted there will be instrumented by MSan later!
7423 NextInsn = NormalDest->getFirstInsertionPt();
7424 assert(NextInsn != NormalDest->end() &&
7425 "Could not find insertion point for retval shadow load");
7426 }
7427 IRBuilder<> IRBAfter(&*NextInsn);
7428 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7429 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7430 "_msret");
7431 setShadow(&CB, RetvalShadow);
7432 if (MS.TrackOrigins)
7433 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7434 }
7435
7436 bool isAMustTailRetVal(Value *RetVal) {
7437 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7438 RetVal = I->getOperand(0);
7439 }
7440 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7441 return I->isMustTailCall();
7442 }
7443 return false;
7444 }
7445
7446 void visitReturnInst(ReturnInst &I) {
7447 IRBuilder<> IRB(&I);
7448 Value *RetVal = I.getReturnValue();
7449 if (!RetVal)
7450 return;
7451 // Don't emit the epilogue for musttail call returns.
7452 if (isAMustTailRetVal(RetVal))
7453 return;
7454 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7455 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7456 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7457 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7458 // must always return fully initialized values. For now, we hardcode "main".
7459 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7460
7461 Value *Shadow = getShadow(RetVal);
7462 bool StoreOrigin = true;
7463 if (EagerCheck) {
7464 insertCheckShadowOf(RetVal, &I);
7465 Shadow = getCleanShadow(RetVal);
7466 StoreOrigin = false;
7467 }
7468
7469 // The caller may still expect information passed over TLS if we pass our
7470 // check
7471 if (StoreShadow) {
7472 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7473 if (MS.TrackOrigins && StoreOrigin)
7474 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7475 }
7476 }
7477
7478 void visitPHINode(PHINode &I) {
7479 IRBuilder<> IRB(&I);
7480 if (!PropagateShadow) {
7481 setShadow(&I, getCleanShadow(&I));
7482 setOrigin(&I, getCleanOrigin());
7483 return;
7484 }
7485
7486 ShadowPHINodes.push_back(&I);
7487 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7488 "_msphi_s"));
7489 if (MS.TrackOrigins)
7490 setOrigin(
7491 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7492 }
7493
7494 Value *getLocalVarIdptr(AllocaInst &I) {
7495 ConstantInt *IntConst =
7496 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7497 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7498 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7499 IntConst);
7500 }
7501
7502 Value *getLocalVarDescription(AllocaInst &I) {
7503 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7504 }
7505
7506 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7507 if (PoisonStack && ClPoisonStackWithCall) {
7508 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7509 } else {
7510 Value *ShadowBase, *OriginBase;
7511 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7512 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7513
7514 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7515 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7516 }
7517
7518 if (PoisonStack && MS.TrackOrigins) {
7519 Value *Idptr = getLocalVarIdptr(I);
7520 if (ClPrintStackNames) {
7521 Value *Descr = getLocalVarDescription(I);
7522 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7523 {&I, Len, Idptr, Descr});
7524 } else {
7525 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7526 }
7527 }
7528 }
7529
7530 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7531 Value *Descr = getLocalVarDescription(I);
7532 if (PoisonStack) {
7533 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7534 } else {
7535 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7536 }
7537 }
7538
7539 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7540 if (!InsPoint)
7541 InsPoint = &I;
7542 NextNodeIRBuilder IRB(InsPoint);
7543 Value *Len = IRB.CreateAllocationSize(MS.IntptrTy, &I);
7544
7545 if (MS.CompileKernel)
7546 poisonAllocaKmsan(I, IRB, Len);
7547 else
7548 poisonAllocaUserspace(I, IRB, Len);
7549 }
7550
7551 void visitAllocaInst(AllocaInst &I) {
7552 setShadow(&I, getCleanShadow(&I));
7553 setOrigin(&I, getCleanOrigin());
7554 // We'll get to this alloca later unless it's poisoned at the corresponding
7555 // llvm.lifetime.start.
7556 AllocaSet.insert(&I);
7557 }
7558
7559 void visitSelectInst(SelectInst &I) {
7560 // a = select b, c, d
7561 Value *B = I.getCondition();
7562 Value *C = I.getTrueValue();
7563 Value *D = I.getFalseValue();
7564
7565 handleSelectLikeInst(I, B, C, D);
7566 }
7567
7568 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7569 IRBuilder<> IRB(&I);
7570
7571 Value *Sb = getShadow(B);
7572 Value *Sc = getShadow(C);
7573 Value *Sd = getShadow(D);
7574
7575 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7576 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7577 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7578
7579 // Result shadow if condition shadow is 0.
7580 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7581 Value *Sa1;
7582 if (I.getType()->isAggregateType()) {
7583 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7584 // an extra "select". This results in much more compact IR.
7585 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7586 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7587 } else if (isScalableNonVectorType(I.getType())) {
7588 // This is intended to handle target("aarch64.svcount"), which can't be
7589 // handled in the else branch because of incompatibility with CreateXor
7590 // ("The supported LLVM operations on this type are limited to load,
7591 // store, phi, select and alloca instructions").
7592
7593 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7594 // branch as needed instead.
7595 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7596 } else {
7597 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7598 // If Sb (condition is poisoned), look for bits in c and d that are equal
7599 // and both unpoisoned.
7600 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7601
7602 // Cast arguments to shadow-compatible type.
7603 C = CreateAppToShadowCast(IRB, C);
7604 D = CreateAppToShadowCast(IRB, D);
7605
7606 // Result shadow if condition shadow is 1.
7607 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7608 }
7609 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7610 setShadow(&I, Sa);
7611 if (MS.TrackOrigins) {
7612 // Origins are always i32, so any vector conditions must be flattened.
7613 // FIXME: consider tracking vector origins for app vectors?
7614 if (B->getType()->isVectorTy()) {
7615 B = convertToBool(B, IRB);
7616 Sb = convertToBool(Sb, IRB);
7617 }
7618 // a = select b, c, d
7619 // Oa = Sb ? Ob : (b ? Oc : Od)
7620 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7621 }
7622 }
7623
7624 void visitLandingPadInst(LandingPadInst &I) {
7625 // Do nothing.
7626 // See https://github.com/google/sanitizers/issues/504
7627 setShadow(&I, getCleanShadow(&I));
7628 setOrigin(&I, getCleanOrigin());
7629 }
7630
7631 void visitCatchSwitchInst(CatchSwitchInst &I) {
7632 setShadow(&I, getCleanShadow(&I));
7633 setOrigin(&I, getCleanOrigin());
7634 }
7635
7636 void visitFuncletPadInst(FuncletPadInst &I) {
7637 setShadow(&I, getCleanShadow(&I));
7638 setOrigin(&I, getCleanOrigin());
7639 }
7640
7641 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7642
7643 void visitExtractValueInst(ExtractValueInst &I) {
7644 IRBuilder<> IRB(&I);
7645 Value *Agg = I.getAggregateOperand();
7646 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7647 Value *AggShadow = getShadow(Agg);
7648 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7649 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7650 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7651 setShadow(&I, ResShadow);
7652 setOriginForNaryOp(I);
7653 }
7654
7655 void visitInsertValueInst(InsertValueInst &I) {
7656 IRBuilder<> IRB(&I);
7657 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7658 Value *AggShadow = getShadow(I.getAggregateOperand());
7659 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7660 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7661 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7662 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7663 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7664 setShadow(&I, Res);
7665 setOriginForNaryOp(I);
7666 }
7667
7668 void dumpInst(Instruction &I) {
7669 // Instruction name only
7670 // For intrinsics, the full/overloaded name is used
7671 //
7672 // e.g., "call llvm.aarch64.neon.uqsub.v16i8"
7673 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7674 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7675 } else {
7676 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7677 }
7678
7679 // Instruction prototype (including return type and parameter types)
7680 // For intrinsics, we use the base/non-overloaded name
7681 //
7682 // e.g., "call <16 x i8> @llvm.aarch64.neon.uqsub(<16 x i8>, <16 x i8>)"
7683 unsigned NumOperands = I.getNumOperands();
7684 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7685 errs() << "YYY call " << *I.getType() << " @";
7686
7687 if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI))
7688 errs() << Intrinsic::getBaseName(II->getIntrinsicID());
7689 else
7690 errs() << CI->getCalledFunction()->getName();
7691
7692 errs() << "(";
7693
7694 // The last operand of a CallInst is the function itself.
7695 NumOperands--;
7696 } else
7697 errs() << "YYY " << *I.getType() << " " << I.getOpcodeName() << "(";
7698
7699 for (size_t i = 0; i < NumOperands; i++) {
7700 if (i > 0)
7701 errs() << ", ";
7702
7703 errs() << *(I.getOperand(i)->getType());
7704 }
7705
7706 errs() << ")\n";
7707
7708 // Full instruction, including types and operand values
7709 // For intrinsics, the full/overloaded name is used
7710 //
7711 // e.g., "%vqsubq_v.i15 = call noundef <16 x i8>
7712 // @llvm.aarch64.neon.uqsub.v16i8(<16 x i8> %vext21.i,
7713 // <16 x i8> splat (i8 1)), !dbg !66"
7714 errs() << "QQQ " << I << "\n";
7715 }
7716
7717 void visitResumeInst(ResumeInst &I) {
7718 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7719 // Nothing to do here.
7720 }
7721
7722 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7723 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7724 // Nothing to do here.
7725 }
7726
7727 void visitCatchReturnInst(CatchReturnInst &CRI) {
7728 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7729 // Nothing to do here.
7730 }
7731
7732 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7733 IRBuilder<> &IRB, const DataLayout &DL,
7734 bool isOutput) {
7735 // For each assembly argument, we check its value for being initialized.
7736 // If the argument is a pointer, we assume it points to a single element
7737 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7738 // Each such pointer is instrumented with a call to the runtime library.
7739 Type *OpType = Operand->getType();
7740 // Check the operand value itself.
7741 insertCheckShadowOf(Operand, &I);
7742 if (!OpType->isPointerTy() || !isOutput) {
7743 assert(!isOutput);
7744 return;
7745 }
7746 if (!ElemTy->isSized())
7747 return;
7748 auto Size = DL.getTypeStoreSize(ElemTy);
7749 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7750 if (MS.CompileKernel) {
7751 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7752 } else {
7753 // ElemTy, derived from elementtype(), does not encode the alignment of
7754 // the pointer. Conservatively assume that the shadow memory is unaligned.
7755 // When Size is large, avoid StoreInst as it would expand to many
7756 // instructions.
7757 auto [ShadowPtr, _] =
7758 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7759 if (Size <= 32)
7760 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7761 else
7762 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7763 SizeVal, Align(1));
7764 }
7765 }
7766
7767 /// Get the number of output arguments returned by pointers.
7768 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7769 int NumRetOutputs = 0;
7770 int NumOutputs = 0;
7771 Type *RetTy = cast<Value>(CB)->getType();
7772 if (!RetTy->isVoidTy()) {
7773 // Register outputs are returned via the CallInst return value.
7774 auto *ST = dyn_cast<StructType>(RetTy);
7775 if (ST)
7776 NumRetOutputs = ST->getNumElements();
7777 else
7778 NumRetOutputs = 1;
7779 }
7780 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7781 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7782 switch (Info.Type) {
7784 NumOutputs++;
7785 break;
7786 default:
7787 break;
7788 }
7789 }
7790 return NumOutputs - NumRetOutputs;
7791 }
7792
7793 void visitAsmInstruction(Instruction &I) {
7794 // Conservative inline assembly handling: check for poisoned shadow of
7795 // asm() arguments, then unpoison the result and all the memory locations
7796 // pointed to by those arguments.
7797 // An inline asm() statement in C++ contains lists of input and output
7798 // arguments used by the assembly code. These are mapped to operands of the
7799 // CallInst as follows:
7800 // - nR register outputs ("=r) are returned by value in a single structure
7801 // (SSA value of the CallInst);
7802 // - nO other outputs ("=m" and others) are returned by pointer as first
7803 // nO operands of the CallInst;
7804 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7805 // remaining nI operands.
7806 // The total number of asm() arguments in the source is nR+nO+nI, and the
7807 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7808 // function to be called).
7809 const DataLayout &DL = F.getDataLayout();
7810 CallBase *CB = cast<CallBase>(&I);
7811 IRBuilder<> IRB(&I);
7812 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7813 int OutputArgs = getNumOutputArgs(IA, CB);
7814 // The last operand of a CallInst is the function itself.
7815 int NumOperands = CB->getNumOperands() - 1;
7816
7817 // Check input arguments. Doing so before unpoisoning output arguments, so
7818 // that we won't overwrite uninit values before checking them.
7819 for (int i = OutputArgs; i < NumOperands; i++) {
7820 Value *Operand = CB->getOperand(i);
7821 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7822 /*isOutput*/ false);
7823 }
7824 // Unpoison output arguments. This must happen before the actual InlineAsm
7825 // call, so that the shadow for memory published in the asm() statement
7826 // remains valid.
7827 for (int i = 0; i < OutputArgs; i++) {
7828 Value *Operand = CB->getOperand(i);
7829 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7830 /*isOutput*/ true);
7831 }
7832
7833 setShadow(&I, getCleanShadow(&I));
7834 setOrigin(&I, getCleanOrigin());
7835 }
7836
7837 void visitFreezeInst(FreezeInst &I) {
7838 // Freeze always returns a fully defined value.
7839 setShadow(&I, getCleanShadow(&I));
7840 setOrigin(&I, getCleanOrigin());
7841 }
7842
7843 void visitInstruction(Instruction &I) {
7844 // Everything else: stop propagating and check for poisoned shadow.
7846 dumpInst(I);
7847 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7848 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7849 Value *Operand = I.getOperand(i);
7850 if (Operand->getType()->isSized())
7851 insertCheckShadowOf(Operand, &I);
7852 }
7853 setShadow(&I, getCleanShadow(&I));
7854 setOrigin(&I, getCleanOrigin());
7855 }
7856};
7857
7858struct VarArgHelperBase : public VarArgHelper {
7859 Function &F;
7860 MemorySanitizer &MS;
7861 MemorySanitizerVisitor &MSV;
7862 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7863 const unsigned VAListTagSize;
7864
7865 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7866 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7867 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7868
7869 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7870 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7871 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7872 }
7873
7874 /// Compute the shadow address for a given va_arg.
7875 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7876 return IRB.CreatePtrAdd(
7877 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7878 }
7879
7880 /// Compute the shadow address for a given va_arg.
7881 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7882 unsigned ArgSize) {
7883 // Make sure we don't overflow __msan_va_arg_tls.
7884 if (ArgOffset + ArgSize > kParamTLSSize)
7885 return nullptr;
7886 return getShadowPtrForVAArgument(IRB, ArgOffset);
7887 }
7888
7889 /// Compute the origin address for a given va_arg.
7890 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7891 // getOriginPtrForVAArgument() is always called after
7892 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7893 // overflow.
7894 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7895 ConstantInt::get(MS.IntptrTy, ArgOffset),
7896 "_msarg_va_o");
7897 }
7898
7899 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7900 unsigned BaseOffset) {
7901 // The tails of __msan_va_arg_tls is not large enough to fit full
7902 // value shadow, but it will be copied to backup anyway. Make it
7903 // clean.
7904 if (BaseOffset >= kParamTLSSize)
7905 return;
7906 Value *TailSize =
7907 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7908 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7909 TailSize, Align(8));
7910 }
7911
7912 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7913 IRBuilder<> IRB(&I);
7914 Value *VAListTag = I.getArgOperand(0);
7915 const Align Alignment = Align(8);
7916 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7917 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7918 // Unpoison the whole __va_list_tag.
7919 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7920 VAListTagSize, Alignment, false);
7921 }
7922
7923 void visitVAStartInst(VAStartInst &I) override {
7924 if (F.getCallingConv() == CallingConv::Win64)
7925 return;
7926 VAStartInstrumentationList.push_back(&I);
7927 unpoisonVAListTagForInst(I);
7928 }
7929
7930 void visitVACopyInst(VACopyInst &I) override {
7931 if (F.getCallingConv() == CallingConv::Win64)
7932 return;
7933 unpoisonVAListTagForInst(I);
7934 }
7935};
7936
7937/// AMD64-specific implementation of VarArgHelper.
7938struct VarArgAMD64Helper : public VarArgHelperBase {
7939 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7940 // See a comment in visitCallBase for more details.
7941 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7942 static const unsigned AMD64FpEndOffsetSSE = 176;
7943 // If SSE is disabled, fp_offset in va_list is zero.
7944 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7945
7946 unsigned AMD64FpEndOffset;
7947 AllocaInst *VAArgTLSCopy = nullptr;
7948 AllocaInst *VAArgTLSOriginCopy = nullptr;
7949 Value *VAArgOverflowSize = nullptr;
7950
7951 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7952
7953 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7954 MemorySanitizerVisitor &MSV)
7955 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7956 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7957 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7958 if (Attr.isStringAttribute() &&
7959 (Attr.getKindAsString() == "target-features")) {
7960 if (Attr.getValueAsString().contains("-sse"))
7961 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7962 break;
7963 }
7964 }
7965 }
7966
7967 ArgKind classifyArgument(Value *arg) {
7968 // A very rough approximation of X86_64 argument classification rules.
7969 Type *T = arg->getType();
7970 if (T->isX86_FP80Ty())
7971 return AK_Memory;
7972 if (T->isFPOrFPVectorTy())
7973 return AK_FloatingPoint;
7974 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7975 return AK_GeneralPurpose;
7976 if (T->isPointerTy())
7977 return AK_GeneralPurpose;
7978 return AK_Memory;
7979 }
7980
7981 // For VarArg functions, store the argument shadow in an ABI-specific format
7982 // that corresponds to va_list layout.
7983 // We do this because Clang lowers va_arg in the frontend, and this pass
7984 // only sees the low level code that deals with va_list internals.
7985 // A much easier alternative (provided that Clang emits va_arg instructions)
7986 // would have been to associate each live instance of va_list with a copy of
7987 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7988 // order.
7989 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7990 unsigned GpOffset = 0;
7991 unsigned FpOffset = AMD64GpEndOffset;
7992 unsigned OverflowOffset = AMD64FpEndOffset;
7993 const DataLayout &DL = F.getDataLayout();
7994
7995 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7996 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7997 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7998 if (IsByVal) {
7999 // ByVal arguments always go to the overflow area.
8000 // Fixed arguments passed through the overflow area will be stepped
8001 // over by va_start, so don't count them towards the offset.
8002 if (IsFixed)
8003 continue;
8004 assert(A->getType()->isPointerTy());
8005 Type *RealTy = CB.getParamByValType(ArgNo);
8006 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8007 uint64_t AlignedSize = alignTo(ArgSize, 8);
8008 unsigned BaseOffset = OverflowOffset;
8009 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
8010 Value *OriginBase = nullptr;
8011 if (MS.TrackOrigins)
8012 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
8013 OverflowOffset += AlignedSize;
8014
8015 if (OverflowOffset > kParamTLSSize) {
8016 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8017 continue; // We have no space to copy shadow there.
8018 }
8019
8020 Value *ShadowPtr, *OriginPtr;
8021 std::tie(ShadowPtr, OriginPtr) =
8022 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
8023 /*isStore*/ false);
8024 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
8025 kShadowTLSAlignment, ArgSize);
8026 if (MS.TrackOrigins)
8027 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
8028 kShadowTLSAlignment, ArgSize);
8029 } else {
8030 ArgKind AK = classifyArgument(A);
8031 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
8032 AK = AK_Memory;
8033 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
8034 AK = AK_Memory;
8035 Value *ShadowBase, *OriginBase = nullptr;
8036 switch (AK) {
8037 case AK_GeneralPurpose:
8038 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
8039 if (MS.TrackOrigins)
8040 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
8041 GpOffset += 8;
8042 assert(GpOffset <= kParamTLSSize);
8043 break;
8044 case AK_FloatingPoint:
8045 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
8046 if (MS.TrackOrigins)
8047 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8048 FpOffset += 16;
8049 assert(FpOffset <= kParamTLSSize);
8050 break;
8051 case AK_Memory:
8052 if (IsFixed)
8053 continue;
8054 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8055 uint64_t AlignedSize = alignTo(ArgSize, 8);
8056 unsigned BaseOffset = OverflowOffset;
8057 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
8058 if (MS.TrackOrigins) {
8059 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
8060 }
8061 OverflowOffset += AlignedSize;
8062 if (OverflowOffset > kParamTLSSize) {
8063 // We have no space to copy shadow there.
8064 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8065 continue;
8066 }
8067 }
8068 // Take fixed arguments into account for GpOffset and FpOffset,
8069 // but don't actually store shadows for them.
8070 // TODO(glider): don't call get*PtrForVAArgument() for them.
8071 if (IsFixed)
8072 continue;
8073 Value *Shadow = MSV.getShadow(A);
8074 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
8075 if (MS.TrackOrigins) {
8076 Value *Origin = MSV.getOrigin(A);
8077 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8078 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8080 }
8081 }
8082 }
8083 Constant *OverflowSize =
8084 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
8085 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8086 }
8087
8088 void finalizeInstrumentation() override {
8089 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8090 "finalizeInstrumentation called twice");
8091 if (!VAStartInstrumentationList.empty()) {
8092 // If there is a va_start in this function, make a backup copy of
8093 // va_arg_tls somewhere in the function entry block.
8094 IRBuilder<> IRB(MSV.FnPrologueEnd);
8095 VAArgOverflowSize =
8096 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8097 Value *CopySize = IRB.CreateAdd(
8098 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
8099 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8100 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8101 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8102 CopySize, kShadowTLSAlignment, false);
8103
8104 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8105 Intrinsic::umin, CopySize,
8106 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8107 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8108 kShadowTLSAlignment, SrcSize);
8109 if (MS.TrackOrigins) {
8110 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8111 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8112 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8113 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8114 }
8115 }
8116
8117 // Instrument va_start.
8118 // Copy va_list shadow from the backup copy of the TLS contents.
8119 for (CallInst *OrigInst : VAStartInstrumentationList) {
8120 NextNodeIRBuilder IRB(OrigInst);
8121 Value *VAListTag = OrigInst->getArgOperand(0);
8122
8123 Value *RegSaveAreaPtrPtr =
8124 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
8125 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8126 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8127 const Align Alignment = Align(16);
8128 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8129 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8130 Alignment, /*isStore*/ true);
8131 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8132 AMD64FpEndOffset);
8133 if (MS.TrackOrigins)
8134 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8135 Alignment, AMD64FpEndOffset);
8136 Value *OverflowArgAreaPtrPtr =
8137 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
8138 Value *OverflowArgAreaPtr =
8139 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8140 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8141 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8142 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8143 Alignment, /*isStore*/ true);
8144 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8145 AMD64FpEndOffset);
8146 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8147 VAArgOverflowSize);
8148 if (MS.TrackOrigins) {
8149 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8150 AMD64FpEndOffset);
8151 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8152 VAArgOverflowSize);
8153 }
8154 }
8155 }
8156};
8157
8158/// AArch64-specific implementation of VarArgHelper.
8159struct VarArgAArch64Helper : public VarArgHelperBase {
8160 static const unsigned kAArch64GrArgSize = 64;
8161 static const unsigned kAArch64VrArgSize = 128;
8162
8163 static const unsigned AArch64GrBegOffset = 0;
8164 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
8165 // Make VR space aligned to 16 bytes.
8166 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
8167 static const unsigned AArch64VrEndOffset =
8168 AArch64VrBegOffset + kAArch64VrArgSize;
8169 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
8170
8171 AllocaInst *VAArgTLSCopy = nullptr;
8172 Value *VAArgOverflowSize = nullptr;
8173
8174 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
8175
8176 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
8177 MemorySanitizerVisitor &MSV)
8178 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
8179
8180 // A very rough approximation of aarch64 argument classification rules.
8181 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
8182 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
8183 return {AK_GeneralPurpose, 1};
8184 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
8185 return {AK_FloatingPoint, 1};
8186
8187 if (T->isArrayTy()) {
8188 auto R = classifyArgument(T->getArrayElementType());
8189 R.second *= T->getScalarType()->getArrayNumElements();
8190 return R;
8191 }
8192
8193 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
8194 auto R = classifyArgument(FV->getScalarType());
8195 R.second *= FV->getNumElements();
8196 return R;
8197 }
8198
8199 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
8200 return {AK_Memory, 0};
8201 }
8202
8203 // The instrumentation stores the argument shadow in a non ABI-specific
8204 // format because it does not know which argument is named (since Clang,
8205 // like x86_64 case, lowers the va_args in the frontend and this pass only
8206 // sees the low level code that deals with va_list internals).
8207 // The first seven GR registers are saved in the first 56 bytes of the
8208 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
8209 // the remaining arguments.
8210 // Using constant offset within the va_arg TLS array allows fast copy
8211 // in the finalize instrumentation.
8212 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8213 unsigned GrOffset = AArch64GrBegOffset;
8214 unsigned VrOffset = AArch64VrBegOffset;
8215 unsigned OverflowOffset = AArch64VAEndOffset;
8216
8217 const DataLayout &DL = F.getDataLayout();
8218 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8219 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8220 auto [AK, RegNum] = classifyArgument(A->getType());
8221 if (AK == AK_GeneralPurpose &&
8222 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
8223 AK = AK_Memory;
8224 if (AK == AK_FloatingPoint &&
8225 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
8226 AK = AK_Memory;
8227 Value *Base;
8228 switch (AK) {
8229 case AK_GeneralPurpose:
8230 Base = getShadowPtrForVAArgument(IRB, GrOffset);
8231 GrOffset += 8 * RegNum;
8232 break;
8233 case AK_FloatingPoint:
8234 Base = getShadowPtrForVAArgument(IRB, VrOffset);
8235 VrOffset += 16 * RegNum;
8236 break;
8237 case AK_Memory:
8238 // Don't count fixed arguments in the overflow area - va_start will
8239 // skip right over them.
8240 if (IsFixed)
8241 continue;
8242 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8243 uint64_t AlignedSize = alignTo(ArgSize, 8);
8244 unsigned BaseOffset = OverflowOffset;
8245 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
8246 OverflowOffset += AlignedSize;
8247 if (OverflowOffset > kParamTLSSize) {
8248 // We have no space to copy shadow there.
8249 CleanUnusedTLS(IRB, Base, BaseOffset);
8250 continue;
8251 }
8252 break;
8253 }
8254 // Count Gp/Vr fixed arguments to their respective offsets, but don't
8255 // bother to actually store a shadow.
8256 if (IsFixed)
8257 continue;
8258 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8259 }
8260 Constant *OverflowSize =
8261 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
8262 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8263 }
8264
8265 // Retrieve a va_list field of 'void*' size.
8266 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8267 Value *SaveAreaPtrPtr =
8268 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8269 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
8270 }
8271
8272 // Retrieve a va_list field of 'int' size.
8273 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8274 Value *SaveAreaPtr =
8275 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8276 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
8277 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
8278 }
8279
8280 void finalizeInstrumentation() override {
8281 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8282 "finalizeInstrumentation called twice");
8283 if (!VAStartInstrumentationList.empty()) {
8284 // If there is a va_start in this function, make a backup copy of
8285 // va_arg_tls somewhere in the function entry block.
8286 IRBuilder<> IRB(MSV.FnPrologueEnd);
8287 VAArgOverflowSize =
8288 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8289 Value *CopySize = IRB.CreateAdd(
8290 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
8291 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8292 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8293 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8294 CopySize, kShadowTLSAlignment, false);
8295
8296 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8297 Intrinsic::umin, CopySize,
8298 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8299 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8300 kShadowTLSAlignment, SrcSize);
8301 }
8302
8303 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
8304 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
8305
8306 // Instrument va_start, copy va_list shadow from the backup copy of
8307 // the TLS contents.
8308 for (CallInst *OrigInst : VAStartInstrumentationList) {
8309 NextNodeIRBuilder IRB(OrigInst);
8310
8311 Value *VAListTag = OrigInst->getArgOperand(0);
8312
8313 // The variadic ABI for AArch64 creates two areas to save the incoming
8314 // argument registers (one for 64-bit general register xn-x7 and another
8315 // for 128-bit FP/SIMD vn-v7).
8316 // We need then to propagate the shadow arguments on both regions
8317 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8318 // The remaining arguments are saved on shadow for 'va::stack'.
8319 // One caveat is it requires only to propagate the non-named arguments,
8320 // however on the call site instrumentation 'all' the arguments are
8321 // saved. So to copy the shadow values from the va_arg TLS array
8322 // we need to adjust the offset for both GR and VR fields based on
8323 // the __{gr,vr}_offs value (since they are stores based on incoming
8324 // named arguments).
8325 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8326
8327 // Read the stack pointer from the va_list.
8328 Value *StackSaveAreaPtr =
8329 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
8330
8331 // Read both the __gr_top and __gr_off and add them up.
8332 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
8333 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
8334
8335 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8336 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
8337
8338 // Read both the __vr_top and __vr_off and add them up.
8339 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
8340 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
8341
8342 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8343 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
8344
8345 // It does not know how many named arguments is being used and, on the
8346 // callsite all the arguments were saved. Since __gr_off is defined as
8347 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8348 // argument by ignoring the bytes of shadow from named arguments.
8349 Value *GrRegSaveAreaShadowPtrOff =
8350 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
8351
8352 Value *GrRegSaveAreaShadowPtr =
8353 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8354 Align(8), /*isStore*/ true)
8355 .first;
8356
8357 Value *GrSrcPtr =
8358 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
8359 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
8360
8361 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
8362 GrCopySize);
8363
8364 // Again, but for FP/SIMD values.
8365 Value *VrRegSaveAreaShadowPtrOff =
8366 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
8367
8368 Value *VrRegSaveAreaShadowPtr =
8369 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8370 Align(8), /*isStore*/ true)
8371 .first;
8372
8373 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8374 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
8375 IRB.getInt32(AArch64VrBegOffset)),
8376 VrRegSaveAreaShadowPtrOff);
8377 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
8378
8379 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
8380 VrCopySize);
8381
8382 // And finally for remaining arguments.
8383 Value *StackSaveAreaShadowPtr =
8384 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
8385 Align(16), /*isStore*/ true)
8386 .first;
8387
8388 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8389 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
8390
8391 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
8392 Align(16), VAArgOverflowSize);
8393 }
8394 }
8395};
8396
8397/// PowerPC64-specific implementation of VarArgHelper.
8398struct VarArgPowerPC64Helper : public VarArgHelperBase {
8399 AllocaInst *VAArgTLSCopy = nullptr;
8400 Value *VAArgSize = nullptr;
8401
8402 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8403 MemorySanitizerVisitor &MSV)
8404 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8405
8406 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8407 // For PowerPC, we need to deal with alignment of stack arguments -
8408 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8409 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8410 // For that reason, we compute current offset from stack pointer (which is
8411 // always properly aligned), and offset for the first vararg, then subtract
8412 // them.
8413 unsigned VAArgBase;
8414 Triple TargetTriple(F.getParent()->getTargetTriple());
8415 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8416 // and 32 bytes for ABIv2. This is usually determined by target
8417 // endianness, but in theory could be overridden by function attribute.
8418 if (TargetTriple.isPPC64ELFv2ABI())
8419 VAArgBase = 32;
8420 else
8421 VAArgBase = 48;
8422 unsigned VAArgOffset = VAArgBase;
8423 const DataLayout &DL = F.getDataLayout();
8424 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8425 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8426 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8427 if (IsByVal) {
8428 assert(A->getType()->isPointerTy());
8429 Type *RealTy = CB.getParamByValType(ArgNo);
8430 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8431 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
8432 if (ArgAlign < 8)
8433 ArgAlign = Align(8);
8434 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8435 if (!IsFixed) {
8436 Value *Base =
8437 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8438 if (Base) {
8439 Value *AShadowPtr, *AOriginPtr;
8440 std::tie(AShadowPtr, AOriginPtr) =
8441 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8442 kShadowTLSAlignment, /*isStore*/ false);
8443
8444 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8445 kShadowTLSAlignment, ArgSize);
8446 }
8447 }
8448 VAArgOffset += alignTo(ArgSize, Align(8));
8449 } else {
8450 Value *Base;
8451 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8452 Align ArgAlign = Align(8);
8453 if (A->getType()->isArrayTy()) {
8454 // Arrays are aligned to element size, except for long double
8455 // arrays, which are aligned to 8 bytes.
8456 Type *ElementTy = A->getType()->getArrayElementType();
8457 if (!ElementTy->isPPC_FP128Ty())
8458 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8459 } else if (A->getType()->isVectorTy()) {
8460 // Vectors are naturally aligned.
8461 ArgAlign = Align(ArgSize);
8462 }
8463 if (ArgAlign < 8)
8464 ArgAlign = Align(8);
8465 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8466 if (DL.isBigEndian()) {
8467 // Adjusting the shadow for argument with size < 8 to match the
8468 // placement of bits in big endian system
8469 if (ArgSize < 8)
8470 VAArgOffset += (8 - ArgSize);
8471 }
8472 if (!IsFixed) {
8473 Base =
8474 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8475 if (Base)
8476 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8477 }
8478 VAArgOffset += ArgSize;
8479 VAArgOffset = alignTo(VAArgOffset, Align(8));
8480 }
8481 if (IsFixed)
8482 VAArgBase = VAArgOffset;
8483 }
8484
8485 Constant *TotalVAArgSize =
8486 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8487 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8488 // a new class member i.e. it is the total size of all VarArgs.
8489 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8490 }
8491
8492 void finalizeInstrumentation() override {
8493 assert(!VAArgSize && !VAArgTLSCopy &&
8494 "finalizeInstrumentation called twice");
8495 IRBuilder<> IRB(MSV.FnPrologueEnd);
8496 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8497 Value *CopySize = VAArgSize;
8498
8499 if (!VAStartInstrumentationList.empty()) {
8500 // If there is a va_start in this function, make a backup copy of
8501 // va_arg_tls somewhere in the function entry block.
8502
8503 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8504 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8505 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8506 CopySize, kShadowTLSAlignment, false);
8507
8508 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8509 Intrinsic::umin, CopySize,
8510 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8511 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8512 kShadowTLSAlignment, SrcSize);
8513 }
8514
8515 // Instrument va_start.
8516 // Copy va_list shadow from the backup copy of the TLS contents.
8517 for (CallInst *OrigInst : VAStartInstrumentationList) {
8518 NextNodeIRBuilder IRB(OrigInst);
8519 Value *VAListTag = OrigInst->getArgOperand(0);
8520 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8521
8522 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8523
8524 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8525 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8526 const DataLayout &DL = F.getDataLayout();
8527 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8528 const Align Alignment = Align(IntptrSize);
8529 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8530 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8531 Alignment, /*isStore*/ true);
8532 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8533 CopySize);
8534 }
8535 }
8536};
8537
8538/// PowerPC32-specific implementation of VarArgHelper.
8539struct VarArgPowerPC32Helper : public VarArgHelperBase {
8540 AllocaInst *VAArgTLSCopy = nullptr;
8541 Value *VAArgSize = nullptr;
8542
8543 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8544 MemorySanitizerVisitor &MSV)
8545 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8546
8547 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8548 unsigned VAArgBase;
8549 // Parameter save area is 8 bytes from frame pointer in PPC32
8550 VAArgBase = 8;
8551 unsigned VAArgOffset = VAArgBase;
8552 const DataLayout &DL = F.getDataLayout();
8553 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8554 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8555 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8556 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8557 if (IsByVal) {
8558 assert(A->getType()->isPointerTy());
8559 Type *RealTy = CB.getParamByValType(ArgNo);
8560 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8561 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8562 if (ArgAlign < IntptrSize)
8563 ArgAlign = Align(IntptrSize);
8564 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8565 if (!IsFixed) {
8566 Value *Base =
8567 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8568 if (Base) {
8569 Value *AShadowPtr, *AOriginPtr;
8570 std::tie(AShadowPtr, AOriginPtr) =
8571 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8572 kShadowTLSAlignment, /*isStore*/ false);
8573
8574 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8575 kShadowTLSAlignment, ArgSize);
8576 }
8577 }
8578 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8579 } else {
8580 Value *Base;
8581 Type *ArgTy = A->getType();
8582
8583 // On PPC 32 floating point variable arguments are stored in separate
8584 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8585 // them as they will be found when checking call arguments.
8586 if (!ArgTy->isFloatingPointTy()) {
8587 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8588 Align ArgAlign = Align(IntptrSize);
8589 if (ArgTy->isArrayTy()) {
8590 // Arrays are aligned to element size, except for long double
8591 // arrays, which are aligned to 8 bytes.
8592 Type *ElementTy = ArgTy->getArrayElementType();
8593 if (!ElementTy->isPPC_FP128Ty())
8594 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8595 } else if (ArgTy->isVectorTy()) {
8596 // Vectors are naturally aligned.
8597 ArgAlign = Align(ArgSize);
8598 }
8599 if (ArgAlign < IntptrSize)
8600 ArgAlign = Align(IntptrSize);
8601 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8602 if (DL.isBigEndian()) {
8603 // Adjusting the shadow for argument with size < IntptrSize to match
8604 // the placement of bits in big endian system
8605 if (ArgSize < IntptrSize)
8606 VAArgOffset += (IntptrSize - ArgSize);
8607 }
8608 if (!IsFixed) {
8609 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8610 ArgSize);
8611 if (Base)
8612 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8614 }
8615 VAArgOffset += ArgSize;
8616 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8617 }
8618 }
8619 }
8620
8621 Constant *TotalVAArgSize =
8622 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8623 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8624 // a new class member i.e. it is the total size of all VarArgs.
8625 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8626 }
8627
8628 void finalizeInstrumentation() override {
8629 assert(!VAArgSize && !VAArgTLSCopy &&
8630 "finalizeInstrumentation called twice");
8631 IRBuilder<> IRB(MSV.FnPrologueEnd);
8632 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8633 Value *CopySize = VAArgSize;
8634
8635 if (!VAStartInstrumentationList.empty()) {
8636 // If there is a va_start in this function, make a backup copy of
8637 // va_arg_tls somewhere in the function entry block.
8638
8639 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8640 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8641 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8642 CopySize, kShadowTLSAlignment, false);
8643
8644 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8645 Intrinsic::umin, CopySize,
8646 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8647 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8648 kShadowTLSAlignment, SrcSize);
8649 }
8650
8651 // Instrument va_start.
8652 // Copy va_list shadow from the backup copy of the TLS contents.
8653 for (CallInst *OrigInst : VAStartInstrumentationList) {
8654 NextNodeIRBuilder IRB(OrigInst);
8655 Value *VAListTag = OrigInst->getArgOperand(0);
8656 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8657 Value *RegSaveAreaSize = CopySize;
8658
8659 // In PPC32 va_list_tag is a struct
8660 RegSaveAreaPtrPtr =
8661 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8662
8663 // On PPC 32 reg_save_area can only hold 32 bytes of data
8664 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8665 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8666
8667 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8668 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8669
8670 const DataLayout &DL = F.getDataLayout();
8671 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8672 const Align Alignment = Align(IntptrSize);
8673
8674 { // Copy reg save area
8675 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8676 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8677 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8678 Alignment, /*isStore*/ true);
8679 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8680 Alignment, RegSaveAreaSize);
8681
8682 RegSaveAreaShadowPtr =
8683 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8684 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8685 ConstantInt::get(MS.IntptrTy, 32));
8686 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8687 // We fill fp shadow with zeroes as uninitialized fp args should have
8688 // been found during call base check
8689 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8690 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8691 }
8692
8693 { // Copy overflow area
8694 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8695 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8696
8697 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8698 OverflowAreaPtrPtr =
8699 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8700 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8701
8702 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8703
8704 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8705 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8706 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8707 Alignment, /*isStore*/ true);
8708
8709 Value *OverflowVAArgTLSCopyPtr =
8710 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8711 OverflowVAArgTLSCopyPtr =
8712 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8713
8714 OverflowVAArgTLSCopyPtr =
8715 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8716 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8717 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8718 }
8719 }
8720 }
8721};
8722
8723/// SystemZ-specific implementation of VarArgHelper.
8724struct VarArgSystemZHelper : public VarArgHelperBase {
8725 static const unsigned SystemZGpOffset = 16;
8726 static const unsigned SystemZGpEndOffset = 56;
8727 static const unsigned SystemZFpOffset = 128;
8728 static const unsigned SystemZFpEndOffset = 160;
8729 static const unsigned SystemZMaxVrArgs = 8;
8730 static const unsigned SystemZRegSaveAreaSize = 160;
8731 static const unsigned SystemZOverflowOffset = 160;
8732 static const unsigned SystemZVAListTagSize = 32;
8733 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8734 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8735
8736 bool IsSoftFloatABI;
8737 AllocaInst *VAArgTLSCopy = nullptr;
8738 AllocaInst *VAArgTLSOriginCopy = nullptr;
8739 Value *VAArgOverflowSize = nullptr;
8740
8741 enum class ArgKind {
8742 GeneralPurpose,
8743 FloatingPoint,
8744 Vector,
8745 Memory,
8746 Indirect,
8747 };
8748
8749 enum class ShadowExtension { None, Zero, Sign };
8750
8751 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8752 MemorySanitizerVisitor &MSV)
8753 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8754 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8755
8756 ArgKind classifyArgument(Type *T) {
8757 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8758 // only a few possibilities of what it can be. In particular, enums, single
8759 // element structs and large types have already been taken care of.
8760
8761 // Some i128 and fp128 arguments are converted to pointers only in the
8762 // back end.
8763 if (T->isIntegerTy(128) || T->isFP128Ty())
8764 return ArgKind::Indirect;
8765 if (T->isFloatingPointTy())
8766 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8767 if (T->isIntegerTy() || T->isPointerTy())
8768 return ArgKind::GeneralPurpose;
8769 if (T->isVectorTy())
8770 return ArgKind::Vector;
8771 return ArgKind::Memory;
8772 }
8773
8774 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8775 // ABI says: "One of the simple integer types no more than 64 bits wide.
8776 // ... If such an argument is shorter than 64 bits, replace it by a full
8777 // 64-bit integer representing the same number, using sign or zero
8778 // extension". Shadow for an integer argument has the same type as the
8779 // argument itself, so it can be sign or zero extended as well.
8780 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8781 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8782 if (ZExt) {
8783 assert(!SExt);
8784 return ShadowExtension::Zero;
8785 }
8786 if (SExt) {
8787 assert(!ZExt);
8788 return ShadowExtension::Sign;
8789 }
8790 return ShadowExtension::None;
8791 }
8792
8793 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8794 unsigned GpOffset = SystemZGpOffset;
8795 unsigned FpOffset = SystemZFpOffset;
8796 unsigned VrIndex = 0;
8797 unsigned OverflowOffset = SystemZOverflowOffset;
8798 const DataLayout &DL = F.getDataLayout();
8799 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8800 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8801 // SystemZABIInfo does not produce ByVal parameters.
8802 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8803 Type *T = A->getType();
8804 ArgKind AK = classifyArgument(T);
8805 if (AK == ArgKind::Indirect) {
8806 T = MS.PtrTy;
8807 AK = ArgKind::GeneralPurpose;
8808 }
8809 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8810 AK = ArgKind::Memory;
8811 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8812 AK = ArgKind::Memory;
8813 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8814 AK = ArgKind::Memory;
8815 Value *ShadowBase = nullptr;
8816 Value *OriginBase = nullptr;
8817 ShadowExtension SE = ShadowExtension::None;
8818 switch (AK) {
8819 case ArgKind::GeneralPurpose: {
8820 // Always keep track of GpOffset, but store shadow only for varargs.
8821 uint64_t ArgSize = 8;
8822 if (GpOffset + ArgSize <= kParamTLSSize) {
8823 if (!IsFixed) {
8824 SE = getShadowExtension(CB, ArgNo);
8825 uint64_t GapSize = 0;
8826 if (SE == ShadowExtension::None) {
8827 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8828 assert(ArgAllocSize <= ArgSize);
8829 GapSize = ArgSize - ArgAllocSize;
8830 }
8831 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8832 if (MS.TrackOrigins)
8833 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8834 }
8835 GpOffset += ArgSize;
8836 } else {
8837 GpOffset = kParamTLSSize;
8838 }
8839 break;
8840 }
8841 case ArgKind::FloatingPoint: {
8842 // Always keep track of FpOffset, but store shadow only for varargs.
8843 uint64_t ArgSize = 8;
8844 if (FpOffset + ArgSize <= kParamTLSSize) {
8845 if (!IsFixed) {
8846 // PoP says: "A short floating-point datum requires only the
8847 // left-most 32 bit positions of a floating-point register".
8848 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8849 // don't extend shadow and don't mind the gap.
8850 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8851 if (MS.TrackOrigins)
8852 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8853 }
8854 FpOffset += ArgSize;
8855 } else {
8856 FpOffset = kParamTLSSize;
8857 }
8858 break;
8859 }
8860 case ArgKind::Vector: {
8861 // Keep track of VrIndex. No need to store shadow, since vector varargs
8862 // go through AK_Memory.
8863 assert(IsFixed);
8864 VrIndex++;
8865 break;
8866 }
8867 case ArgKind::Memory: {
8868 // Keep track of OverflowOffset and store shadow only for varargs.
8869 // Ignore fixed args, since we need to copy only the vararg portion of
8870 // the overflow area shadow.
8871 if (!IsFixed) {
8872 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8873 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8874 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8875 SE = getShadowExtension(CB, ArgNo);
8876 uint64_t GapSize =
8877 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8878 ShadowBase =
8879 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8880 if (MS.TrackOrigins)
8881 OriginBase =
8882 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8883 OverflowOffset += ArgSize;
8884 } else {
8885 OverflowOffset = kParamTLSSize;
8886 }
8887 }
8888 break;
8889 }
8890 case ArgKind::Indirect:
8891 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8892 }
8893 if (ShadowBase == nullptr)
8894 continue;
8895 Value *Shadow = MSV.getShadow(A);
8896 if (SE != ShadowExtension::None)
8897 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8898 /*Signed*/ SE == ShadowExtension::Sign);
8899 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8900 IRB.CreateStore(Shadow, ShadowBase);
8901 if (MS.TrackOrigins) {
8902 Value *Origin = MSV.getOrigin(A);
8903 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8904 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8906 }
8907 }
8908 Constant *OverflowSize = ConstantInt::get(
8909 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8910 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8911 }
8912
8913 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8914 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8915 IRB.CreateAdd(
8916 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8917 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8918 MS.PtrTy);
8919 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8920 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8921 const Align Alignment = Align(8);
8922 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8923 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8924 /*isStore*/ true);
8925 // TODO(iii): copy only fragments filled by visitCallBase()
8926 // TODO(iii): support packed-stack && !use-soft-float
8927 // For use-soft-float functions, it is enough to copy just the GPRs.
8928 unsigned RegSaveAreaSize =
8929 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8930 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8931 RegSaveAreaSize);
8932 if (MS.TrackOrigins)
8933 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8934 Alignment, RegSaveAreaSize);
8935 }
8936
8937 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8938 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8939 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8940 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8941 IRB.CreateAdd(
8942 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8943 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8944 MS.PtrTy);
8945 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8946 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8947 const Align Alignment = Align(8);
8948 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8949 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8950 Alignment, /*isStore*/ true);
8951 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8952 SystemZOverflowOffset);
8953 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8954 VAArgOverflowSize);
8955 if (MS.TrackOrigins) {
8956 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8957 SystemZOverflowOffset);
8958 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8959 VAArgOverflowSize);
8960 }
8961 }
8962
8963 void finalizeInstrumentation() override {
8964 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8965 "finalizeInstrumentation called twice");
8966 if (!VAStartInstrumentationList.empty()) {
8967 // If there is a va_start in this function, make a backup copy of
8968 // va_arg_tls somewhere in the function entry block.
8969 IRBuilder<> IRB(MSV.FnPrologueEnd);
8970 VAArgOverflowSize =
8971 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8972 Value *CopySize =
8973 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8974 VAArgOverflowSize);
8975 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8976 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8977 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8978 CopySize, kShadowTLSAlignment, false);
8979
8980 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8981 Intrinsic::umin, CopySize,
8982 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8983 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8984 kShadowTLSAlignment, SrcSize);
8985 if (MS.TrackOrigins) {
8986 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8987 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8988 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8989 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8990 }
8991 }
8992
8993 // Instrument va_start.
8994 // Copy va_list shadow from the backup copy of the TLS contents.
8995 for (CallInst *OrigInst : VAStartInstrumentationList) {
8996 NextNodeIRBuilder IRB(OrigInst);
8997 Value *VAListTag = OrigInst->getArgOperand(0);
8998 copyRegSaveArea(IRB, VAListTag);
8999 copyOverflowArea(IRB, VAListTag);
9000 }
9001 }
9002};
9003
9004/// i386-specific implementation of VarArgHelper.
9005struct VarArgI386Helper : public VarArgHelperBase {
9006 AllocaInst *VAArgTLSCopy = nullptr;
9007 Value *VAArgSize = nullptr;
9008
9009 VarArgI386Helper(Function &F, MemorySanitizer &MS,
9010 MemorySanitizerVisitor &MSV)
9011 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
9012
9013 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9014 const DataLayout &DL = F.getDataLayout();
9015 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9016 unsigned VAArgOffset = 0;
9017 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
9018 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9019 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
9020 if (IsByVal) {
9021 assert(A->getType()->isPointerTy());
9022 Type *RealTy = CB.getParamByValType(ArgNo);
9023 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
9024 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
9025 if (ArgAlign < IntptrSize)
9026 ArgAlign = Align(IntptrSize);
9027 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
9028 if (!IsFixed) {
9029 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9030 if (Base) {
9031 Value *AShadowPtr, *AOriginPtr;
9032 std::tie(AShadowPtr, AOriginPtr) =
9033 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
9034 kShadowTLSAlignment, /*isStore*/ false);
9035
9036 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
9037 kShadowTLSAlignment, ArgSize);
9038 }
9039 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
9040 }
9041 } else {
9042 Value *Base;
9043 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
9044 Align ArgAlign = Align(IntptrSize);
9045 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
9046 if (DL.isBigEndian()) {
9047 // Adjusting the shadow for argument with size < IntptrSize to match
9048 // the placement of bits in big endian system
9049 if (ArgSize < IntptrSize)
9050 VAArgOffset += (IntptrSize - ArgSize);
9051 }
9052 if (!IsFixed) {
9053 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9054 if (Base)
9055 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
9056 VAArgOffset += ArgSize;
9057 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
9058 }
9059 }
9060 }
9061
9062 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
9063 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9064 // a new class member i.e. it is the total size of all VarArgs.
9065 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
9066 }
9067
9068 void finalizeInstrumentation() override {
9069 assert(!VAArgSize && !VAArgTLSCopy &&
9070 "finalizeInstrumentation called twice");
9071 IRBuilder<> IRB(MSV.FnPrologueEnd);
9072 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
9073 Value *CopySize = VAArgSize;
9074
9075 if (!VAStartInstrumentationList.empty()) {
9076 // If there is a va_start in this function, make a backup copy of
9077 // va_arg_tls somewhere in the function entry block.
9078 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9079 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9080 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9081 CopySize, kShadowTLSAlignment, false);
9082
9083 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9084 Intrinsic::umin, CopySize,
9085 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9086 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9087 kShadowTLSAlignment, SrcSize);
9088 }
9089
9090 // Instrument va_start.
9091 // Copy va_list shadow from the backup copy of the TLS contents.
9092 for (CallInst *OrigInst : VAStartInstrumentationList) {
9093 NextNodeIRBuilder IRB(OrigInst);
9094 Value *VAListTag = OrigInst->getArgOperand(0);
9095 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
9096 Value *RegSaveAreaPtrPtr =
9097 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9098 PointerType::get(*MS.C, 0));
9099 Value *RegSaveAreaPtr =
9100 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
9101 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9102 const DataLayout &DL = F.getDataLayout();
9103 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9104 const Align Alignment = Align(IntptrSize);
9105 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9106 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
9107 Alignment, /*isStore*/ true);
9108 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9109 CopySize);
9110 }
9111 }
9112};
9113
9114/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
9115/// LoongArch64.
9116struct VarArgGenericHelper : public VarArgHelperBase {
9117 AllocaInst *VAArgTLSCopy = nullptr;
9118 Value *VAArgSize = nullptr;
9119
9120 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
9121 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
9122 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
9123
9124 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9125 unsigned VAArgOffset = 0;
9126 const DataLayout &DL = F.getDataLayout();
9127 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9128 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
9129 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9130 if (IsFixed)
9131 continue;
9132 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
9133 if (DL.isBigEndian()) {
9134 // Adjusting the shadow for argument with size < IntptrSize to match the
9135 // placement of bits in big endian system
9136 if (ArgSize < IntptrSize)
9137 VAArgOffset += (IntptrSize - ArgSize);
9138 }
9139 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9140 VAArgOffset += ArgSize;
9141 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
9142 if (!Base)
9143 continue;
9144 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
9145 }
9146
9147 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
9148 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9149 // a new class member i.e. it is the total size of all VarArgs.
9150 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
9151 }
9152
9153 void finalizeInstrumentation() override {
9154 assert(!VAArgSize && !VAArgTLSCopy &&
9155 "finalizeInstrumentation called twice");
9156 IRBuilder<> IRB(MSV.FnPrologueEnd);
9157 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
9158 Value *CopySize = VAArgSize;
9159
9160 if (!VAStartInstrumentationList.empty()) {
9161 // If there is a va_start in this function, make a backup copy of
9162 // va_arg_tls somewhere in the function entry block.
9163 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9164 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9165 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9166 CopySize, kShadowTLSAlignment, false);
9167
9168 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9169 Intrinsic::umin, CopySize,
9170 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9171 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9172 kShadowTLSAlignment, SrcSize);
9173 }
9174
9175 // Instrument va_start.
9176 // Copy va_list shadow from the backup copy of the TLS contents.
9177 for (CallInst *OrigInst : VAStartInstrumentationList) {
9178 NextNodeIRBuilder IRB(OrigInst);
9179 Value *VAListTag = OrigInst->getArgOperand(0);
9180 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
9181 Value *RegSaveAreaPtrPtr =
9182 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9183 PointerType::get(*MS.C, 0));
9184 Value *RegSaveAreaPtr =
9185 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
9186 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9187 const DataLayout &DL = F.getDataLayout();
9188 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9189 const Align Alignment = Align(IntptrSize);
9190 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9191 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
9192 Alignment, /*isStore*/ true);
9193 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9194 CopySize);
9195 }
9196 }
9197};
9198
9199// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
9200// regarding VAArgs.
9201using VarArgARM32Helper = VarArgGenericHelper;
9202using VarArgRISCVHelper = VarArgGenericHelper;
9203using VarArgMIPSHelper = VarArgGenericHelper;
9204using VarArgLoongArch64Helper = VarArgGenericHelper;
9205
9206/// A no-op implementation of VarArgHelper.
9207struct VarArgNoOpHelper : public VarArgHelper {
9208 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
9209 MemorySanitizerVisitor &MSV) {}
9210
9211 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
9212
9213 void visitVAStartInst(VAStartInst &I) override {}
9214
9215 void visitVACopyInst(VACopyInst &I) override {}
9216
9217 void finalizeInstrumentation() override {}
9218};
9219
9220} // end anonymous namespace
9221
9222static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
9223 MemorySanitizerVisitor &Visitor) {
9224 // VarArg handling is only implemented on AMD64. False positives are possible
9225 // on other platforms.
9226 Triple TargetTriple(Func.getParent()->getTargetTriple());
9227
9228 if (TargetTriple.getArch() == Triple::x86)
9229 return new VarArgI386Helper(Func, Msan, Visitor);
9230
9231 if (TargetTriple.getArch() == Triple::x86_64)
9232 return new VarArgAMD64Helper(Func, Msan, Visitor);
9233
9234 if (TargetTriple.isARM())
9235 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9236
9237 if (TargetTriple.isAArch64())
9238 return new VarArgAArch64Helper(Func, Msan, Visitor);
9239
9240 if (TargetTriple.isSystemZ())
9241 return new VarArgSystemZHelper(Func, Msan, Visitor);
9242
9243 // On PowerPC32 VAListTag is a struct
9244 // {char, char, i16 padding, char *, char *}
9245 if (TargetTriple.isPPC32())
9246 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
9247
9248 if (TargetTriple.isPPC64())
9249 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
9250
9251 if (TargetTriple.isRISCV32())
9252 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9253
9254 if (TargetTriple.isRISCV64())
9255 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9256
9257 if (TargetTriple.isMIPS32())
9258 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9259
9260 if (TargetTriple.isMIPS64())
9261 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9262
9263 if (TargetTriple.isLoongArch64())
9264 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
9265 /*VAListTagSize=*/8);
9266
9267 return new VarArgNoOpHelper(Func, Msan, Visitor);
9268}
9269
9270bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
9271 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
9272 return false;
9273
9274 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
9275 return false;
9276
9277 MemorySanitizerVisitor Visitor(F, *this, TLI);
9278
9279 // Clear out memory attributes.
9281 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
9282 F.removeFnAttrs(B);
9283
9284 return Visitor.runOnFunction();
9285}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_S390X_MemoryMapParams
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
OddOrEvenLanes
@ kOddLanes
@ kEvenLanes
@ kBothLanes
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< int > ClSwitchPrecision("msan-switch-precision", cl::desc("Controls the number of cases considered by MSan for LLVM switch " "instructions. 0 means no UUMs detected. Higher values lead to " "fewer false negatives but may impact compiler and/or " "application performance. N.B. LLVM switch instructions do not " "correspond exactly to C++ switch statements."), cl::Hidden, cl::init(99))
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
uint64_t IntrinsicInst * II
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:145
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:483
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=false)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:135
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
Definition Constants.cpp:91
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:74
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:802
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:215
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2561
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1957
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1837
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:546
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2615
LLVM_ABI Value * CreateAllocationSize(Type *DestTy, AllocaInst *AI)
Get allocation size of an alloca as a runtime Value* (handles both static and dynamic allocas and vsc...
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2549
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:574
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1871
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:686
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2223
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2608
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2066
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2171
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1516
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:561
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2025
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:566
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1457
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2312
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1944
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1788
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2473
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1812
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2308
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1423
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2176
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended from a 64-bit value.
Definition IRBuilder.h:532
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1854
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1495
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:629
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2054
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2583
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1554
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1867
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1406
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2166
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2641
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2487
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2040
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:604
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1711
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2340
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2320
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2249
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2636
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:599
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1890
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2030
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1535
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1602
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2418
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1576
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:551
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1440
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2787
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:318
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:413
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1082
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1125
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1098
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:420
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1130
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1071
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1077
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:957
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1103
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1048
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1149
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:61
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
LLVM_ABI bool canLosslesslyBitCastTo(Type *Ty) const
Return true if this type could be converted with a lossless BitCast to type 'Ty'.
Definition Type.cpp:153
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:280
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:197
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:230
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:207
unsigned getNumOperands() const
Definition User.h:229
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:397
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
LLVM_ABI StringRef getBaseName(ID id)
Return the LLVM name for an intrinsic, without encoded types for overloading, such as "llvm....
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
Definition Types.h:26
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:532
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1669
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2554
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3895
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2901
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70