LLVM 23.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-handle-lifetime-intrinsics",
304 cl::desc(
305 "when possible, poison scoped variables at the beginning of the scope "
306 "(slower, but more precise)"),
307 cl::Hidden, cl::init(true));
308
309// When compiling the Linux kernel, we sometimes see false positives related to
310// MSan being unable to understand that inline assembly calls may initialize
311// local variables.
312// This flag makes the compiler conservatively unpoison every memory location
313// passed into an assembly call. Note that this may cause false positives.
314// Because it's impossible to figure out the array sizes, we can only unpoison
315// the first sizeof(type) bytes for each type* pointer.
317 "msan-handle-asm-conservative",
318 cl::desc("conservative handling of inline assembly"), cl::Hidden,
319 cl::init(true));
320
321// This flag controls whether we check the shadow of the address
322// operand of load or store. Such bugs are very rare, since load from
323// a garbage address typically results in SEGV, but still happen
324// (e.g. only lower bits of address are garbage, or the access happens
325// early at program startup where malloc-ed memory is more likely to
326// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
328 "msan-check-access-address",
329 cl::desc("report accesses through a pointer which has poisoned shadow"),
330 cl::Hidden, cl::init(true));
331
333 "msan-eager-checks",
334 cl::desc("check arguments and return values at function call boundaries"),
335 cl::Hidden, cl::init(false));
336
338 "msan-dump-strict-instructions",
339 cl::desc("print out instructions with default strict semantics i.e.,"
340 "check that all the inputs are fully initialized, and mark "
341 "the output as fully initialized. These semantics are applied "
342 "to instructions that could not be handled explicitly nor "
343 "heuristically."),
344 cl::Hidden, cl::init(false));
345
346// Currently, all the heuristically handled instructions are specifically
347// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
348// to parallel 'msan-dump-strict-instructions', and to keep the door open to
349// handling non-intrinsic instructions heuristically.
351 "msan-dump-heuristic-instructions",
352 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
353 "Use -msan-dump-strict-instructions to print instructions that "
354 "could not be handled explicitly nor heuristically."),
355 cl::Hidden, cl::init(false));
356
358 "msan-instrumentation-with-call-threshold",
359 cl::desc(
360 "If the function being instrumented requires more than "
361 "this number of checks and origin stores, use callbacks instead of "
362 "inline checks (-1 means never use callbacks)."),
363 cl::Hidden, cl::init(3500));
364
365static cl::opt<bool>
366 ClEnableKmsan("msan-kernel",
367 cl::desc("Enable KernelMemorySanitizer instrumentation"),
368 cl::Hidden, cl::init(false));
369
370static cl::opt<bool>
371 ClDisableChecks("msan-disable-checks",
372 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
373 cl::init(false));
374
375static cl::opt<bool>
376 ClCheckConstantShadow("msan-check-constant-shadow",
377 cl::desc("Insert checks for constant shadow values"),
378 cl::Hidden, cl::init(true));
379
380// This is off by default because of a bug in gold:
381// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
382static cl::opt<bool>
383 ClWithComdat("msan-with-comdat",
384 cl::desc("Place MSan constructors in comdat sections"),
385 cl::Hidden, cl::init(false));
386
387// These options allow to specify custom memory map parameters
388// See MemoryMapParams for details.
389static cl::opt<uint64_t> ClAndMask("msan-and-mask",
390 cl::desc("Define custom MSan AndMask"),
391 cl::Hidden, cl::init(0));
392
393static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
394 cl::desc("Define custom MSan XorMask"),
395 cl::Hidden, cl::init(0));
396
397static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
398 cl::desc("Define custom MSan ShadowBase"),
399 cl::Hidden, cl::init(0));
400
401static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
402 cl::desc("Define custom MSan OriginBase"),
403 cl::Hidden, cl::init(0));
404
405static cl::opt<int>
406 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
407 cl::desc("Define threshold for number of checks per "
408 "debug location to force origin update."),
409 cl::Hidden, cl::init(3));
410
411const char kMsanModuleCtorName[] = "msan.module_ctor";
412const char kMsanInitName[] = "__msan_init";
413
414namespace {
415
416// Memory map parameters used in application-to-shadow address calculation.
417// Offset = (Addr & ~AndMask) ^ XorMask
418// Shadow = ShadowBase + Offset
419// Origin = OriginBase + Offset
420struct MemoryMapParams {
421 uint64_t AndMask;
422 uint64_t XorMask;
423 uint64_t ShadowBase;
424 uint64_t OriginBase;
425};
426
427struct PlatformMemoryMapParams {
428 const MemoryMapParams *bits32;
429 const MemoryMapParams *bits64;
430};
431
432} // end anonymous namespace
433
434// i386 Linux
435static const MemoryMapParams Linux_I386_MemoryMapParams = {
436 0x000080000000, // AndMask
437 0, // XorMask (not used)
438 0, // ShadowBase (not used)
439 0x000040000000, // OriginBase
440};
441
442// x86_64 Linux
443static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
444 0, // AndMask (not used)
445 0x500000000000, // XorMask
446 0, // ShadowBase (not used)
447 0x100000000000, // OriginBase
448};
449
450// mips32 Linux
451// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
452// after picking good constants
453
454// mips64 Linux
455static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
456 0, // AndMask (not used)
457 0x008000000000, // XorMask
458 0, // ShadowBase (not used)
459 0x002000000000, // OriginBase
460};
461
462// ppc32 Linux
463// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
464// after picking good constants
465
466// ppc64 Linux
467static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
468 0xE00000000000, // AndMask
469 0x100000000000, // XorMask
470 0x080000000000, // ShadowBase
471 0x1C0000000000, // OriginBase
472};
473
474// s390x Linux
475static const MemoryMapParams Linux_S390X_MemoryMapParams = {
476 0xC00000000000, // AndMask
477 0, // XorMask (not used)
478 0x080000000000, // ShadowBase
479 0x1C0000000000, // OriginBase
480};
481
482// arm32 Linux
483// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
484// after picking good constants
485
486// aarch64 Linux
487static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
488 0, // AndMask (not used)
489 0x0B00000000000, // XorMask
490 0, // ShadowBase (not used)
491 0x0200000000000, // OriginBase
492};
493
494// loongarch64 Linux
495static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
496 0, // AndMask (not used)
497 0x500000000000, // XorMask
498 0, // ShadowBase (not used)
499 0x100000000000, // OriginBase
500};
501
502// riscv32 Linux
503// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
504// after picking good constants
505
506// aarch64 FreeBSD
507static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
508 0x1800000000000, // AndMask
509 0x0400000000000, // XorMask
510 0x0200000000000, // ShadowBase
511 0x0700000000000, // OriginBase
512};
513
514// i386 FreeBSD
515static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
516 0x000180000000, // AndMask
517 0x000040000000, // XorMask
518 0x000020000000, // ShadowBase
519 0x000700000000, // OriginBase
520};
521
522// x86_64 FreeBSD
523static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
524 0xc00000000000, // AndMask
525 0x200000000000, // XorMask
526 0x100000000000, // ShadowBase
527 0x380000000000, // OriginBase
528};
529
530// x86_64 NetBSD
531static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
532 0, // AndMask
533 0x500000000000, // XorMask
534 0, // ShadowBase
535 0x100000000000, // OriginBase
536};
537
538static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
541};
542
543static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
544 nullptr,
546};
547
548static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
549 nullptr,
551};
552
553static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
554 nullptr,
556};
557
558static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
559 nullptr,
561};
562
563static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
564 nullptr,
566};
567
568static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
569 nullptr,
571};
572
573static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
576};
577
578static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
579 nullptr,
581};
582
583namespace {
584
585/// Instrument functions of a module to detect uninitialized reads.
586///
587/// Instantiating MemorySanitizer inserts the msan runtime library API function
588/// declarations into the module if they don't exist already. Instantiating
589/// ensures the __msan_init function is in the list of global constructors for
590/// the module.
591class MemorySanitizer {
592public:
593 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
594 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
595 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
596 initializeModule(M);
597 }
598
599 // MSan cannot be moved or copied because of MapParams.
600 MemorySanitizer(MemorySanitizer &&) = delete;
601 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
602 MemorySanitizer(const MemorySanitizer &) = delete;
603 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
604
605 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
606
607private:
608 friend struct MemorySanitizerVisitor;
609 friend struct VarArgHelperBase;
610 friend struct VarArgAMD64Helper;
611 friend struct VarArgAArch64Helper;
612 friend struct VarArgPowerPC64Helper;
613 friend struct VarArgPowerPC32Helper;
614 friend struct VarArgSystemZHelper;
615 friend struct VarArgI386Helper;
616 friend struct VarArgGenericHelper;
617
618 void initializeModule(Module &M);
619 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
620 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
621 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
622
623 template <typename... ArgsTy>
624 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
625 ArgsTy... Args);
626
627 /// True if we're compiling the Linux kernel.
628 bool CompileKernel;
629 /// Track origins (allocation points) of uninitialized values.
630 int TrackOrigins;
631 bool Recover;
632 bool EagerChecks;
633
634 Triple TargetTriple;
635 LLVMContext *C;
636 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
637 Type *OriginTy;
638 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
639
640 // XxxTLS variables represent the per-thread state in MSan and per-task state
641 // in KMSAN.
642 // For the userspace these point to thread-local globals. In the kernel land
643 // they point to the members of a per-task struct obtained via a call to
644 // __msan_get_context_state().
645
646 /// Thread-local shadow storage for function parameters.
647 Value *ParamTLS;
648
649 /// Thread-local origin storage for function parameters.
650 Value *ParamOriginTLS;
651
652 /// Thread-local shadow storage for function return value.
653 Value *RetvalTLS;
654
655 /// Thread-local origin storage for function return value.
656 Value *RetvalOriginTLS;
657
658 /// Thread-local shadow storage for in-register va_arg function.
659 Value *VAArgTLS;
660
661 /// Thread-local shadow storage for in-register va_arg function.
662 Value *VAArgOriginTLS;
663
664 /// Thread-local shadow storage for va_arg overflow area.
665 Value *VAArgOverflowSizeTLS;
666
667 /// Are the instrumentation callbacks set up?
668 bool CallbacksInitialized = false;
669
670 /// The run-time callback to print a warning.
671 FunctionCallee WarningFn;
672
673 // These arrays are indexed by log2(AccessSize).
674 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
675 FunctionCallee MaybeWarningVarSizeFn;
676 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
677
678 /// Run-time helper that generates a new origin value for a stack
679 /// allocation.
680 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
681 // No description version
682 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
683
684 /// Run-time helper that poisons stack on function entry.
685 FunctionCallee MsanPoisonStackFn;
686
687 /// Run-time helper that records a store (or any event) of an
688 /// uninitialized value and returns an updated origin id encoding this info.
689 FunctionCallee MsanChainOriginFn;
690
691 /// Run-time helper that paints an origin over a region.
692 FunctionCallee MsanSetOriginFn;
693
694 /// MSan runtime replacements for memmove, memcpy and memset.
695 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
696
697 /// KMSAN callback for task-local function argument shadow.
698 StructType *MsanContextStateTy;
699 FunctionCallee MsanGetContextStateFn;
700
701 /// Functions for poisoning/unpoisoning local variables
702 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
703
704 /// Pair of shadow/origin pointers.
705 Type *MsanMetadata;
706
707 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
708 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
709 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
710 FunctionCallee MsanMetadataPtrForStore_1_8[4];
711 FunctionCallee MsanInstrumentAsmStoreFn;
712
713 /// Storage for return values of the MsanMetadataPtrXxx functions.
714 Value *MsanMetadataAlloca;
715
716 /// Helper to choose between different MsanMetadataPtrXxx().
717 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
718
719 /// Memory map parameters used in application-to-shadow calculation.
720 const MemoryMapParams *MapParams;
721
722 /// Custom memory map parameters used when -msan-shadow-base or
723 // -msan-origin-base is provided.
724 MemoryMapParams CustomMapParams;
725
726 MDNode *ColdCallWeights;
727
728 /// Branch weights for origin store.
729 MDNode *OriginStoreWeights;
730};
731
732void insertModuleCtor(Module &M) {
735 /*InitArgTypes=*/{},
736 /*InitArgs=*/{},
737 // This callback is invoked when the functions are created the first
738 // time. Hook them into the global ctors list in that case:
739 [&](Function *Ctor, FunctionCallee) {
740 if (!ClWithComdat) {
741 appendToGlobalCtors(M, Ctor, 0);
742 return;
743 }
744 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
745 Ctor->setComdat(MsanCtorComdat);
746 appendToGlobalCtors(M, Ctor, 0, Ctor);
747 });
748}
749
750template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
751 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
752}
753
754} // end anonymous namespace
755
757 bool EagerChecks)
758 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
759 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
760 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
761 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
762
765 // Return early if nosanitize_memory module flag is present for the module.
766 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
767 return PreservedAnalyses::all();
768 bool Modified = false;
769 if (!Options.Kernel) {
770 insertModuleCtor(M);
771 Modified = true;
772 }
773
774 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
775 for (Function &F : M) {
776 if (F.empty())
777 continue;
778 MemorySanitizer Msan(*F.getParent(), Options);
779 Modified |=
780 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
781 }
782
783 if (!Modified)
784 return PreservedAnalyses::all();
785
787 // GlobalsAA is considered stateless and does not get invalidated unless
788 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
789 // make changes that require GlobalsAA to be invalidated.
790 PA.abandon<GlobalsAA>();
791 return PA;
792}
793
795 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
797 OS, MapClassName2PassName);
798 OS << '<';
799 if (Options.Recover)
800 OS << "recover;";
801 if (Options.Kernel)
802 OS << "kernel;";
803 if (Options.EagerChecks)
804 OS << "eager-checks;";
805 OS << "track-origins=" << Options.TrackOrigins;
806 OS << '>';
807}
808
809/// Create a non-const global initialized with the given string.
810///
811/// Creates a writable global for Str so that we can pass it to the
812/// run-time lib. Runtime uses first 4 bytes of the string to store the
813/// frame ID, so the string needs to be mutable.
815 StringRef Str) {
816 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
817 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
818 GlobalValue::PrivateLinkage, StrConst, "");
819}
820
821template <typename... ArgsTy>
823MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
824 ArgsTy... Args) {
825 if (TargetTriple.getArch() == Triple::systemz) {
826 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
827 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
828 std::forward<ArgsTy>(Args)...);
829 }
830
831 return M.getOrInsertFunction(Name, MsanMetadata,
832 std::forward<ArgsTy>(Args)...);
833}
834
835/// Create KMSAN API callbacks.
836void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
837 IRBuilder<> IRB(*C);
838
839 // These will be initialized in insertKmsanPrologue().
840 RetvalTLS = nullptr;
841 RetvalOriginTLS = nullptr;
842 ParamTLS = nullptr;
843 ParamOriginTLS = nullptr;
844 VAArgTLS = nullptr;
845 VAArgOriginTLS = nullptr;
846 VAArgOverflowSizeTLS = nullptr;
847
848 WarningFn = M.getOrInsertFunction("__msan_warning",
849 TLI.getAttrList(C, {0}, /*Signed=*/false),
850 IRB.getVoidTy(), IRB.getInt32Ty());
851
852 // Requests the per-task context state (kmsan_context_state*) from the
853 // runtime library.
854 MsanContextStateTy = StructType::get(
855 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
858 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
859 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
860 OriginTy);
861 MsanGetContextStateFn =
862 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
863
864 MsanMetadata = StructType::get(PtrTy, PtrTy);
865
866 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
867 std::string name_load =
868 "__msan_metadata_ptr_for_load_" + std::to_string(size);
869 std::string name_store =
870 "__msan_metadata_ptr_for_store_" + std::to_string(size);
871 MsanMetadataPtrForLoad_1_8[ind] =
872 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
873 MsanMetadataPtrForStore_1_8[ind] =
874 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
875 }
876
877 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
878 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
879 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
880 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
881
882 // Functions for poisoning and unpoisoning memory.
883 MsanPoisonAllocaFn = M.getOrInsertFunction(
884 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
885 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
886 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
887}
888
890 return M.getOrInsertGlobal(Name, Ty, [&] {
891 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
892 nullptr, Name, nullptr,
894 });
895}
896
897/// Insert declarations for userspace-specific functions and globals.
898void MemorySanitizer::createUserspaceApi(Module &M,
899 const TargetLibraryInfo &TLI) {
900 IRBuilder<> IRB(*C);
901
902 // Create the callback.
903 // FIXME: this function should have "Cold" calling conv,
904 // which is not yet implemented.
905 if (TrackOrigins) {
906 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
907 : "__msan_warning_with_origin_noreturn";
908 WarningFn = M.getOrInsertFunction(WarningFnName,
909 TLI.getAttrList(C, {0}, /*Signed=*/false),
910 IRB.getVoidTy(), IRB.getInt32Ty());
911 } else {
912 StringRef WarningFnName =
913 Recover ? "__msan_warning" : "__msan_warning_noreturn";
914 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
915 }
916
917 // Create the global TLS variables.
918 RetvalTLS =
919 getOrInsertGlobal(M, "__msan_retval_tls",
920 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
921
922 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
923
924 ParamTLS =
925 getOrInsertGlobal(M, "__msan_param_tls",
926 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
927
928 ParamOriginTLS =
929 getOrInsertGlobal(M, "__msan_param_origin_tls",
930 ArrayType::get(OriginTy, kParamTLSSize / 4));
931
932 VAArgTLS =
933 getOrInsertGlobal(M, "__msan_va_arg_tls",
934 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
935
936 VAArgOriginTLS =
937 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
938 ArrayType::get(OriginTy, kParamTLSSize / 4));
939
940 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
941 IRB.getIntPtrTy(M.getDataLayout()));
942
943 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
944 AccessSizeIndex++) {
945 unsigned AccessSize = 1 << AccessSizeIndex;
946 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
947 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
948 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
949 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
950 MaybeWarningVarSizeFn = M.getOrInsertFunction(
951 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
952 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
953 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
954 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
955 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
956 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
957 IRB.getInt32Ty());
958 }
959
960 MsanSetAllocaOriginWithDescriptionFn =
961 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
962 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
963 MsanSetAllocaOriginNoDescriptionFn =
964 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
965 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
966 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
967 IRB.getVoidTy(), PtrTy, IntptrTy);
968}
969
970/// Insert extern declaration of runtime-provided functions and globals.
971void MemorySanitizer::initializeCallbacks(Module &M,
972 const TargetLibraryInfo &TLI) {
973 // Only do this once.
974 if (CallbacksInitialized)
975 return;
976
977 IRBuilder<> IRB(*C);
978 // Initialize callbacks that are common for kernel and userspace
979 // instrumentation.
980 MsanChainOriginFn = M.getOrInsertFunction(
981 "__msan_chain_origin",
982 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
983 IRB.getInt32Ty());
984 MsanSetOriginFn = M.getOrInsertFunction(
985 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
986 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
987 MemmoveFn =
988 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
989 MemcpyFn =
990 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
991 MemsetFn = M.getOrInsertFunction("__msan_memset",
992 TLI.getAttrList(C, {1}, /*Signed=*/true),
993 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
994
995 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
996 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
997
998 if (CompileKernel) {
999 createKernelApi(M, TLI);
1000 } else {
1001 createUserspaceApi(M, TLI);
1002 }
1003 CallbacksInitialized = true;
1004}
1005
1006FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1007 int size) {
1008 FunctionCallee *Fns =
1009 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1010 switch (size) {
1011 case 1:
1012 return Fns[0];
1013 case 2:
1014 return Fns[1];
1015 case 4:
1016 return Fns[2];
1017 case 8:
1018 return Fns[3];
1019 default:
1020 return nullptr;
1021 }
1022}
1023
1024/// Module-level initialization.
1025///
1026/// inserts a call to __msan_init to the module's constructor list.
1027void MemorySanitizer::initializeModule(Module &M) {
1028 auto &DL = M.getDataLayout();
1029
1030 TargetTriple = M.getTargetTriple();
1031
1032 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1033 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1034 // Check the overrides first
1035 if (ShadowPassed || OriginPassed) {
1036 CustomMapParams.AndMask = ClAndMask;
1037 CustomMapParams.XorMask = ClXorMask;
1038 CustomMapParams.ShadowBase = ClShadowBase;
1039 CustomMapParams.OriginBase = ClOriginBase;
1040 MapParams = &CustomMapParams;
1041 } else {
1042 switch (TargetTriple.getOS()) {
1043 case Triple::FreeBSD:
1044 switch (TargetTriple.getArch()) {
1045 case Triple::aarch64:
1046 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1047 break;
1048 case Triple::x86_64:
1049 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1050 break;
1051 case Triple::x86:
1052 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1053 break;
1054 default:
1055 report_fatal_error("unsupported architecture");
1056 }
1057 break;
1058 case Triple::NetBSD:
1059 switch (TargetTriple.getArch()) {
1060 case Triple::x86_64:
1061 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1062 break;
1063 default:
1064 report_fatal_error("unsupported architecture");
1065 }
1066 break;
1067 case Triple::Linux:
1068 switch (TargetTriple.getArch()) {
1069 case Triple::x86_64:
1070 MapParams = Linux_X86_MemoryMapParams.bits64;
1071 break;
1072 case Triple::x86:
1073 MapParams = Linux_X86_MemoryMapParams.bits32;
1074 break;
1075 case Triple::mips64:
1076 case Triple::mips64el:
1077 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1078 break;
1079 case Triple::ppc64:
1080 case Triple::ppc64le:
1081 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1082 break;
1083 case Triple::systemz:
1084 MapParams = Linux_S390_MemoryMapParams.bits64;
1085 break;
1086 case Triple::aarch64:
1087 case Triple::aarch64_be:
1088 MapParams = Linux_ARM_MemoryMapParams.bits64;
1089 break;
1091 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1092 break;
1093 default:
1094 report_fatal_error("unsupported architecture");
1095 }
1096 break;
1097 default:
1098 report_fatal_error("unsupported operating system");
1099 }
1100 }
1101
1102 C = &(M.getContext());
1103 IRBuilder<> IRB(*C);
1104 IntptrTy = IRB.getIntPtrTy(DL);
1105 OriginTy = IRB.getInt32Ty();
1106 PtrTy = IRB.getPtrTy();
1107
1108 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1110
1111 if (!CompileKernel) {
1112 if (TrackOrigins)
1113 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1114 return new GlobalVariable(
1115 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1116 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1117 });
1118
1119 if (Recover)
1120 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1121 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1122 GlobalValue::WeakODRLinkage,
1123 IRB.getInt32(Recover), "__msan_keep_going");
1124 });
1125 }
1126}
1127
1128namespace {
1129
1130/// A helper class that handles instrumentation of VarArg
1131/// functions on a particular platform.
1132///
1133/// Implementations are expected to insert the instrumentation
1134/// necessary to propagate argument shadow through VarArg function
1135/// calls. Visit* methods are called during an InstVisitor pass over
1136/// the function, and should avoid creating new basic blocks. A new
1137/// instance of this class is created for each instrumented function.
1138struct VarArgHelper {
1139 virtual ~VarArgHelper() = default;
1140
1141 /// Visit a CallBase.
1142 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1143
1144 /// Visit a va_start call.
1145 virtual void visitVAStartInst(VAStartInst &I) = 0;
1146
1147 /// Visit a va_copy call.
1148 virtual void visitVACopyInst(VACopyInst &I) = 0;
1149
1150 /// Finalize function instrumentation.
1151 ///
1152 /// This method is called after visiting all interesting (see above)
1153 /// instructions in a function.
1154 virtual void finalizeInstrumentation() = 0;
1155};
1156
1157struct MemorySanitizerVisitor;
1158
1159} // end anonymous namespace
1160
1161static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1162 MemorySanitizerVisitor &Visitor);
1163
1164static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1165 if (TS.isScalable())
1166 // Scalable types unconditionally take slowpaths.
1167 return kNumberOfAccessSizes;
1168 unsigned TypeSizeFixed = TS.getFixedValue();
1169 if (TypeSizeFixed <= 8)
1170 return 0;
1171 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1172}
1173
1174namespace {
1175
1176/// Helper class to attach debug information of the given instruction onto new
1177/// instructions inserted after.
1178class NextNodeIRBuilder : public IRBuilder<> {
1179public:
1180 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1181 SetCurrentDebugLocation(IP->getDebugLoc());
1182 }
1183};
1184
1185/// This class does all the work for a given function. Store and Load
1186/// instructions store and load corresponding shadow and origin
1187/// values. Most instructions propagate shadow from arguments to their
1188/// return values. Certain instructions (most importantly, BranchInst)
1189/// test their argument shadow and print reports (with a runtime call) if it's
1190/// non-zero.
1191struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1192 Function &F;
1193 MemorySanitizer &MS;
1194 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1195 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1196 std::unique_ptr<VarArgHelper> VAHelper;
1197 const TargetLibraryInfo *TLI;
1198 Instruction *FnPrologueEnd;
1199 SmallVector<Instruction *, 16> Instructions;
1200
1201 // The following flags disable parts of MSan instrumentation based on
1202 // exclusion list contents and command-line options.
1203 bool InsertChecks;
1204 bool PropagateShadow;
1205 bool PoisonStack;
1206 bool PoisonUndef;
1207 bool PoisonUndefVectors;
1208
1209 struct ShadowOriginAndInsertPoint {
1210 Value *Shadow;
1211 Value *Origin;
1212 Instruction *OrigIns;
1213
1214 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1215 : Shadow(S), Origin(O), OrigIns(I) {}
1216 };
1218 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1219 SmallSetVector<AllocaInst *, 16> AllocaSet;
1222 int64_t SplittableBlocksCount = 0;
1223
1224 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1225 const TargetLibraryInfo &TLI)
1226 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1227 bool SanitizeFunction =
1228 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1229 InsertChecks = SanitizeFunction;
1230 PropagateShadow = SanitizeFunction;
1231 PoisonStack = SanitizeFunction && ClPoisonStack;
1232 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1233 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1234
1235 // In the presence of unreachable blocks, we may see Phi nodes with
1236 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1237 // blocks, such nodes will not have any shadow value associated with them.
1238 // It's easier to remove unreachable blocks than deal with missing shadow.
1240
1241 MS.initializeCallbacks(*F.getParent(), TLI);
1242 FnPrologueEnd =
1243 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1244 .CreateIntrinsic(Intrinsic::donothing, {});
1245
1246 if (MS.CompileKernel) {
1247 IRBuilder<> IRB(FnPrologueEnd);
1248 insertKmsanPrologue(IRB);
1249 }
1250
1251 LLVM_DEBUG(if (!InsertChecks) dbgs()
1252 << "MemorySanitizer is not inserting checks into '"
1253 << F.getName() << "'\n");
1254 }
1255
1256 bool instrumentWithCalls(Value *V) {
1257 // Constants likely will be eliminated by follow-up passes.
1258 if (isa<Constant>(V))
1259 return false;
1260 ++SplittableBlocksCount;
1262 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1263 }
1264
1265 bool isInPrologue(Instruction &I) {
1266 return I.getParent() == FnPrologueEnd->getParent() &&
1267 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1268 }
1269
1270 // Creates a new origin and records the stack trace. In general we can call
1271 // this function for any origin manipulation we like. However it will cost
1272 // runtime resources. So use this wisely only if it can provide additional
1273 // information helpful to a user.
1274 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1275 if (MS.TrackOrigins <= 1)
1276 return V;
1277 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1278 }
1279
1280 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1281 const DataLayout &DL = F.getDataLayout();
1282 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1283 if (IntptrSize == kOriginSize)
1284 return Origin;
1285 assert(IntptrSize == kOriginSize * 2);
1286 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1287 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1288 }
1289
1290 /// Fill memory range with the given origin value.
1291 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1292 TypeSize TS, Align Alignment) {
1293 const DataLayout &DL = F.getDataLayout();
1294 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1295 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1296 assert(IntptrAlignment >= kMinOriginAlignment);
1297 assert(IntptrSize >= kOriginSize);
1298
1299 // Note: The loop based formation works for fixed length vectors too,
1300 // however we prefer to unroll and specialize alignment below.
1301 if (TS.isScalable()) {
1302 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1303 Value *RoundUp =
1304 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1305 Value *End =
1306 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1307 auto [InsertPt, Index] =
1309 IRB.SetInsertPoint(InsertPt);
1310
1311 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1313 return;
1314 }
1315
1316 unsigned Size = TS.getFixedValue();
1317
1318 unsigned Ofs = 0;
1319 Align CurrentAlignment = Alignment;
1320 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1321 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1322 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1323 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1324 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1325 : IntptrOriginPtr;
1326 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1327 Ofs += IntptrSize / kOriginSize;
1328 CurrentAlignment = IntptrAlignment;
1329 }
1330 }
1331
1332 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1333 Value *GEP =
1334 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1335 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1336 CurrentAlignment = kMinOriginAlignment;
1337 }
1338 }
1339
1340 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1341 Value *OriginPtr, Align Alignment) {
1342 const DataLayout &DL = F.getDataLayout();
1343 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1344 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1345 // ZExt cannot convert between vector and scalar
1346 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1347 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1348 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1349 // Origin is not needed: value is initialized or const shadow is
1350 // ignored.
1351 return;
1352 }
1353 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1354 // Copy origin as the value is definitely uninitialized.
1355 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1356 OriginAlignment);
1357 return;
1358 }
1359 // Fallback to runtime check, which still can be optimized out later.
1360 }
1361
1362 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1363 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1364 if (instrumentWithCalls(ConvertedShadow) &&
1365 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1366 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1367 Value *ConvertedShadow2 =
1368 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1369 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1370 CB->addParamAttr(0, Attribute::ZExt);
1371 CB->addParamAttr(2, Attribute::ZExt);
1372 } else {
1373 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1375 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1376 IRBuilder<> IRBNew(CheckTerm);
1377 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1378 OriginAlignment);
1379 }
1380 }
1381
1382 void materializeStores() {
1383 for (StoreInst *SI : StoreList) {
1384 IRBuilder<> IRB(SI);
1385 Value *Val = SI->getValueOperand();
1386 Value *Addr = SI->getPointerOperand();
1387 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1388 Value *ShadowPtr, *OriginPtr;
1389 Type *ShadowTy = Shadow->getType();
1390 const Align Alignment = SI->getAlign();
1391 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1392 std::tie(ShadowPtr, OriginPtr) =
1393 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1394
1395 [[maybe_unused]] StoreInst *NewSI =
1396 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1397 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1398
1399 if (SI->isAtomic())
1400 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1401
1402 if (MS.TrackOrigins && !SI->isAtomic())
1403 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1404 OriginAlignment);
1405 }
1406 }
1407
1408 // Returns true if Debug Location corresponds to multiple warnings.
1409 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1410 if (MS.TrackOrigins < 2)
1411 return false;
1412
1413 if (LazyWarningDebugLocationCount.empty())
1414 for (const auto &I : InstrumentationList)
1415 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1416
1417 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1418 }
1419
1420 /// Helper function to insert a warning at IRB's current insert point.
1421 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1422 if (!Origin)
1423 Origin = (Value *)IRB.getInt32(0);
1424 assert(Origin->getType()->isIntegerTy());
1425
1426 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1427 // Try to create additional origin with debug info of the last origin
1428 // instruction. It may provide additional information to the user.
1429 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1430 assert(MS.TrackOrigins);
1431 auto NewDebugLoc = OI->getDebugLoc();
1432 // Origin update with missing or the same debug location provides no
1433 // additional value.
1434 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1435 // Insert update just before the check, so we call runtime only just
1436 // before the report.
1437 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1438 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1439 Origin = updateOrigin(Origin, IRBOrigin);
1440 }
1441 }
1442 }
1443
1444 if (MS.CompileKernel || MS.TrackOrigins)
1445 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1446 else
1447 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1448 // FIXME: Insert UnreachableInst if !MS.Recover?
1449 // This may invalidate some of the following checks and needs to be done
1450 // at the very end.
1451 }
1452
1453 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1454 Value *Origin) {
1455 const DataLayout &DL = F.getDataLayout();
1456 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1457 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1458 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1459 // ZExt cannot convert between vector and scalar
1460 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1461 Value *ConvertedShadow2 =
1462 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1463
1464 if (SizeIndex < kNumberOfAccessSizes) {
1465 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1466 CallBase *CB = IRB.CreateCall(
1467 Fn,
1468 {ConvertedShadow2,
1469 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1470 CB->addParamAttr(0, Attribute::ZExt);
1471 CB->addParamAttr(1, Attribute::ZExt);
1472 } else {
1473 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1474 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1475 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1476 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1477 CallBase *CB = IRB.CreateCall(
1478 Fn,
1479 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1480 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1481 CB->addParamAttr(1, Attribute::ZExt);
1482 CB->addParamAttr(2, Attribute::ZExt);
1483 }
1484 } else {
1485 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1487 Cmp, &*IRB.GetInsertPoint(),
1488 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1489
1490 IRB.SetInsertPoint(CheckTerm);
1491 insertWarningFn(IRB, Origin);
1492 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1493 }
1494 }
1495
1496 void materializeInstructionChecks(
1497 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1498 const DataLayout &DL = F.getDataLayout();
1499 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1500 // correct origin.
1501 bool Combine = !MS.TrackOrigins;
1502 Instruction *Instruction = InstructionChecks.front().OrigIns;
1503 Value *Shadow = nullptr;
1504 for (const auto &ShadowData : InstructionChecks) {
1505 assert(ShadowData.OrigIns == Instruction);
1506 IRBuilder<> IRB(Instruction);
1507
1508 Value *ConvertedShadow = ShadowData.Shadow;
1509
1510 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1511 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1512 // Skip, value is initialized or const shadow is ignored.
1513 continue;
1514 }
1515 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1516 // Report as the value is definitely uninitialized.
1517 insertWarningFn(IRB, ShadowData.Origin);
1518 if (!MS.Recover)
1519 return; // Always fail and stop here, not need to check the rest.
1520 // Skip entire instruction,
1521 continue;
1522 }
1523 // Fallback to runtime check, which still can be optimized out later.
1524 }
1525
1526 if (!Combine) {
1527 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1528 continue;
1529 }
1530
1531 if (!Shadow) {
1532 Shadow = ConvertedShadow;
1533 continue;
1534 }
1535
1536 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1537 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1538 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1539 }
1540
1541 if (Shadow) {
1542 assert(Combine);
1543 IRBuilder<> IRB(Instruction);
1544 materializeOneCheck(IRB, Shadow, nullptr);
1545 }
1546 }
1547
1548 static bool isAArch64SVCount(Type *Ty) {
1549 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1550 return TTy->getName() == "aarch64.svcount";
1551 return false;
1552 }
1553
1554 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1555 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1556 static bool isScalableNonVectorType(Type *Ty) {
1557 if (!isAArch64SVCount(Ty))
1558 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1559 << "\n");
1560
1561 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1562 }
1563
1564 void materializeChecks() {
1565#ifndef NDEBUG
1566 // For assert below.
1567 SmallPtrSet<Instruction *, 16> Done;
1568#endif
1569
1570 for (auto I = InstrumentationList.begin();
1571 I != InstrumentationList.end();) {
1572 auto OrigIns = I->OrigIns;
1573 // Checks are grouped by the original instruction. We call all
1574 // `insertShadowCheck` for an instruction at once.
1575 assert(Done.insert(OrigIns).second);
1576 auto J = std::find_if(I + 1, InstrumentationList.end(),
1577 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1578 return OrigIns != R.OrigIns;
1579 });
1580 // Process all checks of instruction at once.
1581 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1582 I = J;
1583 }
1584
1585 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1586 }
1587
1588 // Returns the last instruction in the new prologue
1589 void insertKmsanPrologue(IRBuilder<> &IRB) {
1590 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1591 Constant *Zero = IRB.getInt32(0);
1592 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1593 {Zero, IRB.getInt32(0)}, "param_shadow");
1594 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1595 {Zero, IRB.getInt32(1)}, "retval_shadow");
1596 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1597 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1598 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1599 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1600 MS.VAArgOverflowSizeTLS =
1601 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1602 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1603 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1604 {Zero, IRB.getInt32(5)}, "param_origin");
1605 MS.RetvalOriginTLS =
1606 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1607 {Zero, IRB.getInt32(6)}, "retval_origin");
1608 if (MS.TargetTriple.getArch() == Triple::systemz)
1609 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1610 }
1611
1612 /// Add MemorySanitizer instrumentation to a function.
1613 bool runOnFunction() {
1614 // Iterate all BBs in depth-first order and create shadow instructions
1615 // for all instructions (where applicable).
1616 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1617 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1618 visit(*BB);
1619
1620 // `visit` above only collects instructions. Process them after iterating
1621 // CFG to avoid requirement on CFG transformations.
1622 for (Instruction *I : Instructions)
1624
1625 // Finalize PHI nodes.
1626 for (PHINode *PN : ShadowPHINodes) {
1627 PHINode *PNS = cast<PHINode>(getShadow(PN));
1628 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1629 size_t NumValues = PN->getNumIncomingValues();
1630 for (size_t v = 0; v < NumValues; v++) {
1631 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1632 if (PNO)
1633 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1634 }
1635 }
1636
1637 VAHelper->finalizeInstrumentation();
1638
1639 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1640 // instrumenting only allocas.
1642 for (auto Item : LifetimeStartList) {
1643 instrumentAlloca(*Item.second, Item.first);
1644 AllocaSet.remove(Item.second);
1645 }
1646 }
1647 // Poison the allocas for which we didn't instrument the corresponding
1648 // lifetime intrinsics.
1649 for (AllocaInst *AI : AllocaSet)
1650 instrumentAlloca(*AI);
1651
1652 // Insert shadow value checks.
1653 materializeChecks();
1654
1655 // Delayed instrumentation of StoreInst.
1656 // This may not add new address checks.
1657 materializeStores();
1658
1659 return true;
1660 }
1661
1662 /// Compute the shadow type that corresponds to a given Value.
1663 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1664
1665 /// Compute the shadow type that corresponds to a given Type.
1666 Type *getShadowTy(Type *OrigTy) {
1667 if (!OrigTy->isSized()) {
1668 return nullptr;
1669 }
1670 // For integer type, shadow is the same as the original type.
1671 // This may return weird-sized types like i1.
1672 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1673 return IT;
1674 const DataLayout &DL = F.getDataLayout();
1675 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1676 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1677 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1678 VT->getElementCount());
1679 }
1680 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1681 return ArrayType::get(getShadowTy(AT->getElementType()),
1682 AT->getNumElements());
1683 }
1684 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1686 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1687 Elements.push_back(getShadowTy(ST->getElementType(i)));
1688 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1689 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1690 return Res;
1691 }
1692 if (isScalableNonVectorType(OrigTy)) {
1693 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1694 << "\n");
1695 return OrigTy;
1696 }
1697
1698 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1699 return IntegerType::get(*MS.C, TypeSize);
1700 }
1701
1702 /// Extract combined shadow of struct elements as a bool
1703 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1704 IRBuilder<> &IRB) {
1705 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1706 Value *Aggregator = FalseVal;
1707
1708 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1709 // Combine by ORing together each element's bool shadow
1710 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1711 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1712
1713 if (Aggregator != FalseVal)
1714 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1715 else
1716 Aggregator = ShadowBool;
1717 }
1718
1719 return Aggregator;
1720 }
1721
1722 // Extract combined shadow of array elements
1723 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1724 IRBuilder<> &IRB) {
1725 if (!Array->getNumElements())
1726 return IRB.getIntN(/* width */ 1, /* value */ 0);
1727
1728 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1729 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1730
1731 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1732 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1733 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1734 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1735 }
1736 return Aggregator;
1737 }
1738
1739 /// Convert a shadow value to it's flattened variant. The resulting
1740 /// shadow may not necessarily have the same bit width as the input
1741 /// value, but it will always be comparable to zero.
1742 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1743 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1744 return collapseStructShadow(Struct, V, IRB);
1745 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1746 return collapseArrayShadow(Array, V, IRB);
1747 if (isa<VectorType>(V->getType())) {
1748 if (isa<ScalableVectorType>(V->getType()))
1749 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1750 unsigned BitWidth =
1751 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1752 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1753 }
1754 return V;
1755 }
1756
1757 // Convert a scalar value to an i1 by comparing with 0
1758 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1759 Type *VTy = V->getType();
1760 if (!VTy->isIntegerTy())
1761 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1762 if (VTy->getIntegerBitWidth() == 1)
1763 // Just converting a bool to a bool, so do nothing.
1764 return V;
1765 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1766 }
1767
1768 Type *ptrToIntPtrType(Type *PtrTy) const {
1769 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1770 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1771 VectTy->getElementCount());
1772 }
1773 assert(PtrTy->isIntOrPtrTy());
1774 return MS.IntptrTy;
1775 }
1776
1777 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1778 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1779 return VectorType::get(
1780 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1781 VectTy->getElementCount());
1782 }
1783 assert(IntPtrTy == MS.IntptrTy);
1784 return MS.PtrTy;
1785 }
1786
1787 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1788 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1790 VectTy->getElementCount(),
1791 constToIntPtr(VectTy->getElementType(), C));
1792 }
1793 assert(IntPtrTy == MS.IntptrTy);
1794 // TODO: Avoid implicit trunc?
1795 // See https://github.com/llvm/llvm-project/issues/112510.
1796 return ConstantInt::get(MS.IntptrTy, C, /*IsSigned=*/false,
1797 /*ImplicitTrunc=*/true);
1798 }
1799
1800 /// Returns the integer shadow offset that corresponds to a given
1801 /// application address, whereby:
1802 ///
1803 /// Offset = (Addr & ~AndMask) ^ XorMask
1804 /// Shadow = ShadowBase + Offset
1805 /// Origin = (OriginBase + Offset) & ~Alignment
1806 ///
1807 /// Note: for efficiency, many shadow mappings only require use the XorMask
1808 /// and OriginBase; the AndMask and ShadowBase are often zero.
1809 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1810 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1811 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1812
1813 if (uint64_t AndMask = MS.MapParams->AndMask)
1814 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1815
1816 if (uint64_t XorMask = MS.MapParams->XorMask)
1817 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1818 return OffsetLong;
1819 }
1820
1821 /// Compute the shadow and origin addresses corresponding to a given
1822 /// application address.
1823 ///
1824 /// Shadow = ShadowBase + Offset
1825 /// Origin = (OriginBase + Offset) & ~3ULL
1826 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1827 /// a single pointee.
1828 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1829 std::pair<Value *, Value *>
1830 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1831 MaybeAlign Alignment) {
1832 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1833 if (!VectTy) {
1834 assert(Addr->getType()->isPointerTy());
1835 } else {
1836 assert(VectTy->getElementType()->isPointerTy());
1837 }
1838 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1839 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1840 Value *ShadowLong = ShadowOffset;
1841 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1842 ShadowLong =
1843 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1844 }
1845 Value *ShadowPtr = IRB.CreateIntToPtr(
1846 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1847
1848 Value *OriginPtr = nullptr;
1849 if (MS.TrackOrigins) {
1850 Value *OriginLong = ShadowOffset;
1851 uint64_t OriginBase = MS.MapParams->OriginBase;
1852 if (OriginBase != 0)
1853 OriginLong =
1854 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1855 if (!Alignment || *Alignment < kMinOriginAlignment) {
1856 uint64_t Mask = kMinOriginAlignment.value() - 1;
1857 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1858 }
1859 OriginPtr = IRB.CreateIntToPtr(
1860 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1861 }
1862 return std::make_pair(ShadowPtr, OriginPtr);
1863 }
1864
1865 template <typename... ArgsTy>
1866 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1867 ArgsTy... Args) {
1868 if (MS.TargetTriple.getArch() == Triple::systemz) {
1869 IRB.CreateCall(Callee,
1870 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1871 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1872 }
1873
1874 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1875 }
1876
1877 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1878 IRBuilder<> &IRB,
1879 Type *ShadowTy,
1880 bool isStore) {
1881 Value *ShadowOriginPtrs;
1882 const DataLayout &DL = F.getDataLayout();
1883 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1884
1885 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1886 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1887 if (Getter) {
1888 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1889 } else {
1890 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1891 ShadowOriginPtrs = createMetadataCall(
1892 IRB,
1893 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1894 AddrCast, SizeVal);
1895 }
1896 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1897 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1898 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1899
1900 return std::make_pair(ShadowPtr, OriginPtr);
1901 }
1902
1903 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1904 /// a single pointee.
1905 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1906 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1907 IRBuilder<> &IRB,
1908 Type *ShadowTy,
1909 bool isStore) {
1910 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1911 if (!VectTy) {
1912 assert(Addr->getType()->isPointerTy());
1913 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1914 }
1915
1916 // TODO: Support callbacs with vectors of addresses.
1917 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1918 Value *ShadowPtrs = ConstantInt::getNullValue(
1919 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1920 Value *OriginPtrs = nullptr;
1921 if (MS.TrackOrigins)
1922 OriginPtrs = ConstantInt::getNullValue(
1923 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1924 for (unsigned i = 0; i < NumElements; ++i) {
1925 Value *OneAddr =
1926 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1927 auto [ShadowPtr, OriginPtr] =
1928 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1929
1930 ShadowPtrs = IRB.CreateInsertElement(
1931 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1932 if (MS.TrackOrigins)
1933 OriginPtrs = IRB.CreateInsertElement(
1934 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1935 }
1936 return {ShadowPtrs, OriginPtrs};
1937 }
1938
1939 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1940 Type *ShadowTy,
1941 MaybeAlign Alignment,
1942 bool isStore) {
1943 if (MS.CompileKernel)
1944 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1945 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1946 }
1947
1948 /// Compute the shadow address for a given function argument.
1949 ///
1950 /// Shadow = ParamTLS+ArgOffset.
1951 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1952 return IRB.CreatePtrAdd(MS.ParamTLS,
1953 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1954 }
1955
1956 /// Compute the origin address for a given function argument.
1957 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1958 if (!MS.TrackOrigins)
1959 return nullptr;
1960 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1961 ConstantInt::get(MS.IntptrTy, ArgOffset),
1962 "_msarg_o");
1963 }
1964
1965 /// Compute the shadow address for a retval.
1966 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1967 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1968 }
1969
1970 /// Compute the origin address for a retval.
1971 Value *getOriginPtrForRetval() {
1972 // We keep a single origin for the entire retval. Might be too optimistic.
1973 return MS.RetvalOriginTLS;
1974 }
1975
1976 /// Set SV to be the shadow value for V.
1977 void setShadow(Value *V, Value *SV) {
1978 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1979 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1980 }
1981
1982 /// Set Origin to be the origin value for V.
1983 void setOrigin(Value *V, Value *Origin) {
1984 if (!MS.TrackOrigins)
1985 return;
1986 assert(!OriginMap.count(V) && "Values may only have one origin");
1987 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1988 OriginMap[V] = Origin;
1989 }
1990
1991 Constant *getCleanShadow(Type *OrigTy) {
1992 Type *ShadowTy = getShadowTy(OrigTy);
1993 if (!ShadowTy)
1994 return nullptr;
1995 return Constant::getNullValue(ShadowTy);
1996 }
1997
1998 /// Create a clean shadow value for a given value.
1999 ///
2000 /// Clean shadow (all zeroes) means all bits of the value are defined
2001 /// (initialized).
2002 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2003
2004 /// Create a dirty shadow of a given shadow type.
2005 Constant *getPoisonedShadow(Type *ShadowTy) {
2006 assert(ShadowTy);
2007 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2008 return Constant::getAllOnesValue(ShadowTy);
2009 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2010 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2011 getPoisonedShadow(AT->getElementType()));
2012 return ConstantArray::get(AT, Vals);
2013 }
2014 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2015 SmallVector<Constant *, 4> Vals;
2016 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2017 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2018 return ConstantStruct::get(ST, Vals);
2019 }
2020 llvm_unreachable("Unexpected shadow type");
2021 }
2022
2023 /// Create a dirty shadow for a given value.
2024 Constant *getPoisonedShadow(Value *V) {
2025 Type *ShadowTy = getShadowTy(V);
2026 if (!ShadowTy)
2027 return nullptr;
2028 return getPoisonedShadow(ShadowTy);
2029 }
2030
2031 /// Create a clean (zero) origin.
2032 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2033
2034 /// Get the shadow value for a given Value.
2035 ///
2036 /// This function either returns the value set earlier with setShadow,
2037 /// or extracts if from ParamTLS (for function arguments).
2038 Value *getShadow(Value *V) {
2039 if (Instruction *I = dyn_cast<Instruction>(V)) {
2040 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2041 return getCleanShadow(V);
2042 // For instructions the shadow is already stored in the map.
2043 Value *Shadow = ShadowMap[V];
2044 if (!Shadow) {
2045 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2046 assert(Shadow && "No shadow for a value");
2047 }
2048 return Shadow;
2049 }
2050 // Handle fully undefined values
2051 // (partially undefined constant vectors are handled later)
2052 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2053 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2054 : getCleanShadow(V);
2055 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2056 return AllOnes;
2057 }
2058 if (Argument *A = dyn_cast<Argument>(V)) {
2059 // For arguments we compute the shadow on demand and store it in the map.
2060 Value *&ShadowPtr = ShadowMap[V];
2061 if (ShadowPtr)
2062 return ShadowPtr;
2063 Function *F = A->getParent();
2064 IRBuilder<> EntryIRB(FnPrologueEnd);
2065 unsigned ArgOffset = 0;
2066 const DataLayout &DL = F->getDataLayout();
2067 for (auto &FArg : F->args()) {
2068 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2069 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2070 ? "vscale not fully supported\n"
2071 : "Arg is not sized\n"));
2072 if (A == &FArg) {
2073 ShadowPtr = getCleanShadow(V);
2074 setOrigin(A, getCleanOrigin());
2075 break;
2076 }
2077 continue;
2078 }
2079
2080 unsigned Size = FArg.hasByValAttr()
2081 ? DL.getTypeAllocSize(FArg.getParamByValType())
2082 : DL.getTypeAllocSize(FArg.getType());
2083
2084 if (A == &FArg) {
2085 bool Overflow = ArgOffset + Size > kParamTLSSize;
2086 if (FArg.hasByValAttr()) {
2087 // ByVal pointer itself has clean shadow. We copy the actual
2088 // argument shadow to the underlying memory.
2089 // Figure out maximal valid memcpy alignment.
2090 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2091 FArg.getParamAlign(), FArg.getParamByValType());
2092 Value *CpShadowPtr, *CpOriginPtr;
2093 std::tie(CpShadowPtr, CpOriginPtr) =
2094 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2095 /*isStore*/ true);
2096 if (!PropagateShadow || Overflow) {
2097 // ParamTLS overflow.
2098 EntryIRB.CreateMemSet(
2099 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2100 Size, ArgAlign);
2101 } else {
2102 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2103 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2104 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2105 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2106 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2107
2108 if (MS.TrackOrigins) {
2109 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2110 // FIXME: OriginSize should be:
2111 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2112 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2113 EntryIRB.CreateMemCpy(
2114 CpOriginPtr,
2115 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2116 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2117 OriginSize);
2118 }
2119 }
2120 }
2121
2122 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2123 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2124 ShadowPtr = getCleanShadow(V);
2125 setOrigin(A, getCleanOrigin());
2126 } else {
2127 // Shadow over TLS
2128 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2129 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2131 if (MS.TrackOrigins) {
2132 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2133 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2134 }
2135 }
2137 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2138 break;
2139 }
2140
2141 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2142 }
2143 assert(ShadowPtr && "Could not find shadow for an argument");
2144 return ShadowPtr;
2145 }
2146
2147 // Check for partially-undefined constant vectors
2148 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2149 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2150 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2151 PoisonUndefVectors) {
2152 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2153 SmallVector<Constant *, 32> ShadowVector(NumElems);
2154 for (unsigned i = 0; i != NumElems; ++i) {
2155 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2156 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2157 : getCleanShadow(Elem);
2158 }
2159
2160 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2161 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2162 << *ShadowConstant << "\n");
2163
2164 return ShadowConstant;
2165 }
2166
2167 // TODO: partially-undefined constant arrays, structures, and nested types
2168
2169 // For everything else the shadow is zero.
2170 return getCleanShadow(V);
2171 }
2172
2173 /// Get the shadow for i-th argument of the instruction I.
2174 Value *getShadow(Instruction *I, int i) {
2175 return getShadow(I->getOperand(i));
2176 }
2177
2178 /// Get the origin for a value.
2179 Value *getOrigin(Value *V) {
2180 if (!MS.TrackOrigins)
2181 return nullptr;
2182 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2183 return getCleanOrigin();
2185 "Unexpected value type in getOrigin()");
2186 if (Instruction *I = dyn_cast<Instruction>(V)) {
2187 if (I->getMetadata(LLVMContext::MD_nosanitize))
2188 return getCleanOrigin();
2189 }
2190 Value *Origin = OriginMap[V];
2191 assert(Origin && "Missing origin");
2192 return Origin;
2193 }
2194
2195 /// Get the origin for i-th argument of the instruction I.
2196 Value *getOrigin(Instruction *I, int i) {
2197 return getOrigin(I->getOperand(i));
2198 }
2199
2200 /// Remember the place where a shadow check should be inserted.
2201 ///
2202 /// This location will be later instrumented with a check that will print a
2203 /// UMR warning in runtime if the shadow value is not 0.
2204 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2205 assert(Shadow);
2206 if (!InsertChecks)
2207 return;
2208
2209 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2210 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2211 << *OrigIns << "\n");
2212 return;
2213 }
2214
2215 Type *ShadowTy = Shadow->getType();
2216 if (isScalableNonVectorType(ShadowTy)) {
2217 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2218 << " before " << *OrigIns << "\n");
2219 return;
2220 }
2221#ifndef NDEBUG
2222 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2223 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2224 "Can only insert checks for integer, vector, and aggregate shadow "
2225 "types");
2226#endif
2227 InstrumentationList.push_back(
2228 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2229 }
2230
2231 /// Get shadow for value, and remember the place where a shadow check should
2232 /// be inserted.
2233 ///
2234 /// This location will be later instrumented with a check that will print a
2235 /// UMR warning in runtime if the value is not fully defined.
2236 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2237 assert(Val);
2238 Value *Shadow, *Origin;
2240 Shadow = getShadow(Val);
2241 if (!Shadow)
2242 return;
2243 Origin = getOrigin(Val);
2244 } else {
2245 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2246 if (!Shadow)
2247 return;
2248 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2249 }
2250 insertCheckShadow(Shadow, Origin, OrigIns);
2251 }
2252
2254 switch (a) {
2255 case AtomicOrdering::NotAtomic:
2256 return AtomicOrdering::NotAtomic;
2257 case AtomicOrdering::Unordered:
2258 case AtomicOrdering::Monotonic:
2259 case AtomicOrdering::Release:
2260 return AtomicOrdering::Release;
2261 case AtomicOrdering::Acquire:
2262 case AtomicOrdering::AcquireRelease:
2263 return AtomicOrdering::AcquireRelease;
2264 case AtomicOrdering::SequentiallyConsistent:
2265 return AtomicOrdering::SequentiallyConsistent;
2266 }
2267 llvm_unreachable("Unknown ordering");
2268 }
2269
2270 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2271 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2272 uint32_t OrderingTable[NumOrderings] = {};
2273
2274 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2275 OrderingTable[(int)AtomicOrderingCABI::release] =
2276 (int)AtomicOrderingCABI::release;
2277 OrderingTable[(int)AtomicOrderingCABI::consume] =
2278 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2279 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2280 (int)AtomicOrderingCABI::acq_rel;
2281 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2282 (int)AtomicOrderingCABI::seq_cst;
2283
2284 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2285 }
2286
2288 switch (a) {
2289 case AtomicOrdering::NotAtomic:
2290 return AtomicOrdering::NotAtomic;
2291 case AtomicOrdering::Unordered:
2292 case AtomicOrdering::Monotonic:
2293 case AtomicOrdering::Acquire:
2294 return AtomicOrdering::Acquire;
2295 case AtomicOrdering::Release:
2296 case AtomicOrdering::AcquireRelease:
2297 return AtomicOrdering::AcquireRelease;
2298 case AtomicOrdering::SequentiallyConsistent:
2299 return AtomicOrdering::SequentiallyConsistent;
2300 }
2301 llvm_unreachable("Unknown ordering");
2302 }
2303
2304 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2305 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2306 uint32_t OrderingTable[NumOrderings] = {};
2307
2308 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2309 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2310 OrderingTable[(int)AtomicOrderingCABI::consume] =
2311 (int)AtomicOrderingCABI::acquire;
2312 OrderingTable[(int)AtomicOrderingCABI::release] =
2313 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2314 (int)AtomicOrderingCABI::acq_rel;
2315 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2316 (int)AtomicOrderingCABI::seq_cst;
2317
2318 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2319 }
2320
2321 // ------------------- Visitors.
2322 using InstVisitor<MemorySanitizerVisitor>::visit;
2323 void visit(Instruction &I) {
2324 if (I.getMetadata(LLVMContext::MD_nosanitize))
2325 return;
2326 // Don't want to visit if we're in the prologue
2327 if (isInPrologue(I))
2328 return;
2329 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2330 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2331 // We still need to set the shadow and origin to clean values.
2332 setShadow(&I, getCleanShadow(&I));
2333 setOrigin(&I, getCleanOrigin());
2334 return;
2335 }
2336
2337 Instructions.push_back(&I);
2338 }
2339
2340 /// Instrument LoadInst
2341 ///
2342 /// Loads the corresponding shadow and (optionally) origin.
2343 /// Optionally, checks that the load address is fully defined.
2344 void visitLoadInst(LoadInst &I) {
2345 assert(I.getType()->isSized() && "Load type must have size");
2346 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2347 NextNodeIRBuilder IRB(&I);
2348 Type *ShadowTy = getShadowTy(&I);
2349 Value *Addr = I.getPointerOperand();
2350 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2351 const Align Alignment = I.getAlign();
2352 if (PropagateShadow) {
2353 std::tie(ShadowPtr, OriginPtr) =
2354 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2355 setShadow(&I,
2356 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2357 } else {
2358 setShadow(&I, getCleanShadow(&I));
2359 }
2360
2362 insertCheckShadowOf(I.getPointerOperand(), &I);
2363
2364 if (I.isAtomic())
2365 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2366
2367 if (MS.TrackOrigins) {
2368 if (PropagateShadow) {
2369 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2370 setOrigin(
2371 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2372 } else {
2373 setOrigin(&I, getCleanOrigin());
2374 }
2375 }
2376 }
2377
2378 /// Instrument StoreInst
2379 ///
2380 /// Stores the corresponding shadow and (optionally) origin.
2381 /// Optionally, checks that the store address is fully defined.
2382 void visitStoreInst(StoreInst &I) {
2383 StoreList.push_back(&I);
2385 insertCheckShadowOf(I.getPointerOperand(), &I);
2386 }
2387
2388 void handleCASOrRMW(Instruction &I) {
2390
2391 IRBuilder<> IRB(&I);
2392 Value *Addr = I.getOperand(0);
2393 Value *Val = I.getOperand(1);
2394 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2395 /*isStore*/ true)
2396 .first;
2397
2399 insertCheckShadowOf(Addr, &I);
2400
2401 // Only test the conditional argument of cmpxchg instruction.
2402 // The other argument can potentially be uninitialized, but we can not
2403 // detect this situation reliably without possible false positives.
2405 insertCheckShadowOf(Val, &I);
2406
2407 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2408
2409 setShadow(&I, getCleanShadow(&I));
2410 setOrigin(&I, getCleanOrigin());
2411 }
2412
2413 void visitAtomicRMWInst(AtomicRMWInst &I) {
2414 handleCASOrRMW(I);
2415 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2416 }
2417
2418 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2419 handleCASOrRMW(I);
2420 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2421 }
2422
2423 // Vector manipulation.
2424 void visitExtractElementInst(ExtractElementInst &I) {
2425 insertCheckShadowOf(I.getOperand(1), &I);
2426 IRBuilder<> IRB(&I);
2427 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2428 "_msprop"));
2429 setOrigin(&I, getOrigin(&I, 0));
2430 }
2431
2432 void visitInsertElementInst(InsertElementInst &I) {
2433 insertCheckShadowOf(I.getOperand(2), &I);
2434 IRBuilder<> IRB(&I);
2435 auto *Shadow0 = getShadow(&I, 0);
2436 auto *Shadow1 = getShadow(&I, 1);
2437 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2438 "_msprop"));
2439 setOriginForNaryOp(I);
2440 }
2441
2442 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2443 IRBuilder<> IRB(&I);
2444 auto *Shadow0 = getShadow(&I, 0);
2445 auto *Shadow1 = getShadow(&I, 1);
2446 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2447 "_msprop"));
2448 setOriginForNaryOp(I);
2449 }
2450
2451 // Casts.
2452 void visitSExtInst(SExtInst &I) {
2453 IRBuilder<> IRB(&I);
2454 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2455 setOrigin(&I, getOrigin(&I, 0));
2456 }
2457
2458 void visitZExtInst(ZExtInst &I) {
2459 IRBuilder<> IRB(&I);
2460 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2461 setOrigin(&I, getOrigin(&I, 0));
2462 }
2463
2464 void visitTruncInst(TruncInst &I) {
2465 IRBuilder<> IRB(&I);
2466 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2467 setOrigin(&I, getOrigin(&I, 0));
2468 }
2469
2470 void visitBitCastInst(BitCastInst &I) {
2471 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2472 // a musttail call and a ret, don't instrument. New instructions are not
2473 // allowed after a musttail call.
2474 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2475 if (CI->isMustTailCall())
2476 return;
2477 IRBuilder<> IRB(&I);
2478 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2479 setOrigin(&I, getOrigin(&I, 0));
2480 }
2481
2482 void visitPtrToIntInst(PtrToIntInst &I) {
2483 IRBuilder<> IRB(&I);
2484 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2485 "_msprop_ptrtoint"));
2486 setOrigin(&I, getOrigin(&I, 0));
2487 }
2488
2489 void visitIntToPtrInst(IntToPtrInst &I) {
2490 IRBuilder<> IRB(&I);
2491 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2492 "_msprop_inttoptr"));
2493 setOrigin(&I, getOrigin(&I, 0));
2494 }
2495
2496 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2497 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2498 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2499 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2500 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2501 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2502
2503 /// Generic handler to compute shadow for bitwise AND.
2504 ///
2505 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2506 ///
2507 /// This code is precise: it implements the rule that "And" of an initialized
2508 /// zero bit always results in an initialized value:
2509 // 1&1 => 1; 0&1 => 0; p&1 => p;
2510 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2511 // 1&p => p; 0&p => 0; p&p => p;
2512 //
2513 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2514 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2515 Value *S2) {
2516 Value *S1S2 = IRB.CreateAnd(S1, S2);
2517 Value *V1S2 = IRB.CreateAnd(V1, S2);
2518 Value *S1V2 = IRB.CreateAnd(S1, V2);
2519
2520 if (V1->getType() != S1->getType()) {
2521 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2522 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2523 }
2524
2525 return IRB.CreateOr({S1S2, V1S2, S1V2});
2526 }
2527
2528 /// Handler for bitwise AND operator.
2529 void visitAnd(BinaryOperator &I) {
2530 IRBuilder<> IRB(&I);
2531 Value *V1 = I.getOperand(0);
2532 Value *V2 = I.getOperand(1);
2533 Value *S1 = getShadow(&I, 0);
2534 Value *S2 = getShadow(&I, 1);
2535
2536 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2537
2538 setShadow(&I, OutShadow);
2539 setOriginForNaryOp(I);
2540 }
2541
2542 void visitOr(BinaryOperator &I) {
2543 IRBuilder<> IRB(&I);
2544 // "Or" of 1 and a poisoned value results in unpoisoned value:
2545 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2546 // 1|0 => 1; 0|0 => 0; p|0 => p;
2547 // 1|p => 1; 0|p => p; p|p => p;
2548 //
2549 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2550 //
2551 // If the "disjoint OR" property is violated, the result is poison, and
2552 // hence the entire shadow is uninitialized:
2553 // S = S | SignExt(V1 & V2 != 0)
2554 Value *S1 = getShadow(&I, 0);
2555 Value *S2 = getShadow(&I, 1);
2556 Value *V1 = I.getOperand(0);
2557 Value *V2 = I.getOperand(1);
2558 if (V1->getType() != S1->getType()) {
2559 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2560 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2561 }
2562
2563 Value *NotV1 = IRB.CreateNot(V1);
2564 Value *NotV2 = IRB.CreateNot(V2);
2565
2566 Value *S1S2 = IRB.CreateAnd(S1, S2);
2567 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2568 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2569
2570 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2571
2572 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2573 Value *V1V2 = IRB.CreateAnd(V1, V2);
2574 Value *DisjointOrShadow = IRB.CreateSExt(
2575 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2576 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2577 }
2578
2579 setShadow(&I, S);
2580 setOriginForNaryOp(I);
2581 }
2582
2583 /// Default propagation of shadow and/or origin.
2584 ///
2585 /// This class implements the general case of shadow propagation, used in all
2586 /// cases where we don't know and/or don't care about what the operation
2587 /// actually does. It converts all input shadow values to a common type
2588 /// (extending or truncating as necessary), and bitwise OR's them.
2589 ///
2590 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2591 /// fully initialized), and less prone to false positives.
2592 ///
2593 /// This class also implements the general case of origin propagation. For a
2594 /// Nary operation, result origin is set to the origin of an argument that is
2595 /// not entirely initialized. If there is more than one such arguments, the
2596 /// rightmost of them is picked. It does not matter which one is picked if all
2597 /// arguments are initialized.
2598 template <bool CombineShadow> class Combiner {
2599 Value *Shadow = nullptr;
2600 Value *Origin = nullptr;
2601 IRBuilder<> &IRB;
2602 MemorySanitizerVisitor *MSV;
2603
2604 public:
2605 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2606 : IRB(IRB), MSV(MSV) {}
2607
2608 /// Add a pair of shadow and origin values to the mix.
2609 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2610 if (CombineShadow) {
2611 assert(OpShadow);
2612 if (!Shadow)
2613 Shadow = OpShadow;
2614 else {
2615 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2616 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2617 }
2618 }
2619
2620 if (MSV->MS.TrackOrigins) {
2621 assert(OpOrigin);
2622 if (!Origin) {
2623 Origin = OpOrigin;
2624 } else {
2625 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2626 // No point in adding something that might result in 0 origin value.
2627 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2628 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2629 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2630 }
2631 }
2632 }
2633 return *this;
2634 }
2635
2636 /// Add an application value to the mix.
2637 Combiner &Add(Value *V) {
2638 Value *OpShadow = MSV->getShadow(V);
2639 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2640 return Add(OpShadow, OpOrigin);
2641 }
2642
2643 /// Set the current combined values as the given instruction's shadow
2644 /// and origin.
2645 void Done(Instruction *I) {
2646 if (CombineShadow) {
2647 assert(Shadow);
2648 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2649 MSV->setShadow(I, Shadow);
2650 }
2651 if (MSV->MS.TrackOrigins) {
2652 assert(Origin);
2653 MSV->setOrigin(I, Origin);
2654 }
2655 }
2656
2657 /// Store the current combined value at the specified origin
2658 /// location.
2659 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2660 if (MSV->MS.TrackOrigins) {
2661 assert(Origin);
2662 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2663 }
2664 }
2665 };
2666
2667 using ShadowAndOriginCombiner = Combiner<true>;
2668 using OriginCombiner = Combiner<false>;
2669
2670 /// Propagate origin for arbitrary operation.
2671 void setOriginForNaryOp(Instruction &I) {
2672 if (!MS.TrackOrigins)
2673 return;
2674 IRBuilder<> IRB(&I);
2675 OriginCombiner OC(this, IRB);
2676 for (Use &Op : I.operands())
2677 OC.Add(Op.get());
2678 OC.Done(&I);
2679 }
2680
2681 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2682 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2683 "Vector of pointers is not a valid shadow type");
2684 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2686 : Ty->getPrimitiveSizeInBits();
2687 }
2688
2689 /// Cast between two shadow types, extending or truncating as
2690 /// necessary.
2691 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2692 bool Signed = false) {
2693 Type *srcTy = V->getType();
2694 if (srcTy == dstTy)
2695 return V;
2696 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2697 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2698 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2699 return IRB.CreateICmpNE(V, getCleanShadow(V));
2700
2701 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2702 return IRB.CreateIntCast(V, dstTy, Signed);
2703 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2704 cast<VectorType>(dstTy)->getElementCount() ==
2705 cast<VectorType>(srcTy)->getElementCount())
2706 return IRB.CreateIntCast(V, dstTy, Signed);
2707 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2708 Value *V2 =
2709 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2710 return IRB.CreateBitCast(V2, dstTy);
2711 // TODO: handle struct types.
2712 }
2713
2714 /// Cast an application value to the type of its own shadow.
2715 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2716 Type *ShadowTy = getShadowTy(V);
2717 if (V->getType() == ShadowTy)
2718 return V;
2719 if (V->getType()->isPtrOrPtrVectorTy())
2720 return IRB.CreatePtrToInt(V, ShadowTy);
2721 else
2722 return IRB.CreateBitCast(V, ShadowTy);
2723 }
2724
2725 /// Propagate shadow for arbitrary operation.
2726 void handleShadowOr(Instruction &I) {
2727 IRBuilder<> IRB(&I);
2728 ShadowAndOriginCombiner SC(this, IRB);
2729 for (Use &Op : I.operands())
2730 SC.Add(Op.get());
2731 SC.Done(&I);
2732 }
2733
2734 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2735 // of elements.
2736 //
2737 // For example, suppose we have:
2738 // VectorA: <a0, a1, a2, a3, a4, a5>
2739 // VectorB: <b0, b1, b2, b3, b4, b5>
2740 // ReductionFactor: 3
2741 // Shards: 1
2742 // The output would be:
2743 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2744 //
2745 // If we have:
2746 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2747 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2748 // ReductionFactor: 2
2749 // Shards: 2
2750 // then a and be each have 2 "shards", resulting in the output being
2751 // interleaved:
2752 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2753 //
2754 // This is convenient for instrumenting horizontal add/sub.
2755 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2756 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2757 unsigned Shards, Value *VectorA, Value *VectorB) {
2758 assert(isa<FixedVectorType>(VectorA->getType()));
2759 unsigned NumElems =
2760 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2761
2762 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2763 if (VectorB) {
2764 assert(VectorA->getType() == VectorB->getType());
2765 TotalNumElems *= 2;
2766 }
2767
2768 assert(NumElems % (ReductionFactor * Shards) == 0);
2769
2770 Value *Or = nullptr;
2771
2772 IRBuilder<> IRB(&I);
2773 for (unsigned i = 0; i < ReductionFactor; i++) {
2774 SmallVector<int, 16> Mask;
2775
2776 for (unsigned j = 0; j < Shards; j++) {
2777 unsigned Offset = NumElems / Shards * j;
2778
2779 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2780 Mask.push_back(Offset + X + i);
2781
2782 if (VectorB) {
2783 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2784 Mask.push_back(NumElems + Offset + X + i);
2785 }
2786 }
2787
2788 Value *Masked;
2789 if (VectorB)
2790 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2791 else
2792 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2793
2794 if (Or)
2795 Or = IRB.CreateOr(Or, Masked);
2796 else
2797 Or = Masked;
2798 }
2799
2800 return Or;
2801 }
2802
2803 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2804 /// fields.
2805 ///
2806 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2807 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2808 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2809 assert(I.arg_size() == 1 || I.arg_size() == 2);
2810
2811 assert(I.getType()->isVectorTy());
2812 assert(I.getArgOperand(0)->getType()->isVectorTy());
2813
2814 [[maybe_unused]] FixedVectorType *ParamType =
2815 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2816 assert((I.arg_size() != 2) ||
2817 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2818 [[maybe_unused]] FixedVectorType *ReturnType =
2819 cast<FixedVectorType>(I.getType());
2820 assert(ParamType->getNumElements() * I.arg_size() ==
2821 2 * ReturnType->getNumElements());
2822
2823 IRBuilder<> IRB(&I);
2824
2825 // Horizontal OR of shadow
2826 Value *FirstArgShadow = getShadow(&I, 0);
2827 Value *SecondArgShadow = nullptr;
2828 if (I.arg_size() == 2)
2829 SecondArgShadow = getShadow(&I, 1);
2830
2831 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2832 FirstArgShadow, SecondArgShadow);
2833
2834 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2835
2836 setShadow(&I, OrShadow);
2837 setOriginForNaryOp(I);
2838 }
2839
2840 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2841 /// fields, with the parameters reinterpreted to have elements of a specified
2842 /// width. For example:
2843 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2844 /// conceptually operates on
2845 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2846 /// and can be handled with ReinterpretElemWidth == 16.
2847 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2848 int ReinterpretElemWidth) {
2849 assert(I.arg_size() == 1 || I.arg_size() == 2);
2850
2851 assert(I.getType()->isVectorTy());
2852 assert(I.getArgOperand(0)->getType()->isVectorTy());
2853
2854 FixedVectorType *ParamType =
2855 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2856 assert((I.arg_size() != 2) ||
2857 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2858
2859 [[maybe_unused]] FixedVectorType *ReturnType =
2860 cast<FixedVectorType>(I.getType());
2861 assert(ParamType->getNumElements() * I.arg_size() ==
2862 2 * ReturnType->getNumElements());
2863
2864 IRBuilder<> IRB(&I);
2865
2866 FixedVectorType *ReinterpretShadowTy = nullptr;
2867 assert(isAligned(Align(ReinterpretElemWidth),
2868 ParamType->getPrimitiveSizeInBits()));
2869 ReinterpretShadowTy = FixedVectorType::get(
2870 IRB.getIntNTy(ReinterpretElemWidth),
2871 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2872
2873 // Horizontal OR of shadow
2874 Value *FirstArgShadow = getShadow(&I, 0);
2875 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2876
2877 // If we had two parameters each with an odd number of elements, the total
2878 // number of elements is even, but we have never seen this in extant
2879 // instruction sets, so we enforce that each parameter must have an even
2880 // number of elements.
2882 Align(2),
2883 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2884
2885 Value *SecondArgShadow = nullptr;
2886 if (I.arg_size() == 2) {
2887 SecondArgShadow = getShadow(&I, 1);
2888 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2889 }
2890
2891 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2892 FirstArgShadow, SecondArgShadow);
2893
2894 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2895
2896 setShadow(&I, OrShadow);
2897 setOriginForNaryOp(I);
2898 }
2899
2900 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2901
2902 // Handle multiplication by constant.
2903 //
2904 // Handle a special case of multiplication by constant that may have one or
2905 // more zeros in the lower bits. This makes corresponding number of lower bits
2906 // of the result zero as well. We model it by shifting the other operand
2907 // shadow left by the required number of bits. Effectively, we transform
2908 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2909 // We use multiplication by 2**N instead of shift to cover the case of
2910 // multiplication by 0, which may occur in some elements of a vector operand.
2911 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2912 Value *OtherArg) {
2913 Constant *ShadowMul;
2914 Type *Ty = ConstArg->getType();
2915 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2916 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2917 Type *EltTy = VTy->getElementType();
2919 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2920 if (ConstantInt *Elt =
2922 const APInt &V = Elt->getValue();
2923 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2924 Elements.push_back(ConstantInt::get(EltTy, V2));
2925 } else {
2926 Elements.push_back(ConstantInt::get(EltTy, 1));
2927 }
2928 }
2929 ShadowMul = ConstantVector::get(Elements);
2930 } else {
2931 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2932 const APInt &V = Elt->getValue();
2933 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2934 ShadowMul = ConstantInt::get(Ty, V2);
2935 } else {
2936 ShadowMul = ConstantInt::get(Ty, 1);
2937 }
2938 }
2939
2940 IRBuilder<> IRB(&I);
2941 setShadow(&I,
2942 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2943 setOrigin(&I, getOrigin(OtherArg));
2944 }
2945
2946 void visitMul(BinaryOperator &I) {
2947 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2948 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2949 if (constOp0 && !constOp1)
2950 handleMulByConstant(I, constOp0, I.getOperand(1));
2951 else if (constOp1 && !constOp0)
2952 handleMulByConstant(I, constOp1, I.getOperand(0));
2953 else
2954 handleShadowOr(I);
2955 }
2956
2957 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2958 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2959 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2960 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2961 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2962 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2963
2964 void handleIntegerDiv(Instruction &I) {
2965 IRBuilder<> IRB(&I);
2966 // Strict on the second argument.
2967 insertCheckShadowOf(I.getOperand(1), &I);
2968 setShadow(&I, getShadow(&I, 0));
2969 setOrigin(&I, getOrigin(&I, 0));
2970 }
2971
2972 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2973 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2974 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2975 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2976
2977 // Floating point division is side-effect free. We can not require that the
2978 // divisor is fully initialized and must propagate shadow. See PR37523.
2979 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2980 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2981
2982 /// Instrument == and != comparisons.
2983 ///
2984 /// Sometimes the comparison result is known even if some of the bits of the
2985 /// arguments are not.
2986 void handleEqualityComparison(ICmpInst &I) {
2987 IRBuilder<> IRB(&I);
2988 Value *A = I.getOperand(0);
2989 Value *B = I.getOperand(1);
2990 Value *Sa = getShadow(A);
2991 Value *Sb = getShadow(B);
2992
2993 // Get rid of pointers and vectors of pointers.
2994 // For ints (and vectors of ints), types of A and Sa match,
2995 // and this is a no-op.
2996 A = IRB.CreatePointerCast(A, Sa->getType());
2997 B = IRB.CreatePointerCast(B, Sb->getType());
2998
2999 // A == B <==> (C = A^B) == 0
3000 // A != B <==> (C = A^B) != 0
3001 // Sc = Sa | Sb
3002 Value *C = IRB.CreateXor(A, B);
3003 Value *Sc = IRB.CreateOr(Sa, Sb);
3004 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
3005 // Result is defined if one of the following is true
3006 // * there is a defined 1 bit in C
3007 // * C is fully defined
3008 // Si = !(C & ~Sc) && Sc
3010 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
3011 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
3012 Value *RHS =
3013 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
3014 Value *Si = IRB.CreateAnd(LHS, RHS);
3015 Si->setName("_msprop_icmp");
3016 setShadow(&I, Si);
3017 setOriginForNaryOp(I);
3018 }
3019
3020 /// Instrument relational comparisons.
3021 ///
3022 /// This function does exact shadow propagation for all relational
3023 /// comparisons of integers, pointers and vectors of those.
3024 /// FIXME: output seems suboptimal when one of the operands is a constant
3025 void handleRelationalComparisonExact(ICmpInst &I) {
3026 IRBuilder<> IRB(&I);
3027 Value *A = I.getOperand(0);
3028 Value *B = I.getOperand(1);
3029 Value *Sa = getShadow(A);
3030 Value *Sb = getShadow(B);
3031
3032 // Get rid of pointers and vectors of pointers.
3033 // For ints (and vectors of ints), types of A and Sa match,
3034 // and this is a no-op.
3035 A = IRB.CreatePointerCast(A, Sa->getType());
3036 B = IRB.CreatePointerCast(B, Sb->getType());
3037
3038 // Let [a0, a1] be the interval of possible values of A, taking into account
3039 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3040 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3041 bool IsSigned = I.isSigned();
3042
3043 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3044 if (IsSigned) {
3045 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3046 // should be preserved, if checked with `getUnsignedPredicate()`.
3047 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3048 // affected, as they are created by effectively adding/substructing from
3049 // A (or B) a value, derived from shadow, with no overflow, either
3050 // before or after sign flip.
3051 APInt MinVal =
3052 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3053 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3054 }
3055 // Minimize undefined bits.
3056 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3057 Value *Max = IRB.CreateOr(V, S);
3058 return std::make_pair(Min, Max);
3059 };
3060
3061 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3062 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3063 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3064 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3065
3066 Value *Si = IRB.CreateXor(S1, S2);
3067 setShadow(&I, Si);
3068 setOriginForNaryOp(I);
3069 }
3070
3071 /// Instrument signed relational comparisons.
3072 ///
3073 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3074 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3075 void handleSignedRelationalComparison(ICmpInst &I) {
3076 Constant *constOp;
3077 Value *op = nullptr;
3079 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3080 op = I.getOperand(0);
3081 pre = I.getPredicate();
3082 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3083 op = I.getOperand(1);
3084 pre = I.getSwappedPredicate();
3085 } else {
3086 handleShadowOr(I);
3087 return;
3088 }
3089
3090 if ((constOp->isNullValue() &&
3091 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3092 (constOp->isAllOnesValue() &&
3093 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3094 IRBuilder<> IRB(&I);
3095 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3096 "_msprop_icmp_s");
3097 setShadow(&I, Shadow);
3098 setOrigin(&I, getOrigin(op));
3099 } else {
3100 handleShadowOr(I);
3101 }
3102 }
3103
3104 void visitICmpInst(ICmpInst &I) {
3105 if (!ClHandleICmp) {
3106 handleShadowOr(I);
3107 return;
3108 }
3109 if (I.isEquality()) {
3110 handleEqualityComparison(I);
3111 return;
3112 }
3113
3114 assert(I.isRelational());
3115 if (ClHandleICmpExact) {
3116 handleRelationalComparisonExact(I);
3117 return;
3118 }
3119 if (I.isSigned()) {
3120 handleSignedRelationalComparison(I);
3121 return;
3122 }
3123
3124 assert(I.isUnsigned());
3125 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3126 handleRelationalComparisonExact(I);
3127 return;
3128 }
3129
3130 handleShadowOr(I);
3131 }
3132
3133 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3134
3135 void handleShift(BinaryOperator &I) {
3136 IRBuilder<> IRB(&I);
3137 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3138 // Otherwise perform the same shift on S1.
3139 Value *S1 = getShadow(&I, 0);
3140 Value *S2 = getShadow(&I, 1);
3141 Value *S2Conv =
3142 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3143 Value *V2 = I.getOperand(1);
3144 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3145 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3146 setOriginForNaryOp(I);
3147 }
3148
3149 void visitShl(BinaryOperator &I) { handleShift(I); }
3150 void visitAShr(BinaryOperator &I) { handleShift(I); }
3151 void visitLShr(BinaryOperator &I) { handleShift(I); }
3152
3153 void handleFunnelShift(IntrinsicInst &I) {
3154 IRBuilder<> IRB(&I);
3155 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3156 // Otherwise perform the same shift on S0 and S1.
3157 Value *S0 = getShadow(&I, 0);
3158 Value *S1 = getShadow(&I, 1);
3159 Value *S2 = getShadow(&I, 2);
3160 Value *S2Conv =
3161 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3162 Value *V2 = I.getOperand(2);
3163 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3164 {S0, S1, V2});
3165 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3166 setOriginForNaryOp(I);
3167 }
3168
3169 /// Instrument llvm.memmove
3170 ///
3171 /// At this point we don't know if llvm.memmove will be inlined or not.
3172 /// If we don't instrument it and it gets inlined,
3173 /// our interceptor will not kick in and we will lose the memmove.
3174 /// If we instrument the call here, but it does not get inlined,
3175 /// we will memmove the shadow twice: which is bad in case
3176 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3177 ///
3178 /// Similar situation exists for memcpy and memset.
3179 void visitMemMoveInst(MemMoveInst &I) {
3180 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3181 IRBuilder<> IRB(&I);
3182 IRB.CreateCall(MS.MemmoveFn,
3183 {I.getArgOperand(0), I.getArgOperand(1),
3184 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3186 }
3187
3188 /// Instrument memcpy
3189 ///
3190 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3191 /// unfortunate as it may slowdown small constant memcpys.
3192 /// FIXME: consider doing manual inline for small constant sizes and proper
3193 /// alignment.
3194 ///
3195 /// Note: This also handles memcpy.inline, which promises no calls to external
3196 /// functions as an optimization. However, with instrumentation enabled this
3197 /// is difficult to promise; additionally, we know that the MSan runtime
3198 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3199 /// instrumentation it's safe to turn memcpy.inline into a call to
3200 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3201 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3202 void visitMemCpyInst(MemCpyInst &I) {
3203 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3204 IRBuilder<> IRB(&I);
3205 IRB.CreateCall(MS.MemcpyFn,
3206 {I.getArgOperand(0), I.getArgOperand(1),
3207 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3209 }
3210
3211 // Same as memcpy.
3212 void visitMemSetInst(MemSetInst &I) {
3213 IRBuilder<> IRB(&I);
3214 IRB.CreateCall(
3215 MS.MemsetFn,
3216 {I.getArgOperand(0),
3217 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3218 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3220 }
3221
3222 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3223
3224 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3225
3226 /// Handle vector store-like intrinsics.
3227 ///
3228 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3229 /// has 1 pointer argument and 1 vector argument, returns void.
3230 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3231 assert(I.arg_size() == 2);
3232
3233 IRBuilder<> IRB(&I);
3234 Value *Addr = I.getArgOperand(0);
3235 Value *Shadow = getShadow(&I, 1);
3236 Value *ShadowPtr, *OriginPtr;
3237
3238 // We don't know the pointer alignment (could be unaligned SSE store!).
3239 // Have to assume to worst case.
3240 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3241 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3242 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3243
3245 insertCheckShadowOf(Addr, &I);
3246
3247 // FIXME: factor out common code from materializeStores
3248 if (MS.TrackOrigins)
3249 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3250 return true;
3251 }
3252
3253 /// Handle vector load-like intrinsics.
3254 ///
3255 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3256 /// has 1 pointer argument, returns a vector.
3257 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3258 assert(I.arg_size() == 1);
3259
3260 IRBuilder<> IRB(&I);
3261 Value *Addr = I.getArgOperand(0);
3262
3263 Type *ShadowTy = getShadowTy(&I);
3264 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3265 if (PropagateShadow) {
3266 // We don't know the pointer alignment (could be unaligned SSE load!).
3267 // Have to assume to worst case.
3268 const Align Alignment = Align(1);
3269 std::tie(ShadowPtr, OriginPtr) =
3270 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3271 setShadow(&I,
3272 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3273 } else {
3274 setShadow(&I, getCleanShadow(&I));
3275 }
3276
3278 insertCheckShadowOf(Addr, &I);
3279
3280 if (MS.TrackOrigins) {
3281 if (PropagateShadow)
3282 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3283 else
3284 setOrigin(&I, getCleanOrigin());
3285 }
3286 return true;
3287 }
3288
3289 /// Handle (SIMD arithmetic)-like intrinsics.
3290 ///
3291 /// Instrument intrinsics with any number of arguments of the same type [*],
3292 /// equal to the return type, plus a specified number of trailing flags of
3293 /// any type.
3294 ///
3295 /// [*] The type should be simple (no aggregates or pointers; vectors are
3296 /// fine).
3297 ///
3298 /// Caller guarantees that this intrinsic does not access memory.
3299 ///
3300 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3301 /// by this handler. See horizontalReduce().
3302 ///
3303 /// TODO: permutation intrinsics are also often incorrectly matched.
3304 [[maybe_unused]] bool
3305 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3306 unsigned int trailingFlags) {
3307 Type *RetTy = I.getType();
3308 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3309 return false;
3310
3311 unsigned NumArgOperands = I.arg_size();
3312 assert(NumArgOperands >= trailingFlags);
3313 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3314 Type *Ty = I.getArgOperand(i)->getType();
3315 if (Ty != RetTy)
3316 return false;
3317 }
3318
3319 IRBuilder<> IRB(&I);
3320 ShadowAndOriginCombiner SC(this, IRB);
3321 for (unsigned i = 0; i < NumArgOperands; ++i)
3322 SC.Add(I.getArgOperand(i));
3323 SC.Done(&I);
3324
3325 return true;
3326 }
3327
3328 /// Returns whether it was able to heuristically instrument unknown
3329 /// intrinsics.
3330 ///
3331 /// The main purpose of this code is to do something reasonable with all
3332 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3333 /// We recognize several classes of intrinsics by their argument types and
3334 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3335 /// sure that we know what the intrinsic does.
3336 ///
3337 /// We special-case intrinsics where this approach fails. See llvm.bswap
3338 /// handling as an example of that.
3339 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3340 unsigned NumArgOperands = I.arg_size();
3341 if (NumArgOperands == 0)
3342 return false;
3343
3344 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3345 I.getArgOperand(1)->getType()->isVectorTy() &&
3346 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3347 // This looks like a vector store.
3348 return handleVectorStoreIntrinsic(I);
3349 }
3350
3351 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3352 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3353 // This looks like a vector load.
3354 return handleVectorLoadIntrinsic(I);
3355 }
3356
3357 if (I.doesNotAccessMemory())
3358 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3359 return true;
3360
3361 // FIXME: detect and handle SSE maskstore/maskload?
3362 // Some cases are now handled in handleAVXMasked{Load,Store}.
3363 return false;
3364 }
3365
3366 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3367 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3369 dumpInst(I);
3370
3371 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3372 << "\n");
3373 return true;
3374 } else
3375 return false;
3376 }
3377
3378 void handleInvariantGroup(IntrinsicInst &I) {
3379 setShadow(&I, getShadow(&I, 0));
3380 setOrigin(&I, getOrigin(&I, 0));
3381 }
3382
3383 void handleLifetimeStart(IntrinsicInst &I) {
3384 if (!PoisonStack)
3385 return;
3386 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3387 if (AI)
3388 LifetimeStartList.push_back(std::make_pair(&I, AI));
3389 }
3390
3391 void handleBswap(IntrinsicInst &I) {
3392 IRBuilder<> IRB(&I);
3393 Value *Op = I.getArgOperand(0);
3394 Type *OpType = Op->getType();
3395 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3396 getShadow(Op)));
3397 setOrigin(&I, getOrigin(Op));
3398 }
3399
3400 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3401 // and a 1. If the input is all zero, it is fully initialized iff
3402 // !is_zero_poison.
3403 //
3404 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3405 // concrete value 0/1, and ? is an uninitialized bit:
3406 // - 0001 0??? is fully initialized
3407 // - 000? ???? is fully uninitialized (*)
3408 // - ???? ???? is fully uninitialized
3409 // - 0000 0000 is fully uninitialized if is_zero_poison,
3410 // fully initialized otherwise
3411 //
3412 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3413 // only need to poison 4 bits.
3414 //
3415 // OutputShadow =
3416 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3417 // || (is_zero_poison && AllZeroSrc)
3418 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3419 IRBuilder<> IRB(&I);
3420 Value *Src = I.getArgOperand(0);
3421 Value *SrcShadow = getShadow(Src);
3422
3423 Value *False = IRB.getInt1(false);
3424 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3425 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3426 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3427 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3428
3429 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3430 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3431
3432 Value *NotAllZeroShadow =
3433 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3434 Value *OutputShadow =
3435 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3436
3437 // If zero poison is requested, mix in with the shadow
3438 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3439 if (!IsZeroPoison->isZeroValue()) {
3440 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3441 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3442 }
3443
3444 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3445
3446 setShadow(&I, OutputShadow);
3447 setOriginForNaryOp(I);
3448 }
3449
3450 /// Handle Arm NEON vector convert intrinsics.
3451 ///
3452 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3453 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
3454 ///
3455 /// For conversions to or from fixed-point, there is a trailing argument to
3456 /// indicate the fixed-point precision:
3457 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
3458 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
3459 ///
3460 /// For x86 SSE vector convert intrinsics, see
3461 /// handleSSEVectorConvertIntrinsic().
3462 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I, bool FixedPoint) {
3463 if (FixedPoint)
3464 assert(I.arg_size() == 2);
3465 else
3466 assert(I.arg_size() == 1);
3467
3468 IRBuilder<> IRB(&I);
3469 Value *S0 = getShadow(&I, 0);
3470
3471 if (FixedPoint) {
3472 Value *Precision = I.getOperand(1);
3473 insertCheckShadowOf(Precision, &I);
3474 }
3475
3476 /// For scalars:
3477 /// Since they are converting from floating-point to integer, the output is
3478 /// - fully uninitialized if *any* bit of the input is uninitialized
3479 /// - fully ininitialized if all bits of the input are ininitialized
3480 /// We apply the same principle on a per-field basis for vectors.
3481 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3482 getShadowTy(&I));
3483 setShadow(&I, OutShadow);
3484 setOriginForNaryOp(I);
3485 }
3486
3487 /// Some instructions have additional zero-elements in the return type
3488 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3489 ///
3490 /// This function will return a vector type with the same number of elements
3491 /// as the input, but same per-element width as the return value e.g.,
3492 /// <8 x i8>.
3493 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3494 assert(isa<FixedVectorType>(getShadowTy(&I)));
3495 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3496
3497 // TODO: generalize beyond 2x?
3498 if (ShadowType->getElementCount() ==
3499 cast<VectorType>(Src->getType())->getElementCount() * 2)
3500 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3501
3502 assert(ShadowType->getElementCount() ==
3503 cast<VectorType>(Src->getType())->getElementCount());
3504
3505 return ShadowType;
3506 }
3507
3508 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3509 /// to match the length of the shadow for the instruction.
3510 /// If scalar types of the vectors are different, it will use the type of the
3511 /// input vector.
3512 /// This is more type-safe than CreateShadowCast().
3513 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3514 IRBuilder<> IRB(&I);
3516 assert(isa<FixedVectorType>(I.getType()));
3517
3518 Value *FullShadow = getCleanShadow(&I);
3519 unsigned ShadowNumElems =
3520 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3521 unsigned FullShadowNumElems =
3522 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3523
3524 assert((ShadowNumElems == FullShadowNumElems) ||
3525 (ShadowNumElems * 2 == FullShadowNumElems));
3526
3527 if (ShadowNumElems == FullShadowNumElems) {
3528 FullShadow = Shadow;
3529 } else {
3530 // TODO: generalize beyond 2x?
3531 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3532 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3533
3534 // Append zeros
3535 FullShadow =
3536 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3537 }
3538
3539 return FullShadow;
3540 }
3541
3542 /// Handle x86 SSE vector conversion.
3543 ///
3544 /// e.g., single-precision to half-precision conversion:
3545 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3546 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3547 ///
3548 /// floating-point to integer:
3549 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3550 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3551 ///
3552 /// Note: if the output has more elements, they are zero-initialized (and
3553 /// therefore the shadow will also be initialized).
3554 ///
3555 /// This differs from handleSSEVectorConvertIntrinsic() because it
3556 /// propagates uninitialized shadow (instead of checking the shadow).
3557 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3558 bool HasRoundingMode) {
3559 if (HasRoundingMode) {
3560 assert(I.arg_size() == 2);
3561 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3562 assert(RoundingMode->getType()->isIntegerTy());
3563 } else {
3564 assert(I.arg_size() == 1);
3565 }
3566
3567 Value *Src = I.getArgOperand(0);
3568 assert(Src->getType()->isVectorTy());
3569
3570 // The return type might have more elements than the input.
3571 // Temporarily shrink the return type's number of elements.
3572 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3573
3574 IRBuilder<> IRB(&I);
3575 Value *S0 = getShadow(&I, 0);
3576
3577 /// For scalars:
3578 /// Since they are converting to and/or from floating-point, the output is:
3579 /// - fully uninitialized if *any* bit of the input is uninitialized
3580 /// - fully ininitialized if all bits of the input are ininitialized
3581 /// We apply the same principle on a per-field basis for vectors.
3582 Value *Shadow =
3583 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3584
3585 // The return type might have more elements than the input.
3586 // Extend the return type back to its original width if necessary.
3587 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3588
3589 setShadow(&I, FullShadow);
3590 setOriginForNaryOp(I);
3591 }
3592
3593 // Instrument x86 SSE vector convert intrinsic.
3594 //
3595 // This function instruments intrinsics like cvtsi2ss:
3596 // %Out = int_xxx_cvtyyy(%ConvertOp)
3597 // or
3598 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3599 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3600 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3601 // elements from \p CopyOp.
3602 // In most cases conversion involves floating-point value which may trigger a
3603 // hardware exception when not fully initialized. For this reason we require
3604 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3605 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3606 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3607 // return a fully initialized value.
3608 //
3609 // For Arm NEON vector convert intrinsics, see
3610 // handleNEONVectorConvertIntrinsic().
3611 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3612 bool HasRoundingMode = false) {
3613 IRBuilder<> IRB(&I);
3614 Value *CopyOp, *ConvertOp;
3615
3616 assert((!HasRoundingMode ||
3617 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3618 "Invalid rounding mode");
3619
3620 switch (I.arg_size() - HasRoundingMode) {
3621 case 2:
3622 CopyOp = I.getArgOperand(0);
3623 ConvertOp = I.getArgOperand(1);
3624 break;
3625 case 1:
3626 ConvertOp = I.getArgOperand(0);
3627 CopyOp = nullptr;
3628 break;
3629 default:
3630 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3631 }
3632
3633 // The first *NumUsedElements* elements of ConvertOp are converted to the
3634 // same number of output elements. The rest of the output is copied from
3635 // CopyOp, or (if not available) filled with zeroes.
3636 // Combine shadow for elements of ConvertOp that are used in this operation,
3637 // and insert a check.
3638 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3639 // int->any conversion.
3640 Value *ConvertShadow = getShadow(ConvertOp);
3641 Value *AggShadow = nullptr;
3642 if (ConvertOp->getType()->isVectorTy()) {
3643 AggShadow = IRB.CreateExtractElement(
3644 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3645 for (int i = 1; i < NumUsedElements; ++i) {
3646 Value *MoreShadow = IRB.CreateExtractElement(
3647 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3648 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3649 }
3650 } else {
3651 AggShadow = ConvertShadow;
3652 }
3653 assert(AggShadow->getType()->isIntegerTy());
3654 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3655
3656 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3657 // ConvertOp.
3658 if (CopyOp) {
3659 assert(CopyOp->getType() == I.getType());
3660 assert(CopyOp->getType()->isVectorTy());
3661 Value *ResultShadow = getShadow(CopyOp);
3662 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3663 for (int i = 0; i < NumUsedElements; ++i) {
3664 ResultShadow = IRB.CreateInsertElement(
3665 ResultShadow, ConstantInt::getNullValue(EltTy),
3666 ConstantInt::get(IRB.getInt32Ty(), i));
3667 }
3668 setShadow(&I, ResultShadow);
3669 setOrigin(&I, getOrigin(CopyOp));
3670 } else {
3671 setShadow(&I, getCleanShadow(&I));
3672 setOrigin(&I, getCleanOrigin());
3673 }
3674 }
3675
3676 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3677 // zeroes if it is zero, and all ones otherwise.
3678 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3679 if (S->getType()->isVectorTy())
3680 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3681 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3682 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3683 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3684 }
3685
3686 // Given a vector, extract its first element, and return all
3687 // zeroes if it is zero, and all ones otherwise.
3688 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3689 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3690 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3691 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3692 }
3693
3694 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3695 Type *T = S->getType();
3696 assert(T->isVectorTy());
3697 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3698 return IRB.CreateSExt(S2, T);
3699 }
3700
3701 // Instrument vector shift intrinsic.
3702 //
3703 // This function instruments intrinsics like int_x86_avx2_psll_w.
3704 // Intrinsic shifts %In by %ShiftSize bits.
3705 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3706 // size, and the rest is ignored. Behavior is defined even if shift size is
3707 // greater than register (or field) width.
3708 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3709 assert(I.arg_size() == 2);
3710 IRBuilder<> IRB(&I);
3711 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3712 // Otherwise perform the same shift on S1.
3713 Value *S1 = getShadow(&I, 0);
3714 Value *S2 = getShadow(&I, 1);
3715 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3716 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3717 Value *V1 = I.getOperand(0);
3718 Value *V2 = I.getOperand(1);
3719 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3720 {IRB.CreateBitCast(S1, V1->getType()), V2});
3721 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3722 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3723 setOriginForNaryOp(I);
3724 }
3725
3726 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3727 // vectors.
3728 Type *getMMXVectorTy(unsigned EltSizeInBits,
3729 unsigned X86_MMXSizeInBits = 64) {
3730 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3731 "Illegal MMX vector element size");
3732 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3733 X86_MMXSizeInBits / EltSizeInBits);
3734 }
3735
3736 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3737 // intrinsic.
3738 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3739 switch (id) {
3740 case Intrinsic::x86_sse2_packsswb_128:
3741 case Intrinsic::x86_sse2_packuswb_128:
3742 return Intrinsic::x86_sse2_packsswb_128;
3743
3744 case Intrinsic::x86_sse2_packssdw_128:
3745 case Intrinsic::x86_sse41_packusdw:
3746 return Intrinsic::x86_sse2_packssdw_128;
3747
3748 case Intrinsic::x86_avx2_packsswb:
3749 case Intrinsic::x86_avx2_packuswb:
3750 return Intrinsic::x86_avx2_packsswb;
3751
3752 case Intrinsic::x86_avx2_packssdw:
3753 case Intrinsic::x86_avx2_packusdw:
3754 return Intrinsic::x86_avx2_packssdw;
3755
3756 case Intrinsic::x86_mmx_packsswb:
3757 case Intrinsic::x86_mmx_packuswb:
3758 return Intrinsic::x86_mmx_packsswb;
3759
3760 case Intrinsic::x86_mmx_packssdw:
3761 return Intrinsic::x86_mmx_packssdw;
3762
3763 case Intrinsic::x86_avx512_packssdw_512:
3764 case Intrinsic::x86_avx512_packusdw_512:
3765 return Intrinsic::x86_avx512_packssdw_512;
3766
3767 case Intrinsic::x86_avx512_packsswb_512:
3768 case Intrinsic::x86_avx512_packuswb_512:
3769 return Intrinsic::x86_avx512_packsswb_512;
3770
3771 default:
3772 llvm_unreachable("unexpected intrinsic id");
3773 }
3774 }
3775
3776 // Instrument vector pack intrinsic.
3777 //
3778 // This function instruments intrinsics like x86_mmx_packsswb, that
3779 // packs elements of 2 input vectors into half as many bits with saturation.
3780 // Shadow is propagated with the signed variant of the same intrinsic applied
3781 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3782 // MMXEltSizeInBits is used only for x86mmx arguments.
3783 //
3784 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3785 void handleVectorPackIntrinsic(IntrinsicInst &I,
3786 unsigned MMXEltSizeInBits = 0) {
3787 assert(I.arg_size() == 2);
3788 IRBuilder<> IRB(&I);
3789 Value *S1 = getShadow(&I, 0);
3790 Value *S2 = getShadow(&I, 1);
3791 assert(S1->getType()->isVectorTy());
3792
3793 // SExt and ICmpNE below must apply to individual elements of input vectors.
3794 // In case of x86mmx arguments, cast them to appropriate vector types and
3795 // back.
3796 Type *T =
3797 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3798 if (MMXEltSizeInBits) {
3799 S1 = IRB.CreateBitCast(S1, T);
3800 S2 = IRB.CreateBitCast(S2, T);
3801 }
3802 Value *S1_ext =
3804 Value *S2_ext =
3806 if (MMXEltSizeInBits) {
3807 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3808 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3809 }
3810
3811 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3812 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3813 "_msprop_vector_pack");
3814 if (MMXEltSizeInBits)
3815 S = IRB.CreateBitCast(S, getShadowTy(&I));
3816 setShadow(&I, S);
3817 setOriginForNaryOp(I);
3818 }
3819
3820 // Convert `Mask` into `<n x i1>`.
3821 Constant *createDppMask(unsigned Width, unsigned Mask) {
3822 SmallVector<Constant *, 4> R(Width);
3823 for (auto &M : R) {
3824 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3825 Mask >>= 1;
3826 }
3827 return ConstantVector::get(R);
3828 }
3829
3830 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3831 // arg is poisoned, entire dot product is poisoned.
3832 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3833 unsigned DstMask) {
3834 const unsigned Width =
3835 cast<FixedVectorType>(S->getType())->getNumElements();
3836
3837 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3839 Value *SElem = IRB.CreateOrReduce(S);
3840 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3841 Value *DstMaskV = createDppMask(Width, DstMask);
3842
3843 return IRB.CreateSelect(
3844 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3845 }
3846
3847 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3848 //
3849 // 2 and 4 element versions produce single scalar of dot product, and then
3850 // puts it into elements of output vector, selected by 4 lowest bits of the
3851 // mask. Top 4 bits of the mask control which elements of input to use for dot
3852 // product.
3853 //
3854 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3855 // mask. According to the spec it just operates as 4 element version on first
3856 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3857 // output.
3858 void handleDppIntrinsic(IntrinsicInst &I) {
3859 IRBuilder<> IRB(&I);
3860
3861 Value *S0 = getShadow(&I, 0);
3862 Value *S1 = getShadow(&I, 1);
3863 Value *S = IRB.CreateOr(S0, S1);
3864
3865 const unsigned Width =
3866 cast<FixedVectorType>(S->getType())->getNumElements();
3867 assert(Width == 2 || Width == 4 || Width == 8);
3868
3869 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3870 const unsigned SrcMask = Mask >> 4;
3871 const unsigned DstMask = Mask & 0xf;
3872
3873 // Calculate shadow as `<n x i1>`.
3874 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3875 if (Width == 8) {
3876 // First 4 elements of shadow are already calculated. `makeDppShadow`
3877 // operats on 32 bit masks, so we can just shift masks, and repeat.
3878 SI1 = IRB.CreateOr(
3879 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3880 }
3881 // Extend to real size of shadow, poisoning either all or none bits of an
3882 // element.
3883 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3884
3885 setShadow(&I, S);
3886 setOriginForNaryOp(I);
3887 }
3888
3889 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3890 C = CreateAppToShadowCast(IRB, C);
3891 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3892 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3893 C = IRB.CreateAShr(C, ElSize - 1);
3894 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3895 return IRB.CreateTrunc(C, FVT);
3896 }
3897
3898 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3899 void handleBlendvIntrinsic(IntrinsicInst &I) {
3900 Value *C = I.getOperand(2);
3901 Value *T = I.getOperand(1);
3902 Value *F = I.getOperand(0);
3903
3904 Value *Sc = getShadow(&I, 2);
3905 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3906
3907 {
3908 IRBuilder<> IRB(&I);
3909 // Extract top bit from condition and its shadow.
3910 C = convertBlendvToSelectMask(IRB, C);
3911 Sc = convertBlendvToSelectMask(IRB, Sc);
3912
3913 setShadow(C, Sc);
3914 setOrigin(C, Oc);
3915 }
3916
3917 handleSelectLikeInst(I, C, T, F);
3918 }
3919
3920 // Instrument sum-of-absolute-differences intrinsic.
3921 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3922 const unsigned SignificantBitsPerResultElement = 16;
3923 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3924 unsigned ZeroBitsPerResultElement =
3925 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3926
3927 IRBuilder<> IRB(&I);
3928 auto *Shadow0 = getShadow(&I, 0);
3929 auto *Shadow1 = getShadow(&I, 1);
3930 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3931 S = IRB.CreateBitCast(S, ResTy);
3932 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3933 ResTy);
3934 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3935 S = IRB.CreateBitCast(S, getShadowTy(&I));
3936 setShadow(&I, S);
3937 setOriginForNaryOp(I);
3938 }
3939
3940 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
3941 //
3942 // e.g., Two operands:
3943 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3944 //
3945 // Two operands which require an EltSizeInBits override:
3946 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3947 //
3948 // Three operands:
3949 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3950 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3951 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
3952 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
3953 // (these are equivalent to multiply-add on %a and %b, followed by
3954 // adding/"accumulating" %s. "Accumulation" stores the result in one
3955 // of the source registers, but this accumulate vs. add distinction
3956 // is lost when dealing with LLVM intrinsics.)
3957 //
3958 // ZeroPurifies means that multiplying a known-zero with an uninitialized
3959 // value results in an initialized value. This is applicable for integer
3960 // multiplication, but not floating-point (counter-example: NaN).
3961 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
3962 unsigned ReductionFactor,
3963 bool ZeroPurifies,
3964 unsigned EltSizeInBits = 0) {
3965 IRBuilder<> IRB(&I);
3966
3967 [[maybe_unused]] FixedVectorType *ReturnType =
3968 cast<FixedVectorType>(I.getType());
3969 assert(isa<FixedVectorType>(ReturnType));
3970
3971 // Vectors A and B, and shadows
3972 Value *Va = nullptr;
3973 Value *Vb = nullptr;
3974 Value *Sa = nullptr;
3975 Value *Sb = nullptr;
3976
3977 assert(I.arg_size() == 2 || I.arg_size() == 3);
3978 if (I.arg_size() == 2) {
3979 Va = I.getOperand(0);
3980 Vb = I.getOperand(1);
3981
3982 Sa = getShadow(&I, 0);
3983 Sb = getShadow(&I, 1);
3984 } else if (I.arg_size() == 3) {
3985 // Operand 0 is the accumulator. We will deal with that below.
3986 Va = I.getOperand(1);
3987 Vb = I.getOperand(2);
3988
3989 Sa = getShadow(&I, 1);
3990 Sb = getShadow(&I, 2);
3991 }
3992
3993 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3994 assert(ParamType == Vb->getType());
3995
3996 assert(ParamType->getPrimitiveSizeInBits() ==
3997 ReturnType->getPrimitiveSizeInBits());
3998
3999 if (I.arg_size() == 3) {
4000 [[maybe_unused]] auto *AccumulatorType =
4001 cast<FixedVectorType>(I.getOperand(0)->getType());
4002 assert(AccumulatorType == ReturnType);
4003 }
4004
4005 FixedVectorType *ImplicitReturnType =
4006 cast<FixedVectorType>(getShadowTy(ReturnType));
4007 // Step 1: instrument multiplication of corresponding vector elements
4008 if (EltSizeInBits) {
4009 ImplicitReturnType = cast<FixedVectorType>(
4010 getMMXVectorTy(EltSizeInBits * ReductionFactor,
4011 ParamType->getPrimitiveSizeInBits()));
4012 ParamType = cast<FixedVectorType>(
4013 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
4014
4015 Va = IRB.CreateBitCast(Va, ParamType);
4016 Vb = IRB.CreateBitCast(Vb, ParamType);
4017
4018 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
4019 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
4020 } else {
4021 assert(ParamType->getNumElements() ==
4022 ReturnType->getNumElements() * ReductionFactor);
4023 }
4024
4025 // Each element of the vector is represented by a single bit (poisoned or
4026 // not) e.g., <8 x i1>.
4027 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
4028 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
4029 Value *And;
4030 if (ZeroPurifies) {
4031 // Multiplying an *initialized* zero by an uninitialized element results
4032 // in an initialized zero element.
4033 //
4034 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4035 // results in an unpoisoned value.
4036 Value *VaInt = Va;
4037 Value *VbInt = Vb;
4038 if (!Va->getType()->isIntegerTy()) {
4039 VaInt = CreateAppToShadowCast(IRB, Va);
4040 VbInt = CreateAppToShadowCast(IRB, Vb);
4041 }
4042
4043 // We check for non-zero on a per-element basis, not per-bit.
4044 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4045 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4046
4047 And = handleBitwiseAnd(IRB, VaNonZero, VbNonZero, SaNonZero, SbNonZero);
4048 } else {
4049 And = IRB.CreateOr({SaNonZero, SbNonZero});
4050 }
4051
4052 // Extend <8 x i1> to <8 x i16>.
4053 // (The real pmadd intrinsic would have computed intermediate values of
4054 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4055 // consider each element to be either fully initialized or fully
4056 // uninitialized.)
4057 And = IRB.CreateSExt(And, Sa->getType());
4058
4059 // Step 2: instrument horizontal add
4060 // We don't need bit-precise horizontalReduce because we only want to check
4061 // if each pair/quad of elements is fully zero.
4062 // Cast to <4 x i32>.
4063 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4064
4065 // Compute <4 x i1>, then extend back to <4 x i32>.
4066 Value *OutShadow = IRB.CreateSExt(
4067 IRB.CreateICmpNE(Horizontal,
4068 Constant::getNullValue(Horizontal->getType())),
4069 ImplicitReturnType);
4070
4071 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4072 // AVX, it is already correct).
4073 if (EltSizeInBits)
4074 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4075
4076 // Step 3 (if applicable): instrument accumulator
4077 if (I.arg_size() == 3)
4078 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4079
4080 setShadow(&I, OutShadow);
4081 setOriginForNaryOp(I);
4082 }
4083
4084 // Instrument compare-packed intrinsic.
4085 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4086 // all-ones shadow.
4087 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
4088 IRBuilder<> IRB(&I);
4089 Type *ResTy = getShadowTy(&I);
4090 auto *Shadow0 = getShadow(&I, 0);
4091 auto *Shadow1 = getShadow(&I, 1);
4092 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4093 Value *S = IRB.CreateSExt(
4094 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4095 setShadow(&I, S);
4096 setOriginForNaryOp(I);
4097 }
4098
4099 // Instrument compare-scalar intrinsic.
4100 // This handles both cmp* intrinsics which return the result in the first
4101 // element of a vector, and comi* which return the result as i32.
4102 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4103 IRBuilder<> IRB(&I);
4104 auto *Shadow0 = getShadow(&I, 0);
4105 auto *Shadow1 = getShadow(&I, 1);
4106 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4107 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4108 setShadow(&I, S);
4109 setOriginForNaryOp(I);
4110 }
4111
4112 // Instrument generic vector reduction intrinsics
4113 // by ORing together all their fields.
4114 //
4115 // If AllowShadowCast is true, the return type does not need to be the same
4116 // type as the fields
4117 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4118 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4119 assert(I.arg_size() == 1);
4120
4121 IRBuilder<> IRB(&I);
4122 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4123 if (AllowShadowCast)
4124 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4125 else
4126 assert(S->getType() == getShadowTy(&I));
4127 setShadow(&I, S);
4128 setOriginForNaryOp(I);
4129 }
4130
4131 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4132 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4133 // %a1)
4134 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4135 //
4136 // The type of the return value, initial starting value, and elements of the
4137 // vector must be identical.
4138 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4139 assert(I.arg_size() == 2);
4140
4141 IRBuilder<> IRB(&I);
4142 Value *Shadow0 = getShadow(&I, 0);
4143 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4144 assert(Shadow0->getType() == Shadow1->getType());
4145 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4146 assert(S->getType() == getShadowTy(&I));
4147 setShadow(&I, S);
4148 setOriginForNaryOp(I);
4149 }
4150
4151 // Instrument vector.reduce.or intrinsic.
4152 // Valid (non-poisoned) set bits in the operand pull low the
4153 // corresponding shadow bits.
4154 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4155 assert(I.arg_size() == 1);
4156
4157 IRBuilder<> IRB(&I);
4158 Value *OperandShadow = getShadow(&I, 0);
4159 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4160 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4161 // Bit N is clean if any field's bit N is 1 and unpoison
4162 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4163 // Otherwise, it is clean if every field's bit N is unpoison
4164 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4165 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4166
4167 setShadow(&I, S);
4168 setOrigin(&I, getOrigin(&I, 0));
4169 }
4170
4171 // Instrument vector.reduce.and intrinsic.
4172 // Valid (non-poisoned) unset bits in the operand pull down the
4173 // corresponding shadow bits.
4174 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4175 assert(I.arg_size() == 1);
4176
4177 IRBuilder<> IRB(&I);
4178 Value *OperandShadow = getShadow(&I, 0);
4179 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4180 // Bit N is clean if any field's bit N is 0 and unpoison
4181 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4182 // Otherwise, it is clean if every field's bit N is unpoison
4183 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4184 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4185
4186 setShadow(&I, S);
4187 setOrigin(&I, getOrigin(&I, 0));
4188 }
4189
4190 void handleStmxcsr(IntrinsicInst &I) {
4191 IRBuilder<> IRB(&I);
4192 Value *Addr = I.getArgOperand(0);
4193 Type *Ty = IRB.getInt32Ty();
4194 Value *ShadowPtr =
4195 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4196
4197 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4198
4200 insertCheckShadowOf(Addr, &I);
4201 }
4202
4203 void handleLdmxcsr(IntrinsicInst &I) {
4204 if (!InsertChecks)
4205 return;
4206
4207 IRBuilder<> IRB(&I);
4208 Value *Addr = I.getArgOperand(0);
4209 Type *Ty = IRB.getInt32Ty();
4210 const Align Alignment = Align(1);
4211 Value *ShadowPtr, *OriginPtr;
4212 std::tie(ShadowPtr, OriginPtr) =
4213 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4214
4216 insertCheckShadowOf(Addr, &I);
4217
4218 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4219 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4220 : getCleanOrigin();
4221 insertCheckShadow(Shadow, Origin, &I);
4222 }
4223
4224 void handleMaskedExpandLoad(IntrinsicInst &I) {
4225 IRBuilder<> IRB(&I);
4226 Value *Ptr = I.getArgOperand(0);
4227 MaybeAlign Align = I.getParamAlign(0);
4228 Value *Mask = I.getArgOperand(1);
4229 Value *PassThru = I.getArgOperand(2);
4230
4232 insertCheckShadowOf(Ptr, &I);
4233 insertCheckShadowOf(Mask, &I);
4234 }
4235
4236 if (!PropagateShadow) {
4237 setShadow(&I, getCleanShadow(&I));
4238 setOrigin(&I, getCleanOrigin());
4239 return;
4240 }
4241
4242 Type *ShadowTy = getShadowTy(&I);
4243 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4244 auto [ShadowPtr, OriginPtr] =
4245 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4246
4247 Value *Shadow =
4248 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4249 getShadow(PassThru), "_msmaskedexpload");
4250
4251 setShadow(&I, Shadow);
4252
4253 // TODO: Store origins.
4254 setOrigin(&I, getCleanOrigin());
4255 }
4256
4257 void handleMaskedCompressStore(IntrinsicInst &I) {
4258 IRBuilder<> IRB(&I);
4259 Value *Values = I.getArgOperand(0);
4260 Value *Ptr = I.getArgOperand(1);
4261 MaybeAlign Align = I.getParamAlign(1);
4262 Value *Mask = I.getArgOperand(2);
4263
4265 insertCheckShadowOf(Ptr, &I);
4266 insertCheckShadowOf(Mask, &I);
4267 }
4268
4269 Value *Shadow = getShadow(Values);
4270 Type *ElementShadowTy =
4271 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4272 auto [ShadowPtr, OriginPtrs] =
4273 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4274
4275 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4276
4277 // TODO: Store origins.
4278 }
4279
4280 void handleMaskedGather(IntrinsicInst &I) {
4281 IRBuilder<> IRB(&I);
4282 Value *Ptrs = I.getArgOperand(0);
4283 const Align Alignment = I.getParamAlign(0).valueOrOne();
4284 Value *Mask = I.getArgOperand(1);
4285 Value *PassThru = I.getArgOperand(2);
4286
4287 Type *PtrsShadowTy = getShadowTy(Ptrs);
4289 insertCheckShadowOf(Mask, &I);
4290 Value *MaskedPtrShadow = IRB.CreateSelect(
4291 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4292 "_msmaskedptrs");
4293 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4294 }
4295
4296 if (!PropagateShadow) {
4297 setShadow(&I, getCleanShadow(&I));
4298 setOrigin(&I, getCleanOrigin());
4299 return;
4300 }
4301
4302 Type *ShadowTy = getShadowTy(&I);
4303 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4304 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4305 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4306
4307 Value *Shadow =
4308 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4309 getShadow(PassThru), "_msmaskedgather");
4310
4311 setShadow(&I, Shadow);
4312
4313 // TODO: Store origins.
4314 setOrigin(&I, getCleanOrigin());
4315 }
4316
4317 void handleMaskedScatter(IntrinsicInst &I) {
4318 IRBuilder<> IRB(&I);
4319 Value *Values = I.getArgOperand(0);
4320 Value *Ptrs = I.getArgOperand(1);
4321 const Align Alignment = I.getParamAlign(1).valueOrOne();
4322 Value *Mask = I.getArgOperand(2);
4323
4324 Type *PtrsShadowTy = getShadowTy(Ptrs);
4326 insertCheckShadowOf(Mask, &I);
4327 Value *MaskedPtrShadow = IRB.CreateSelect(
4328 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4329 "_msmaskedptrs");
4330 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4331 }
4332
4333 Value *Shadow = getShadow(Values);
4334 Type *ElementShadowTy =
4335 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4336 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4337 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4338
4339 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4340
4341 // TODO: Store origin.
4342 }
4343
4344 // Intrinsic::masked_store
4345 //
4346 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4347 // stores are lowered to Intrinsic::masked_store.
4348 void handleMaskedStore(IntrinsicInst &I) {
4349 IRBuilder<> IRB(&I);
4350 Value *V = I.getArgOperand(0);
4351 Value *Ptr = I.getArgOperand(1);
4352 const Align Alignment = I.getParamAlign(1).valueOrOne();
4353 Value *Mask = I.getArgOperand(2);
4354 Value *Shadow = getShadow(V);
4355
4357 insertCheckShadowOf(Ptr, &I);
4358 insertCheckShadowOf(Mask, &I);
4359 }
4360
4361 Value *ShadowPtr;
4362 Value *OriginPtr;
4363 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4364 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4365
4366 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4367
4368 if (!MS.TrackOrigins)
4369 return;
4370
4371 auto &DL = F.getDataLayout();
4372 paintOrigin(IRB, getOrigin(V), OriginPtr,
4373 DL.getTypeStoreSize(Shadow->getType()),
4374 std::max(Alignment, kMinOriginAlignment));
4375 }
4376
4377 // Intrinsic::masked_load
4378 //
4379 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4380 // loads are lowered to Intrinsic::masked_load.
4381 void handleMaskedLoad(IntrinsicInst &I) {
4382 IRBuilder<> IRB(&I);
4383 Value *Ptr = I.getArgOperand(0);
4384 const Align Alignment = I.getParamAlign(0).valueOrOne();
4385 Value *Mask = I.getArgOperand(1);
4386 Value *PassThru = I.getArgOperand(2);
4387
4389 insertCheckShadowOf(Ptr, &I);
4390 insertCheckShadowOf(Mask, &I);
4391 }
4392
4393 if (!PropagateShadow) {
4394 setShadow(&I, getCleanShadow(&I));
4395 setOrigin(&I, getCleanOrigin());
4396 return;
4397 }
4398
4399 Type *ShadowTy = getShadowTy(&I);
4400 Value *ShadowPtr, *OriginPtr;
4401 std::tie(ShadowPtr, OriginPtr) =
4402 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4403 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4404 getShadow(PassThru), "_msmaskedld"));
4405
4406 if (!MS.TrackOrigins)
4407 return;
4408
4409 // Choose between PassThru's and the loaded value's origins.
4410 Value *MaskedPassThruShadow = IRB.CreateAnd(
4411 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4412
4413 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4414
4415 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4416 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4417
4418 setOrigin(&I, Origin);
4419 }
4420
4421 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4422 // dst mask src
4423 //
4424 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4425 // by handleMaskedStore.
4426 //
4427 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4428 // vector of integers, unlike the LLVM masked intrinsics, which require a
4429 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4430 // mentions that the x86 backend does not know how to efficiently convert
4431 // from a vector of booleans back into the AVX mask format; therefore, they
4432 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4433 // intrinsics.
4434 void handleAVXMaskedStore(IntrinsicInst &I) {
4435 assert(I.arg_size() == 3);
4436
4437 IRBuilder<> IRB(&I);
4438
4439 Value *Dst = I.getArgOperand(0);
4440 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4441
4442 Value *Mask = I.getArgOperand(1);
4443 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4444
4445 Value *Src = I.getArgOperand(2);
4446 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4447
4448 const Align Alignment = Align(1);
4449
4450 Value *SrcShadow = getShadow(Src);
4451
4453 insertCheckShadowOf(Dst, &I);
4454 insertCheckShadowOf(Mask, &I);
4455 }
4456
4457 Value *DstShadowPtr;
4458 Value *DstOriginPtr;
4459 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4460 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4461
4462 SmallVector<Value *, 2> ShadowArgs;
4463 ShadowArgs.append(1, DstShadowPtr);
4464 ShadowArgs.append(1, Mask);
4465 // The intrinsic may require floating-point but shadows can be arbitrary
4466 // bit patterns, of which some would be interpreted as "invalid"
4467 // floating-point values (NaN etc.); we assume the intrinsic will happily
4468 // copy them.
4469 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4470
4471 CallInst *CI =
4472 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4473 setShadow(&I, CI);
4474
4475 if (!MS.TrackOrigins)
4476 return;
4477
4478 // Approximation only
4479 auto &DL = F.getDataLayout();
4480 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4481 DL.getTypeStoreSize(SrcShadow->getType()),
4482 std::max(Alignment, kMinOriginAlignment));
4483 }
4484
4485 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4486 // return src mask
4487 //
4488 // Masked-off values are replaced with 0, which conveniently also represents
4489 // initialized memory.
4490 //
4491 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4492 // by handleMaskedStore.
4493 //
4494 // We do not combine this with handleMaskedLoad; see comment in
4495 // handleAVXMaskedStore for the rationale.
4496 //
4497 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4498 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4499 // parameter.
4500 void handleAVXMaskedLoad(IntrinsicInst &I) {
4501 assert(I.arg_size() == 2);
4502
4503 IRBuilder<> IRB(&I);
4504
4505 Value *Src = I.getArgOperand(0);
4506 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4507
4508 Value *Mask = I.getArgOperand(1);
4509 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4510
4511 const Align Alignment = Align(1);
4512
4514 insertCheckShadowOf(Mask, &I);
4515 }
4516
4517 Type *SrcShadowTy = getShadowTy(Src);
4518 Value *SrcShadowPtr, *SrcOriginPtr;
4519 std::tie(SrcShadowPtr, SrcOriginPtr) =
4520 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4521
4522 SmallVector<Value *, 2> ShadowArgs;
4523 ShadowArgs.append(1, SrcShadowPtr);
4524 ShadowArgs.append(1, Mask);
4525
4526 CallInst *CI =
4527 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4528 // The AVX masked load intrinsics do not have integer variants. We use the
4529 // floating-point variants, which will happily copy the shadows even if
4530 // they are interpreted as "invalid" floating-point values (NaN etc.).
4531 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4532
4533 if (!MS.TrackOrigins)
4534 return;
4535
4536 // The "pass-through" value is always zero (initialized). To the extent
4537 // that that results in initialized aligned 4-byte chunks, the origin value
4538 // is ignored. It is therefore correct to simply copy the origin from src.
4539 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4540 setOrigin(&I, PtrSrcOrigin);
4541 }
4542
4543 // Test whether the mask indices are initialized, only checking the bits that
4544 // are actually used.
4545 //
4546 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4547 // used/checked.
4548 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4549 assert(isFixedIntVector(Idx));
4550 auto IdxVectorSize =
4551 cast<FixedVectorType>(Idx->getType())->getNumElements();
4552 assert(isPowerOf2_64(IdxVectorSize));
4553
4554 // Compiler isn't smart enough, let's help it
4555 if (isa<Constant>(Idx))
4556 return;
4557
4558 auto *IdxShadow = getShadow(Idx);
4559 Value *Truncated = IRB.CreateTrunc(
4560 IdxShadow,
4561 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4562 IdxVectorSize));
4563 insertCheckShadow(Truncated, getOrigin(Idx), I);
4564 }
4565
4566 // Instrument AVX permutation intrinsic.
4567 // We apply the same permutation (argument index 1) to the shadow.
4568 void handleAVXVpermilvar(IntrinsicInst &I) {
4569 IRBuilder<> IRB(&I);
4570 Value *Shadow = getShadow(&I, 0);
4571 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4572
4573 // Shadows are integer-ish types but some intrinsics require a
4574 // different (e.g., floating-point) type.
4575 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4576 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4577 {Shadow, I.getArgOperand(1)});
4578
4579 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4580 setOriginForNaryOp(I);
4581 }
4582
4583 // Instrument AVX permutation intrinsic.
4584 // We apply the same permutation (argument index 1) to the shadows.
4585 void handleAVXVpermi2var(IntrinsicInst &I) {
4586 assert(I.arg_size() == 3);
4587 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4588 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4589 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4590 [[maybe_unused]] auto ArgVectorSize =
4591 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4592 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4593 ->getNumElements() == ArgVectorSize);
4594 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4595 ->getNumElements() == ArgVectorSize);
4596 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4597 assert(I.getType() == I.getArgOperand(0)->getType());
4598 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4599 IRBuilder<> IRB(&I);
4600 Value *AShadow = getShadow(&I, 0);
4601 Value *Idx = I.getArgOperand(1);
4602 Value *BShadow = getShadow(&I, 2);
4603
4604 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4605
4606 // Shadows are integer-ish types but some intrinsics require a
4607 // different (e.g., floating-point) type.
4608 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4609 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4610 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4611 {AShadow, Idx, BShadow});
4612 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4613 setOriginForNaryOp(I);
4614 }
4615
4616 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4617 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4618 }
4619
4620 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4621 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4622 }
4623
4624 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4625 return isFixedIntVectorTy(V->getType());
4626 }
4627
4628 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4629 return isFixedFPVectorTy(V->getType());
4630 }
4631
4632 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4633 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4634 // i32 rounding)
4635 //
4636 // Inconveniently, some similar intrinsics have a different operand order:
4637 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4638 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4639 // i16 mask)
4640 //
4641 // If the return type has more elements than A, the excess elements are
4642 // zeroed (and the corresponding shadow is initialized).
4643 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4644 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4645 // i8 mask)
4646 //
4647 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4648 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4649 // where all_or_nothing(x) is fully uninitialized if x has any
4650 // uninitialized bits
4651 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4652 IRBuilder<> IRB(&I);
4653
4654 assert(I.arg_size() == 4);
4655 Value *A = I.getOperand(0);
4656 Value *WriteThrough;
4657 Value *Mask;
4659 if (LastMask) {
4660 WriteThrough = I.getOperand(2);
4661 Mask = I.getOperand(3);
4662 RoundingMode = I.getOperand(1);
4663 } else {
4664 WriteThrough = I.getOperand(1);
4665 Mask = I.getOperand(2);
4666 RoundingMode = I.getOperand(3);
4667 }
4668
4669 assert(isFixedFPVector(A));
4670 assert(isFixedIntVector(WriteThrough));
4671
4672 unsigned ANumElements =
4673 cast<FixedVectorType>(A->getType())->getNumElements();
4674 [[maybe_unused]] unsigned WriteThruNumElements =
4675 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4676 assert(ANumElements == WriteThruNumElements ||
4677 ANumElements * 2 == WriteThruNumElements);
4678
4679 assert(Mask->getType()->isIntegerTy());
4680 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4681 assert(ANumElements == MaskNumElements ||
4682 ANumElements * 2 == MaskNumElements);
4683
4684 assert(WriteThruNumElements == MaskNumElements);
4685
4686 // Some bits of the mask may be unused, though it's unusual to have partly
4687 // uninitialized bits.
4688 insertCheckShadowOf(Mask, &I);
4689
4690 assert(RoundingMode->getType()->isIntegerTy());
4691 // Only some bits of the rounding mode are used, though it's very
4692 // unusual to have uninitialized bits there (more commonly, it's a
4693 // constant).
4694 insertCheckShadowOf(RoundingMode, &I);
4695
4696 assert(I.getType() == WriteThrough->getType());
4697
4698 Value *AShadow = getShadow(A);
4699 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4700
4701 if (ANumElements * 2 == MaskNumElements) {
4702 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4703 // from the zeroed shadow instead of the writethrough's shadow.
4704 Mask =
4705 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4706 Mask =
4707 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4708 }
4709
4710 // Convert i16 mask to <16 x i1>
4711 Mask = IRB.CreateBitCast(
4712 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4713 "_ms_mask_bitcast");
4714
4715 /// For floating-point to integer conversion, the output is:
4716 /// - fully uninitialized if *any* bit of the input is uninitialized
4717 /// - fully ininitialized if all bits of the input are ininitialized
4718 /// We apply the same principle on a per-element basis for vectors.
4719 ///
4720 /// We use the scalar width of the return type instead of A's.
4721 AShadow = IRB.CreateSExt(
4722 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4723 getShadowTy(&I), "_ms_a_shadow");
4724
4725 Value *WriteThroughShadow = getShadow(WriteThrough);
4726 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4727 "_ms_writethru_select");
4728
4729 setShadow(&I, Shadow);
4730 setOriginForNaryOp(I);
4731 }
4732
4733 // Instrument BMI / BMI2 intrinsics.
4734 // All of these intrinsics are Z = I(X, Y)
4735 // where the types of all operands and the result match, and are either i32 or
4736 // i64. The following instrumentation happens to work for all of them:
4737 // Sz = I(Sx, Y) | (sext (Sy != 0))
4738 void handleBmiIntrinsic(IntrinsicInst &I) {
4739 IRBuilder<> IRB(&I);
4740 Type *ShadowTy = getShadowTy(&I);
4741
4742 // If any bit of the mask operand is poisoned, then the whole thing is.
4743 Value *SMask = getShadow(&I, 1);
4744 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4745 ShadowTy);
4746 // Apply the same intrinsic to the shadow of the first operand.
4747 Value *S = IRB.CreateCall(I.getCalledFunction(),
4748 {getShadow(&I, 0), I.getOperand(1)});
4749 S = IRB.CreateOr(SMask, S);
4750 setShadow(&I, S);
4751 setOriginForNaryOp(I);
4752 }
4753
4754 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4755 SmallVector<int, 8> Mask;
4756 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4757 Mask.append(2, X);
4758 }
4759 return Mask;
4760 }
4761
4762 // Instrument pclmul intrinsics.
4763 // These intrinsics operate either on odd or on even elements of the input
4764 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4765 // Replace the unused elements with copies of the used ones, ex:
4766 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4767 // or
4768 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4769 // and then apply the usual shadow combining logic.
4770 void handlePclmulIntrinsic(IntrinsicInst &I) {
4771 IRBuilder<> IRB(&I);
4772 unsigned Width =
4773 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4774 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4775 "pclmul 3rd operand must be a constant");
4776 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4777 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4778 getPclmulMask(Width, Imm & 0x01));
4779 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4780 getPclmulMask(Width, Imm & 0x10));
4781 ShadowAndOriginCombiner SOC(this, IRB);
4782 SOC.Add(Shuf0, getOrigin(&I, 0));
4783 SOC.Add(Shuf1, getOrigin(&I, 1));
4784 SOC.Done(&I);
4785 }
4786
4787 // Instrument _mm_*_sd|ss intrinsics
4788 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4789 IRBuilder<> IRB(&I);
4790 unsigned Width =
4791 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4792 Value *First = getShadow(&I, 0);
4793 Value *Second = getShadow(&I, 1);
4794 // First element of second operand, remaining elements of first operand
4795 SmallVector<int, 16> Mask;
4796 Mask.push_back(Width);
4797 for (unsigned i = 1; i < Width; i++)
4798 Mask.push_back(i);
4799 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4800
4801 setShadow(&I, Shadow);
4802 setOriginForNaryOp(I);
4803 }
4804
4805 void handleVtestIntrinsic(IntrinsicInst &I) {
4806 IRBuilder<> IRB(&I);
4807 Value *Shadow0 = getShadow(&I, 0);
4808 Value *Shadow1 = getShadow(&I, 1);
4809 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4810 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4811 Value *Scalar = convertShadowToScalar(NZ, IRB);
4812 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4813
4814 setShadow(&I, Shadow);
4815 setOriginForNaryOp(I);
4816 }
4817
4818 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4819 IRBuilder<> IRB(&I);
4820 unsigned Width =
4821 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4822 Value *First = getShadow(&I, 0);
4823 Value *Second = getShadow(&I, 1);
4824 Value *OrShadow = IRB.CreateOr(First, Second);
4825 // First element of both OR'd together, remaining elements of first operand
4826 SmallVector<int, 16> Mask;
4827 Mask.push_back(Width);
4828 for (unsigned i = 1; i < Width; i++)
4829 Mask.push_back(i);
4830 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4831
4832 setShadow(&I, Shadow);
4833 setOriginForNaryOp(I);
4834 }
4835
4836 // _mm_round_ps / _mm_round_ps.
4837 // Similar to maybeHandleSimpleNomemIntrinsic except
4838 // the second argument is guaranteed to be a constant integer.
4839 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4840 assert(I.getArgOperand(0)->getType() == I.getType());
4841 assert(I.arg_size() == 2);
4842 assert(isa<ConstantInt>(I.getArgOperand(1)));
4843
4844 IRBuilder<> IRB(&I);
4845 ShadowAndOriginCombiner SC(this, IRB);
4846 SC.Add(I.getArgOperand(0));
4847 SC.Done(&I);
4848 }
4849
4850 // Instrument @llvm.abs intrinsic.
4851 //
4852 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4853 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4854 void handleAbsIntrinsic(IntrinsicInst &I) {
4855 assert(I.arg_size() == 2);
4856 Value *Src = I.getArgOperand(0);
4857 Value *IsIntMinPoison = I.getArgOperand(1);
4858
4859 assert(I.getType()->isIntOrIntVectorTy());
4860
4861 assert(Src->getType() == I.getType());
4862
4863 assert(IsIntMinPoison->getType()->isIntegerTy());
4864 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4865
4866 IRBuilder<> IRB(&I);
4867 Value *SrcShadow = getShadow(Src);
4868
4869 APInt MinVal =
4870 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4871 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4872 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4873
4874 Value *PoisonedShadow = getPoisonedShadow(Src);
4875 Value *PoisonedIfIntMinShadow =
4876 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4877 Value *Shadow =
4878 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4879
4880 setShadow(&I, Shadow);
4881 setOrigin(&I, getOrigin(&I, 0));
4882 }
4883
4884 void handleIsFpClass(IntrinsicInst &I) {
4885 IRBuilder<> IRB(&I);
4886 Value *Shadow = getShadow(&I, 0);
4887 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4888 setOrigin(&I, getOrigin(&I, 0));
4889 }
4890
4891 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4892 IRBuilder<> IRB(&I);
4893 Value *Shadow0 = getShadow(&I, 0);
4894 Value *Shadow1 = getShadow(&I, 1);
4895 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4896 Value *ShadowElt1 =
4897 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4898
4899 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4900 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4901 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4902
4903 setShadow(&I, Shadow);
4904 setOriginForNaryOp(I);
4905 }
4906
4907 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4908 assert(isa<FixedVectorType>(V->getType()));
4909 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4910 Value *Shadow = getShadow(V);
4911 return IRB.CreateExtractElement(Shadow,
4912 ConstantInt::get(IRB.getInt32Ty(), 0));
4913 }
4914
4915 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4916 //
4917 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4918 // (<8 x i64>, <16 x i8>, i8)
4919 // A WriteThru Mask
4920 //
4921 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4922 // (<16 x i32>, <16 x i8>, i16)
4923 //
4924 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4925 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4926 //
4927 // If Dst has more elements than A, the excess elements are zeroed (and the
4928 // corresponding shadow is initialized).
4929 //
4930 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4931 // and is much faster than this handler.
4932 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4933 IRBuilder<> IRB(&I);
4934
4935 assert(I.arg_size() == 3);
4936 Value *A = I.getOperand(0);
4937 Value *WriteThrough = I.getOperand(1);
4938 Value *Mask = I.getOperand(2);
4939
4940 assert(isFixedIntVector(A));
4941 assert(isFixedIntVector(WriteThrough));
4942
4943 unsigned ANumElements =
4944 cast<FixedVectorType>(A->getType())->getNumElements();
4945 unsigned OutputNumElements =
4946 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4947 assert(ANumElements == OutputNumElements ||
4948 ANumElements * 2 == OutputNumElements);
4949
4950 assert(Mask->getType()->isIntegerTy());
4951 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4952 insertCheckShadowOf(Mask, &I);
4953
4954 assert(I.getType() == WriteThrough->getType());
4955
4956 // Widen the mask, if necessary, to have one bit per element of the output
4957 // vector.
4958 // We want the extra bits to have '1's, so that the CreateSelect will
4959 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4960 // versions of the intrinsics are sometimes implemented using an all-1's
4961 // mask and an undefined value for WriteThroughShadow). We accomplish this
4962 // by using bitwise NOT before and after the ZExt.
4963 if (ANumElements != OutputNumElements) {
4964 Mask = IRB.CreateNot(Mask);
4965 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4966 "_ms_widen_mask");
4967 Mask = IRB.CreateNot(Mask);
4968 }
4969 Mask = IRB.CreateBitCast(
4970 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4971
4972 Value *AShadow = getShadow(A);
4973
4974 // The return type might have more elements than the input.
4975 // Temporarily shrink the return type's number of elements.
4976 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4977
4978 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4979 // This handler treats them all as truncation, which leads to some rare
4980 // false positives in the cases where the truncated bytes could
4981 // unambiguously saturate the value e.g., if A = ??????10 ????????
4982 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4983 // fully defined, but the truncated byte is ????????.
4984 //
4985 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4986 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4987 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4988
4989 Value *WriteThroughShadow = getShadow(WriteThrough);
4990
4991 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4992 setShadow(&I, Shadow);
4993 setOriginForNaryOp(I);
4994 }
4995
4996 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4997 // values and perform an operation whose shadow propagation should be handled
4998 // as all-or-nothing [*], with masking provided by a vector and a mask
4999 // supplied as an integer.
5000 //
5001 // [*] if all bits of a vector element are initialized, the output is fully
5002 // initialized; otherwise, the output is fully uninitialized
5003 //
5004 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5005 // (<16 x float>, <16 x float>, i16)
5006 // A WriteThru Mask
5007 //
5008 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5009 // (<2 x double>, <2 x double>, i8)
5010 //
5011 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5012 // (<8 x double>, i32, <8 x double>, i8, i32)
5013 // A Imm WriteThru Mask Rounding
5014 //
5015 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
5016 // be fully initialized.
5017 //
5018 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
5019 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
5020 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
5021 unsigned WriteThruIndex,
5022 unsigned MaskIndex) {
5023 IRBuilder<> IRB(&I);
5024
5025 unsigned NumArgs = I.arg_size();
5026 assert(AIndex < NumArgs);
5027 assert(WriteThruIndex < NumArgs);
5028 assert(MaskIndex < NumArgs);
5029 assert(AIndex != WriteThruIndex);
5030 assert(AIndex != MaskIndex);
5031 assert(WriteThruIndex != MaskIndex);
5032
5033 Value *A = I.getOperand(AIndex);
5034 Value *WriteThru = I.getOperand(WriteThruIndex);
5035 Value *Mask = I.getOperand(MaskIndex);
5036
5037 assert(isFixedFPVector(A));
5038 assert(isFixedFPVector(WriteThru));
5039
5040 [[maybe_unused]] unsigned ANumElements =
5041 cast<FixedVectorType>(A->getType())->getNumElements();
5042 unsigned OutputNumElements =
5043 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5044 assert(ANumElements == OutputNumElements);
5045
5046 for (unsigned i = 0; i < NumArgs; ++i) {
5047 if (i != AIndex && i != WriteThruIndex) {
5048 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5049 // they be fully initialized.
5050 assert(I.getOperand(i)->getType()->isIntegerTy());
5051 insertCheckShadowOf(I.getOperand(i), &I);
5052 }
5053 }
5054
5055 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5056 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
5057 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
5058 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5059
5060 assert(I.getType() == WriteThru->getType());
5061
5062 Mask = IRB.CreateBitCast(
5063 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5064
5065 Value *AShadow = getShadow(A);
5066
5067 // All-or-nothing shadow
5068 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
5069 AShadow->getType());
5070
5071 Value *WriteThruShadow = getShadow(WriteThru);
5072
5073 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
5074 setShadow(&I, Shadow);
5075
5076 setOriginForNaryOp(I);
5077 }
5078
5079 // For sh.* compiler intrinsics:
5080 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5081 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5082 // A B WriteThru Mask RoundingMode
5083 //
5084 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5085 // DstShadow[1..7] = AShadow[1..7]
5086 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5087 IRBuilder<> IRB(&I);
5088
5089 assert(I.arg_size() == 5);
5090 Value *A = I.getOperand(0);
5091 Value *B = I.getOperand(1);
5092 Value *WriteThrough = I.getOperand(2);
5093 Value *Mask = I.getOperand(3);
5094 Value *RoundingMode = I.getOperand(4);
5095
5096 // Technically, we could probably just check whether the LSB is
5097 // initialized, but intuitively it feels like a partly uninitialized mask
5098 // is unintended, and we should warn the user immediately.
5099 insertCheckShadowOf(Mask, &I);
5100 insertCheckShadowOf(RoundingMode, &I);
5101
5102 assert(isa<FixedVectorType>(A->getType()));
5103 unsigned NumElements =
5104 cast<FixedVectorType>(A->getType())->getNumElements();
5105 assert(NumElements == 8);
5106 assert(A->getType() == B->getType());
5107 assert(B->getType() == WriteThrough->getType());
5108 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5109 assert(RoundingMode->getType()->isIntegerTy());
5110
5111 Value *ALowerShadow = extractLowerShadow(IRB, A);
5112 Value *BLowerShadow = extractLowerShadow(IRB, B);
5113
5114 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5115
5116 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5117
5118 Mask = IRB.CreateBitCast(
5119 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5120 Value *MaskLower =
5121 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5122
5123 Value *AShadow = getShadow(A);
5124 Value *DstLowerShadow =
5125 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5126 Value *DstShadow = IRB.CreateInsertElement(
5127 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5128 "_msprop");
5129
5130 setShadow(&I, DstShadow);
5131 setOriginForNaryOp(I);
5132 }
5133
5134 // Approximately handle AVX Galois Field Affine Transformation
5135 //
5136 // e.g.,
5137 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5138 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5139 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5140 // Out A x b
5141 // where A and x are packed matrices, b is a vector,
5142 // Out = A * x + b in GF(2)
5143 //
5144 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5145 // computation also includes a parity calculation.
5146 //
5147 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5148 // Out_Shadow = (V1_Shadow & V2_Shadow)
5149 // | (V1 & V2_Shadow)
5150 // | (V1_Shadow & V2 )
5151 //
5152 // We approximate the shadow of gf2p8affineqb using:
5153 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5154 // | gf2p8affineqb(x, A_shadow, 0)
5155 // | gf2p8affineqb(x_Shadow, A, 0)
5156 // | set1_epi8(b_Shadow)
5157 //
5158 // This approximation has false negatives: if an intermediate dot-product
5159 // contains an even number of 1's, the parity is 0.
5160 // It has no false positives.
5161 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5162 IRBuilder<> IRB(&I);
5163
5164 assert(I.arg_size() == 3);
5165 Value *A = I.getOperand(0);
5166 Value *X = I.getOperand(1);
5167 Value *B = I.getOperand(2);
5168
5169 assert(isFixedIntVector(A));
5170 assert(cast<VectorType>(A->getType())
5171 ->getElementType()
5172 ->getScalarSizeInBits() == 8);
5173
5174 assert(A->getType() == X->getType());
5175
5176 assert(B->getType()->isIntegerTy());
5177 assert(B->getType()->getScalarSizeInBits() == 8);
5178
5179 assert(I.getType() == A->getType());
5180
5181 Value *AShadow = getShadow(A);
5182 Value *XShadow = getShadow(X);
5183 Value *BZeroShadow = getCleanShadow(B);
5184
5185 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5186 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5187 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5188 {X, AShadow, BZeroShadow});
5189 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5190 {XShadow, A, BZeroShadow});
5191
5192 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5193 Value *BShadow = getShadow(B);
5194 Value *BBroadcastShadow = getCleanShadow(AShadow);
5195 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5196 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5197 // lower appropriately (e.g., VPBROADCASTB).
5198 // Besides, b is often a constant, in which case it is fully initialized.
5199 for (unsigned i = 0; i < NumElements; i++)
5200 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5201
5202 setShadow(&I, IRB.CreateOr(
5203 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5204 setOriginForNaryOp(I);
5205 }
5206
5207 // Handle Arm NEON vector load intrinsics (vld*).
5208 //
5209 // The WithLane instructions (ld[234]lane) are similar to:
5210 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5211 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5212 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5213 // %A)
5214 //
5215 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5216 // to:
5217 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5218 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5219 unsigned int numArgs = I.arg_size();
5220
5221 // Return type is a struct of vectors of integers or floating-point
5222 assert(I.getType()->isStructTy());
5223 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5224 assert(RetTy->getNumElements() > 0);
5226 RetTy->getElementType(0)->isFPOrFPVectorTy());
5227 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5228 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5229
5230 if (WithLane) {
5231 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5232 assert(4 <= numArgs && numArgs <= 6);
5233
5234 // Return type is a struct of the input vectors
5235 assert(RetTy->getNumElements() + 2 == numArgs);
5236 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5237 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5238 } else {
5239 assert(numArgs == 1);
5240 }
5241
5242 IRBuilder<> IRB(&I);
5243
5244 SmallVector<Value *, 6> ShadowArgs;
5245 if (WithLane) {
5246 for (unsigned int i = 0; i < numArgs - 2; i++)
5247 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5248
5249 // Lane number, passed verbatim
5250 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5251 ShadowArgs.push_back(LaneNumber);
5252
5253 // TODO: blend shadow of lane number into output shadow?
5254 insertCheckShadowOf(LaneNumber, &I);
5255 }
5256
5257 Value *Src = I.getArgOperand(numArgs - 1);
5258 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5259
5260 Type *SrcShadowTy = getShadowTy(Src);
5261 auto [SrcShadowPtr, SrcOriginPtr] =
5262 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5263 ShadowArgs.push_back(SrcShadowPtr);
5264
5265 // The NEON vector load instructions handled by this function all have
5266 // integer variants. It is easier to use those rather than trying to cast
5267 // a struct of vectors of floats into a struct of vectors of integers.
5268 CallInst *CI =
5269 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5270 setShadow(&I, CI);
5271
5272 if (!MS.TrackOrigins)
5273 return;
5274
5275 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5276 setOrigin(&I, PtrSrcOrigin);
5277 }
5278
5279 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5280 /// and vst{2,3,4}lane).
5281 ///
5282 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5283 /// last argument, with the initial arguments being the inputs (and lane
5284 /// number for vst{2,3,4}lane). They return void.
5285 ///
5286 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5287 /// abcdabcdabcdabcd... into *outP
5288 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5289 /// writes aaaa...bbbb...cccc...dddd... into *outP
5290 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5291 /// These instructions can all be instrumented with essentially the same
5292 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5293 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5294 IRBuilder<> IRB(&I);
5295
5296 // Don't use getNumOperands() because it includes the callee
5297 int numArgOperands = I.arg_size();
5298
5299 // The last arg operand is the output (pointer)
5300 assert(numArgOperands >= 1);
5301 Value *Addr = I.getArgOperand(numArgOperands - 1);
5302 assert(Addr->getType()->isPointerTy());
5303 int skipTrailingOperands = 1;
5304
5306 insertCheckShadowOf(Addr, &I);
5307
5308 // Second-last operand is the lane number (for vst{2,3,4}lane)
5309 if (useLane) {
5310 skipTrailingOperands++;
5311 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5313 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5314 }
5315
5316 SmallVector<Value *, 8> ShadowArgs;
5317 // All the initial operands are the inputs
5318 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5319 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5320 Value *Shadow = getShadow(&I, i);
5321 ShadowArgs.append(1, Shadow);
5322 }
5323
5324 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5325 // e.g., for:
5326 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5327 // we know the type of the output (and its shadow) is <16 x i8>.
5328 //
5329 // Arm NEON VST is unusual because the last argument is the output address:
5330 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5331 // call void @llvm.aarch64.neon.st2.v16i8.p0
5332 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5333 // and we have no type information about P's operand. We must manually
5334 // compute the type (<16 x i8> x 2).
5335 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5336 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5337 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5338 (numArgOperands - skipTrailingOperands));
5339 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5340
5341 if (useLane)
5342 ShadowArgs.append(1,
5343 I.getArgOperand(numArgOperands - skipTrailingOperands));
5344
5345 Value *OutputShadowPtr, *OutputOriginPtr;
5346 // AArch64 NEON does not need alignment (unless OS requires it)
5347 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5348 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5349 ShadowArgs.append(1, OutputShadowPtr);
5350
5351 CallInst *CI =
5352 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5353 setShadow(&I, CI);
5354
5355 if (MS.TrackOrigins) {
5356 // TODO: if we modelled the vst* instruction more precisely, we could
5357 // more accurately track the origins (e.g., if both inputs are
5358 // uninitialized for vst2, we currently blame the second input, even
5359 // though part of the output depends only on the first input).
5360 //
5361 // This is particularly imprecise for vst{2,3,4}lane, since only one
5362 // lane of each input is actually copied to the output.
5363 OriginCombiner OC(this, IRB);
5364 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5365 OC.Add(I.getArgOperand(i));
5366
5367 const DataLayout &DL = F.getDataLayout();
5368 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5369 OutputOriginPtr);
5370 }
5371 }
5372
5373 // <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8
5374 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5375 // <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5376 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5377 // <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8
5378 // (<4 x i32> R%, <16 x i8> %X, <16 x i8> %Y)
5379 //
5380 // Note:
5381 // - < 4 x *> is a 2x2 matrix
5382 // - <16 x *> is a 2x8 matrix and 8x2 matrix respectively
5383 //
5384 // The general shadow propagation approach is:
5385 // 1) get the shadows of the input matrices %X and %Y
5386 // 2) change the shadow values to 0x1 if the corresponding value is fully
5387 // initialized, and 0x0 otherwise
5388 // 3) perform a matrix multiplication on the shadows of %X and %Y. The output
5389 // will be a 2x2 matrix; for each element, a value of 0x8 means all the
5390 // corresponding inputs were clean.
5391 // 4) blend in the shadow of %R
5392 //
5393 // TODO: consider allowing multiplication of zero with an uninitialized value
5394 // to result in an initialized value.
5395 //
5396 // TODO: handle floating-point matrix multiply using ummla on the shadows:
5397 // case Intrinsic::aarch64_neon_bfmmla:
5398 // handleNEONMatrixMultiply(I, /*ARows=*/ 2, /*ACols=*/ 4,
5399 // /*BRows=*/ 4, /*BCols=*/ 2);
5400 //
5401 void handleNEONMatrixMultiply(IntrinsicInst &I, unsigned int ARows,
5402 unsigned int ACols, unsigned int BRows,
5403 unsigned int BCols) {
5404 IRBuilder<> IRB(&I);
5405
5406 assert(I.arg_size() == 3);
5407 Value *R = I.getArgOperand(0);
5408 Value *A = I.getArgOperand(1);
5409 Value *B = I.getArgOperand(2);
5410
5411 assert(I.getType() == R->getType());
5412
5413 assert(isa<FixedVectorType>(R->getType()));
5414 assert(isa<FixedVectorType>(A->getType()));
5415 assert(isa<FixedVectorType>(B->getType()));
5416
5417 [[maybe_unused]] FixedVectorType *RTy = cast<FixedVectorType>(R->getType());
5418 [[maybe_unused]] FixedVectorType *ATy = cast<FixedVectorType>(A->getType());
5419 [[maybe_unused]] FixedVectorType *BTy = cast<FixedVectorType>(B->getType());
5420
5421 assert(ACols == BRows);
5422 assert(ATy->getNumElements() == ARows * ACols);
5423 assert(BTy->getNumElements() == BRows * BCols);
5424 assert(RTy->getNumElements() == ARows * BCols);
5425
5426 LLVM_DEBUG(dbgs() << "### R: " << *RTy->getElementType() << "\n");
5427 LLVM_DEBUG(dbgs() << "### A: " << *ATy->getElementType() << "\n");
5428 if (RTy->getElementType()->isIntegerTy()) {
5429 // Types are not identical e.g., <4 x i32> %R, <16 x i8> %A
5431 } else {
5434 }
5435 assert(ATy->getElementType() == BTy->getElementType());
5436
5437 Value *ShadowR = getShadow(&I, 0);
5438 Value *ShadowA = getShadow(&I, 1);
5439 Value *ShadowB = getShadow(&I, 2);
5440
5441 // If the value is fully initialized, the shadow will be 000...001.
5442 // Otherwise, the shadow will be all zero.
5443 // (This is the opposite of how we typically handle shadows.)
5444 ShadowA = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowA, getCleanShadow(A)),
5445 ShadowA->getType());
5446 ShadowB = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowB, getCleanShadow(B)),
5447 ShadowB->getType());
5448
5449 Value *ShadowAB = IRB.CreateIntrinsic(
5450 I.getType(), I.getIntrinsicID(), {getCleanShadow(R), ShadowA, ShadowB});
5451
5452 Value *FullyInit = ConstantVector::getSplat(
5453 RTy->getElementCount(),
5454 ConstantInt::get(cast<VectorType>(getShadowTy(R))->getElementType(),
5455 ACols));
5456
5457 ShadowAB = IRB.CreateSExt(IRB.CreateICmpNE(ShadowAB, FullyInit),
5458 ShadowAB->getType());
5459
5460 ShadowR = IRB.CreateSExt(IRB.CreateICmpNE(ShadowR, getCleanShadow(R)),
5461 ShadowR->getType());
5462
5463 setShadow(&I, IRB.CreateOr(ShadowAB, ShadowR));
5464 setOriginForNaryOp(I);
5465 }
5466
5467 /// Handle intrinsics by applying the intrinsic to the shadows.
5468 ///
5469 /// The trailing arguments are passed verbatim to the intrinsic, though any
5470 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5471 /// intrinsic with one trailing verbatim argument:
5472 /// out = intrinsic(var1, var2, opType)
5473 /// we compute:
5474 /// shadow[out] =
5475 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5476 ///
5477 /// Typically, shadowIntrinsicID will be specified by the caller to be
5478 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5479 /// intrinsic of the same type.
5480 ///
5481 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5482 /// bit-patterns (for example, if the intrinsic accepts floats for
5483 /// var1, we require that it doesn't care if inputs are NaNs).
5484 ///
5485 /// For example, this can be applied to the Arm NEON vector table intrinsics
5486 /// (tbl{1,2,3,4}).
5487 ///
5488 /// The origin is approximated using setOriginForNaryOp.
5489 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5490 Intrinsic::ID shadowIntrinsicID,
5491 unsigned int trailingVerbatimArgs) {
5492 IRBuilder<> IRB(&I);
5493
5494 assert(trailingVerbatimArgs < I.arg_size());
5495
5496 SmallVector<Value *, 8> ShadowArgs;
5497 // Don't use getNumOperands() because it includes the callee
5498 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5499 Value *Shadow = getShadow(&I, i);
5500
5501 // Shadows are integer-ish types but some intrinsics require a
5502 // different (e.g., floating-point) type.
5503 ShadowArgs.push_back(
5504 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5505 }
5506
5507 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5508 i++) {
5509 Value *Arg = I.getArgOperand(i);
5510 ShadowArgs.push_back(Arg);
5511 }
5512
5513 CallInst *CI =
5514 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5515 Value *CombinedShadow = CI;
5516
5517 // Combine the computed shadow with the shadow of trailing args
5518 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5519 i++) {
5520 Value *Shadow =
5521 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5522 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5523 }
5524
5525 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5526
5527 setOriginForNaryOp(I);
5528 }
5529
5530 // Approximation only
5531 //
5532 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5533 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5534 assert(I.arg_size() == 2);
5535
5536 handleShadowOr(I);
5537 }
5538
5539 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5540 switch (I.getIntrinsicID()) {
5541 case Intrinsic::uadd_with_overflow:
5542 case Intrinsic::sadd_with_overflow:
5543 case Intrinsic::usub_with_overflow:
5544 case Intrinsic::ssub_with_overflow:
5545 case Intrinsic::umul_with_overflow:
5546 case Intrinsic::smul_with_overflow:
5547 handleArithmeticWithOverflow(I);
5548 break;
5549 case Intrinsic::abs:
5550 handleAbsIntrinsic(I);
5551 break;
5552 case Intrinsic::bitreverse:
5553 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5554 /*trailingVerbatimArgs*/ 0);
5555 break;
5556 case Intrinsic::is_fpclass:
5557 handleIsFpClass(I);
5558 break;
5559 case Intrinsic::lifetime_start:
5560 handleLifetimeStart(I);
5561 break;
5562 case Intrinsic::launder_invariant_group:
5563 case Intrinsic::strip_invariant_group:
5564 handleInvariantGroup(I);
5565 break;
5566 case Intrinsic::bswap:
5567 handleBswap(I);
5568 break;
5569 case Intrinsic::ctlz:
5570 case Intrinsic::cttz:
5571 handleCountLeadingTrailingZeros(I);
5572 break;
5573 case Intrinsic::masked_compressstore:
5574 handleMaskedCompressStore(I);
5575 break;
5576 case Intrinsic::masked_expandload:
5577 handleMaskedExpandLoad(I);
5578 break;
5579 case Intrinsic::masked_gather:
5580 handleMaskedGather(I);
5581 break;
5582 case Intrinsic::masked_scatter:
5583 handleMaskedScatter(I);
5584 break;
5585 case Intrinsic::masked_store:
5586 handleMaskedStore(I);
5587 break;
5588 case Intrinsic::masked_load:
5589 handleMaskedLoad(I);
5590 break;
5591 case Intrinsic::vector_reduce_and:
5592 handleVectorReduceAndIntrinsic(I);
5593 break;
5594 case Intrinsic::vector_reduce_or:
5595 handleVectorReduceOrIntrinsic(I);
5596 break;
5597
5598 case Intrinsic::vector_reduce_add:
5599 case Intrinsic::vector_reduce_xor:
5600 case Intrinsic::vector_reduce_mul:
5601 // Signed/Unsigned Min/Max
5602 // TODO: handling similarly to AND/OR may be more precise.
5603 case Intrinsic::vector_reduce_smax:
5604 case Intrinsic::vector_reduce_smin:
5605 case Intrinsic::vector_reduce_umax:
5606 case Intrinsic::vector_reduce_umin:
5607 // TODO: this has no false positives, but arguably we should check that all
5608 // the bits are initialized.
5609 case Intrinsic::vector_reduce_fmax:
5610 case Intrinsic::vector_reduce_fmin:
5611 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5612 break;
5613
5614 case Intrinsic::vector_reduce_fadd:
5615 case Intrinsic::vector_reduce_fmul:
5616 handleVectorReduceWithStarterIntrinsic(I);
5617 break;
5618
5619 case Intrinsic::scmp:
5620 case Intrinsic::ucmp: {
5621 handleShadowOr(I);
5622 break;
5623 }
5624
5625 case Intrinsic::fshl:
5626 case Intrinsic::fshr:
5627 handleFunnelShift(I);
5628 break;
5629
5630 case Intrinsic::is_constant:
5631 // The result of llvm.is.constant() is always defined.
5632 setShadow(&I, getCleanShadow(&I));
5633 setOrigin(&I, getCleanOrigin());
5634 break;
5635
5636 default:
5637 return false;
5638 }
5639
5640 return true;
5641 }
5642
5643 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5644 switch (I.getIntrinsicID()) {
5645 case Intrinsic::x86_sse_stmxcsr:
5646 handleStmxcsr(I);
5647 break;
5648 case Intrinsic::x86_sse_ldmxcsr:
5649 handleLdmxcsr(I);
5650 break;
5651
5652 // Convert Scalar Double Precision Floating-Point Value
5653 // to Unsigned Doubleword Integer
5654 // etc.
5655 case Intrinsic::x86_avx512_vcvtsd2usi64:
5656 case Intrinsic::x86_avx512_vcvtsd2usi32:
5657 case Intrinsic::x86_avx512_vcvtss2usi64:
5658 case Intrinsic::x86_avx512_vcvtss2usi32:
5659 case Intrinsic::x86_avx512_cvttss2usi64:
5660 case Intrinsic::x86_avx512_cvttss2usi:
5661 case Intrinsic::x86_avx512_cvttsd2usi64:
5662 case Intrinsic::x86_avx512_cvttsd2usi:
5663 case Intrinsic::x86_avx512_cvtusi2ss:
5664 case Intrinsic::x86_avx512_cvtusi642sd:
5665 case Intrinsic::x86_avx512_cvtusi642ss:
5666 handleSSEVectorConvertIntrinsic(I, 1, true);
5667 break;
5668 case Intrinsic::x86_sse2_cvtsd2si64:
5669 case Intrinsic::x86_sse2_cvtsd2si:
5670 case Intrinsic::x86_sse2_cvtsd2ss:
5671 case Intrinsic::x86_sse2_cvttsd2si64:
5672 case Intrinsic::x86_sse2_cvttsd2si:
5673 case Intrinsic::x86_sse_cvtss2si64:
5674 case Intrinsic::x86_sse_cvtss2si:
5675 case Intrinsic::x86_sse_cvttss2si64:
5676 case Intrinsic::x86_sse_cvttss2si:
5677 handleSSEVectorConvertIntrinsic(I, 1);
5678 break;
5679 case Intrinsic::x86_sse_cvtps2pi:
5680 case Intrinsic::x86_sse_cvttps2pi:
5681 handleSSEVectorConvertIntrinsic(I, 2);
5682 break;
5683
5684 // TODO:
5685 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5686 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5687 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5688
5689 case Intrinsic::x86_vcvtps2ph_128:
5690 case Intrinsic::x86_vcvtps2ph_256: {
5691 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5692 break;
5693 }
5694
5695 // Convert Packed Single Precision Floating-Point Values
5696 // to Packed Signed Doubleword Integer Values
5697 //
5698 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5699 // (<16 x float>, <16 x i32>, i16, i32)
5700 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5701 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5702 break;
5703
5704 // Convert Packed Double Precision Floating-Point Values
5705 // to Packed Single Precision Floating-Point Values
5706 case Intrinsic::x86_sse2_cvtpd2ps:
5707 case Intrinsic::x86_sse2_cvtps2dq:
5708 case Intrinsic::x86_sse2_cvtpd2dq:
5709 case Intrinsic::x86_sse2_cvttps2dq:
5710 case Intrinsic::x86_sse2_cvttpd2dq:
5711 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5712 case Intrinsic::x86_avx_cvt_ps2dq_256:
5713 case Intrinsic::x86_avx_cvt_pd2dq_256:
5714 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5715 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5716 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5717 break;
5718 }
5719
5720 // Convert Single-Precision FP Value to 16-bit FP Value
5721 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5722 // (<16 x float>, i32, <16 x i16>, i16)
5723 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5724 // (<4 x float>, i32, <8 x i16>, i8)
5725 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5726 // (<8 x float>, i32, <8 x i16>, i8)
5727 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5728 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5729 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5730 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5731 break;
5732
5733 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5734 case Intrinsic::x86_avx512_psll_w_512:
5735 case Intrinsic::x86_avx512_psll_d_512:
5736 case Intrinsic::x86_avx512_psll_q_512:
5737 case Intrinsic::x86_avx512_pslli_w_512:
5738 case Intrinsic::x86_avx512_pslli_d_512:
5739 case Intrinsic::x86_avx512_pslli_q_512:
5740 case Intrinsic::x86_avx512_psrl_w_512:
5741 case Intrinsic::x86_avx512_psrl_d_512:
5742 case Intrinsic::x86_avx512_psrl_q_512:
5743 case Intrinsic::x86_avx512_psra_w_512:
5744 case Intrinsic::x86_avx512_psra_d_512:
5745 case Intrinsic::x86_avx512_psra_q_512:
5746 case Intrinsic::x86_avx512_psrli_w_512:
5747 case Intrinsic::x86_avx512_psrli_d_512:
5748 case Intrinsic::x86_avx512_psrli_q_512:
5749 case Intrinsic::x86_avx512_psrai_w_512:
5750 case Intrinsic::x86_avx512_psrai_d_512:
5751 case Intrinsic::x86_avx512_psrai_q_512:
5752 case Intrinsic::x86_avx512_psra_q_256:
5753 case Intrinsic::x86_avx512_psra_q_128:
5754 case Intrinsic::x86_avx512_psrai_q_256:
5755 case Intrinsic::x86_avx512_psrai_q_128:
5756 case Intrinsic::x86_avx2_psll_w:
5757 case Intrinsic::x86_avx2_psll_d:
5758 case Intrinsic::x86_avx2_psll_q:
5759 case Intrinsic::x86_avx2_pslli_w:
5760 case Intrinsic::x86_avx2_pslli_d:
5761 case Intrinsic::x86_avx2_pslli_q:
5762 case Intrinsic::x86_avx2_psrl_w:
5763 case Intrinsic::x86_avx2_psrl_d:
5764 case Intrinsic::x86_avx2_psrl_q:
5765 case Intrinsic::x86_avx2_psra_w:
5766 case Intrinsic::x86_avx2_psra_d:
5767 case Intrinsic::x86_avx2_psrli_w:
5768 case Intrinsic::x86_avx2_psrli_d:
5769 case Intrinsic::x86_avx2_psrli_q:
5770 case Intrinsic::x86_avx2_psrai_w:
5771 case Intrinsic::x86_avx2_psrai_d:
5772 case Intrinsic::x86_sse2_psll_w:
5773 case Intrinsic::x86_sse2_psll_d:
5774 case Intrinsic::x86_sse2_psll_q:
5775 case Intrinsic::x86_sse2_pslli_w:
5776 case Intrinsic::x86_sse2_pslli_d:
5777 case Intrinsic::x86_sse2_pslli_q:
5778 case Intrinsic::x86_sse2_psrl_w:
5779 case Intrinsic::x86_sse2_psrl_d:
5780 case Intrinsic::x86_sse2_psrl_q:
5781 case Intrinsic::x86_sse2_psra_w:
5782 case Intrinsic::x86_sse2_psra_d:
5783 case Intrinsic::x86_sse2_psrli_w:
5784 case Intrinsic::x86_sse2_psrli_d:
5785 case Intrinsic::x86_sse2_psrli_q:
5786 case Intrinsic::x86_sse2_psrai_w:
5787 case Intrinsic::x86_sse2_psrai_d:
5788 case Intrinsic::x86_mmx_psll_w:
5789 case Intrinsic::x86_mmx_psll_d:
5790 case Intrinsic::x86_mmx_psll_q:
5791 case Intrinsic::x86_mmx_pslli_w:
5792 case Intrinsic::x86_mmx_pslli_d:
5793 case Intrinsic::x86_mmx_pslli_q:
5794 case Intrinsic::x86_mmx_psrl_w:
5795 case Intrinsic::x86_mmx_psrl_d:
5796 case Intrinsic::x86_mmx_psrl_q:
5797 case Intrinsic::x86_mmx_psra_w:
5798 case Intrinsic::x86_mmx_psra_d:
5799 case Intrinsic::x86_mmx_psrli_w:
5800 case Intrinsic::x86_mmx_psrli_d:
5801 case Intrinsic::x86_mmx_psrli_q:
5802 case Intrinsic::x86_mmx_psrai_w:
5803 case Intrinsic::x86_mmx_psrai_d:
5804 handleVectorShiftIntrinsic(I, /* Variable */ false);
5805 break;
5806 case Intrinsic::x86_avx2_psllv_d:
5807 case Intrinsic::x86_avx2_psllv_d_256:
5808 case Intrinsic::x86_avx512_psllv_d_512:
5809 case Intrinsic::x86_avx2_psllv_q:
5810 case Intrinsic::x86_avx2_psllv_q_256:
5811 case Intrinsic::x86_avx512_psllv_q_512:
5812 case Intrinsic::x86_avx2_psrlv_d:
5813 case Intrinsic::x86_avx2_psrlv_d_256:
5814 case Intrinsic::x86_avx512_psrlv_d_512:
5815 case Intrinsic::x86_avx2_psrlv_q:
5816 case Intrinsic::x86_avx2_psrlv_q_256:
5817 case Intrinsic::x86_avx512_psrlv_q_512:
5818 case Intrinsic::x86_avx2_psrav_d:
5819 case Intrinsic::x86_avx2_psrav_d_256:
5820 case Intrinsic::x86_avx512_psrav_d_512:
5821 case Intrinsic::x86_avx512_psrav_q_128:
5822 case Intrinsic::x86_avx512_psrav_q_256:
5823 case Intrinsic::x86_avx512_psrav_q_512:
5824 handleVectorShiftIntrinsic(I, /* Variable */ true);
5825 break;
5826
5827 // Pack with Signed/Unsigned Saturation
5828 case Intrinsic::x86_sse2_packsswb_128:
5829 case Intrinsic::x86_sse2_packssdw_128:
5830 case Intrinsic::x86_sse2_packuswb_128:
5831 case Intrinsic::x86_sse41_packusdw:
5832 case Intrinsic::x86_avx2_packsswb:
5833 case Intrinsic::x86_avx2_packssdw:
5834 case Intrinsic::x86_avx2_packuswb:
5835 case Intrinsic::x86_avx2_packusdw:
5836 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5837 // (<32 x i16> %a, <32 x i16> %b)
5838 // <32 x i16> @llvm.x86.avx512.packssdw.512
5839 // (<16 x i32> %a, <16 x i32> %b)
5840 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5841 case Intrinsic::x86_avx512_packsswb_512:
5842 case Intrinsic::x86_avx512_packssdw_512:
5843 case Intrinsic::x86_avx512_packuswb_512:
5844 case Intrinsic::x86_avx512_packusdw_512:
5845 handleVectorPackIntrinsic(I);
5846 break;
5847
5848 case Intrinsic::x86_sse41_pblendvb:
5849 case Intrinsic::x86_sse41_blendvpd:
5850 case Intrinsic::x86_sse41_blendvps:
5851 case Intrinsic::x86_avx_blendv_pd_256:
5852 case Intrinsic::x86_avx_blendv_ps_256:
5853 case Intrinsic::x86_avx2_pblendvb:
5854 handleBlendvIntrinsic(I);
5855 break;
5856
5857 case Intrinsic::x86_avx_dp_ps_256:
5858 case Intrinsic::x86_sse41_dppd:
5859 case Intrinsic::x86_sse41_dpps:
5860 handleDppIntrinsic(I);
5861 break;
5862
5863 case Intrinsic::x86_mmx_packsswb:
5864 case Intrinsic::x86_mmx_packuswb:
5865 handleVectorPackIntrinsic(I, 16);
5866 break;
5867
5868 case Intrinsic::x86_mmx_packssdw:
5869 handleVectorPackIntrinsic(I, 32);
5870 break;
5871
5872 case Intrinsic::x86_mmx_psad_bw:
5873 handleVectorSadIntrinsic(I, true);
5874 break;
5875 case Intrinsic::x86_sse2_psad_bw:
5876 case Intrinsic::x86_avx2_psad_bw:
5877 handleVectorSadIntrinsic(I);
5878 break;
5879
5880 // Multiply and Add Packed Words
5881 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5882 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5883 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5884 //
5885 // Multiply and Add Packed Signed and Unsigned Bytes
5886 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5887 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5888 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5889 //
5890 // These intrinsics are auto-upgraded into non-masked forms:
5891 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5892 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5893 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5894 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5895 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5896 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5897 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5898 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5899 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5900 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5901 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5902 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5903 case Intrinsic::x86_sse2_pmadd_wd:
5904 case Intrinsic::x86_avx2_pmadd_wd:
5905 case Intrinsic::x86_avx512_pmaddw_d_512:
5906 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5907 case Intrinsic::x86_avx2_pmadd_ub_sw:
5908 case Intrinsic::x86_avx512_pmaddubs_w_512:
5909 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5910 /*ZeroPurifies=*/true);
5911 break;
5912
5913 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5914 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5915 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5916 /*ZeroPurifies=*/true,
5917 /*EltSizeInBits=*/8);
5918 break;
5919
5920 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5921 case Intrinsic::x86_mmx_pmadd_wd:
5922 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
5923 /*ZeroPurifies=*/true,
5924 /*EltSizeInBits=*/16);
5925 break;
5926
5927 // AVX Vector Neural Network Instructions: bytes
5928 //
5929 // Multiply and Add Signed Bytes
5930 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5931 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5932 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5933 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5934 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5935 // (<16 x i32>, <64 x i8>, <64 x i8>)
5936 //
5937 // Multiply and Add Signed Bytes With Saturation
5938 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5939 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5940 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5941 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5942 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5943 // (<16 x i32>, <64 x i8>, <64 x i8>)
5944 //
5945 // Multiply and Add Signed and Unsigned Bytes
5946 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
5947 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5948 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
5949 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5950 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
5951 // (<16 x i32>, <64 x i8>, <64 x i8>)
5952 //
5953 // Multiply and Add Signed and Unsigned Bytes With Saturation
5954 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
5955 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5956 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
5957 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5958 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5959 // (<16 x i32>, <64 x i8>, <64 x i8>)
5960 //
5961 // Multiply and Add Unsigned and Signed Bytes
5962 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5963 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5964 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5965 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5966 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5967 // (<16 x i32>, <64 x i8>, <64 x i8>)
5968 //
5969 // Multiply and Add Unsigned and Signed Bytes With Saturation
5970 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5971 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5972 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5973 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5974 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
5975 // (<16 x i32>, <64 x i8>, <64 x i8>)
5976 //
5977 // Multiply and Add Unsigned Bytes
5978 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
5979 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5980 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
5981 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5982 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
5983 // (<16 x i32>, <64 x i8>, <64 x i8>)
5984 //
5985 // Multiply and Add Unsigned Bytes With Saturation
5986 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
5987 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5988 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
5989 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5990 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
5991 // (<16 x i32>, <64 x i8>, <64 x i8>)
5992 //
5993 // These intrinsics are auto-upgraded into non-masked forms:
5994 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5995 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5996 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5997 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5998 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5999 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6000 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6001 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6002 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6003 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6004 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6005 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6006 //
6007 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6008 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6009 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6010 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6011 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6012 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6013 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6014 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6015 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6016 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6017 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6018 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6019 case Intrinsic::x86_avx512_vpdpbusd_128:
6020 case Intrinsic::x86_avx512_vpdpbusd_256:
6021 case Intrinsic::x86_avx512_vpdpbusd_512:
6022 case Intrinsic::x86_avx512_vpdpbusds_128:
6023 case Intrinsic::x86_avx512_vpdpbusds_256:
6024 case Intrinsic::x86_avx512_vpdpbusds_512:
6025 case Intrinsic::x86_avx2_vpdpbssd_128:
6026 case Intrinsic::x86_avx2_vpdpbssd_256:
6027 case Intrinsic::x86_avx10_vpdpbssd_512:
6028 case Intrinsic::x86_avx2_vpdpbssds_128:
6029 case Intrinsic::x86_avx2_vpdpbssds_256:
6030 case Intrinsic::x86_avx10_vpdpbssds_512:
6031 case Intrinsic::x86_avx2_vpdpbsud_128:
6032 case Intrinsic::x86_avx2_vpdpbsud_256:
6033 case Intrinsic::x86_avx10_vpdpbsud_512:
6034 case Intrinsic::x86_avx2_vpdpbsuds_128:
6035 case Intrinsic::x86_avx2_vpdpbsuds_256:
6036 case Intrinsic::x86_avx10_vpdpbsuds_512:
6037 case Intrinsic::x86_avx2_vpdpbuud_128:
6038 case Intrinsic::x86_avx2_vpdpbuud_256:
6039 case Intrinsic::x86_avx10_vpdpbuud_512:
6040 case Intrinsic::x86_avx2_vpdpbuuds_128:
6041 case Intrinsic::x86_avx2_vpdpbuuds_256:
6042 case Intrinsic::x86_avx10_vpdpbuuds_512:
6043 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6044 /*ZeroPurifies=*/true);
6045 break;
6046
6047 // AVX Vector Neural Network Instructions: words
6048 //
6049 // Multiply and Add Signed Word Integers
6050 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6051 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6052 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6053 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6054 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6055 // (<16 x i32>, <32 x i16>, <32 x i16>)
6056 //
6057 // Multiply and Add Signed Word Integers With Saturation
6058 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6059 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6060 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6061 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6062 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6063 // (<16 x i32>, <32 x i16>, <32 x i16>)
6064 //
6065 // Multiply and Add Signed and Unsigned Word Integers
6066 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6067 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6068 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6069 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6070 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6071 // (<16 x i32>, <32 x i16>, <32 x i16>)
6072 //
6073 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6074 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6075 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6076 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6077 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6078 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6079 // (<16 x i32>, <32 x i16>, <32 x i16>)
6080 //
6081 // Multiply and Add Unsigned and Signed Word Integers
6082 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6083 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6084 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6085 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6086 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6087 // (<16 x i32>, <32 x i16>, <32 x i16>)
6088 //
6089 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6090 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6091 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6092 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6093 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6094 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6095 // (<16 x i32>, <32 x i16>, <32 x i16>)
6096 //
6097 // Multiply and Add Unsigned and Unsigned Word Integers
6098 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6099 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6100 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6101 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6102 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6103 // (<16 x i32>, <32 x i16>, <32 x i16>)
6104 //
6105 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6106 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6107 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6108 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6109 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6110 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6111 // (<16 x i32>, <32 x i16>, <32 x i16>)
6112 //
6113 // These intrinsics are auto-upgraded into non-masked forms:
6114 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6115 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6116 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6117 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6118 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6119 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6120 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6121 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6122 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6123 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6124 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6125 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6126 //
6127 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6128 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6129 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6130 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6131 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6132 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6133 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6134 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6135 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6136 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6137 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6138 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6139 case Intrinsic::x86_avx512_vpdpwssd_128:
6140 case Intrinsic::x86_avx512_vpdpwssd_256:
6141 case Intrinsic::x86_avx512_vpdpwssd_512:
6142 case Intrinsic::x86_avx512_vpdpwssds_128:
6143 case Intrinsic::x86_avx512_vpdpwssds_256:
6144 case Intrinsic::x86_avx512_vpdpwssds_512:
6145 case Intrinsic::x86_avx2_vpdpwsud_128:
6146 case Intrinsic::x86_avx2_vpdpwsud_256:
6147 case Intrinsic::x86_avx10_vpdpwsud_512:
6148 case Intrinsic::x86_avx2_vpdpwsuds_128:
6149 case Intrinsic::x86_avx2_vpdpwsuds_256:
6150 case Intrinsic::x86_avx10_vpdpwsuds_512:
6151 case Intrinsic::x86_avx2_vpdpwusd_128:
6152 case Intrinsic::x86_avx2_vpdpwusd_256:
6153 case Intrinsic::x86_avx10_vpdpwusd_512:
6154 case Intrinsic::x86_avx2_vpdpwusds_128:
6155 case Intrinsic::x86_avx2_vpdpwusds_256:
6156 case Intrinsic::x86_avx10_vpdpwusds_512:
6157 case Intrinsic::x86_avx2_vpdpwuud_128:
6158 case Intrinsic::x86_avx2_vpdpwuud_256:
6159 case Intrinsic::x86_avx10_vpdpwuud_512:
6160 case Intrinsic::x86_avx2_vpdpwuuds_128:
6161 case Intrinsic::x86_avx2_vpdpwuuds_256:
6162 case Intrinsic::x86_avx10_vpdpwuuds_512:
6163 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6164 /*ZeroPurifies=*/true);
6165 break;
6166
6167 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6168 // Precision
6169 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6170 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6171 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6172 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6173 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6174 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6175 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6176 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6177 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6178 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6179 /*ZeroPurifies=*/false);
6180 break;
6181
6182 case Intrinsic::x86_sse_cmp_ss:
6183 case Intrinsic::x86_sse2_cmp_sd:
6184 case Intrinsic::x86_sse_comieq_ss:
6185 case Intrinsic::x86_sse_comilt_ss:
6186 case Intrinsic::x86_sse_comile_ss:
6187 case Intrinsic::x86_sse_comigt_ss:
6188 case Intrinsic::x86_sse_comige_ss:
6189 case Intrinsic::x86_sse_comineq_ss:
6190 case Intrinsic::x86_sse_ucomieq_ss:
6191 case Intrinsic::x86_sse_ucomilt_ss:
6192 case Intrinsic::x86_sse_ucomile_ss:
6193 case Intrinsic::x86_sse_ucomigt_ss:
6194 case Intrinsic::x86_sse_ucomige_ss:
6195 case Intrinsic::x86_sse_ucomineq_ss:
6196 case Intrinsic::x86_sse2_comieq_sd:
6197 case Intrinsic::x86_sse2_comilt_sd:
6198 case Intrinsic::x86_sse2_comile_sd:
6199 case Intrinsic::x86_sse2_comigt_sd:
6200 case Intrinsic::x86_sse2_comige_sd:
6201 case Intrinsic::x86_sse2_comineq_sd:
6202 case Intrinsic::x86_sse2_ucomieq_sd:
6203 case Intrinsic::x86_sse2_ucomilt_sd:
6204 case Intrinsic::x86_sse2_ucomile_sd:
6205 case Intrinsic::x86_sse2_ucomigt_sd:
6206 case Intrinsic::x86_sse2_ucomige_sd:
6207 case Intrinsic::x86_sse2_ucomineq_sd:
6208 handleVectorCompareScalarIntrinsic(I);
6209 break;
6210
6211 case Intrinsic::x86_avx_cmp_pd_256:
6212 case Intrinsic::x86_avx_cmp_ps_256:
6213 case Intrinsic::x86_sse2_cmp_pd:
6214 case Intrinsic::x86_sse_cmp_ps:
6215 handleVectorComparePackedIntrinsic(I);
6216 break;
6217
6218 case Intrinsic::x86_bmi_bextr_32:
6219 case Intrinsic::x86_bmi_bextr_64:
6220 case Intrinsic::x86_bmi_bzhi_32:
6221 case Intrinsic::x86_bmi_bzhi_64:
6222 case Intrinsic::x86_bmi_pdep_32:
6223 case Intrinsic::x86_bmi_pdep_64:
6224 case Intrinsic::x86_bmi_pext_32:
6225 case Intrinsic::x86_bmi_pext_64:
6226 handleBmiIntrinsic(I);
6227 break;
6228
6229 case Intrinsic::x86_pclmulqdq:
6230 case Intrinsic::x86_pclmulqdq_256:
6231 case Intrinsic::x86_pclmulqdq_512:
6232 handlePclmulIntrinsic(I);
6233 break;
6234
6235 case Intrinsic::x86_avx_round_pd_256:
6236 case Intrinsic::x86_avx_round_ps_256:
6237 case Intrinsic::x86_sse41_round_pd:
6238 case Intrinsic::x86_sse41_round_ps:
6239 handleRoundPdPsIntrinsic(I);
6240 break;
6241
6242 case Intrinsic::x86_sse41_round_sd:
6243 case Intrinsic::x86_sse41_round_ss:
6244 handleUnarySdSsIntrinsic(I);
6245 break;
6246
6247 case Intrinsic::x86_sse2_max_sd:
6248 case Intrinsic::x86_sse_max_ss:
6249 case Intrinsic::x86_sse2_min_sd:
6250 case Intrinsic::x86_sse_min_ss:
6251 handleBinarySdSsIntrinsic(I);
6252 break;
6253
6254 case Intrinsic::x86_avx_vtestc_pd:
6255 case Intrinsic::x86_avx_vtestc_pd_256:
6256 case Intrinsic::x86_avx_vtestc_ps:
6257 case Intrinsic::x86_avx_vtestc_ps_256:
6258 case Intrinsic::x86_avx_vtestnzc_pd:
6259 case Intrinsic::x86_avx_vtestnzc_pd_256:
6260 case Intrinsic::x86_avx_vtestnzc_ps:
6261 case Intrinsic::x86_avx_vtestnzc_ps_256:
6262 case Intrinsic::x86_avx_vtestz_pd:
6263 case Intrinsic::x86_avx_vtestz_pd_256:
6264 case Intrinsic::x86_avx_vtestz_ps:
6265 case Intrinsic::x86_avx_vtestz_ps_256:
6266 case Intrinsic::x86_avx_ptestc_256:
6267 case Intrinsic::x86_avx_ptestnzc_256:
6268 case Intrinsic::x86_avx_ptestz_256:
6269 case Intrinsic::x86_sse41_ptestc:
6270 case Intrinsic::x86_sse41_ptestnzc:
6271 case Intrinsic::x86_sse41_ptestz:
6272 handleVtestIntrinsic(I);
6273 break;
6274
6275 // Packed Horizontal Add/Subtract
6276 case Intrinsic::x86_ssse3_phadd_w:
6277 case Intrinsic::x86_ssse3_phadd_w_128:
6278 case Intrinsic::x86_ssse3_phsub_w:
6279 case Intrinsic::x86_ssse3_phsub_w_128:
6280 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6281 /*ReinterpretElemWidth=*/16);
6282 break;
6283
6284 case Intrinsic::x86_avx2_phadd_w:
6285 case Intrinsic::x86_avx2_phsub_w:
6286 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6287 /*ReinterpretElemWidth=*/16);
6288 break;
6289
6290 // Packed Horizontal Add/Subtract
6291 case Intrinsic::x86_ssse3_phadd_d:
6292 case Intrinsic::x86_ssse3_phadd_d_128:
6293 case Intrinsic::x86_ssse3_phsub_d:
6294 case Intrinsic::x86_ssse3_phsub_d_128:
6295 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6296 /*ReinterpretElemWidth=*/32);
6297 break;
6298
6299 case Intrinsic::x86_avx2_phadd_d:
6300 case Intrinsic::x86_avx2_phsub_d:
6301 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6302 /*ReinterpretElemWidth=*/32);
6303 break;
6304
6305 // Packed Horizontal Add/Subtract and Saturate
6306 case Intrinsic::x86_ssse3_phadd_sw:
6307 case Intrinsic::x86_ssse3_phadd_sw_128:
6308 case Intrinsic::x86_ssse3_phsub_sw:
6309 case Intrinsic::x86_ssse3_phsub_sw_128:
6310 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6311 /*ReinterpretElemWidth=*/16);
6312 break;
6313
6314 case Intrinsic::x86_avx2_phadd_sw:
6315 case Intrinsic::x86_avx2_phsub_sw:
6316 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6317 /*ReinterpretElemWidth=*/16);
6318 break;
6319
6320 // Packed Single/Double Precision Floating-Point Horizontal Add
6321 case Intrinsic::x86_sse3_hadd_ps:
6322 case Intrinsic::x86_sse3_hadd_pd:
6323 case Intrinsic::x86_sse3_hsub_ps:
6324 case Intrinsic::x86_sse3_hsub_pd:
6325 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6326 break;
6327
6328 case Intrinsic::x86_avx_hadd_pd_256:
6329 case Intrinsic::x86_avx_hadd_ps_256:
6330 case Intrinsic::x86_avx_hsub_pd_256:
6331 case Intrinsic::x86_avx_hsub_ps_256:
6332 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6333 break;
6334
6335 case Intrinsic::x86_avx_maskstore_ps:
6336 case Intrinsic::x86_avx_maskstore_pd:
6337 case Intrinsic::x86_avx_maskstore_ps_256:
6338 case Intrinsic::x86_avx_maskstore_pd_256:
6339 case Intrinsic::x86_avx2_maskstore_d:
6340 case Intrinsic::x86_avx2_maskstore_q:
6341 case Intrinsic::x86_avx2_maskstore_d_256:
6342 case Intrinsic::x86_avx2_maskstore_q_256: {
6343 handleAVXMaskedStore(I);
6344 break;
6345 }
6346
6347 case Intrinsic::x86_avx_maskload_ps:
6348 case Intrinsic::x86_avx_maskload_pd:
6349 case Intrinsic::x86_avx_maskload_ps_256:
6350 case Intrinsic::x86_avx_maskload_pd_256:
6351 case Intrinsic::x86_avx2_maskload_d:
6352 case Intrinsic::x86_avx2_maskload_q:
6353 case Intrinsic::x86_avx2_maskload_d_256:
6354 case Intrinsic::x86_avx2_maskload_q_256: {
6355 handleAVXMaskedLoad(I);
6356 break;
6357 }
6358
6359 // Packed
6360 case Intrinsic::x86_avx512fp16_add_ph_512:
6361 case Intrinsic::x86_avx512fp16_sub_ph_512:
6362 case Intrinsic::x86_avx512fp16_mul_ph_512:
6363 case Intrinsic::x86_avx512fp16_div_ph_512:
6364 case Intrinsic::x86_avx512fp16_max_ph_512:
6365 case Intrinsic::x86_avx512fp16_min_ph_512:
6366 case Intrinsic::x86_avx512_min_ps_512:
6367 case Intrinsic::x86_avx512_min_pd_512:
6368 case Intrinsic::x86_avx512_max_ps_512:
6369 case Intrinsic::x86_avx512_max_pd_512: {
6370 // These AVX512 variants contain the rounding mode as a trailing flag.
6371 // Earlier variants do not have a trailing flag and are already handled
6372 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6373 // maybeHandleUnknownIntrinsic.
6374 [[maybe_unused]] bool Success =
6375 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6376 assert(Success);
6377 break;
6378 }
6379
6380 case Intrinsic::x86_avx_vpermilvar_pd:
6381 case Intrinsic::x86_avx_vpermilvar_pd_256:
6382 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6383 case Intrinsic::x86_avx_vpermilvar_ps:
6384 case Intrinsic::x86_avx_vpermilvar_ps_256:
6385 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6386 handleAVXVpermilvar(I);
6387 break;
6388 }
6389
6390 case Intrinsic::x86_avx512_vpermi2var_d_128:
6391 case Intrinsic::x86_avx512_vpermi2var_d_256:
6392 case Intrinsic::x86_avx512_vpermi2var_d_512:
6393 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6394 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6395 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6396 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6397 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6398 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6399 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6400 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6401 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6402 case Intrinsic::x86_avx512_vpermi2var_q_128:
6403 case Intrinsic::x86_avx512_vpermi2var_q_256:
6404 case Intrinsic::x86_avx512_vpermi2var_q_512:
6405 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6406 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6407 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6408 handleAVXVpermi2var(I);
6409 break;
6410
6411 // Packed Shuffle
6412 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6413 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6414 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6415 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6416 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6417 //
6418 // The following intrinsics are auto-upgraded:
6419 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6420 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6421 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6422 case Intrinsic::x86_avx2_pshuf_b:
6423 case Intrinsic::x86_sse_pshuf_w:
6424 case Intrinsic::x86_ssse3_pshuf_b_128:
6425 case Intrinsic::x86_ssse3_pshuf_b:
6426 case Intrinsic::x86_avx512_pshuf_b_512:
6427 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6428 /*trailingVerbatimArgs=*/1);
6429 break;
6430
6431 // AVX512 PMOV: Packed MOV, with truncation
6432 // Precisely handled by applying the same intrinsic to the shadow
6433 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6434 case Intrinsic::x86_avx512_mask_pmov_db_512:
6435 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6436 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6437 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6438 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6439 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6440 /*trailingVerbatimArgs=*/1);
6441 break;
6442 }
6443
6444 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6445 // Approximately handled using the corresponding truncation intrinsic
6446 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6447 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6448 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6449 handleIntrinsicByApplyingToShadow(I,
6450 Intrinsic::x86_avx512_mask_pmov_dw_512,
6451 /* trailingVerbatimArgs=*/1);
6452 break;
6453 }
6454
6455 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6456 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6457 handleIntrinsicByApplyingToShadow(I,
6458 Intrinsic::x86_avx512_mask_pmov_db_512,
6459 /* trailingVerbatimArgs=*/1);
6460 break;
6461 }
6462
6463 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6464 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6465 handleIntrinsicByApplyingToShadow(I,
6466 Intrinsic::x86_avx512_mask_pmov_qb_512,
6467 /* trailingVerbatimArgs=*/1);
6468 break;
6469 }
6470
6471 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6472 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6473 handleIntrinsicByApplyingToShadow(I,
6474 Intrinsic::x86_avx512_mask_pmov_qw_512,
6475 /* trailingVerbatimArgs=*/1);
6476 break;
6477 }
6478
6479 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6480 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6481 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6482 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6483 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6484 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6485 // slow-path handler.
6486 handleAVX512VectorDownConvert(I);
6487 break;
6488 }
6489
6490 // AVX512/AVX10 Reciprocal
6491 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6492 // (<16 x float>, <16 x float>, i16)
6493 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6494 // (<8 x float>, <8 x float>, i8)
6495 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6496 // (<4 x float>, <4 x float>, i8)
6497 //
6498 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6499 // (<8 x double>, <8 x double>, i8)
6500 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6501 // (<4 x double>, <4 x double>, i8)
6502 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6503 // (<2 x double>, <2 x double>, i8)
6504 //
6505 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6506 // (<32 x bfloat>, <32 x bfloat>, i32)
6507 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6508 // (<16 x bfloat>, <16 x bfloat>, i16)
6509 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6510 // (<8 x bfloat>, <8 x bfloat>, i8)
6511 //
6512 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6513 // (<32 x half>, <32 x half>, i32)
6514 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6515 // (<16 x half>, <16 x half>, i16)
6516 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6517 // (<8 x half>, <8 x half>, i8)
6518 //
6519 // TODO: 3-operand variants are not handled:
6520 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6521 // (<2 x double>, <2 x double>, <2 x double>, i8)
6522 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6523 // (<4 x float>, <4 x float>, <4 x float>, i8)
6524 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6525 // (<8 x half>, <8 x half>, <8 x half>, i8)
6526 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6527 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6528 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6529 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6530 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6531 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6532 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6533 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6534 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6535 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6536 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6537 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6538 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6539 /*MaskIndex=*/2);
6540 break;
6541
6542 // AVX512/AVX10 Reciprocal Square Root
6543 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6544 // (<16 x float>, <16 x float>, i16)
6545 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6546 // (<8 x float>, <8 x float>, i8)
6547 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6548 // (<4 x float>, <4 x float>, i8)
6549 //
6550 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6551 // (<8 x double>, <8 x double>, i8)
6552 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6553 // (<4 x double>, <4 x double>, i8)
6554 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6555 // (<2 x double>, <2 x double>, i8)
6556 //
6557 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6558 // (<32 x bfloat>, <32 x bfloat>, i32)
6559 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6560 // (<16 x bfloat>, <16 x bfloat>, i16)
6561 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6562 // (<8 x bfloat>, <8 x bfloat>, i8)
6563 //
6564 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6565 // (<32 x half>, <32 x half>, i32)
6566 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6567 // (<16 x half>, <16 x half>, i16)
6568 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6569 // (<8 x half>, <8 x half>, i8)
6570 //
6571 // TODO: 3-operand variants are not handled:
6572 // <2 x double> @llvm.x86.avx512.rcp14.sd
6573 // (<2 x double>, <2 x double>, <2 x double>, i8)
6574 // <4 x float> @llvm.x86.avx512.rcp14.ss
6575 // (<4 x float>, <4 x float>, <4 x float>, i8)
6576 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6577 // (<8 x half>, <8 x half>, <8 x half>, i8)
6578 case Intrinsic::x86_avx512_rcp14_ps_512:
6579 case Intrinsic::x86_avx512_rcp14_ps_256:
6580 case Intrinsic::x86_avx512_rcp14_ps_128:
6581 case Intrinsic::x86_avx512_rcp14_pd_512:
6582 case Intrinsic::x86_avx512_rcp14_pd_256:
6583 case Intrinsic::x86_avx512_rcp14_pd_128:
6584 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6585 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6586 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6587 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6588 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6589 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6590 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6591 /*MaskIndex=*/2);
6592 break;
6593
6594 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6595 // (<32 x half>, i32, <32 x half>, i32, i32)
6596 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6597 // (<16 x half>, i32, <16 x half>, i32, i16)
6598 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6599 // (<8 x half>, i32, <8 x half>, i32, i8)
6600 //
6601 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6602 // (<16 x float>, i32, <16 x float>, i16, i32)
6603 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6604 // (<8 x float>, i32, <8 x float>, i8)
6605 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6606 // (<4 x float>, i32, <4 x float>, i8)
6607 //
6608 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6609 // (<8 x double>, i32, <8 x double>, i8, i32)
6610 // A Imm WriteThru Mask Rounding
6611 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6612 // (<4 x double>, i32, <4 x double>, i8)
6613 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6614 // (<2 x double>, i32, <2 x double>, i8)
6615 // A Imm WriteThru Mask
6616 //
6617 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6618 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6619 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6620 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6621 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6622 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6623 //
6624 // Not supported: three vectors
6625 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6626 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6627 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6628 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6629 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6630 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6631 // i32)
6632 // A B WriteThru Mask Imm
6633 // Rounding
6634 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6635 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6636 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6637 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6638 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6639 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6640 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6641 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6642 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6643 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6644 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6645 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6646 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6647 /*MaskIndex=*/3);
6648 break;
6649
6650 // AVX512 FP16 Arithmetic
6651 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6652 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6653 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6654 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6655 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6656 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6657 visitGenericScalarHalfwordInst(I);
6658 break;
6659 }
6660
6661 // AVX Galois Field New Instructions
6662 case Intrinsic::x86_vgf2p8affineqb_128:
6663 case Intrinsic::x86_vgf2p8affineqb_256:
6664 case Intrinsic::x86_vgf2p8affineqb_512:
6665 handleAVXGF2P8Affine(I);
6666 break;
6667
6668 default:
6669 return false;
6670 }
6671
6672 return true;
6673 }
6674
6675 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6676 switch (I.getIntrinsicID()) {
6677 case Intrinsic::aarch64_neon_rshrn:
6678 case Intrinsic::aarch64_neon_sqrshl:
6679 case Intrinsic::aarch64_neon_sqrshrn:
6680 case Intrinsic::aarch64_neon_sqrshrun:
6681 case Intrinsic::aarch64_neon_sqshl:
6682 case Intrinsic::aarch64_neon_sqshlu:
6683 case Intrinsic::aarch64_neon_sqshrn:
6684 case Intrinsic::aarch64_neon_sqshrun:
6685 case Intrinsic::aarch64_neon_srshl:
6686 case Intrinsic::aarch64_neon_sshl:
6687 case Intrinsic::aarch64_neon_uqrshl:
6688 case Intrinsic::aarch64_neon_uqrshrn:
6689 case Intrinsic::aarch64_neon_uqshl:
6690 case Intrinsic::aarch64_neon_uqshrn:
6691 case Intrinsic::aarch64_neon_urshl:
6692 case Intrinsic::aarch64_neon_ushl:
6693 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6694 handleVectorShiftIntrinsic(I, /* Variable */ false);
6695 break;
6696
6697 // TODO: handling max/min similarly to AND/OR may be more precise
6698 // Floating-Point Maximum/Minimum Pairwise
6699 case Intrinsic::aarch64_neon_fmaxp:
6700 case Intrinsic::aarch64_neon_fminp:
6701 // Floating-Point Maximum/Minimum Number Pairwise
6702 case Intrinsic::aarch64_neon_fmaxnmp:
6703 case Intrinsic::aarch64_neon_fminnmp:
6704 // Signed/Unsigned Maximum/Minimum Pairwise
6705 case Intrinsic::aarch64_neon_smaxp:
6706 case Intrinsic::aarch64_neon_sminp:
6707 case Intrinsic::aarch64_neon_umaxp:
6708 case Intrinsic::aarch64_neon_uminp:
6709 // Add Pairwise
6710 case Intrinsic::aarch64_neon_addp:
6711 // Floating-point Add Pairwise
6712 case Intrinsic::aarch64_neon_faddp:
6713 // Add Long Pairwise
6714 case Intrinsic::aarch64_neon_saddlp:
6715 case Intrinsic::aarch64_neon_uaddlp: {
6716 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6717 break;
6718 }
6719
6720 // Floating-point Convert to integer, rounding to nearest with ties to Away
6721 case Intrinsic::aarch64_neon_fcvtas:
6722 case Intrinsic::aarch64_neon_fcvtau:
6723 // Floating-point convert to integer, rounding toward minus infinity
6724 case Intrinsic::aarch64_neon_fcvtms:
6725 case Intrinsic::aarch64_neon_fcvtmu:
6726 // Floating-point convert to integer, rounding to nearest with ties to even
6727 case Intrinsic::aarch64_neon_fcvtns:
6728 case Intrinsic::aarch64_neon_fcvtnu:
6729 // Floating-point convert to integer, rounding toward plus infinity
6730 case Intrinsic::aarch64_neon_fcvtps:
6731 case Intrinsic::aarch64_neon_fcvtpu:
6732 // Floating-point Convert to integer, rounding toward Zero
6733 case Intrinsic::aarch64_neon_fcvtzs:
6734 case Intrinsic::aarch64_neon_fcvtzu:
6735 // Floating-point convert to lower precision narrow, rounding to odd
6736 case Intrinsic::aarch64_neon_fcvtxn:
6737 // Vector Conversions Between Half-Precision and Single-Precision
6738 case Intrinsic::aarch64_neon_vcvthf2fp:
6739 case Intrinsic::aarch64_neon_vcvtfp2hf:
6740 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/false);
6741 break;
6742
6743 // Vector Conversions Between Fixed-Point and Floating-Point
6744 case Intrinsic::aarch64_neon_vcvtfxs2fp:
6745 case Intrinsic::aarch64_neon_vcvtfp2fxs:
6746 case Intrinsic::aarch64_neon_vcvtfxu2fp:
6747 case Intrinsic::aarch64_neon_vcvtfp2fxu:
6748 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/true);
6749 break;
6750
6751 // TODO: bfloat conversions
6752 // - bfloat @llvm.aarch64.neon.bfcvt(float)
6753 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
6754 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
6755
6756 // Add reduction to scalar
6757 case Intrinsic::aarch64_neon_faddv:
6758 case Intrinsic::aarch64_neon_saddv:
6759 case Intrinsic::aarch64_neon_uaddv:
6760 // Signed/Unsigned min/max (Vector)
6761 // TODO: handling similarly to AND/OR may be more precise.
6762 case Intrinsic::aarch64_neon_smaxv:
6763 case Intrinsic::aarch64_neon_sminv:
6764 case Intrinsic::aarch64_neon_umaxv:
6765 case Intrinsic::aarch64_neon_uminv:
6766 // Floating-point min/max (vector)
6767 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6768 // but our shadow propagation is the same.
6769 case Intrinsic::aarch64_neon_fmaxv:
6770 case Intrinsic::aarch64_neon_fminv:
6771 case Intrinsic::aarch64_neon_fmaxnmv:
6772 case Intrinsic::aarch64_neon_fminnmv:
6773 // Sum long across vector
6774 case Intrinsic::aarch64_neon_saddlv:
6775 case Intrinsic::aarch64_neon_uaddlv:
6776 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6777 break;
6778
6779 case Intrinsic::aarch64_neon_ld1x2:
6780 case Intrinsic::aarch64_neon_ld1x3:
6781 case Intrinsic::aarch64_neon_ld1x4:
6782 case Intrinsic::aarch64_neon_ld2:
6783 case Intrinsic::aarch64_neon_ld3:
6784 case Intrinsic::aarch64_neon_ld4:
6785 case Intrinsic::aarch64_neon_ld2r:
6786 case Intrinsic::aarch64_neon_ld3r:
6787 case Intrinsic::aarch64_neon_ld4r: {
6788 handleNEONVectorLoad(I, /*WithLane=*/false);
6789 break;
6790 }
6791
6792 case Intrinsic::aarch64_neon_ld2lane:
6793 case Intrinsic::aarch64_neon_ld3lane:
6794 case Intrinsic::aarch64_neon_ld4lane: {
6795 handleNEONVectorLoad(I, /*WithLane=*/true);
6796 break;
6797 }
6798
6799 // Saturating extract narrow
6800 case Intrinsic::aarch64_neon_sqxtn:
6801 case Intrinsic::aarch64_neon_sqxtun:
6802 case Intrinsic::aarch64_neon_uqxtn:
6803 // These only have one argument, but we (ab)use handleShadowOr because it
6804 // does work on single argument intrinsics and will typecast the shadow
6805 // (and update the origin).
6806 handleShadowOr(I);
6807 break;
6808
6809 case Intrinsic::aarch64_neon_st1x2:
6810 case Intrinsic::aarch64_neon_st1x3:
6811 case Intrinsic::aarch64_neon_st1x4:
6812 case Intrinsic::aarch64_neon_st2:
6813 case Intrinsic::aarch64_neon_st3:
6814 case Intrinsic::aarch64_neon_st4: {
6815 handleNEONVectorStoreIntrinsic(I, false);
6816 break;
6817 }
6818
6819 case Intrinsic::aarch64_neon_st2lane:
6820 case Intrinsic::aarch64_neon_st3lane:
6821 case Intrinsic::aarch64_neon_st4lane: {
6822 handleNEONVectorStoreIntrinsic(I, true);
6823 break;
6824 }
6825
6826 // Arm NEON vector table intrinsics have the source/table register(s) as
6827 // arguments, followed by the index register. They return the output.
6828 //
6829 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6830 // original value unchanged in the destination register.'
6831 // Conveniently, zero denotes a clean shadow, which means out-of-range
6832 // indices for TBL will initialize the user data with zero and also clean
6833 // the shadow. (For TBX, neither the user data nor the shadow will be
6834 // updated, which is also correct.)
6835 case Intrinsic::aarch64_neon_tbl1:
6836 case Intrinsic::aarch64_neon_tbl2:
6837 case Intrinsic::aarch64_neon_tbl3:
6838 case Intrinsic::aarch64_neon_tbl4:
6839 case Intrinsic::aarch64_neon_tbx1:
6840 case Intrinsic::aarch64_neon_tbx2:
6841 case Intrinsic::aarch64_neon_tbx3:
6842 case Intrinsic::aarch64_neon_tbx4: {
6843 // The last trailing argument (index register) should be handled verbatim
6844 handleIntrinsicByApplyingToShadow(
6845 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6846 /*trailingVerbatimArgs*/ 1);
6847 break;
6848 }
6849
6850 case Intrinsic::aarch64_neon_fmulx:
6851 case Intrinsic::aarch64_neon_pmul:
6852 case Intrinsic::aarch64_neon_pmull:
6853 case Intrinsic::aarch64_neon_smull:
6854 case Intrinsic::aarch64_neon_pmull64:
6855 case Intrinsic::aarch64_neon_umull: {
6856 handleNEONVectorMultiplyIntrinsic(I);
6857 break;
6858 }
6859
6860 case Intrinsic::aarch64_neon_smmla:
6861 case Intrinsic::aarch64_neon_ummla:
6862 case Intrinsic::aarch64_neon_usmmla:
6863 handleNEONMatrixMultiply(I, /*ARows=*/2, /*ACols=*/8, /*BRows=*/8,
6864 /*BCols=*/2);
6865 break;
6866
6867 // <2 x i32> @llvm.aarch64.neon.[us]dot.v2i32.v8i8
6868 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
6869 // <4 x i32> @llvm.aarch64.neon.[us]dot.v4i32.v16i8
6870 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
6871 case Intrinsic::aarch64_neon_sdot:
6872 case Intrinsic::aarch64_neon_udot:
6873 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6874 /*ZeroPurifies=*/true);
6875 break;
6876
6877 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
6878 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
6879 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
6880 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
6881 case Intrinsic::aarch64_neon_bfdot:
6882 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6883 /*ZeroPurifies=*/false);
6884 break;
6885
6886 default:
6887 return false;
6888 }
6889
6890 return true;
6891 }
6892
6893 void visitIntrinsicInst(IntrinsicInst &I) {
6894 if (maybeHandleCrossPlatformIntrinsic(I))
6895 return;
6896
6897 if (maybeHandleX86SIMDIntrinsic(I))
6898 return;
6899
6900 if (maybeHandleArmSIMDIntrinsic(I))
6901 return;
6902
6903 if (maybeHandleUnknownIntrinsic(I))
6904 return;
6905
6906 visitInstruction(I);
6907 }
6908
6909 void visitLibAtomicLoad(CallBase &CB) {
6910 // Since we use getNextNode here, we can't have CB terminate the BB.
6911 assert(isa<CallInst>(CB));
6912
6913 IRBuilder<> IRB(&CB);
6914 Value *Size = CB.getArgOperand(0);
6915 Value *SrcPtr = CB.getArgOperand(1);
6916 Value *DstPtr = CB.getArgOperand(2);
6917 Value *Ordering = CB.getArgOperand(3);
6918 // Convert the call to have at least Acquire ordering to make sure
6919 // the shadow operations aren't reordered before it.
6920 Value *NewOrdering =
6921 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6922 CB.setArgOperand(3, NewOrdering);
6923
6924 NextNodeIRBuilder NextIRB(&CB);
6925 Value *SrcShadowPtr, *SrcOriginPtr;
6926 std::tie(SrcShadowPtr, SrcOriginPtr) =
6927 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6928 /*isStore*/ false);
6929 Value *DstShadowPtr =
6930 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6931 /*isStore*/ true)
6932 .first;
6933
6934 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6935 if (MS.TrackOrigins) {
6936 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6938 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6939 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6940 }
6941 }
6942
6943 void visitLibAtomicStore(CallBase &CB) {
6944 IRBuilder<> IRB(&CB);
6945 Value *Size = CB.getArgOperand(0);
6946 Value *DstPtr = CB.getArgOperand(2);
6947 Value *Ordering = CB.getArgOperand(3);
6948 // Convert the call to have at least Release ordering to make sure
6949 // the shadow operations aren't reordered after it.
6950 Value *NewOrdering =
6951 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6952 CB.setArgOperand(3, NewOrdering);
6953
6954 Value *DstShadowPtr =
6955 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6956 /*isStore*/ true)
6957 .first;
6958
6959 // Atomic store always paints clean shadow/origin. See file header.
6960 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6961 Align(1));
6962 }
6963
6964 void visitCallBase(CallBase &CB) {
6965 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6966 if (CB.isInlineAsm()) {
6967 // For inline asm (either a call to asm function, or callbr instruction),
6968 // do the usual thing: check argument shadow and mark all outputs as
6969 // clean. Note that any side effects of the inline asm that are not
6970 // immediately visible in its constraints are not handled.
6972 visitAsmInstruction(CB);
6973 else
6974 visitInstruction(CB);
6975 return;
6976 }
6977 LibFunc LF;
6978 if (TLI->getLibFunc(CB, LF)) {
6979 // libatomic.a functions need to have special handling because there isn't
6980 // a good way to intercept them or compile the library with
6981 // instrumentation.
6982 switch (LF) {
6983 case LibFunc_atomic_load:
6984 if (!isa<CallInst>(CB)) {
6985 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6986 "Ignoring!\n";
6987 break;
6988 }
6989 visitLibAtomicLoad(CB);
6990 return;
6991 case LibFunc_atomic_store:
6992 visitLibAtomicStore(CB);
6993 return;
6994 default:
6995 break;
6996 }
6997 }
6998
6999 if (auto *Call = dyn_cast<CallInst>(&CB)) {
7000 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7001
7002 // We are going to insert code that relies on the fact that the callee
7003 // will become a non-readonly function after it is instrumented by us. To
7004 // prevent this code from being optimized out, mark that function
7005 // non-readonly in advance.
7006 // TODO: We can likely do better than dropping memory() completely here.
7007 AttributeMask B;
7008 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
7009
7011 if (Function *Func = Call->getCalledFunction()) {
7012 Func->removeFnAttrs(B);
7013 }
7014
7016 }
7017 IRBuilder<> IRB(&CB);
7018 bool MayCheckCall = MS.EagerChecks;
7019 if (Function *Func = CB.getCalledFunction()) {
7020 // __sanitizer_unaligned_{load,store} functions may be called by users
7021 // and always expects shadows in the TLS. So don't check them.
7022 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
7023 }
7024
7025 unsigned ArgOffset = 0;
7026 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7027 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
7028 if (!A->getType()->isSized()) {
7029 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7030 continue;
7031 }
7032
7033 if (A->getType()->isScalableTy()) {
7034 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7035 // Handle as noundef, but don't reserve tls slots.
7036 insertCheckShadowOf(A, &CB);
7037 continue;
7038 }
7039
7040 unsigned Size = 0;
7041 const DataLayout &DL = F.getDataLayout();
7042
7043 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
7044 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
7045 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7046
7047 if (EagerCheck) {
7048 insertCheckShadowOf(A, &CB);
7049 Size = DL.getTypeAllocSize(A->getType());
7050 } else {
7051 [[maybe_unused]] Value *Store = nullptr;
7052 // Compute the Shadow for arg even if it is ByVal, because
7053 // in that case getShadow() will copy the actual arg shadow to
7054 // __msan_param_tls.
7055 Value *ArgShadow = getShadow(A);
7056 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7057 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7058 << " Shadow: " << *ArgShadow << "\n");
7059 if (ByVal) {
7060 // ByVal requires some special handling as it's too big for a single
7061 // load
7062 assert(A->getType()->isPointerTy() &&
7063 "ByVal argument is not a pointer!");
7064 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
7065 if (ArgOffset + Size > kParamTLSSize)
7066 break;
7067 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
7068 MaybeAlign Alignment = std::nullopt;
7069 if (ParamAlignment)
7070 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
7071 Value *AShadowPtr, *AOriginPtr;
7072 std::tie(AShadowPtr, AOriginPtr) =
7073 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
7074 /*isStore*/ false);
7075 if (!PropagateShadow) {
7076 Store = IRB.CreateMemSet(ArgShadowBase,
7078 Size, Alignment);
7079 } else {
7080 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
7081 Alignment, Size);
7082 if (MS.TrackOrigins) {
7083 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7084 // FIXME: OriginSize should be:
7085 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7086 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
7087 IRB.CreateMemCpy(
7088 ArgOriginBase,
7089 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
7090 AOriginPtr,
7091 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
7092 }
7093 }
7094 } else {
7095 // Any other parameters mean we need bit-grained tracking of uninit
7096 // data
7097 Size = DL.getTypeAllocSize(A->getType());
7098 if (ArgOffset + Size > kParamTLSSize)
7099 break;
7100 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
7102 Constant *Cst = dyn_cast<Constant>(ArgShadow);
7103 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7104 IRB.CreateStore(getOrigin(A),
7105 getOriginPtrForArgument(IRB, ArgOffset));
7106 }
7107 }
7108 assert(Store != nullptr);
7109 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7110 }
7111 assert(Size != 0);
7112 ArgOffset += alignTo(Size, kShadowTLSAlignment);
7113 }
7114 LLVM_DEBUG(dbgs() << " done with call args\n");
7115
7116 FunctionType *FT = CB.getFunctionType();
7117 if (FT->isVarArg()) {
7118 VAHelper->visitCallBase(CB, IRB);
7119 }
7120
7121 // Now, get the shadow for the RetVal.
7122 if (!CB.getType()->isSized())
7123 return;
7124 // Don't emit the epilogue for musttail call returns.
7125 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
7126 return;
7127
7128 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
7129 setShadow(&CB, getCleanShadow(&CB));
7130 setOrigin(&CB, getCleanOrigin());
7131 return;
7132 }
7133
7134 IRBuilder<> IRBBefore(&CB);
7135 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7136 Value *Base = getShadowPtrForRetval(IRBBefore);
7137 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
7139 BasicBlock::iterator NextInsn;
7140 if (isa<CallInst>(CB)) {
7141 NextInsn = ++CB.getIterator();
7142 assert(NextInsn != CB.getParent()->end());
7143 } else {
7144 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
7145 if (!NormalDest->getSinglePredecessor()) {
7146 // FIXME: this case is tricky, so we are just conservative here.
7147 // Perhaps we need to split the edge between this BB and NormalDest,
7148 // but a naive attempt to use SplitEdge leads to a crash.
7149 setShadow(&CB, getCleanShadow(&CB));
7150 setOrigin(&CB, getCleanOrigin());
7151 return;
7152 }
7153 // FIXME: NextInsn is likely in a basic block that has not been visited
7154 // yet. Anything inserted there will be instrumented by MSan later!
7155 NextInsn = NormalDest->getFirstInsertionPt();
7156 assert(NextInsn != NormalDest->end() &&
7157 "Could not find insertion point for retval shadow load");
7158 }
7159 IRBuilder<> IRBAfter(&*NextInsn);
7160 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7161 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7162 "_msret");
7163 setShadow(&CB, RetvalShadow);
7164 if (MS.TrackOrigins)
7165 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7166 }
7167
7168 bool isAMustTailRetVal(Value *RetVal) {
7169 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7170 RetVal = I->getOperand(0);
7171 }
7172 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7173 return I->isMustTailCall();
7174 }
7175 return false;
7176 }
7177
7178 void visitReturnInst(ReturnInst &I) {
7179 IRBuilder<> IRB(&I);
7180 Value *RetVal = I.getReturnValue();
7181 if (!RetVal)
7182 return;
7183 // Don't emit the epilogue for musttail call returns.
7184 if (isAMustTailRetVal(RetVal))
7185 return;
7186 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7187 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7188 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7189 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7190 // must always return fully initialized values. For now, we hardcode "main".
7191 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7192
7193 Value *Shadow = getShadow(RetVal);
7194 bool StoreOrigin = true;
7195 if (EagerCheck) {
7196 insertCheckShadowOf(RetVal, &I);
7197 Shadow = getCleanShadow(RetVal);
7198 StoreOrigin = false;
7199 }
7200
7201 // The caller may still expect information passed over TLS if we pass our
7202 // check
7203 if (StoreShadow) {
7204 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7205 if (MS.TrackOrigins && StoreOrigin)
7206 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7207 }
7208 }
7209
7210 void visitPHINode(PHINode &I) {
7211 IRBuilder<> IRB(&I);
7212 if (!PropagateShadow) {
7213 setShadow(&I, getCleanShadow(&I));
7214 setOrigin(&I, getCleanOrigin());
7215 return;
7216 }
7217
7218 ShadowPHINodes.push_back(&I);
7219 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7220 "_msphi_s"));
7221 if (MS.TrackOrigins)
7222 setOrigin(
7223 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7224 }
7225
7226 Value *getLocalVarIdptr(AllocaInst &I) {
7227 ConstantInt *IntConst =
7228 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7229 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7230 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7231 IntConst);
7232 }
7233
7234 Value *getLocalVarDescription(AllocaInst &I) {
7235 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7236 }
7237
7238 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7239 if (PoisonStack && ClPoisonStackWithCall) {
7240 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7241 } else {
7242 Value *ShadowBase, *OriginBase;
7243 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7244 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7245
7246 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7247 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7248 }
7249
7250 if (PoisonStack && MS.TrackOrigins) {
7251 Value *Idptr = getLocalVarIdptr(I);
7252 if (ClPrintStackNames) {
7253 Value *Descr = getLocalVarDescription(I);
7254 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7255 {&I, Len, Idptr, Descr});
7256 } else {
7257 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7258 }
7259 }
7260 }
7261
7262 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7263 Value *Descr = getLocalVarDescription(I);
7264 if (PoisonStack) {
7265 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7266 } else {
7267 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7268 }
7269 }
7270
7271 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7272 if (!InsPoint)
7273 InsPoint = &I;
7274 NextNodeIRBuilder IRB(InsPoint);
7275 const DataLayout &DL = F.getDataLayout();
7276 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
7277 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
7278 if (I.isArrayAllocation())
7279 Len = IRB.CreateMul(Len,
7280 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
7281
7282 if (MS.CompileKernel)
7283 poisonAllocaKmsan(I, IRB, Len);
7284 else
7285 poisonAllocaUserspace(I, IRB, Len);
7286 }
7287
7288 void visitAllocaInst(AllocaInst &I) {
7289 setShadow(&I, getCleanShadow(&I));
7290 setOrigin(&I, getCleanOrigin());
7291 // We'll get to this alloca later unless it's poisoned at the corresponding
7292 // llvm.lifetime.start.
7293 AllocaSet.insert(&I);
7294 }
7295
7296 void visitSelectInst(SelectInst &I) {
7297 // a = select b, c, d
7298 Value *B = I.getCondition();
7299 Value *C = I.getTrueValue();
7300 Value *D = I.getFalseValue();
7301
7302 handleSelectLikeInst(I, B, C, D);
7303 }
7304
7305 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7306 IRBuilder<> IRB(&I);
7307
7308 Value *Sb = getShadow(B);
7309 Value *Sc = getShadow(C);
7310 Value *Sd = getShadow(D);
7311
7312 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7313 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7314 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7315
7316 // Result shadow if condition shadow is 0.
7317 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7318 Value *Sa1;
7319 if (I.getType()->isAggregateType()) {
7320 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7321 // an extra "select". This results in much more compact IR.
7322 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7323 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7324 } else if (isScalableNonVectorType(I.getType())) {
7325 // This is intended to handle target("aarch64.svcount"), which can't be
7326 // handled in the else branch because of incompatibility with CreateXor
7327 // ("The supported LLVM operations on this type are limited to load,
7328 // store, phi, select and alloca instructions").
7329
7330 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7331 // branch as needed instead.
7332 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7333 } else {
7334 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7335 // If Sb (condition is poisoned), look for bits in c and d that are equal
7336 // and both unpoisoned.
7337 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7338
7339 // Cast arguments to shadow-compatible type.
7340 C = CreateAppToShadowCast(IRB, C);
7341 D = CreateAppToShadowCast(IRB, D);
7342
7343 // Result shadow if condition shadow is 1.
7344 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7345 }
7346 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7347 setShadow(&I, Sa);
7348 if (MS.TrackOrigins) {
7349 // Origins are always i32, so any vector conditions must be flattened.
7350 // FIXME: consider tracking vector origins for app vectors?
7351 if (B->getType()->isVectorTy()) {
7352 B = convertToBool(B, IRB);
7353 Sb = convertToBool(Sb, IRB);
7354 }
7355 // a = select b, c, d
7356 // Oa = Sb ? Ob : (b ? Oc : Od)
7357 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7358 }
7359 }
7360
7361 void visitLandingPadInst(LandingPadInst &I) {
7362 // Do nothing.
7363 // See https://github.com/google/sanitizers/issues/504
7364 setShadow(&I, getCleanShadow(&I));
7365 setOrigin(&I, getCleanOrigin());
7366 }
7367
7368 void visitCatchSwitchInst(CatchSwitchInst &I) {
7369 setShadow(&I, getCleanShadow(&I));
7370 setOrigin(&I, getCleanOrigin());
7371 }
7372
7373 void visitFuncletPadInst(FuncletPadInst &I) {
7374 setShadow(&I, getCleanShadow(&I));
7375 setOrigin(&I, getCleanOrigin());
7376 }
7377
7378 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7379
7380 void visitExtractValueInst(ExtractValueInst &I) {
7381 IRBuilder<> IRB(&I);
7382 Value *Agg = I.getAggregateOperand();
7383 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7384 Value *AggShadow = getShadow(Agg);
7385 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7386 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7387 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7388 setShadow(&I, ResShadow);
7389 setOriginForNaryOp(I);
7390 }
7391
7392 void visitInsertValueInst(InsertValueInst &I) {
7393 IRBuilder<> IRB(&I);
7394 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7395 Value *AggShadow = getShadow(I.getAggregateOperand());
7396 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7397 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7398 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7399 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7400 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7401 setShadow(&I, Res);
7402 setOriginForNaryOp(I);
7403 }
7404
7405 void dumpInst(Instruction &I) {
7406 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7407 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7408 } else {
7409 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7410 }
7411 errs() << "QQQ " << I << "\n";
7412 }
7413
7414 void visitResumeInst(ResumeInst &I) {
7415 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7416 // Nothing to do here.
7417 }
7418
7419 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7420 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7421 // Nothing to do here.
7422 }
7423
7424 void visitCatchReturnInst(CatchReturnInst &CRI) {
7425 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7426 // Nothing to do here.
7427 }
7428
7429 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7430 IRBuilder<> &IRB, const DataLayout &DL,
7431 bool isOutput) {
7432 // For each assembly argument, we check its value for being initialized.
7433 // If the argument is a pointer, we assume it points to a single element
7434 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7435 // Each such pointer is instrumented with a call to the runtime library.
7436 Type *OpType = Operand->getType();
7437 // Check the operand value itself.
7438 insertCheckShadowOf(Operand, &I);
7439 if (!OpType->isPointerTy() || !isOutput) {
7440 assert(!isOutput);
7441 return;
7442 }
7443 if (!ElemTy->isSized())
7444 return;
7445 auto Size = DL.getTypeStoreSize(ElemTy);
7446 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7447 if (MS.CompileKernel) {
7448 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7449 } else {
7450 // ElemTy, derived from elementtype(), does not encode the alignment of
7451 // the pointer. Conservatively assume that the shadow memory is unaligned.
7452 // When Size is large, avoid StoreInst as it would expand to many
7453 // instructions.
7454 auto [ShadowPtr, _] =
7455 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7456 if (Size <= 32)
7457 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7458 else
7459 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7460 SizeVal, Align(1));
7461 }
7462 }
7463
7464 /// Get the number of output arguments returned by pointers.
7465 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7466 int NumRetOutputs = 0;
7467 int NumOutputs = 0;
7468 Type *RetTy = cast<Value>(CB)->getType();
7469 if (!RetTy->isVoidTy()) {
7470 // Register outputs are returned via the CallInst return value.
7471 auto *ST = dyn_cast<StructType>(RetTy);
7472 if (ST)
7473 NumRetOutputs = ST->getNumElements();
7474 else
7475 NumRetOutputs = 1;
7476 }
7477 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7478 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7479 switch (Info.Type) {
7481 NumOutputs++;
7482 break;
7483 default:
7484 break;
7485 }
7486 }
7487 return NumOutputs - NumRetOutputs;
7488 }
7489
7490 void visitAsmInstruction(Instruction &I) {
7491 // Conservative inline assembly handling: check for poisoned shadow of
7492 // asm() arguments, then unpoison the result and all the memory locations
7493 // pointed to by those arguments.
7494 // An inline asm() statement in C++ contains lists of input and output
7495 // arguments used by the assembly code. These are mapped to operands of the
7496 // CallInst as follows:
7497 // - nR register outputs ("=r) are returned by value in a single structure
7498 // (SSA value of the CallInst);
7499 // - nO other outputs ("=m" and others) are returned by pointer as first
7500 // nO operands of the CallInst;
7501 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7502 // remaining nI operands.
7503 // The total number of asm() arguments in the source is nR+nO+nI, and the
7504 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7505 // function to be called).
7506 const DataLayout &DL = F.getDataLayout();
7507 CallBase *CB = cast<CallBase>(&I);
7508 IRBuilder<> IRB(&I);
7509 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7510 int OutputArgs = getNumOutputArgs(IA, CB);
7511 // The last operand of a CallInst is the function itself.
7512 int NumOperands = CB->getNumOperands() - 1;
7513
7514 // Check input arguments. Doing so before unpoisoning output arguments, so
7515 // that we won't overwrite uninit values before checking them.
7516 for (int i = OutputArgs; i < NumOperands; i++) {
7517 Value *Operand = CB->getOperand(i);
7518 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7519 /*isOutput*/ false);
7520 }
7521 // Unpoison output arguments. This must happen before the actual InlineAsm
7522 // call, so that the shadow for memory published in the asm() statement
7523 // remains valid.
7524 for (int i = 0; i < OutputArgs; i++) {
7525 Value *Operand = CB->getOperand(i);
7526 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7527 /*isOutput*/ true);
7528 }
7529
7530 setShadow(&I, getCleanShadow(&I));
7531 setOrigin(&I, getCleanOrigin());
7532 }
7533
7534 void visitFreezeInst(FreezeInst &I) {
7535 // Freeze always returns a fully defined value.
7536 setShadow(&I, getCleanShadow(&I));
7537 setOrigin(&I, getCleanOrigin());
7538 }
7539
7540 void visitInstruction(Instruction &I) {
7541 // Everything else: stop propagating and check for poisoned shadow.
7543 dumpInst(I);
7544 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7545 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7546 Value *Operand = I.getOperand(i);
7547 if (Operand->getType()->isSized())
7548 insertCheckShadowOf(Operand, &I);
7549 }
7550 setShadow(&I, getCleanShadow(&I));
7551 setOrigin(&I, getCleanOrigin());
7552 }
7553};
7554
7555struct VarArgHelperBase : public VarArgHelper {
7556 Function &F;
7557 MemorySanitizer &MS;
7558 MemorySanitizerVisitor &MSV;
7559 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7560 const unsigned VAListTagSize;
7561
7562 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7563 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7564 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7565
7566 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7567 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7568 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7569 }
7570
7571 /// Compute the shadow address for a given va_arg.
7572 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7573 return IRB.CreatePtrAdd(
7574 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7575 }
7576
7577 /// Compute the shadow address for a given va_arg.
7578 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7579 unsigned ArgSize) {
7580 // Make sure we don't overflow __msan_va_arg_tls.
7581 if (ArgOffset + ArgSize > kParamTLSSize)
7582 return nullptr;
7583 return getShadowPtrForVAArgument(IRB, ArgOffset);
7584 }
7585
7586 /// Compute the origin address for a given va_arg.
7587 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7588 // getOriginPtrForVAArgument() is always called after
7589 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7590 // overflow.
7591 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7592 ConstantInt::get(MS.IntptrTy, ArgOffset),
7593 "_msarg_va_o");
7594 }
7595
7596 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7597 unsigned BaseOffset) {
7598 // The tails of __msan_va_arg_tls is not large enough to fit full
7599 // value shadow, but it will be copied to backup anyway. Make it
7600 // clean.
7601 if (BaseOffset >= kParamTLSSize)
7602 return;
7603 Value *TailSize =
7604 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7605 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7606 TailSize, Align(8));
7607 }
7608
7609 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7610 IRBuilder<> IRB(&I);
7611 Value *VAListTag = I.getArgOperand(0);
7612 const Align Alignment = Align(8);
7613 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7614 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7615 // Unpoison the whole __va_list_tag.
7616 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7617 VAListTagSize, Alignment, false);
7618 }
7619
7620 void visitVAStartInst(VAStartInst &I) override {
7621 if (F.getCallingConv() == CallingConv::Win64)
7622 return;
7623 VAStartInstrumentationList.push_back(&I);
7624 unpoisonVAListTagForInst(I);
7625 }
7626
7627 void visitVACopyInst(VACopyInst &I) override {
7628 if (F.getCallingConv() == CallingConv::Win64)
7629 return;
7630 unpoisonVAListTagForInst(I);
7631 }
7632};
7633
7634/// AMD64-specific implementation of VarArgHelper.
7635struct VarArgAMD64Helper : public VarArgHelperBase {
7636 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7637 // See a comment in visitCallBase for more details.
7638 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7639 static const unsigned AMD64FpEndOffsetSSE = 176;
7640 // If SSE is disabled, fp_offset in va_list is zero.
7641 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7642
7643 unsigned AMD64FpEndOffset;
7644 AllocaInst *VAArgTLSCopy = nullptr;
7645 AllocaInst *VAArgTLSOriginCopy = nullptr;
7646 Value *VAArgOverflowSize = nullptr;
7647
7648 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7649
7650 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7651 MemorySanitizerVisitor &MSV)
7652 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7653 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7654 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7655 if (Attr.isStringAttribute() &&
7656 (Attr.getKindAsString() == "target-features")) {
7657 if (Attr.getValueAsString().contains("-sse"))
7658 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7659 break;
7660 }
7661 }
7662 }
7663
7664 ArgKind classifyArgument(Value *arg) {
7665 // A very rough approximation of X86_64 argument classification rules.
7666 Type *T = arg->getType();
7667 if (T->isX86_FP80Ty())
7668 return AK_Memory;
7669 if (T->isFPOrFPVectorTy())
7670 return AK_FloatingPoint;
7671 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7672 return AK_GeneralPurpose;
7673 if (T->isPointerTy())
7674 return AK_GeneralPurpose;
7675 return AK_Memory;
7676 }
7677
7678 // For VarArg functions, store the argument shadow in an ABI-specific format
7679 // that corresponds to va_list layout.
7680 // We do this because Clang lowers va_arg in the frontend, and this pass
7681 // only sees the low level code that deals with va_list internals.
7682 // A much easier alternative (provided that Clang emits va_arg instructions)
7683 // would have been to associate each live instance of va_list with a copy of
7684 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7685 // order.
7686 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7687 unsigned GpOffset = 0;
7688 unsigned FpOffset = AMD64GpEndOffset;
7689 unsigned OverflowOffset = AMD64FpEndOffset;
7690 const DataLayout &DL = F.getDataLayout();
7691
7692 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7693 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7694 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7695 if (IsByVal) {
7696 // ByVal arguments always go to the overflow area.
7697 // Fixed arguments passed through the overflow area will be stepped
7698 // over by va_start, so don't count them towards the offset.
7699 if (IsFixed)
7700 continue;
7701 assert(A->getType()->isPointerTy());
7702 Type *RealTy = CB.getParamByValType(ArgNo);
7703 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7704 uint64_t AlignedSize = alignTo(ArgSize, 8);
7705 unsigned BaseOffset = OverflowOffset;
7706 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7707 Value *OriginBase = nullptr;
7708 if (MS.TrackOrigins)
7709 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7710 OverflowOffset += AlignedSize;
7711
7712 if (OverflowOffset > kParamTLSSize) {
7713 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7714 continue; // We have no space to copy shadow there.
7715 }
7716
7717 Value *ShadowPtr, *OriginPtr;
7718 std::tie(ShadowPtr, OriginPtr) =
7719 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7720 /*isStore*/ false);
7721 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7722 kShadowTLSAlignment, ArgSize);
7723 if (MS.TrackOrigins)
7724 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7725 kShadowTLSAlignment, ArgSize);
7726 } else {
7727 ArgKind AK = classifyArgument(A);
7728 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7729 AK = AK_Memory;
7730 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7731 AK = AK_Memory;
7732 Value *ShadowBase, *OriginBase = nullptr;
7733 switch (AK) {
7734 case AK_GeneralPurpose:
7735 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7736 if (MS.TrackOrigins)
7737 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7738 GpOffset += 8;
7739 assert(GpOffset <= kParamTLSSize);
7740 break;
7741 case AK_FloatingPoint:
7742 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7743 if (MS.TrackOrigins)
7744 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7745 FpOffset += 16;
7746 assert(FpOffset <= kParamTLSSize);
7747 break;
7748 case AK_Memory:
7749 if (IsFixed)
7750 continue;
7751 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7752 uint64_t AlignedSize = alignTo(ArgSize, 8);
7753 unsigned BaseOffset = OverflowOffset;
7754 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7755 if (MS.TrackOrigins) {
7756 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7757 }
7758 OverflowOffset += AlignedSize;
7759 if (OverflowOffset > kParamTLSSize) {
7760 // We have no space to copy shadow there.
7761 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7762 continue;
7763 }
7764 }
7765 // Take fixed arguments into account for GpOffset and FpOffset,
7766 // but don't actually store shadows for them.
7767 // TODO(glider): don't call get*PtrForVAArgument() for them.
7768 if (IsFixed)
7769 continue;
7770 Value *Shadow = MSV.getShadow(A);
7771 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7772 if (MS.TrackOrigins) {
7773 Value *Origin = MSV.getOrigin(A);
7774 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7775 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7777 }
7778 }
7779 }
7780 Constant *OverflowSize =
7781 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7782 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7783 }
7784
7785 void finalizeInstrumentation() override {
7786 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7787 "finalizeInstrumentation called twice");
7788 if (!VAStartInstrumentationList.empty()) {
7789 // If there is a va_start in this function, make a backup copy of
7790 // va_arg_tls somewhere in the function entry block.
7791 IRBuilder<> IRB(MSV.FnPrologueEnd);
7792 VAArgOverflowSize =
7793 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7794 Value *CopySize = IRB.CreateAdd(
7795 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7796 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7797 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7798 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7799 CopySize, kShadowTLSAlignment, false);
7800
7801 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7802 Intrinsic::umin, CopySize,
7803 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7804 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7805 kShadowTLSAlignment, SrcSize);
7806 if (MS.TrackOrigins) {
7807 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7808 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7809 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7810 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7811 }
7812 }
7813
7814 // Instrument va_start.
7815 // Copy va_list shadow from the backup copy of the TLS contents.
7816 for (CallInst *OrigInst : VAStartInstrumentationList) {
7817 NextNodeIRBuilder IRB(OrigInst);
7818 Value *VAListTag = OrigInst->getArgOperand(0);
7819
7820 Value *RegSaveAreaPtrPtr =
7821 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7822 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7823 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7824 const Align Alignment = Align(16);
7825 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7826 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7827 Alignment, /*isStore*/ true);
7828 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7829 AMD64FpEndOffset);
7830 if (MS.TrackOrigins)
7831 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7832 Alignment, AMD64FpEndOffset);
7833 Value *OverflowArgAreaPtrPtr =
7834 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7835 Value *OverflowArgAreaPtr =
7836 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7837 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7838 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7839 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7840 Alignment, /*isStore*/ true);
7841 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7842 AMD64FpEndOffset);
7843 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7844 VAArgOverflowSize);
7845 if (MS.TrackOrigins) {
7846 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7847 AMD64FpEndOffset);
7848 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7849 VAArgOverflowSize);
7850 }
7851 }
7852 }
7853};
7854
7855/// AArch64-specific implementation of VarArgHelper.
7856struct VarArgAArch64Helper : public VarArgHelperBase {
7857 static const unsigned kAArch64GrArgSize = 64;
7858 static const unsigned kAArch64VrArgSize = 128;
7859
7860 static const unsigned AArch64GrBegOffset = 0;
7861 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7862 // Make VR space aligned to 16 bytes.
7863 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7864 static const unsigned AArch64VrEndOffset =
7865 AArch64VrBegOffset + kAArch64VrArgSize;
7866 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7867
7868 AllocaInst *VAArgTLSCopy = nullptr;
7869 Value *VAArgOverflowSize = nullptr;
7870
7871 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7872
7873 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7874 MemorySanitizerVisitor &MSV)
7875 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7876
7877 // A very rough approximation of aarch64 argument classification rules.
7878 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7879 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7880 return {AK_GeneralPurpose, 1};
7881 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7882 return {AK_FloatingPoint, 1};
7883
7884 if (T->isArrayTy()) {
7885 auto R = classifyArgument(T->getArrayElementType());
7886 R.second *= T->getScalarType()->getArrayNumElements();
7887 return R;
7888 }
7889
7890 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7891 auto R = classifyArgument(FV->getScalarType());
7892 R.second *= FV->getNumElements();
7893 return R;
7894 }
7895
7896 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7897 return {AK_Memory, 0};
7898 }
7899
7900 // The instrumentation stores the argument shadow in a non ABI-specific
7901 // format because it does not know which argument is named (since Clang,
7902 // like x86_64 case, lowers the va_args in the frontend and this pass only
7903 // sees the low level code that deals with va_list internals).
7904 // The first seven GR registers are saved in the first 56 bytes of the
7905 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7906 // the remaining arguments.
7907 // Using constant offset within the va_arg TLS array allows fast copy
7908 // in the finalize instrumentation.
7909 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7910 unsigned GrOffset = AArch64GrBegOffset;
7911 unsigned VrOffset = AArch64VrBegOffset;
7912 unsigned OverflowOffset = AArch64VAEndOffset;
7913
7914 const DataLayout &DL = F.getDataLayout();
7915 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7916 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7917 auto [AK, RegNum] = classifyArgument(A->getType());
7918 if (AK == AK_GeneralPurpose &&
7919 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7920 AK = AK_Memory;
7921 if (AK == AK_FloatingPoint &&
7922 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7923 AK = AK_Memory;
7924 Value *Base;
7925 switch (AK) {
7926 case AK_GeneralPurpose:
7927 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7928 GrOffset += 8 * RegNum;
7929 break;
7930 case AK_FloatingPoint:
7931 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7932 VrOffset += 16 * RegNum;
7933 break;
7934 case AK_Memory:
7935 // Don't count fixed arguments in the overflow area - va_start will
7936 // skip right over them.
7937 if (IsFixed)
7938 continue;
7939 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7940 uint64_t AlignedSize = alignTo(ArgSize, 8);
7941 unsigned BaseOffset = OverflowOffset;
7942 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7943 OverflowOffset += AlignedSize;
7944 if (OverflowOffset > kParamTLSSize) {
7945 // We have no space to copy shadow there.
7946 CleanUnusedTLS(IRB, Base, BaseOffset);
7947 continue;
7948 }
7949 break;
7950 }
7951 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7952 // bother to actually store a shadow.
7953 if (IsFixed)
7954 continue;
7955 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7956 }
7957 Constant *OverflowSize =
7958 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7959 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7960 }
7961
7962 // Retrieve a va_list field of 'void*' size.
7963 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7964 Value *SaveAreaPtrPtr =
7965 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7966 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7967 }
7968
7969 // Retrieve a va_list field of 'int' size.
7970 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7971 Value *SaveAreaPtr =
7972 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7973 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7974 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7975 }
7976
7977 void finalizeInstrumentation() override {
7978 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7979 "finalizeInstrumentation called twice");
7980 if (!VAStartInstrumentationList.empty()) {
7981 // If there is a va_start in this function, make a backup copy of
7982 // va_arg_tls somewhere in the function entry block.
7983 IRBuilder<> IRB(MSV.FnPrologueEnd);
7984 VAArgOverflowSize =
7985 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7986 Value *CopySize = IRB.CreateAdd(
7987 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7988 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7989 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7990 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7991 CopySize, kShadowTLSAlignment, false);
7992
7993 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7994 Intrinsic::umin, CopySize,
7995 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7996 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7997 kShadowTLSAlignment, SrcSize);
7998 }
7999
8000 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
8001 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
8002
8003 // Instrument va_start, copy va_list shadow from the backup copy of
8004 // the TLS contents.
8005 for (CallInst *OrigInst : VAStartInstrumentationList) {
8006 NextNodeIRBuilder IRB(OrigInst);
8007
8008 Value *VAListTag = OrigInst->getArgOperand(0);
8009
8010 // The variadic ABI for AArch64 creates two areas to save the incoming
8011 // argument registers (one for 64-bit general register xn-x7 and another
8012 // for 128-bit FP/SIMD vn-v7).
8013 // We need then to propagate the shadow arguments on both regions
8014 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8015 // The remaining arguments are saved on shadow for 'va::stack'.
8016 // One caveat is it requires only to propagate the non-named arguments,
8017 // however on the call site instrumentation 'all' the arguments are
8018 // saved. So to copy the shadow values from the va_arg TLS array
8019 // we need to adjust the offset for both GR and VR fields based on
8020 // the __{gr,vr}_offs value (since they are stores based on incoming
8021 // named arguments).
8022 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8023
8024 // Read the stack pointer from the va_list.
8025 Value *StackSaveAreaPtr =
8026 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
8027
8028 // Read both the __gr_top and __gr_off and add them up.
8029 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
8030 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
8031
8032 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8033 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
8034
8035 // Read both the __vr_top and __vr_off and add them up.
8036 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
8037 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
8038
8039 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8040 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
8041
8042 // It does not know how many named arguments is being used and, on the
8043 // callsite all the arguments were saved. Since __gr_off is defined as
8044 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8045 // argument by ignoring the bytes of shadow from named arguments.
8046 Value *GrRegSaveAreaShadowPtrOff =
8047 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
8048
8049 Value *GrRegSaveAreaShadowPtr =
8050 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8051 Align(8), /*isStore*/ true)
8052 .first;
8053
8054 Value *GrSrcPtr =
8055 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
8056 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
8057
8058 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
8059 GrCopySize);
8060
8061 // Again, but for FP/SIMD values.
8062 Value *VrRegSaveAreaShadowPtrOff =
8063 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
8064
8065 Value *VrRegSaveAreaShadowPtr =
8066 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8067 Align(8), /*isStore*/ true)
8068 .first;
8069
8070 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8071 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
8072 IRB.getInt32(AArch64VrBegOffset)),
8073 VrRegSaveAreaShadowPtrOff);
8074 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
8075
8076 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
8077 VrCopySize);
8078
8079 // And finally for remaining arguments.
8080 Value *StackSaveAreaShadowPtr =
8081 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
8082 Align(16), /*isStore*/ true)
8083 .first;
8084
8085 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8086 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
8087
8088 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
8089 Align(16), VAArgOverflowSize);
8090 }
8091 }
8092};
8093
8094/// PowerPC64-specific implementation of VarArgHelper.
8095struct VarArgPowerPC64Helper : public VarArgHelperBase {
8096 AllocaInst *VAArgTLSCopy = nullptr;
8097 Value *VAArgSize = nullptr;
8098
8099 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8100 MemorySanitizerVisitor &MSV)
8101 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8102
8103 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8104 // For PowerPC, we need to deal with alignment of stack arguments -
8105 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8106 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8107 // For that reason, we compute current offset from stack pointer (which is
8108 // always properly aligned), and offset for the first vararg, then subtract
8109 // them.
8110 unsigned VAArgBase;
8111 Triple TargetTriple(F.getParent()->getTargetTriple());
8112 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8113 // and 32 bytes for ABIv2. This is usually determined by target
8114 // endianness, but in theory could be overridden by function attribute.
8115 if (TargetTriple.isPPC64ELFv2ABI())
8116 VAArgBase = 32;
8117 else
8118 VAArgBase = 48;
8119 unsigned VAArgOffset = VAArgBase;
8120 const DataLayout &DL = F.getDataLayout();
8121 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8122 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8123 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8124 if (IsByVal) {
8125 assert(A->getType()->isPointerTy());
8126 Type *RealTy = CB.getParamByValType(ArgNo);
8127 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8128 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
8129 if (ArgAlign < 8)
8130 ArgAlign = Align(8);
8131 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8132 if (!IsFixed) {
8133 Value *Base =
8134 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8135 if (Base) {
8136 Value *AShadowPtr, *AOriginPtr;
8137 std::tie(AShadowPtr, AOriginPtr) =
8138 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8139 kShadowTLSAlignment, /*isStore*/ false);
8140
8141 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8142 kShadowTLSAlignment, ArgSize);
8143 }
8144 }
8145 VAArgOffset += alignTo(ArgSize, Align(8));
8146 } else {
8147 Value *Base;
8148 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8149 Align ArgAlign = Align(8);
8150 if (A->getType()->isArrayTy()) {
8151 // Arrays are aligned to element size, except for long double
8152 // arrays, which are aligned to 8 bytes.
8153 Type *ElementTy = A->getType()->getArrayElementType();
8154 if (!ElementTy->isPPC_FP128Ty())
8155 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8156 } else if (A->getType()->isVectorTy()) {
8157 // Vectors are naturally aligned.
8158 ArgAlign = Align(ArgSize);
8159 }
8160 if (ArgAlign < 8)
8161 ArgAlign = Align(8);
8162 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8163 if (DL.isBigEndian()) {
8164 // Adjusting the shadow for argument with size < 8 to match the
8165 // placement of bits in big endian system
8166 if (ArgSize < 8)
8167 VAArgOffset += (8 - ArgSize);
8168 }
8169 if (!IsFixed) {
8170 Base =
8171 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8172 if (Base)
8173 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8174 }
8175 VAArgOffset += ArgSize;
8176 VAArgOffset = alignTo(VAArgOffset, Align(8));
8177 }
8178 if (IsFixed)
8179 VAArgBase = VAArgOffset;
8180 }
8181
8182 Constant *TotalVAArgSize =
8183 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8184 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8185 // a new class member i.e. it is the total size of all VarArgs.
8186 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8187 }
8188
8189 void finalizeInstrumentation() override {
8190 assert(!VAArgSize && !VAArgTLSCopy &&
8191 "finalizeInstrumentation called twice");
8192 IRBuilder<> IRB(MSV.FnPrologueEnd);
8193 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8194 Value *CopySize = VAArgSize;
8195
8196 if (!VAStartInstrumentationList.empty()) {
8197 // If there is a va_start in this function, make a backup copy of
8198 // va_arg_tls somewhere in the function entry block.
8199
8200 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8201 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8202 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8203 CopySize, kShadowTLSAlignment, false);
8204
8205 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8206 Intrinsic::umin, CopySize,
8207 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8208 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8209 kShadowTLSAlignment, SrcSize);
8210 }
8211
8212 // Instrument va_start.
8213 // Copy va_list shadow from the backup copy of the TLS contents.
8214 for (CallInst *OrigInst : VAStartInstrumentationList) {
8215 NextNodeIRBuilder IRB(OrigInst);
8216 Value *VAListTag = OrigInst->getArgOperand(0);
8217 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8218
8219 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8220
8221 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8222 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8223 const DataLayout &DL = F.getDataLayout();
8224 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8225 const Align Alignment = Align(IntptrSize);
8226 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8227 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8228 Alignment, /*isStore*/ true);
8229 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8230 CopySize);
8231 }
8232 }
8233};
8234
8235/// PowerPC32-specific implementation of VarArgHelper.
8236struct VarArgPowerPC32Helper : public VarArgHelperBase {
8237 AllocaInst *VAArgTLSCopy = nullptr;
8238 Value *VAArgSize = nullptr;
8239
8240 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8241 MemorySanitizerVisitor &MSV)
8242 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8243
8244 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8245 unsigned VAArgBase;
8246 // Parameter save area is 8 bytes from frame pointer in PPC32
8247 VAArgBase = 8;
8248 unsigned VAArgOffset = VAArgBase;
8249 const DataLayout &DL = F.getDataLayout();
8250 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8251 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8252 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8253 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8254 if (IsByVal) {
8255 assert(A->getType()->isPointerTy());
8256 Type *RealTy = CB.getParamByValType(ArgNo);
8257 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8258 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8259 if (ArgAlign < IntptrSize)
8260 ArgAlign = Align(IntptrSize);
8261 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8262 if (!IsFixed) {
8263 Value *Base =
8264 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8265 if (Base) {
8266 Value *AShadowPtr, *AOriginPtr;
8267 std::tie(AShadowPtr, AOriginPtr) =
8268 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8269 kShadowTLSAlignment, /*isStore*/ false);
8270
8271 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8272 kShadowTLSAlignment, ArgSize);
8273 }
8274 }
8275 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8276 } else {
8277 Value *Base;
8278 Type *ArgTy = A->getType();
8279
8280 // On PPC 32 floating point variable arguments are stored in separate
8281 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8282 // them as they will be found when checking call arguments.
8283 if (!ArgTy->isFloatingPointTy()) {
8284 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8285 Align ArgAlign = Align(IntptrSize);
8286 if (ArgTy->isArrayTy()) {
8287 // Arrays are aligned to element size, except for long double
8288 // arrays, which are aligned to 8 bytes.
8289 Type *ElementTy = ArgTy->getArrayElementType();
8290 if (!ElementTy->isPPC_FP128Ty())
8291 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8292 } else if (ArgTy->isVectorTy()) {
8293 // Vectors are naturally aligned.
8294 ArgAlign = Align(ArgSize);
8295 }
8296 if (ArgAlign < IntptrSize)
8297 ArgAlign = Align(IntptrSize);
8298 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8299 if (DL.isBigEndian()) {
8300 // Adjusting the shadow for argument with size < IntptrSize to match
8301 // the placement of bits in big endian system
8302 if (ArgSize < IntptrSize)
8303 VAArgOffset += (IntptrSize - ArgSize);
8304 }
8305 if (!IsFixed) {
8306 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8307 ArgSize);
8308 if (Base)
8309 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8311 }
8312 VAArgOffset += ArgSize;
8313 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8314 }
8315 }
8316 }
8317
8318 Constant *TotalVAArgSize =
8319 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8320 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8321 // a new class member i.e. it is the total size of all VarArgs.
8322 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8323 }
8324
8325 void finalizeInstrumentation() override {
8326 assert(!VAArgSize && !VAArgTLSCopy &&
8327 "finalizeInstrumentation called twice");
8328 IRBuilder<> IRB(MSV.FnPrologueEnd);
8329 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8330 Value *CopySize = VAArgSize;
8331
8332 if (!VAStartInstrumentationList.empty()) {
8333 // If there is a va_start in this function, make a backup copy of
8334 // va_arg_tls somewhere in the function entry block.
8335
8336 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8337 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8338 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8339 CopySize, kShadowTLSAlignment, false);
8340
8341 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8342 Intrinsic::umin, CopySize,
8343 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8344 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8345 kShadowTLSAlignment, SrcSize);
8346 }
8347
8348 // Instrument va_start.
8349 // Copy va_list shadow from the backup copy of the TLS contents.
8350 for (CallInst *OrigInst : VAStartInstrumentationList) {
8351 NextNodeIRBuilder IRB(OrigInst);
8352 Value *VAListTag = OrigInst->getArgOperand(0);
8353 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8354 Value *RegSaveAreaSize = CopySize;
8355
8356 // In PPC32 va_list_tag is a struct
8357 RegSaveAreaPtrPtr =
8358 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8359
8360 // On PPC 32 reg_save_area can only hold 32 bytes of data
8361 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8362 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8363
8364 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8365 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8366
8367 const DataLayout &DL = F.getDataLayout();
8368 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8369 const Align Alignment = Align(IntptrSize);
8370
8371 { // Copy reg save area
8372 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8373 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8374 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8375 Alignment, /*isStore*/ true);
8376 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8377 Alignment, RegSaveAreaSize);
8378
8379 RegSaveAreaShadowPtr =
8380 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8381 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8382 ConstantInt::get(MS.IntptrTy, 32));
8383 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8384 // We fill fp shadow with zeroes as uninitialized fp args should have
8385 // been found during call base check
8386 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8387 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8388 }
8389
8390 { // Copy overflow area
8391 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8392 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8393
8394 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8395 OverflowAreaPtrPtr =
8396 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8397 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8398
8399 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8400
8401 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8402 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8403 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8404 Alignment, /*isStore*/ true);
8405
8406 Value *OverflowVAArgTLSCopyPtr =
8407 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8408 OverflowVAArgTLSCopyPtr =
8409 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8410
8411 OverflowVAArgTLSCopyPtr =
8412 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8413 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8414 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8415 }
8416 }
8417 }
8418};
8419
8420/// SystemZ-specific implementation of VarArgHelper.
8421struct VarArgSystemZHelper : public VarArgHelperBase {
8422 static const unsigned SystemZGpOffset = 16;
8423 static const unsigned SystemZGpEndOffset = 56;
8424 static const unsigned SystemZFpOffset = 128;
8425 static const unsigned SystemZFpEndOffset = 160;
8426 static const unsigned SystemZMaxVrArgs = 8;
8427 static const unsigned SystemZRegSaveAreaSize = 160;
8428 static const unsigned SystemZOverflowOffset = 160;
8429 static const unsigned SystemZVAListTagSize = 32;
8430 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8431 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8432
8433 bool IsSoftFloatABI;
8434 AllocaInst *VAArgTLSCopy = nullptr;
8435 AllocaInst *VAArgTLSOriginCopy = nullptr;
8436 Value *VAArgOverflowSize = nullptr;
8437
8438 enum class ArgKind {
8439 GeneralPurpose,
8440 FloatingPoint,
8441 Vector,
8442 Memory,
8443 Indirect,
8444 };
8445
8446 enum class ShadowExtension { None, Zero, Sign };
8447
8448 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8449 MemorySanitizerVisitor &MSV)
8450 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8451 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8452
8453 ArgKind classifyArgument(Type *T) {
8454 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8455 // only a few possibilities of what it can be. In particular, enums, single
8456 // element structs and large types have already been taken care of.
8457
8458 // Some i128 and fp128 arguments are converted to pointers only in the
8459 // back end.
8460 if (T->isIntegerTy(128) || T->isFP128Ty())
8461 return ArgKind::Indirect;
8462 if (T->isFloatingPointTy())
8463 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8464 if (T->isIntegerTy() || T->isPointerTy())
8465 return ArgKind::GeneralPurpose;
8466 if (T->isVectorTy())
8467 return ArgKind::Vector;
8468 return ArgKind::Memory;
8469 }
8470
8471 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8472 // ABI says: "One of the simple integer types no more than 64 bits wide.
8473 // ... If such an argument is shorter than 64 bits, replace it by a full
8474 // 64-bit integer representing the same number, using sign or zero
8475 // extension". Shadow for an integer argument has the same type as the
8476 // argument itself, so it can be sign or zero extended as well.
8477 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8478 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8479 if (ZExt) {
8480 assert(!SExt);
8481 return ShadowExtension::Zero;
8482 }
8483 if (SExt) {
8484 assert(!ZExt);
8485 return ShadowExtension::Sign;
8486 }
8487 return ShadowExtension::None;
8488 }
8489
8490 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8491 unsigned GpOffset = SystemZGpOffset;
8492 unsigned FpOffset = SystemZFpOffset;
8493 unsigned VrIndex = 0;
8494 unsigned OverflowOffset = SystemZOverflowOffset;
8495 const DataLayout &DL = F.getDataLayout();
8496 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8497 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8498 // SystemZABIInfo does not produce ByVal parameters.
8499 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8500 Type *T = A->getType();
8501 ArgKind AK = classifyArgument(T);
8502 if (AK == ArgKind::Indirect) {
8503 T = MS.PtrTy;
8504 AK = ArgKind::GeneralPurpose;
8505 }
8506 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8507 AK = ArgKind::Memory;
8508 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8509 AK = ArgKind::Memory;
8510 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8511 AK = ArgKind::Memory;
8512 Value *ShadowBase = nullptr;
8513 Value *OriginBase = nullptr;
8514 ShadowExtension SE = ShadowExtension::None;
8515 switch (AK) {
8516 case ArgKind::GeneralPurpose: {
8517 // Always keep track of GpOffset, but store shadow only for varargs.
8518 uint64_t ArgSize = 8;
8519 if (GpOffset + ArgSize <= kParamTLSSize) {
8520 if (!IsFixed) {
8521 SE = getShadowExtension(CB, ArgNo);
8522 uint64_t GapSize = 0;
8523 if (SE == ShadowExtension::None) {
8524 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8525 assert(ArgAllocSize <= ArgSize);
8526 GapSize = ArgSize - ArgAllocSize;
8527 }
8528 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8529 if (MS.TrackOrigins)
8530 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8531 }
8532 GpOffset += ArgSize;
8533 } else {
8534 GpOffset = kParamTLSSize;
8535 }
8536 break;
8537 }
8538 case ArgKind::FloatingPoint: {
8539 // Always keep track of FpOffset, but store shadow only for varargs.
8540 uint64_t ArgSize = 8;
8541 if (FpOffset + ArgSize <= kParamTLSSize) {
8542 if (!IsFixed) {
8543 // PoP says: "A short floating-point datum requires only the
8544 // left-most 32 bit positions of a floating-point register".
8545 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8546 // don't extend shadow and don't mind the gap.
8547 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8548 if (MS.TrackOrigins)
8549 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8550 }
8551 FpOffset += ArgSize;
8552 } else {
8553 FpOffset = kParamTLSSize;
8554 }
8555 break;
8556 }
8557 case ArgKind::Vector: {
8558 // Keep track of VrIndex. No need to store shadow, since vector varargs
8559 // go through AK_Memory.
8560 assert(IsFixed);
8561 VrIndex++;
8562 break;
8563 }
8564 case ArgKind::Memory: {
8565 // Keep track of OverflowOffset and store shadow only for varargs.
8566 // Ignore fixed args, since we need to copy only the vararg portion of
8567 // the overflow area shadow.
8568 if (!IsFixed) {
8569 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8570 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8571 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8572 SE = getShadowExtension(CB, ArgNo);
8573 uint64_t GapSize =
8574 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8575 ShadowBase =
8576 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8577 if (MS.TrackOrigins)
8578 OriginBase =
8579 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8580 OverflowOffset += ArgSize;
8581 } else {
8582 OverflowOffset = kParamTLSSize;
8583 }
8584 }
8585 break;
8586 }
8587 case ArgKind::Indirect:
8588 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8589 }
8590 if (ShadowBase == nullptr)
8591 continue;
8592 Value *Shadow = MSV.getShadow(A);
8593 if (SE != ShadowExtension::None)
8594 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8595 /*Signed*/ SE == ShadowExtension::Sign);
8596 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8597 IRB.CreateStore(Shadow, ShadowBase);
8598 if (MS.TrackOrigins) {
8599 Value *Origin = MSV.getOrigin(A);
8600 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8601 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8603 }
8604 }
8605 Constant *OverflowSize = ConstantInt::get(
8606 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8607 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8608 }
8609
8610 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8611 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8612 IRB.CreateAdd(
8613 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8614 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8615 MS.PtrTy);
8616 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8617 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8618 const Align Alignment = Align(8);
8619 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8620 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8621 /*isStore*/ true);
8622 // TODO(iii): copy only fragments filled by visitCallBase()
8623 // TODO(iii): support packed-stack && !use-soft-float
8624 // For use-soft-float functions, it is enough to copy just the GPRs.
8625 unsigned RegSaveAreaSize =
8626 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8627 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8628 RegSaveAreaSize);
8629 if (MS.TrackOrigins)
8630 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8631 Alignment, RegSaveAreaSize);
8632 }
8633
8634 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8635 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8636 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8637 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8638 IRB.CreateAdd(
8639 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8640 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8641 MS.PtrTy);
8642 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8643 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8644 const Align Alignment = Align(8);
8645 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8646 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8647 Alignment, /*isStore*/ true);
8648 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8649 SystemZOverflowOffset);
8650 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8651 VAArgOverflowSize);
8652 if (MS.TrackOrigins) {
8653 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8654 SystemZOverflowOffset);
8655 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8656 VAArgOverflowSize);
8657 }
8658 }
8659
8660 void finalizeInstrumentation() override {
8661 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8662 "finalizeInstrumentation called twice");
8663 if (!VAStartInstrumentationList.empty()) {
8664 // If there is a va_start in this function, make a backup copy of
8665 // va_arg_tls somewhere in the function entry block.
8666 IRBuilder<> IRB(MSV.FnPrologueEnd);
8667 VAArgOverflowSize =
8668 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8669 Value *CopySize =
8670 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8671 VAArgOverflowSize);
8672 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8673 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8674 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8675 CopySize, kShadowTLSAlignment, false);
8676
8677 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8678 Intrinsic::umin, CopySize,
8679 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8680 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8681 kShadowTLSAlignment, SrcSize);
8682 if (MS.TrackOrigins) {
8683 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8684 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8685 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8686 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8687 }
8688 }
8689
8690 // Instrument va_start.
8691 // Copy va_list shadow from the backup copy of the TLS contents.
8692 for (CallInst *OrigInst : VAStartInstrumentationList) {
8693 NextNodeIRBuilder IRB(OrigInst);
8694 Value *VAListTag = OrigInst->getArgOperand(0);
8695 copyRegSaveArea(IRB, VAListTag);
8696 copyOverflowArea(IRB, VAListTag);
8697 }
8698 }
8699};
8700
8701/// i386-specific implementation of VarArgHelper.
8702struct VarArgI386Helper : public VarArgHelperBase {
8703 AllocaInst *VAArgTLSCopy = nullptr;
8704 Value *VAArgSize = nullptr;
8705
8706 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8707 MemorySanitizerVisitor &MSV)
8708 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8709
8710 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8711 const DataLayout &DL = F.getDataLayout();
8712 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8713 unsigned VAArgOffset = 0;
8714 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8715 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8716 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8717 if (IsByVal) {
8718 assert(A->getType()->isPointerTy());
8719 Type *RealTy = CB.getParamByValType(ArgNo);
8720 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8721 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8722 if (ArgAlign < IntptrSize)
8723 ArgAlign = Align(IntptrSize);
8724 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8725 if (!IsFixed) {
8726 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8727 if (Base) {
8728 Value *AShadowPtr, *AOriginPtr;
8729 std::tie(AShadowPtr, AOriginPtr) =
8730 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8731 kShadowTLSAlignment, /*isStore*/ false);
8732
8733 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8734 kShadowTLSAlignment, ArgSize);
8735 }
8736 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8737 }
8738 } else {
8739 Value *Base;
8740 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8741 Align ArgAlign = Align(IntptrSize);
8742 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8743 if (DL.isBigEndian()) {
8744 // Adjusting the shadow for argument with size < IntptrSize to match
8745 // the placement of bits in big endian system
8746 if (ArgSize < IntptrSize)
8747 VAArgOffset += (IntptrSize - ArgSize);
8748 }
8749 if (!IsFixed) {
8750 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8751 if (Base)
8752 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8753 VAArgOffset += ArgSize;
8754 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8755 }
8756 }
8757 }
8758
8759 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8760 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8761 // a new class member i.e. it is the total size of all VarArgs.
8762 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8763 }
8764
8765 void finalizeInstrumentation() override {
8766 assert(!VAArgSize && !VAArgTLSCopy &&
8767 "finalizeInstrumentation called twice");
8768 IRBuilder<> IRB(MSV.FnPrologueEnd);
8769 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8770 Value *CopySize = VAArgSize;
8771
8772 if (!VAStartInstrumentationList.empty()) {
8773 // If there is a va_start in this function, make a backup copy of
8774 // va_arg_tls somewhere in the function entry block.
8775 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8776 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8777 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8778 CopySize, kShadowTLSAlignment, false);
8779
8780 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8781 Intrinsic::umin, CopySize,
8782 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8783 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8784 kShadowTLSAlignment, SrcSize);
8785 }
8786
8787 // Instrument va_start.
8788 // Copy va_list shadow from the backup copy of the TLS contents.
8789 for (CallInst *OrigInst : VAStartInstrumentationList) {
8790 NextNodeIRBuilder IRB(OrigInst);
8791 Value *VAListTag = OrigInst->getArgOperand(0);
8792 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8793 Value *RegSaveAreaPtrPtr =
8794 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8795 PointerType::get(*MS.C, 0));
8796 Value *RegSaveAreaPtr =
8797 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8798 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8799 const DataLayout &DL = F.getDataLayout();
8800 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8801 const Align Alignment = Align(IntptrSize);
8802 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8803 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8804 Alignment, /*isStore*/ true);
8805 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8806 CopySize);
8807 }
8808 }
8809};
8810
8811/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8812/// LoongArch64.
8813struct VarArgGenericHelper : public VarArgHelperBase {
8814 AllocaInst *VAArgTLSCopy = nullptr;
8815 Value *VAArgSize = nullptr;
8816
8817 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8818 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8819 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8820
8821 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8822 unsigned VAArgOffset = 0;
8823 const DataLayout &DL = F.getDataLayout();
8824 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8825 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8826 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8827 if (IsFixed)
8828 continue;
8829 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8830 if (DL.isBigEndian()) {
8831 // Adjusting the shadow for argument with size < IntptrSize to match the
8832 // placement of bits in big endian system
8833 if (ArgSize < IntptrSize)
8834 VAArgOffset += (IntptrSize - ArgSize);
8835 }
8836 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8837 VAArgOffset += ArgSize;
8838 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8839 if (!Base)
8840 continue;
8841 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8842 }
8843
8844 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8845 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8846 // a new class member i.e. it is the total size of all VarArgs.
8847 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8848 }
8849
8850 void finalizeInstrumentation() override {
8851 assert(!VAArgSize && !VAArgTLSCopy &&
8852 "finalizeInstrumentation called twice");
8853 IRBuilder<> IRB(MSV.FnPrologueEnd);
8854 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8855 Value *CopySize = VAArgSize;
8856
8857 if (!VAStartInstrumentationList.empty()) {
8858 // If there is a va_start in this function, make a backup copy of
8859 // va_arg_tls somewhere in the function entry block.
8860 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8861 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8862 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8863 CopySize, kShadowTLSAlignment, false);
8864
8865 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8866 Intrinsic::umin, CopySize,
8867 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8868 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8869 kShadowTLSAlignment, SrcSize);
8870 }
8871
8872 // Instrument va_start.
8873 // Copy va_list shadow from the backup copy of the TLS contents.
8874 for (CallInst *OrigInst : VAStartInstrumentationList) {
8875 NextNodeIRBuilder IRB(OrigInst);
8876 Value *VAListTag = OrigInst->getArgOperand(0);
8877 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8878 Value *RegSaveAreaPtrPtr =
8879 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8880 PointerType::get(*MS.C, 0));
8881 Value *RegSaveAreaPtr =
8882 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8883 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8884 const DataLayout &DL = F.getDataLayout();
8885 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8886 const Align Alignment = Align(IntptrSize);
8887 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8888 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8889 Alignment, /*isStore*/ true);
8890 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8891 CopySize);
8892 }
8893 }
8894};
8895
8896// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8897// regarding VAArgs.
8898using VarArgARM32Helper = VarArgGenericHelper;
8899using VarArgRISCVHelper = VarArgGenericHelper;
8900using VarArgMIPSHelper = VarArgGenericHelper;
8901using VarArgLoongArch64Helper = VarArgGenericHelper;
8902
8903/// A no-op implementation of VarArgHelper.
8904struct VarArgNoOpHelper : public VarArgHelper {
8905 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8906 MemorySanitizerVisitor &MSV) {}
8907
8908 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8909
8910 void visitVAStartInst(VAStartInst &I) override {}
8911
8912 void visitVACopyInst(VACopyInst &I) override {}
8913
8914 void finalizeInstrumentation() override {}
8915};
8916
8917} // end anonymous namespace
8918
8919static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8920 MemorySanitizerVisitor &Visitor) {
8921 // VarArg handling is only implemented on AMD64. False positives are possible
8922 // on other platforms.
8923 Triple TargetTriple(Func.getParent()->getTargetTriple());
8924
8925 if (TargetTriple.getArch() == Triple::x86)
8926 return new VarArgI386Helper(Func, Msan, Visitor);
8927
8928 if (TargetTriple.getArch() == Triple::x86_64)
8929 return new VarArgAMD64Helper(Func, Msan, Visitor);
8930
8931 if (TargetTriple.isARM())
8932 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8933
8934 if (TargetTriple.isAArch64())
8935 return new VarArgAArch64Helper(Func, Msan, Visitor);
8936
8937 if (TargetTriple.isSystemZ())
8938 return new VarArgSystemZHelper(Func, Msan, Visitor);
8939
8940 // On PowerPC32 VAListTag is a struct
8941 // {char, char, i16 padding, char *, char *}
8942 if (TargetTriple.isPPC32())
8943 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8944
8945 if (TargetTriple.isPPC64())
8946 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8947
8948 if (TargetTriple.isRISCV32())
8949 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8950
8951 if (TargetTriple.isRISCV64())
8952 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8953
8954 if (TargetTriple.isMIPS32())
8955 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8956
8957 if (TargetTriple.isMIPS64())
8958 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8959
8960 if (TargetTriple.isLoongArch64())
8961 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8962 /*VAListTagSize=*/8);
8963
8964 return new VarArgNoOpHelper(Func, Msan, Visitor);
8965}
8966
8967bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8968 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8969 return false;
8970
8971 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8972 return false;
8973
8974 MemorySanitizerVisitor Visitor(F, *this, TLI);
8975
8976 // Clear out memory attributes.
8978 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8979 F.removeFnAttrs(B);
8980
8981 return Visitor.runOnFunction();
8982}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_S390X_MemoryMapParams
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:145
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:483
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=false)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:135
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:802
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2551
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1954
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1834
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2605
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2539
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1868
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2069
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2220
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2598
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2063
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2168
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2022
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2302
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1941
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1785
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2463
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1809
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2298
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2173
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1851
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2051
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2573
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1864
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2163
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2631
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2477
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2037
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2330
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2310
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2246
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2626
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1887
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2027
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2408
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2772
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:318
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:413
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1066
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1109
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1082
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:418
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1114
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1055
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1061
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:943
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1087
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1034
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1133
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:61
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:280
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:197
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:230
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:207
unsigned getNumOperands() const
Definition User.h:229
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:397
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
Definition Types.h:26
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:532
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1667
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2544
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3865
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70