LLVM 22.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229static const unsigned kParamTLSSize = 800;
230static const unsigned kRetvalTLSSize = 800;
231
232// Accesses sizes are powers of two: 1, 2, 4, 8.
233static const size_t kNumberOfAccessSizes = 4;
234
235/// Track origins of uninitialized values.
236///
237/// Adds a section to MemorySanitizer report that points to the allocation
238/// (stack or heap) the uninitialized bits came from originally.
240 "msan-track-origins",
241 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
242 cl::init(0));
243
244static cl::opt<bool> ClKeepGoing("msan-keep-going",
245 cl::desc("keep going after reporting a UMR"),
246 cl::Hidden, cl::init(false));
247
248static cl::opt<bool>
249 ClPoisonStack("msan-poison-stack",
250 cl::desc("poison uninitialized stack variables"), cl::Hidden,
251 cl::init(true));
252
254 "msan-poison-stack-with-call",
255 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
256 cl::init(false));
257
259 "msan-poison-stack-pattern",
260 cl::desc("poison uninitialized stack variables with the given pattern"),
261 cl::Hidden, cl::init(0xff));
262
263static cl::opt<bool>
264 ClPrintStackNames("msan-print-stack-names",
265 cl::desc("Print name of local stack variable"),
266 cl::Hidden, cl::init(true));
267
268static cl::opt<bool>
269 ClPoisonUndef("msan-poison-undef",
270 cl::desc("Poison fully undef temporary values. "
271 "Partially undefined constant vectors "
272 "are unaffected by this flag (see "
273 "-msan-poison-undef-vectors)."),
274 cl::Hidden, cl::init(true));
275
277 "msan-poison-undef-vectors",
278 cl::desc("Precisely poison partially undefined constant vectors. "
279 "If false (legacy behavior), the entire vector is "
280 "considered fully initialized, which may lead to false "
281 "negatives. Fully undefined constant vectors are "
282 "unaffected by this flag (see -msan-poison-undef)."),
283 cl::Hidden, cl::init(false));
284
286 "msan-precise-disjoint-or",
287 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
288 "disjointedness is ignored (i.e., 1|1 is initialized)."),
289 cl::Hidden, cl::init(false));
290
291static cl::opt<bool>
292 ClHandleICmp("msan-handle-icmp",
293 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
294 cl::Hidden, cl::init(true));
295
296static cl::opt<bool>
297 ClHandleICmpExact("msan-handle-icmp-exact",
298 cl::desc("exact handling of relational integer ICmp"),
299 cl::Hidden, cl::init(true));
300
302 "msan-handle-lifetime-intrinsics",
303 cl::desc(
304 "when possible, poison scoped variables at the beginning of the scope "
305 "(slower, but more precise)"),
306 cl::Hidden, cl::init(true));
307
308// When compiling the Linux kernel, we sometimes see false positives related to
309// MSan being unable to understand that inline assembly calls may initialize
310// local variables.
311// This flag makes the compiler conservatively unpoison every memory location
312// passed into an assembly call. Note that this may cause false positives.
313// Because it's impossible to figure out the array sizes, we can only unpoison
314// the first sizeof(type) bytes for each type* pointer.
316 "msan-handle-asm-conservative",
317 cl::desc("conservative handling of inline assembly"), cl::Hidden,
318 cl::init(true));
319
320// This flag controls whether we check the shadow of the address
321// operand of load or store. Such bugs are very rare, since load from
322// a garbage address typically results in SEGV, but still happen
323// (e.g. only lower bits of address are garbage, or the access happens
324// early at program startup where malloc-ed memory is more likely to
325// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
327 "msan-check-access-address",
328 cl::desc("report accesses through a pointer which has poisoned shadow"),
329 cl::Hidden, cl::init(true));
330
332 "msan-eager-checks",
333 cl::desc("check arguments and return values at function call boundaries"),
334 cl::Hidden, cl::init(false));
335
337 "msan-dump-strict-instructions",
338 cl::desc("print out instructions with default strict semantics i.e.,"
339 "check that all the inputs are fully initialized, and mark "
340 "the output as fully initialized. These semantics are applied "
341 "to instructions that could not be handled explicitly nor "
342 "heuristically."),
343 cl::Hidden, cl::init(false));
344
345// Currently, all the heuristically handled instructions are specifically
346// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
347// to parallel 'msan-dump-strict-instructions', and to keep the door open to
348// handling non-intrinsic instructions heuristically.
350 "msan-dump-heuristic-instructions",
351 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
352 "Use -msan-dump-strict-instructions to print instructions that "
353 "could not be handled explicitly nor heuristically."),
354 cl::Hidden, cl::init(false));
355
357 "msan-instrumentation-with-call-threshold",
358 cl::desc(
359 "If the function being instrumented requires more than "
360 "this number of checks and origin stores, use callbacks instead of "
361 "inline checks (-1 means never use callbacks)."),
362 cl::Hidden, cl::init(3500));
363
364static cl::opt<bool>
365 ClEnableKmsan("msan-kernel",
366 cl::desc("Enable KernelMemorySanitizer instrumentation"),
367 cl::Hidden, cl::init(false));
368
369static cl::opt<bool>
370 ClDisableChecks("msan-disable-checks",
371 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
372 cl::init(false));
373
374static cl::opt<bool>
375 ClCheckConstantShadow("msan-check-constant-shadow",
376 cl::desc("Insert checks for constant shadow values"),
377 cl::Hidden, cl::init(true));
378
379// This is off by default because of a bug in gold:
380// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
381static cl::opt<bool>
382 ClWithComdat("msan-with-comdat",
383 cl::desc("Place MSan constructors in comdat sections"),
384 cl::Hidden, cl::init(false));
385
386// These options allow to specify custom memory map parameters
387// See MemoryMapParams for details.
388static cl::opt<uint64_t> ClAndMask("msan-and-mask",
389 cl::desc("Define custom MSan AndMask"),
390 cl::Hidden, cl::init(0));
391
392static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
393 cl::desc("Define custom MSan XorMask"),
394 cl::Hidden, cl::init(0));
395
396static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
397 cl::desc("Define custom MSan ShadowBase"),
398 cl::Hidden, cl::init(0));
399
400static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
401 cl::desc("Define custom MSan OriginBase"),
402 cl::Hidden, cl::init(0));
403
404static cl::opt<int>
405 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
406 cl::desc("Define threshold for number of checks per "
407 "debug location to force origin update."),
408 cl::Hidden, cl::init(3));
409
410const char kMsanModuleCtorName[] = "msan.module_ctor";
411const char kMsanInitName[] = "__msan_init";
412
413namespace {
414
415// Memory map parameters used in application-to-shadow address calculation.
416// Offset = (Addr & ~AndMask) ^ XorMask
417// Shadow = ShadowBase + Offset
418// Origin = OriginBase + Offset
419struct MemoryMapParams {
420 uint64_t AndMask;
421 uint64_t XorMask;
422 uint64_t ShadowBase;
423 uint64_t OriginBase;
424};
425
426struct PlatformMemoryMapParams {
427 const MemoryMapParams *bits32;
428 const MemoryMapParams *bits64;
429};
430
431} // end anonymous namespace
432
433// i386 Linux
434static const MemoryMapParams Linux_I386_MemoryMapParams = {
435 0x000080000000, // AndMask
436 0, // XorMask (not used)
437 0, // ShadowBase (not used)
438 0x000040000000, // OriginBase
439};
440
441// x86_64 Linux
442static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
443 0, // AndMask (not used)
444 0x500000000000, // XorMask
445 0, // ShadowBase (not used)
446 0x100000000000, // OriginBase
447};
448
449// mips32 Linux
450// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
451// after picking good constants
452
453// mips64 Linux
454static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
455 0, // AndMask (not used)
456 0x008000000000, // XorMask
457 0, // ShadowBase (not used)
458 0x002000000000, // OriginBase
459};
460
461// ppc32 Linux
462// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
463// after picking good constants
464
465// ppc64 Linux
466static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
467 0xE00000000000, // AndMask
468 0x100000000000, // XorMask
469 0x080000000000, // ShadowBase
470 0x1C0000000000, // OriginBase
471};
472
473// s390x Linux
474static const MemoryMapParams Linux_S390X_MemoryMapParams = {
475 0xC00000000000, // AndMask
476 0, // XorMask (not used)
477 0x080000000000, // ShadowBase
478 0x1C0000000000, // OriginBase
479};
480
481// arm32 Linux
482// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
483// after picking good constants
484
485// aarch64 Linux
486static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
487 0, // AndMask (not used)
488 0x0B00000000000, // XorMask
489 0, // ShadowBase (not used)
490 0x0200000000000, // OriginBase
491};
492
493// loongarch64 Linux
494static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
495 0, // AndMask (not used)
496 0x500000000000, // XorMask
497 0, // ShadowBase (not used)
498 0x100000000000, // OriginBase
499};
500
501// riscv32 Linux
502// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
503// after picking good constants
504
505// aarch64 FreeBSD
506static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
507 0x1800000000000, // AndMask
508 0x0400000000000, // XorMask
509 0x0200000000000, // ShadowBase
510 0x0700000000000, // OriginBase
511};
512
513// i386 FreeBSD
514static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
515 0x000180000000, // AndMask
516 0x000040000000, // XorMask
517 0x000020000000, // ShadowBase
518 0x000700000000, // OriginBase
519};
520
521// x86_64 FreeBSD
522static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
523 0xc00000000000, // AndMask
524 0x200000000000, // XorMask
525 0x100000000000, // ShadowBase
526 0x380000000000, // OriginBase
527};
528
529// x86_64 NetBSD
530static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
531 0, // AndMask
532 0x500000000000, // XorMask
533 0, // ShadowBase
534 0x100000000000, // OriginBase
535};
536
537static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
540};
541
542static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
543 nullptr,
545};
546
547static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
548 nullptr,
550};
551
552static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
553 nullptr,
555};
556
557static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
558 nullptr,
560};
561
562static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
563 nullptr,
565};
566
567static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
568 nullptr,
570};
571
572static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
575};
576
577static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
578 nullptr,
580};
581
582namespace {
583
584/// Instrument functions of a module to detect uninitialized reads.
585///
586/// Instantiating MemorySanitizer inserts the msan runtime library API function
587/// declarations into the module if they don't exist already. Instantiating
588/// ensures the __msan_init function is in the list of global constructors for
589/// the module.
590class MemorySanitizer {
591public:
592 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
593 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
594 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
595 initializeModule(M);
596 }
597
598 // MSan cannot be moved or copied because of MapParams.
599 MemorySanitizer(MemorySanitizer &&) = delete;
600 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
601 MemorySanitizer(const MemorySanitizer &) = delete;
602 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
603
604 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
605
606private:
607 friend struct MemorySanitizerVisitor;
608 friend struct VarArgHelperBase;
609 friend struct VarArgAMD64Helper;
610 friend struct VarArgAArch64Helper;
611 friend struct VarArgPowerPC64Helper;
612 friend struct VarArgPowerPC32Helper;
613 friend struct VarArgSystemZHelper;
614 friend struct VarArgI386Helper;
615 friend struct VarArgGenericHelper;
616
617 void initializeModule(Module &M);
618 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
619 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
620 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
621
622 template <typename... ArgsTy>
623 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
624 ArgsTy... Args);
625
626 /// True if we're compiling the Linux kernel.
627 bool CompileKernel;
628 /// Track origins (allocation points) of uninitialized values.
629 int TrackOrigins;
630 bool Recover;
631 bool EagerChecks;
632
633 Triple TargetTriple;
634 LLVMContext *C;
635 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
636 Type *OriginTy;
637 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
638
639 // XxxTLS variables represent the per-thread state in MSan and per-task state
640 // in KMSAN.
641 // For the userspace these point to thread-local globals. In the kernel land
642 // they point to the members of a per-task struct obtained via a call to
643 // __msan_get_context_state().
644
645 /// Thread-local shadow storage for function parameters.
646 Value *ParamTLS;
647
648 /// Thread-local origin storage for function parameters.
649 Value *ParamOriginTLS;
650
651 /// Thread-local shadow storage for function return value.
652 Value *RetvalTLS;
653
654 /// Thread-local origin storage for function return value.
655 Value *RetvalOriginTLS;
656
657 /// Thread-local shadow storage for in-register va_arg function.
658 Value *VAArgTLS;
659
660 /// Thread-local shadow storage for in-register va_arg function.
661 Value *VAArgOriginTLS;
662
663 /// Thread-local shadow storage for va_arg overflow area.
664 Value *VAArgOverflowSizeTLS;
665
666 /// Are the instrumentation callbacks set up?
667 bool CallbacksInitialized = false;
668
669 /// The run-time callback to print a warning.
670 FunctionCallee WarningFn;
671
672 // These arrays are indexed by log2(AccessSize).
673 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
674 FunctionCallee MaybeWarningVarSizeFn;
675 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
676
677 /// Run-time helper that generates a new origin value for a stack
678 /// allocation.
679 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
680 // No description version
681 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
682
683 /// Run-time helper that poisons stack on function entry.
684 FunctionCallee MsanPoisonStackFn;
685
686 /// Run-time helper that records a store (or any event) of an
687 /// uninitialized value and returns an updated origin id encoding this info.
688 FunctionCallee MsanChainOriginFn;
689
690 /// Run-time helper that paints an origin over a region.
691 FunctionCallee MsanSetOriginFn;
692
693 /// MSan runtime replacements for memmove, memcpy and memset.
694 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
695
696 /// KMSAN callback for task-local function argument shadow.
697 StructType *MsanContextStateTy;
698 FunctionCallee MsanGetContextStateFn;
699
700 /// Functions for poisoning/unpoisoning local variables
701 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
702
703 /// Pair of shadow/origin pointers.
704 Type *MsanMetadata;
705
706 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
707 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
708 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
709 FunctionCallee MsanMetadataPtrForStore_1_8[4];
710 FunctionCallee MsanInstrumentAsmStoreFn;
711
712 /// Storage for return values of the MsanMetadataPtrXxx functions.
713 Value *MsanMetadataAlloca;
714
715 /// Helper to choose between different MsanMetadataPtrXxx().
716 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
717
718 /// Memory map parameters used in application-to-shadow calculation.
719 const MemoryMapParams *MapParams;
720
721 /// Custom memory map parameters used when -msan-shadow-base or
722 // -msan-origin-base is provided.
723 MemoryMapParams CustomMapParams;
724
725 MDNode *ColdCallWeights;
726
727 /// Branch weights for origin store.
728 MDNode *OriginStoreWeights;
729};
730
731void insertModuleCtor(Module &M) {
734 /*InitArgTypes=*/{},
735 /*InitArgs=*/{},
736 // This callback is invoked when the functions are created the first
737 // time. Hook them into the global ctors list in that case:
738 [&](Function *Ctor, FunctionCallee) {
739 if (!ClWithComdat) {
740 appendToGlobalCtors(M, Ctor, 0);
741 return;
742 }
743 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
744 Ctor->setComdat(MsanCtorComdat);
745 appendToGlobalCtors(M, Ctor, 0, Ctor);
746 });
747}
748
749template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
750 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
751}
752
753} // end anonymous namespace
754
756 bool EagerChecks)
757 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
758 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
759 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
760 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
761
764 // Return early if nosanitize_memory module flag is present for the module.
765 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
766 return PreservedAnalyses::all();
767 bool Modified = false;
768 if (!Options.Kernel) {
769 insertModuleCtor(M);
770 Modified = true;
771 }
772
773 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
774 for (Function &F : M) {
775 if (F.empty())
776 continue;
777 MemorySanitizer Msan(*F.getParent(), Options);
778 Modified |=
779 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
780 }
781
782 if (!Modified)
783 return PreservedAnalyses::all();
784
786 // GlobalsAA is considered stateless and does not get invalidated unless
787 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
788 // make changes that require GlobalsAA to be invalidated.
789 PA.abandon<GlobalsAA>();
790 return PA;
791}
792
794 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
796 OS, MapClassName2PassName);
797 OS << '<';
798 if (Options.Recover)
799 OS << "recover;";
800 if (Options.Kernel)
801 OS << "kernel;";
802 if (Options.EagerChecks)
803 OS << "eager-checks;";
804 OS << "track-origins=" << Options.TrackOrigins;
805 OS << '>';
806}
807
808/// Create a non-const global initialized with the given string.
809///
810/// Creates a writable global for Str so that we can pass it to the
811/// run-time lib. Runtime uses first 4 bytes of the string to store the
812/// frame ID, so the string needs to be mutable.
814 StringRef Str) {
815 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
816 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
817 GlobalValue::PrivateLinkage, StrConst, "");
818}
819
820template <typename... ArgsTy>
822MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
823 ArgsTy... Args) {
824 if (TargetTriple.getArch() == Triple::systemz) {
825 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
826 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
827 std::forward<ArgsTy>(Args)...);
828 }
829
830 return M.getOrInsertFunction(Name, MsanMetadata,
831 std::forward<ArgsTy>(Args)...);
832}
833
834/// Create KMSAN API callbacks.
835void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
836 IRBuilder<> IRB(*C);
837
838 // These will be initialized in insertKmsanPrologue().
839 RetvalTLS = nullptr;
840 RetvalOriginTLS = nullptr;
841 ParamTLS = nullptr;
842 ParamOriginTLS = nullptr;
843 VAArgTLS = nullptr;
844 VAArgOriginTLS = nullptr;
845 VAArgOverflowSizeTLS = nullptr;
846
847 WarningFn = M.getOrInsertFunction("__msan_warning",
848 TLI.getAttrList(C, {0}, /*Signed=*/false),
849 IRB.getVoidTy(), IRB.getInt32Ty());
850
851 // Requests the per-task context state (kmsan_context_state*) from the
852 // runtime library.
853 MsanContextStateTy = StructType::get(
854 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
855 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
858 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
859 OriginTy);
860 MsanGetContextStateFn =
861 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
862
863 MsanMetadata = StructType::get(PtrTy, PtrTy);
864
865 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
866 std::string name_load =
867 "__msan_metadata_ptr_for_load_" + std::to_string(size);
868 std::string name_store =
869 "__msan_metadata_ptr_for_store_" + std::to_string(size);
870 MsanMetadataPtrForLoad_1_8[ind] =
871 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
872 MsanMetadataPtrForStore_1_8[ind] =
873 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
874 }
875
876 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
877 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
878 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
879 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
880
881 // Functions for poisoning and unpoisoning memory.
882 MsanPoisonAllocaFn = M.getOrInsertFunction(
883 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
884 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
885 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
886}
887
889 return M.getOrInsertGlobal(Name, Ty, [&] {
890 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
891 nullptr, Name, nullptr,
893 });
894}
895
896/// Insert declarations for userspace-specific functions and globals.
897void MemorySanitizer::createUserspaceApi(Module &M,
898 const TargetLibraryInfo &TLI) {
899 IRBuilder<> IRB(*C);
900
901 // Create the callback.
902 // FIXME: this function should have "Cold" calling conv,
903 // which is not yet implemented.
904 if (TrackOrigins) {
905 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
906 : "__msan_warning_with_origin_noreturn";
907 WarningFn = M.getOrInsertFunction(WarningFnName,
908 TLI.getAttrList(C, {0}, /*Signed=*/false),
909 IRB.getVoidTy(), IRB.getInt32Ty());
910 } else {
911 StringRef WarningFnName =
912 Recover ? "__msan_warning" : "__msan_warning_noreturn";
913 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
914 }
915
916 // Create the global TLS variables.
917 RetvalTLS =
918 getOrInsertGlobal(M, "__msan_retval_tls",
919 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
920
921 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
922
923 ParamTLS =
924 getOrInsertGlobal(M, "__msan_param_tls",
925 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
926
927 ParamOriginTLS =
928 getOrInsertGlobal(M, "__msan_param_origin_tls",
929 ArrayType::get(OriginTy, kParamTLSSize / 4));
930
931 VAArgTLS =
932 getOrInsertGlobal(M, "__msan_va_arg_tls",
933 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
934
935 VAArgOriginTLS =
936 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
937 ArrayType::get(OriginTy, kParamTLSSize / 4));
938
939 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
940 IRB.getIntPtrTy(M.getDataLayout()));
941
942 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
943 AccessSizeIndex++) {
944 unsigned AccessSize = 1 << AccessSizeIndex;
945 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
946 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
947 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
948 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
949 MaybeWarningVarSizeFn = M.getOrInsertFunction(
950 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
951 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
952 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
953 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
954 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
955 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
956 IRB.getInt32Ty());
957 }
958
959 MsanSetAllocaOriginWithDescriptionFn =
960 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
961 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
962 MsanSetAllocaOriginNoDescriptionFn =
963 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
964 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
965 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
966 IRB.getVoidTy(), PtrTy, IntptrTy);
967}
968
969/// Insert extern declaration of runtime-provided functions and globals.
970void MemorySanitizer::initializeCallbacks(Module &M,
971 const TargetLibraryInfo &TLI) {
972 // Only do this once.
973 if (CallbacksInitialized)
974 return;
975
976 IRBuilder<> IRB(*C);
977 // Initialize callbacks that are common for kernel and userspace
978 // instrumentation.
979 MsanChainOriginFn = M.getOrInsertFunction(
980 "__msan_chain_origin",
981 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
982 IRB.getInt32Ty());
983 MsanSetOriginFn = M.getOrInsertFunction(
984 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
985 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
986 MemmoveFn =
987 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
988 MemcpyFn =
989 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
990 MemsetFn = M.getOrInsertFunction("__msan_memset",
991 TLI.getAttrList(C, {1}, /*Signed=*/true),
992 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
993
994 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
995 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
996
997 if (CompileKernel) {
998 createKernelApi(M, TLI);
999 } else {
1000 createUserspaceApi(M, TLI);
1001 }
1002 CallbacksInitialized = true;
1003}
1004
1005FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1006 int size) {
1007 FunctionCallee *Fns =
1008 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1009 switch (size) {
1010 case 1:
1011 return Fns[0];
1012 case 2:
1013 return Fns[1];
1014 case 4:
1015 return Fns[2];
1016 case 8:
1017 return Fns[3];
1018 default:
1019 return nullptr;
1020 }
1021}
1022
1023/// Module-level initialization.
1024///
1025/// inserts a call to __msan_init to the module's constructor list.
1026void MemorySanitizer::initializeModule(Module &M) {
1027 auto &DL = M.getDataLayout();
1028
1029 TargetTriple = M.getTargetTriple();
1030
1031 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1032 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1033 // Check the overrides first
1034 if (ShadowPassed || OriginPassed) {
1035 CustomMapParams.AndMask = ClAndMask;
1036 CustomMapParams.XorMask = ClXorMask;
1037 CustomMapParams.ShadowBase = ClShadowBase;
1038 CustomMapParams.OriginBase = ClOriginBase;
1039 MapParams = &CustomMapParams;
1040 } else {
1041 switch (TargetTriple.getOS()) {
1042 case Triple::FreeBSD:
1043 switch (TargetTriple.getArch()) {
1044 case Triple::aarch64:
1045 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1046 break;
1047 case Triple::x86_64:
1048 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1049 break;
1050 case Triple::x86:
1051 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1052 break;
1053 default:
1054 report_fatal_error("unsupported architecture");
1055 }
1056 break;
1057 case Triple::NetBSD:
1058 switch (TargetTriple.getArch()) {
1059 case Triple::x86_64:
1060 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1061 break;
1062 default:
1063 report_fatal_error("unsupported architecture");
1064 }
1065 break;
1066 case Triple::Linux:
1067 switch (TargetTriple.getArch()) {
1068 case Triple::x86_64:
1069 MapParams = Linux_X86_MemoryMapParams.bits64;
1070 break;
1071 case Triple::x86:
1072 MapParams = Linux_X86_MemoryMapParams.bits32;
1073 break;
1074 case Triple::mips64:
1075 case Triple::mips64el:
1076 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1077 break;
1078 case Triple::ppc64:
1079 case Triple::ppc64le:
1080 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1081 break;
1082 case Triple::systemz:
1083 MapParams = Linux_S390_MemoryMapParams.bits64;
1084 break;
1085 case Triple::aarch64:
1086 case Triple::aarch64_be:
1087 MapParams = Linux_ARM_MemoryMapParams.bits64;
1088 break;
1090 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1091 break;
1092 default:
1093 report_fatal_error("unsupported architecture");
1094 }
1095 break;
1096 default:
1097 report_fatal_error("unsupported operating system");
1098 }
1099 }
1100
1101 C = &(M.getContext());
1102 IRBuilder<> IRB(*C);
1103 IntptrTy = IRB.getIntPtrTy(DL);
1104 OriginTy = IRB.getInt32Ty();
1105 PtrTy = IRB.getPtrTy();
1106
1107 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1108 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109
1110 if (!CompileKernel) {
1111 if (TrackOrigins)
1112 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1113 return new GlobalVariable(
1114 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1115 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1116 });
1117
1118 if (Recover)
1119 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1120 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1121 GlobalValue::WeakODRLinkage,
1122 IRB.getInt32(Recover), "__msan_keep_going");
1123 });
1124 }
1125}
1126
1127namespace {
1128
1129/// A helper class that handles instrumentation of VarArg
1130/// functions on a particular platform.
1131///
1132/// Implementations are expected to insert the instrumentation
1133/// necessary to propagate argument shadow through VarArg function
1134/// calls. Visit* methods are called during an InstVisitor pass over
1135/// the function, and should avoid creating new basic blocks. A new
1136/// instance of this class is created for each instrumented function.
1137struct VarArgHelper {
1138 virtual ~VarArgHelper() = default;
1139
1140 /// Visit a CallBase.
1141 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1142
1143 /// Visit a va_start call.
1144 virtual void visitVAStartInst(VAStartInst &I) = 0;
1145
1146 /// Visit a va_copy call.
1147 virtual void visitVACopyInst(VACopyInst &I) = 0;
1148
1149 /// Finalize function instrumentation.
1150 ///
1151 /// This method is called after visiting all interesting (see above)
1152 /// instructions in a function.
1153 virtual void finalizeInstrumentation() = 0;
1154};
1155
1156struct MemorySanitizerVisitor;
1157
1158} // end anonymous namespace
1159
1160static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1161 MemorySanitizerVisitor &Visitor);
1162
1163static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1164 if (TS.isScalable())
1165 // Scalable types unconditionally take slowpaths.
1166 return kNumberOfAccessSizes;
1167 unsigned TypeSizeFixed = TS.getFixedValue();
1168 if (TypeSizeFixed <= 8)
1169 return 0;
1170 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1171}
1172
1173namespace {
1174
1175/// Helper class to attach debug information of the given instruction onto new
1176/// instructions inserted after.
1177class NextNodeIRBuilder : public IRBuilder<> {
1178public:
1179 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1180 SetCurrentDebugLocation(IP->getDebugLoc());
1181 }
1182};
1183
1184/// This class does all the work for a given function. Store and Load
1185/// instructions store and load corresponding shadow and origin
1186/// values. Most instructions propagate shadow from arguments to their
1187/// return values. Certain instructions (most importantly, BranchInst)
1188/// test their argument shadow and print reports (with a runtime call) if it's
1189/// non-zero.
1190struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1191 Function &F;
1192 MemorySanitizer &MS;
1193 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1194 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1195 std::unique_ptr<VarArgHelper> VAHelper;
1196 const TargetLibraryInfo *TLI;
1197 Instruction *FnPrologueEnd;
1198 SmallVector<Instruction *, 16> Instructions;
1199
1200 // The following flags disable parts of MSan instrumentation based on
1201 // exclusion list contents and command-line options.
1202 bool InsertChecks;
1203 bool PropagateShadow;
1204 bool PoisonStack;
1205 bool PoisonUndef;
1206 bool PoisonUndefVectors;
1207
1208 struct ShadowOriginAndInsertPoint {
1209 Value *Shadow;
1210 Value *Origin;
1211 Instruction *OrigIns;
1212
1213 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1214 : Shadow(S), Origin(O), OrigIns(I) {}
1215 };
1217 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1218 SmallSetVector<AllocaInst *, 16> AllocaSet;
1221 int64_t SplittableBlocksCount = 0;
1222
1223 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1224 const TargetLibraryInfo &TLI)
1225 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1226 bool SanitizeFunction =
1227 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1228 InsertChecks = SanitizeFunction;
1229 PropagateShadow = SanitizeFunction;
1230 PoisonStack = SanitizeFunction && ClPoisonStack;
1231 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1232 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1233
1234 // In the presence of unreachable blocks, we may see Phi nodes with
1235 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1236 // blocks, such nodes will not have any shadow value associated with them.
1237 // It's easier to remove unreachable blocks than deal with missing shadow.
1239
1240 MS.initializeCallbacks(*F.getParent(), TLI);
1241 FnPrologueEnd =
1242 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1243 .CreateIntrinsic(Intrinsic::donothing, {});
1244
1245 if (MS.CompileKernel) {
1246 IRBuilder<> IRB(FnPrologueEnd);
1247 insertKmsanPrologue(IRB);
1248 }
1249
1250 LLVM_DEBUG(if (!InsertChecks) dbgs()
1251 << "MemorySanitizer is not inserting checks into '"
1252 << F.getName() << "'\n");
1253 }
1254
1255 bool instrumentWithCalls(Value *V) {
1256 // Constants likely will be eliminated by follow-up passes.
1257 if (isa<Constant>(V))
1258 return false;
1259 ++SplittableBlocksCount;
1261 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1262 }
1263
1264 bool isInPrologue(Instruction &I) {
1265 return I.getParent() == FnPrologueEnd->getParent() &&
1266 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1267 }
1268
1269 // Creates a new origin and records the stack trace. In general we can call
1270 // this function for any origin manipulation we like. However it will cost
1271 // runtime resources. So use this wisely only if it can provide additional
1272 // information helpful to a user.
1273 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1274 if (MS.TrackOrigins <= 1)
1275 return V;
1276 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1277 }
1278
1279 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1280 const DataLayout &DL = F.getDataLayout();
1281 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1282 if (IntptrSize == kOriginSize)
1283 return Origin;
1284 assert(IntptrSize == kOriginSize * 2);
1285 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1286 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1287 }
1288
1289 /// Fill memory range with the given origin value.
1290 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1291 TypeSize TS, Align Alignment) {
1292 const DataLayout &DL = F.getDataLayout();
1293 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1294 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1295 assert(IntptrAlignment >= kMinOriginAlignment);
1296 assert(IntptrSize >= kOriginSize);
1297
1298 // Note: The loop based formation works for fixed length vectors too,
1299 // however we prefer to unroll and specialize alignment below.
1300 if (TS.isScalable()) {
1301 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1302 Value *RoundUp =
1303 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1304 Value *End =
1305 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1306 auto [InsertPt, Index] =
1308 IRB.SetInsertPoint(InsertPt);
1309
1310 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1312 return;
1313 }
1314
1315 unsigned Size = TS.getFixedValue();
1316
1317 unsigned Ofs = 0;
1318 Align CurrentAlignment = Alignment;
1319 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1320 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1321 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1322 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1323 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1324 : IntptrOriginPtr;
1325 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1326 Ofs += IntptrSize / kOriginSize;
1327 CurrentAlignment = IntptrAlignment;
1328 }
1329 }
1330
1331 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1332 Value *GEP =
1333 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1334 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1335 CurrentAlignment = kMinOriginAlignment;
1336 }
1337 }
1338
1339 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1340 Value *OriginPtr, Align Alignment) {
1341 const DataLayout &DL = F.getDataLayout();
1342 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1343 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1344 // ZExt cannot convert between vector and scalar
1345 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1346 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1347 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1348 // Origin is not needed: value is initialized or const shadow is
1349 // ignored.
1350 return;
1351 }
1352 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1353 // Copy origin as the value is definitely uninitialized.
1354 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1355 OriginAlignment);
1356 return;
1357 }
1358 // Fallback to runtime check, which still can be optimized out later.
1359 }
1360
1361 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1362 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1363 if (instrumentWithCalls(ConvertedShadow) &&
1364 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1365 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1366 Value *ConvertedShadow2 =
1367 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1368 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1369 CB->addParamAttr(0, Attribute::ZExt);
1370 CB->addParamAttr(2, Attribute::ZExt);
1371 } else {
1372 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1374 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1375 IRBuilder<> IRBNew(CheckTerm);
1376 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1377 OriginAlignment);
1378 }
1379 }
1380
1381 void materializeStores() {
1382 for (StoreInst *SI : StoreList) {
1383 IRBuilder<> IRB(SI);
1384 Value *Val = SI->getValueOperand();
1385 Value *Addr = SI->getPointerOperand();
1386 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1387 Value *ShadowPtr, *OriginPtr;
1388 Type *ShadowTy = Shadow->getType();
1389 const Align Alignment = SI->getAlign();
1390 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1391 std::tie(ShadowPtr, OriginPtr) =
1392 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1393
1394 [[maybe_unused]] StoreInst *NewSI =
1395 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1396 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1397
1398 if (SI->isAtomic())
1399 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1400
1401 if (MS.TrackOrigins && !SI->isAtomic())
1402 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1403 OriginAlignment);
1404 }
1405 }
1406
1407 // Returns true if Debug Location corresponds to multiple warnings.
1408 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1409 if (MS.TrackOrigins < 2)
1410 return false;
1411
1412 if (LazyWarningDebugLocationCount.empty())
1413 for (const auto &I : InstrumentationList)
1414 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1415
1416 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1417 }
1418
1419 /// Helper function to insert a warning at IRB's current insert point.
1420 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1421 if (!Origin)
1422 Origin = (Value *)IRB.getInt32(0);
1423 assert(Origin->getType()->isIntegerTy());
1424
1425 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1426 // Try to create additional origin with debug info of the last origin
1427 // instruction. It may provide additional information to the user.
1428 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1429 assert(MS.TrackOrigins);
1430 auto NewDebugLoc = OI->getDebugLoc();
1431 // Origin update with missing or the same debug location provides no
1432 // additional value.
1433 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1434 // Insert update just before the check, so we call runtime only just
1435 // before the report.
1436 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1437 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1438 Origin = updateOrigin(Origin, IRBOrigin);
1439 }
1440 }
1441 }
1442
1443 if (MS.CompileKernel || MS.TrackOrigins)
1444 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1445 else
1446 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1447 // FIXME: Insert UnreachableInst if !MS.Recover?
1448 // This may invalidate some of the following checks and needs to be done
1449 // at the very end.
1450 }
1451
1452 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1453 Value *Origin) {
1454 const DataLayout &DL = F.getDataLayout();
1455 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1456 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1457 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1458 // ZExt cannot convert between vector and scalar
1459 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1460 Value *ConvertedShadow2 =
1461 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1462
1463 if (SizeIndex < kNumberOfAccessSizes) {
1464 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1465 CallBase *CB = IRB.CreateCall(
1466 Fn,
1467 {ConvertedShadow2,
1468 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1469 CB->addParamAttr(0, Attribute::ZExt);
1470 CB->addParamAttr(1, Attribute::ZExt);
1471 } else {
1472 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1473 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1474 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1475 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1476 CallBase *CB = IRB.CreateCall(
1477 Fn,
1478 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1479 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1480 CB->addParamAttr(1, Attribute::ZExt);
1481 CB->addParamAttr(2, Attribute::ZExt);
1482 }
1483 } else {
1484 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1486 Cmp, &*IRB.GetInsertPoint(),
1487 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1488
1489 IRB.SetInsertPoint(CheckTerm);
1490 insertWarningFn(IRB, Origin);
1491 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1492 }
1493 }
1494
1495 void materializeInstructionChecks(
1496 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1497 const DataLayout &DL = F.getDataLayout();
1498 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1499 // correct origin.
1500 bool Combine = !MS.TrackOrigins;
1501 Instruction *Instruction = InstructionChecks.front().OrigIns;
1502 Value *Shadow = nullptr;
1503 for (const auto &ShadowData : InstructionChecks) {
1504 assert(ShadowData.OrigIns == Instruction);
1505 IRBuilder<> IRB(Instruction);
1506
1507 Value *ConvertedShadow = ShadowData.Shadow;
1508
1509 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1510 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1511 // Skip, value is initialized or const shadow is ignored.
1512 continue;
1513 }
1514 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1515 // Report as the value is definitely uninitialized.
1516 insertWarningFn(IRB, ShadowData.Origin);
1517 if (!MS.Recover)
1518 return; // Always fail and stop here, not need to check the rest.
1519 // Skip entire instruction,
1520 continue;
1521 }
1522 // Fallback to runtime check, which still can be optimized out later.
1523 }
1524
1525 if (!Combine) {
1526 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1527 continue;
1528 }
1529
1530 if (!Shadow) {
1531 Shadow = ConvertedShadow;
1532 continue;
1533 }
1534
1535 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1536 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1537 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1538 }
1539
1540 if (Shadow) {
1541 assert(Combine);
1542 IRBuilder<> IRB(Instruction);
1543 materializeOneCheck(IRB, Shadow, nullptr);
1544 }
1545 }
1546
1547 void materializeChecks() {
1548#ifndef NDEBUG
1549 // For assert below.
1550 SmallPtrSet<Instruction *, 16> Done;
1551#endif
1552
1553 for (auto I = InstrumentationList.begin();
1554 I != InstrumentationList.end();) {
1555 auto OrigIns = I->OrigIns;
1556 // Checks are grouped by the original instruction. We call all
1557 // `insertShadowCheck` for an instruction at once.
1558 assert(Done.insert(OrigIns).second);
1559 auto J = std::find_if(I + 1, InstrumentationList.end(),
1560 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1561 return OrigIns != R.OrigIns;
1562 });
1563 // Process all checks of instruction at once.
1564 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1565 I = J;
1566 }
1567
1568 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1569 }
1570
1571 // Returns the last instruction in the new prologue
1572 void insertKmsanPrologue(IRBuilder<> &IRB) {
1573 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1574 Constant *Zero = IRB.getInt32(0);
1575 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1576 {Zero, IRB.getInt32(0)}, "param_shadow");
1577 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1578 {Zero, IRB.getInt32(1)}, "retval_shadow");
1579 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1580 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1581 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1582 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1583 MS.VAArgOverflowSizeTLS =
1584 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1585 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1586 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1587 {Zero, IRB.getInt32(5)}, "param_origin");
1588 MS.RetvalOriginTLS =
1589 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1590 {Zero, IRB.getInt32(6)}, "retval_origin");
1591 if (MS.TargetTriple.getArch() == Triple::systemz)
1592 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1593 }
1594
1595 /// Add MemorySanitizer instrumentation to a function.
1596 bool runOnFunction() {
1597 // Iterate all BBs in depth-first order and create shadow instructions
1598 // for all instructions (where applicable).
1599 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1600 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1601 visit(*BB);
1602
1603 // `visit` above only collects instructions. Process them after iterating
1604 // CFG to avoid requirement on CFG transformations.
1605 for (Instruction *I : Instructions)
1607
1608 // Finalize PHI nodes.
1609 for (PHINode *PN : ShadowPHINodes) {
1610 PHINode *PNS = cast<PHINode>(getShadow(PN));
1611 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1612 size_t NumValues = PN->getNumIncomingValues();
1613 for (size_t v = 0; v < NumValues; v++) {
1614 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1615 if (PNO)
1616 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1617 }
1618 }
1619
1620 VAHelper->finalizeInstrumentation();
1621
1622 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1623 // instrumenting only allocas.
1625 for (auto Item : LifetimeStartList) {
1626 instrumentAlloca(*Item.second, Item.first);
1627 AllocaSet.remove(Item.second);
1628 }
1629 }
1630 // Poison the allocas for which we didn't instrument the corresponding
1631 // lifetime intrinsics.
1632 for (AllocaInst *AI : AllocaSet)
1633 instrumentAlloca(*AI);
1634
1635 // Insert shadow value checks.
1636 materializeChecks();
1637
1638 // Delayed instrumentation of StoreInst.
1639 // This may not add new address checks.
1640 materializeStores();
1641
1642 return true;
1643 }
1644
1645 /// Compute the shadow type that corresponds to a given Value.
1646 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1647
1648 /// Compute the shadow type that corresponds to a given Type.
1649 Type *getShadowTy(Type *OrigTy) {
1650 if (!OrigTy->isSized()) {
1651 return nullptr;
1652 }
1653 // For integer type, shadow is the same as the original type.
1654 // This may return weird-sized types like i1.
1655 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1656 return IT;
1657 const DataLayout &DL = F.getDataLayout();
1658 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1659 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1660 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1661 VT->getElementCount());
1662 }
1663 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1664 return ArrayType::get(getShadowTy(AT->getElementType()),
1665 AT->getNumElements());
1666 }
1667 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1669 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1670 Elements.push_back(getShadowTy(ST->getElementType(i)));
1671 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1672 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1673 return Res;
1674 }
1675 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1676 return IntegerType::get(*MS.C, TypeSize);
1677 }
1678
1679 /// Extract combined shadow of struct elements as a bool
1680 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1681 IRBuilder<> &IRB) {
1682 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1683 Value *Aggregator = FalseVal;
1684
1685 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1686 // Combine by ORing together each element's bool shadow
1687 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1688 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1689
1690 if (Aggregator != FalseVal)
1691 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1692 else
1693 Aggregator = ShadowBool;
1694 }
1695
1696 return Aggregator;
1697 }
1698
1699 // Extract combined shadow of array elements
1700 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1701 IRBuilder<> &IRB) {
1702 if (!Array->getNumElements())
1703 return IRB.getIntN(/* width */ 1, /* value */ 0);
1704
1705 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1706 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1707
1708 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1709 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1710 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1711 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1712 }
1713 return Aggregator;
1714 }
1715
1716 /// Convert a shadow value to it's flattened variant. The resulting
1717 /// shadow may not necessarily have the same bit width as the input
1718 /// value, but it will always be comparable to zero.
1719 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1720 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1721 return collapseStructShadow(Struct, V, IRB);
1722 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1723 return collapseArrayShadow(Array, V, IRB);
1724 if (isa<VectorType>(V->getType())) {
1725 if (isa<ScalableVectorType>(V->getType()))
1726 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1727 unsigned BitWidth =
1728 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1729 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1730 }
1731 return V;
1732 }
1733
1734 // Convert a scalar value to an i1 by comparing with 0
1735 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1736 Type *VTy = V->getType();
1737 if (!VTy->isIntegerTy())
1738 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1739 if (VTy->getIntegerBitWidth() == 1)
1740 // Just converting a bool to a bool, so do nothing.
1741 return V;
1742 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1743 }
1744
1745 Type *ptrToIntPtrType(Type *PtrTy) const {
1746 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1747 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1748 VectTy->getElementCount());
1749 }
1750 assert(PtrTy->isIntOrPtrTy());
1751 return MS.IntptrTy;
1752 }
1753
1754 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1755 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1756 return VectorType::get(
1757 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1758 VectTy->getElementCount());
1759 }
1760 assert(IntPtrTy == MS.IntptrTy);
1761 return MS.PtrTy;
1762 }
1763
1764 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1765 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1767 VectTy->getElementCount(),
1768 constToIntPtr(VectTy->getElementType(), C));
1769 }
1770 assert(IntPtrTy == MS.IntptrTy);
1771 return ConstantInt::get(MS.IntptrTy, C);
1772 }
1773
1774 /// Returns the integer shadow offset that corresponds to a given
1775 /// application address, whereby:
1776 ///
1777 /// Offset = (Addr & ~AndMask) ^ XorMask
1778 /// Shadow = ShadowBase + Offset
1779 /// Origin = (OriginBase + Offset) & ~Alignment
1780 ///
1781 /// Note: for efficiency, many shadow mappings only require use the XorMask
1782 /// and OriginBase; the AndMask and ShadowBase are often zero.
1783 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1784 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1785 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1786
1787 if (uint64_t AndMask = MS.MapParams->AndMask)
1788 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1789
1790 if (uint64_t XorMask = MS.MapParams->XorMask)
1791 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1792 return OffsetLong;
1793 }
1794
1795 /// Compute the shadow and origin addresses corresponding to a given
1796 /// application address.
1797 ///
1798 /// Shadow = ShadowBase + Offset
1799 /// Origin = (OriginBase + Offset) & ~3ULL
1800 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1801 /// a single pointee.
1802 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1803 std::pair<Value *, Value *>
1804 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1805 MaybeAlign Alignment) {
1806 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1807 if (!VectTy) {
1808 assert(Addr->getType()->isPointerTy());
1809 } else {
1810 assert(VectTy->getElementType()->isPointerTy());
1811 }
1812 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1813 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1814 Value *ShadowLong = ShadowOffset;
1815 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1816 ShadowLong =
1817 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1818 }
1819 Value *ShadowPtr = IRB.CreateIntToPtr(
1820 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1821
1822 Value *OriginPtr = nullptr;
1823 if (MS.TrackOrigins) {
1824 Value *OriginLong = ShadowOffset;
1825 uint64_t OriginBase = MS.MapParams->OriginBase;
1826 if (OriginBase != 0)
1827 OriginLong =
1828 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1829 if (!Alignment || *Alignment < kMinOriginAlignment) {
1830 uint64_t Mask = kMinOriginAlignment.value() - 1;
1831 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1832 }
1833 OriginPtr = IRB.CreateIntToPtr(
1834 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1835 }
1836 return std::make_pair(ShadowPtr, OriginPtr);
1837 }
1838
1839 template <typename... ArgsTy>
1840 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1841 ArgsTy... Args) {
1842 if (MS.TargetTriple.getArch() == Triple::systemz) {
1843 IRB.CreateCall(Callee,
1844 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1845 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1846 }
1847
1848 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1849 }
1850
1851 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1852 IRBuilder<> &IRB,
1853 Type *ShadowTy,
1854 bool isStore) {
1855 Value *ShadowOriginPtrs;
1856 const DataLayout &DL = F.getDataLayout();
1857 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1858
1859 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1860 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1861 if (Getter) {
1862 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1863 } else {
1864 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1865 ShadowOriginPtrs = createMetadataCall(
1866 IRB,
1867 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1868 AddrCast, SizeVal);
1869 }
1870 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1871 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1872 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1873
1874 return std::make_pair(ShadowPtr, OriginPtr);
1875 }
1876
1877 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1878 /// a single pointee.
1879 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1880 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1881 IRBuilder<> &IRB,
1882 Type *ShadowTy,
1883 bool isStore) {
1884 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1885 if (!VectTy) {
1886 assert(Addr->getType()->isPointerTy());
1887 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1888 }
1889
1890 // TODO: Support callbacs with vectors of addresses.
1891 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1892 Value *ShadowPtrs = ConstantInt::getNullValue(
1893 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1894 Value *OriginPtrs = nullptr;
1895 if (MS.TrackOrigins)
1896 OriginPtrs = ConstantInt::getNullValue(
1897 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1898 for (unsigned i = 0; i < NumElements; ++i) {
1899 Value *OneAddr =
1900 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1901 auto [ShadowPtr, OriginPtr] =
1902 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1903
1904 ShadowPtrs = IRB.CreateInsertElement(
1905 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1906 if (MS.TrackOrigins)
1907 OriginPtrs = IRB.CreateInsertElement(
1908 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1909 }
1910 return {ShadowPtrs, OriginPtrs};
1911 }
1912
1913 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1914 Type *ShadowTy,
1915 MaybeAlign Alignment,
1916 bool isStore) {
1917 if (MS.CompileKernel)
1918 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1919 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1920 }
1921
1922 /// Compute the shadow address for a given function argument.
1923 ///
1924 /// Shadow = ParamTLS+ArgOffset.
1925 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1926 return IRB.CreatePtrAdd(MS.ParamTLS,
1927 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1928 }
1929
1930 /// Compute the origin address for a given function argument.
1931 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1932 if (!MS.TrackOrigins)
1933 return nullptr;
1934 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1935 ConstantInt::get(MS.IntptrTy, ArgOffset),
1936 "_msarg_o");
1937 }
1938
1939 /// Compute the shadow address for a retval.
1940 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1941 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1942 }
1943
1944 /// Compute the origin address for a retval.
1945 Value *getOriginPtrForRetval() {
1946 // We keep a single origin for the entire retval. Might be too optimistic.
1947 return MS.RetvalOriginTLS;
1948 }
1949
1950 /// Set SV to be the shadow value for V.
1951 void setShadow(Value *V, Value *SV) {
1952 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1953 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1954 }
1955
1956 /// Set Origin to be the origin value for V.
1957 void setOrigin(Value *V, Value *Origin) {
1958 if (!MS.TrackOrigins)
1959 return;
1960 assert(!OriginMap.count(V) && "Values may only have one origin");
1961 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1962 OriginMap[V] = Origin;
1963 }
1964
1965 Constant *getCleanShadow(Type *OrigTy) {
1966 Type *ShadowTy = getShadowTy(OrigTy);
1967 if (!ShadowTy)
1968 return nullptr;
1969 return Constant::getNullValue(ShadowTy);
1970 }
1971
1972 /// Create a clean shadow value for a given value.
1973 ///
1974 /// Clean shadow (all zeroes) means all bits of the value are defined
1975 /// (initialized).
1976 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
1977
1978 /// Create a dirty shadow of a given shadow type.
1979 Constant *getPoisonedShadow(Type *ShadowTy) {
1980 assert(ShadowTy);
1981 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
1982 return Constant::getAllOnesValue(ShadowTy);
1983 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
1984 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
1985 getPoisonedShadow(AT->getElementType()));
1986 return ConstantArray::get(AT, Vals);
1987 }
1988 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
1990 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1991 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
1992 return ConstantStruct::get(ST, Vals);
1993 }
1994 llvm_unreachable("Unexpected shadow type");
1995 }
1996
1997 /// Create a dirty shadow for a given value.
1998 Constant *getPoisonedShadow(Value *V) {
1999 Type *ShadowTy = getShadowTy(V);
2000 if (!ShadowTy)
2001 return nullptr;
2002 return getPoisonedShadow(ShadowTy);
2003 }
2004
2005 /// Create a clean (zero) origin.
2006 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2007
2008 /// Get the shadow value for a given Value.
2009 ///
2010 /// This function either returns the value set earlier with setShadow,
2011 /// or extracts if from ParamTLS (for function arguments).
2012 Value *getShadow(Value *V) {
2013 if (Instruction *I = dyn_cast<Instruction>(V)) {
2014 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2015 return getCleanShadow(V);
2016 // For instructions the shadow is already stored in the map.
2017 Value *Shadow = ShadowMap[V];
2018 if (!Shadow) {
2019 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2020 assert(Shadow && "No shadow for a value");
2021 }
2022 return Shadow;
2023 }
2024 // Handle fully undefined values
2025 // (partially undefined constant vectors are handled later)
2026 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2027 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2028 : getCleanShadow(V);
2029 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2030 return AllOnes;
2031 }
2032 if (Argument *A = dyn_cast<Argument>(V)) {
2033 // For arguments we compute the shadow on demand and store it in the map.
2034 Value *&ShadowPtr = ShadowMap[V];
2035 if (ShadowPtr)
2036 return ShadowPtr;
2037 Function *F = A->getParent();
2038 IRBuilder<> EntryIRB(FnPrologueEnd);
2039 unsigned ArgOffset = 0;
2040 const DataLayout &DL = F->getDataLayout();
2041 for (auto &FArg : F->args()) {
2042 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2043 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2044 ? "vscale not fully supported\n"
2045 : "Arg is not sized\n"));
2046 if (A == &FArg) {
2047 ShadowPtr = getCleanShadow(V);
2048 setOrigin(A, getCleanOrigin());
2049 break;
2050 }
2051 continue;
2052 }
2053
2054 unsigned Size = FArg.hasByValAttr()
2055 ? DL.getTypeAllocSize(FArg.getParamByValType())
2056 : DL.getTypeAllocSize(FArg.getType());
2057
2058 if (A == &FArg) {
2059 bool Overflow = ArgOffset + Size > kParamTLSSize;
2060 if (FArg.hasByValAttr()) {
2061 // ByVal pointer itself has clean shadow. We copy the actual
2062 // argument shadow to the underlying memory.
2063 // Figure out maximal valid memcpy alignment.
2064 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2065 FArg.getParamAlign(), FArg.getParamByValType());
2066 Value *CpShadowPtr, *CpOriginPtr;
2067 std::tie(CpShadowPtr, CpOriginPtr) =
2068 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2069 /*isStore*/ true);
2070 if (!PropagateShadow || Overflow) {
2071 // ParamTLS overflow.
2072 EntryIRB.CreateMemSet(
2073 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2074 Size, ArgAlign);
2075 } else {
2076 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2077 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2078 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2079 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2080 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2081
2082 if (MS.TrackOrigins) {
2083 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2084 // FIXME: OriginSize should be:
2085 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2086 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2087 EntryIRB.CreateMemCpy(
2088 CpOriginPtr,
2089 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2090 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2091 OriginSize);
2092 }
2093 }
2094 }
2095
2096 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2097 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2098 ShadowPtr = getCleanShadow(V);
2099 setOrigin(A, getCleanOrigin());
2100 } else {
2101 // Shadow over TLS
2102 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2103 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2105 if (MS.TrackOrigins) {
2106 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2107 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2108 }
2109 }
2111 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2112 break;
2113 }
2114
2115 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2116 }
2117 assert(ShadowPtr && "Could not find shadow for an argument");
2118 return ShadowPtr;
2119 }
2120
2121 // Check for partially-undefined constant vectors
2122 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2123 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2124 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2125 PoisonUndefVectors) {
2126 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2127 SmallVector<Constant *, 32> ShadowVector(NumElems);
2128 for (unsigned i = 0; i != NumElems; ++i) {
2129 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2130 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2131 : getCleanShadow(Elem);
2132 }
2133
2134 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2135 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2136 << *ShadowConstant << "\n");
2137
2138 return ShadowConstant;
2139 }
2140
2141 // TODO: partially-undefined constant arrays, structures, and nested types
2142
2143 // For everything else the shadow is zero.
2144 return getCleanShadow(V);
2145 }
2146
2147 /// Get the shadow for i-th argument of the instruction I.
2148 Value *getShadow(Instruction *I, int i) {
2149 return getShadow(I->getOperand(i));
2150 }
2151
2152 /// Get the origin for a value.
2153 Value *getOrigin(Value *V) {
2154 if (!MS.TrackOrigins)
2155 return nullptr;
2156 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2157 return getCleanOrigin();
2159 "Unexpected value type in getOrigin()");
2160 if (Instruction *I = dyn_cast<Instruction>(V)) {
2161 if (I->getMetadata(LLVMContext::MD_nosanitize))
2162 return getCleanOrigin();
2163 }
2164 Value *Origin = OriginMap[V];
2165 assert(Origin && "Missing origin");
2166 return Origin;
2167 }
2168
2169 /// Get the origin for i-th argument of the instruction I.
2170 Value *getOrigin(Instruction *I, int i) {
2171 return getOrigin(I->getOperand(i));
2172 }
2173
2174 /// Remember the place where a shadow check should be inserted.
2175 ///
2176 /// This location will be later instrumented with a check that will print a
2177 /// UMR warning in runtime if the shadow value is not 0.
2178 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2179 assert(Shadow);
2180 if (!InsertChecks)
2181 return;
2182
2183 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2184 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2185 << *OrigIns << "\n");
2186 return;
2187 }
2188#ifndef NDEBUG
2189 Type *ShadowTy = Shadow->getType();
2190 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2191 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2192 "Can only insert checks for integer, vector, and aggregate shadow "
2193 "types");
2194#endif
2195 InstrumentationList.push_back(
2196 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2197 }
2198
2199 /// Get shadow for value, and remember the place where a shadow check should
2200 /// be inserted.
2201 ///
2202 /// This location will be later instrumented with a check that will print a
2203 /// UMR warning in runtime if the value is not fully defined.
2204 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2205 assert(Val);
2206 Value *Shadow, *Origin;
2208 Shadow = getShadow(Val);
2209 if (!Shadow)
2210 return;
2211 Origin = getOrigin(Val);
2212 } else {
2213 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2214 if (!Shadow)
2215 return;
2216 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2217 }
2218 insertCheckShadow(Shadow, Origin, OrigIns);
2219 }
2220
2222 switch (a) {
2223 case AtomicOrdering::NotAtomic:
2224 return AtomicOrdering::NotAtomic;
2225 case AtomicOrdering::Unordered:
2226 case AtomicOrdering::Monotonic:
2227 case AtomicOrdering::Release:
2228 return AtomicOrdering::Release;
2229 case AtomicOrdering::Acquire:
2230 case AtomicOrdering::AcquireRelease:
2231 return AtomicOrdering::AcquireRelease;
2232 case AtomicOrdering::SequentiallyConsistent:
2233 return AtomicOrdering::SequentiallyConsistent;
2234 }
2235 llvm_unreachable("Unknown ordering");
2236 }
2237
2238 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2239 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2240 uint32_t OrderingTable[NumOrderings] = {};
2241
2242 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2243 OrderingTable[(int)AtomicOrderingCABI::release] =
2244 (int)AtomicOrderingCABI::release;
2245 OrderingTable[(int)AtomicOrderingCABI::consume] =
2246 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2247 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2248 (int)AtomicOrderingCABI::acq_rel;
2249 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2250 (int)AtomicOrderingCABI::seq_cst;
2251
2252 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2253 }
2254
2256 switch (a) {
2257 case AtomicOrdering::NotAtomic:
2258 return AtomicOrdering::NotAtomic;
2259 case AtomicOrdering::Unordered:
2260 case AtomicOrdering::Monotonic:
2261 case AtomicOrdering::Acquire:
2262 return AtomicOrdering::Acquire;
2263 case AtomicOrdering::Release:
2264 case AtomicOrdering::AcquireRelease:
2265 return AtomicOrdering::AcquireRelease;
2266 case AtomicOrdering::SequentiallyConsistent:
2267 return AtomicOrdering::SequentiallyConsistent;
2268 }
2269 llvm_unreachable("Unknown ordering");
2270 }
2271
2272 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2273 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2274 uint32_t OrderingTable[NumOrderings] = {};
2275
2276 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2277 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2278 OrderingTable[(int)AtomicOrderingCABI::consume] =
2279 (int)AtomicOrderingCABI::acquire;
2280 OrderingTable[(int)AtomicOrderingCABI::release] =
2281 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2282 (int)AtomicOrderingCABI::acq_rel;
2283 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2284 (int)AtomicOrderingCABI::seq_cst;
2285
2286 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2287 }
2288
2289 // ------------------- Visitors.
2290 using InstVisitor<MemorySanitizerVisitor>::visit;
2291 void visit(Instruction &I) {
2292 if (I.getMetadata(LLVMContext::MD_nosanitize))
2293 return;
2294 // Don't want to visit if we're in the prologue
2295 if (isInPrologue(I))
2296 return;
2297 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2298 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2299 // We still need to set the shadow and origin to clean values.
2300 setShadow(&I, getCleanShadow(&I));
2301 setOrigin(&I, getCleanOrigin());
2302 return;
2303 }
2304
2305 Instructions.push_back(&I);
2306 }
2307
2308 /// Instrument LoadInst
2309 ///
2310 /// Loads the corresponding shadow and (optionally) origin.
2311 /// Optionally, checks that the load address is fully defined.
2312 void visitLoadInst(LoadInst &I) {
2313 assert(I.getType()->isSized() && "Load type must have size");
2314 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2315 NextNodeIRBuilder IRB(&I);
2316 Type *ShadowTy = getShadowTy(&I);
2317 Value *Addr = I.getPointerOperand();
2318 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2319 const Align Alignment = I.getAlign();
2320 if (PropagateShadow) {
2321 std::tie(ShadowPtr, OriginPtr) =
2322 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2323 setShadow(&I,
2324 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2325 } else {
2326 setShadow(&I, getCleanShadow(&I));
2327 }
2328
2330 insertCheckShadowOf(I.getPointerOperand(), &I);
2331
2332 if (I.isAtomic())
2333 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2334
2335 if (MS.TrackOrigins) {
2336 if (PropagateShadow) {
2337 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2338 setOrigin(
2339 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2340 } else {
2341 setOrigin(&I, getCleanOrigin());
2342 }
2343 }
2344 }
2345
2346 /// Instrument StoreInst
2347 ///
2348 /// Stores the corresponding shadow and (optionally) origin.
2349 /// Optionally, checks that the store address is fully defined.
2350 void visitStoreInst(StoreInst &I) {
2351 StoreList.push_back(&I);
2353 insertCheckShadowOf(I.getPointerOperand(), &I);
2354 }
2355
2356 void handleCASOrRMW(Instruction &I) {
2358
2359 IRBuilder<> IRB(&I);
2360 Value *Addr = I.getOperand(0);
2361 Value *Val = I.getOperand(1);
2362 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2363 /*isStore*/ true)
2364 .first;
2365
2367 insertCheckShadowOf(Addr, &I);
2368
2369 // Only test the conditional argument of cmpxchg instruction.
2370 // The other argument can potentially be uninitialized, but we can not
2371 // detect this situation reliably without possible false positives.
2373 insertCheckShadowOf(Val, &I);
2374
2375 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2376
2377 setShadow(&I, getCleanShadow(&I));
2378 setOrigin(&I, getCleanOrigin());
2379 }
2380
2381 void visitAtomicRMWInst(AtomicRMWInst &I) {
2382 handleCASOrRMW(I);
2383 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2384 }
2385
2386 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2387 handleCASOrRMW(I);
2388 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2389 }
2390
2391 // Vector manipulation.
2392 void visitExtractElementInst(ExtractElementInst &I) {
2393 insertCheckShadowOf(I.getOperand(1), &I);
2394 IRBuilder<> IRB(&I);
2395 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2396 "_msprop"));
2397 setOrigin(&I, getOrigin(&I, 0));
2398 }
2399
2400 void visitInsertElementInst(InsertElementInst &I) {
2401 insertCheckShadowOf(I.getOperand(2), &I);
2402 IRBuilder<> IRB(&I);
2403 auto *Shadow0 = getShadow(&I, 0);
2404 auto *Shadow1 = getShadow(&I, 1);
2405 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2406 "_msprop"));
2407 setOriginForNaryOp(I);
2408 }
2409
2410 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2411 IRBuilder<> IRB(&I);
2412 auto *Shadow0 = getShadow(&I, 0);
2413 auto *Shadow1 = getShadow(&I, 1);
2414 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2415 "_msprop"));
2416 setOriginForNaryOp(I);
2417 }
2418
2419 // Casts.
2420 void visitSExtInst(SExtInst &I) {
2421 IRBuilder<> IRB(&I);
2422 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2423 setOrigin(&I, getOrigin(&I, 0));
2424 }
2425
2426 void visitZExtInst(ZExtInst &I) {
2427 IRBuilder<> IRB(&I);
2428 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2429 setOrigin(&I, getOrigin(&I, 0));
2430 }
2431
2432 void visitTruncInst(TruncInst &I) {
2433 IRBuilder<> IRB(&I);
2434 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2435 setOrigin(&I, getOrigin(&I, 0));
2436 }
2437
2438 void visitBitCastInst(BitCastInst &I) {
2439 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2440 // a musttail call and a ret, don't instrument. New instructions are not
2441 // allowed after a musttail call.
2442 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2443 if (CI->isMustTailCall())
2444 return;
2445 IRBuilder<> IRB(&I);
2446 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2447 setOrigin(&I, getOrigin(&I, 0));
2448 }
2449
2450 void visitPtrToIntInst(PtrToIntInst &I) {
2451 IRBuilder<> IRB(&I);
2452 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2453 "_msprop_ptrtoint"));
2454 setOrigin(&I, getOrigin(&I, 0));
2455 }
2456
2457 void visitIntToPtrInst(IntToPtrInst &I) {
2458 IRBuilder<> IRB(&I);
2459 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2460 "_msprop_inttoptr"));
2461 setOrigin(&I, getOrigin(&I, 0));
2462 }
2463
2464 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2465 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2466 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2467 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2468 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2469 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2470
2471 /// Propagate shadow for bitwise AND.
2472 ///
2473 /// This code is exact, i.e. if, for example, a bit in the left argument
2474 /// is defined and 0, then neither the value not definedness of the
2475 /// corresponding bit in B don't affect the resulting shadow.
2476 void visitAnd(BinaryOperator &I) {
2477 IRBuilder<> IRB(&I);
2478 // "And" of 0 and a poisoned value results in unpoisoned value.
2479 // 1&1 => 1; 0&1 => 0; p&1 => p;
2480 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2481 // 1&p => p; 0&p => 0; p&p => p;
2482 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2483 Value *S1 = getShadow(&I, 0);
2484 Value *S2 = getShadow(&I, 1);
2485 Value *V1 = I.getOperand(0);
2486 Value *V2 = I.getOperand(1);
2487 if (V1->getType() != S1->getType()) {
2488 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2489 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2490 }
2491 Value *S1S2 = IRB.CreateAnd(S1, S2);
2492 Value *V1S2 = IRB.CreateAnd(V1, S2);
2493 Value *S1V2 = IRB.CreateAnd(S1, V2);
2494 setShadow(&I, IRB.CreateOr({S1S2, V1S2, S1V2}));
2495 setOriginForNaryOp(I);
2496 }
2497
2498 void visitOr(BinaryOperator &I) {
2499 IRBuilder<> IRB(&I);
2500 // "Or" of 1 and a poisoned value results in unpoisoned value:
2501 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2502 // 1|0 => 1; 0|0 => 0; p|0 => p;
2503 // 1|p => 1; 0|p => p; p|p => p;
2504 //
2505 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2506 //
2507 // If the "disjoint OR" property is violated, the result is poison, and
2508 // hence the entire shadow is uninitialized:
2509 // S = S | SignExt(V1 & V2 != 0)
2510 Value *S1 = getShadow(&I, 0);
2511 Value *S2 = getShadow(&I, 1);
2512 Value *V1 = I.getOperand(0);
2513 Value *V2 = I.getOperand(1);
2514 if (V1->getType() != S1->getType()) {
2515 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2516 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2517 }
2518
2519 Value *NotV1 = IRB.CreateNot(V1);
2520 Value *NotV2 = IRB.CreateNot(V2);
2521
2522 Value *S1S2 = IRB.CreateAnd(S1, S2);
2523 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2524 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2525
2526 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2527
2528 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2529 Value *V1V2 = IRB.CreateAnd(V1, V2);
2530 Value *DisjointOrShadow = IRB.CreateSExt(
2531 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2532 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2533 }
2534
2535 setShadow(&I, S);
2536 setOriginForNaryOp(I);
2537 }
2538
2539 /// Default propagation of shadow and/or origin.
2540 ///
2541 /// This class implements the general case of shadow propagation, used in all
2542 /// cases where we don't know and/or don't care about what the operation
2543 /// actually does. It converts all input shadow values to a common type
2544 /// (extending or truncating as necessary), and bitwise OR's them.
2545 ///
2546 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2547 /// fully initialized), and less prone to false positives.
2548 ///
2549 /// This class also implements the general case of origin propagation. For a
2550 /// Nary operation, result origin is set to the origin of an argument that is
2551 /// not entirely initialized. If there is more than one such arguments, the
2552 /// rightmost of them is picked. It does not matter which one is picked if all
2553 /// arguments are initialized.
2554 template <bool CombineShadow> class Combiner {
2555 Value *Shadow = nullptr;
2556 Value *Origin = nullptr;
2557 IRBuilder<> &IRB;
2558 MemorySanitizerVisitor *MSV;
2559
2560 public:
2561 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2562 : IRB(IRB), MSV(MSV) {}
2563
2564 /// Add a pair of shadow and origin values to the mix.
2565 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2566 if (CombineShadow) {
2567 assert(OpShadow);
2568 if (!Shadow)
2569 Shadow = OpShadow;
2570 else {
2571 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2572 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2573 }
2574 }
2575
2576 if (MSV->MS.TrackOrigins) {
2577 assert(OpOrigin);
2578 if (!Origin) {
2579 Origin = OpOrigin;
2580 } else {
2581 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2582 // No point in adding something that might result in 0 origin value.
2583 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2584 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2585 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2586 }
2587 }
2588 }
2589 return *this;
2590 }
2591
2592 /// Add an application value to the mix.
2593 Combiner &Add(Value *V) {
2594 Value *OpShadow = MSV->getShadow(V);
2595 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2596 return Add(OpShadow, OpOrigin);
2597 }
2598
2599 /// Set the current combined values as the given instruction's shadow
2600 /// and origin.
2601 void Done(Instruction *I) {
2602 if (CombineShadow) {
2603 assert(Shadow);
2604 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2605 MSV->setShadow(I, Shadow);
2606 }
2607 if (MSV->MS.TrackOrigins) {
2608 assert(Origin);
2609 MSV->setOrigin(I, Origin);
2610 }
2611 }
2612
2613 /// Store the current combined value at the specified origin
2614 /// location.
2615 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2616 if (MSV->MS.TrackOrigins) {
2617 assert(Origin);
2618 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2619 }
2620 }
2621 };
2622
2623 using ShadowAndOriginCombiner = Combiner<true>;
2624 using OriginCombiner = Combiner<false>;
2625
2626 /// Propagate origin for arbitrary operation.
2627 void setOriginForNaryOp(Instruction &I) {
2628 if (!MS.TrackOrigins)
2629 return;
2630 IRBuilder<> IRB(&I);
2631 OriginCombiner OC(this, IRB);
2632 for (Use &Op : I.operands())
2633 OC.Add(Op.get());
2634 OC.Done(&I);
2635 }
2636
2637 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2638 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2639 "Vector of pointers is not a valid shadow type");
2640 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2642 : Ty->getPrimitiveSizeInBits();
2643 }
2644
2645 /// Cast between two shadow types, extending or truncating as
2646 /// necessary.
2647 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2648 bool Signed = false) {
2649 Type *srcTy = V->getType();
2650 if (srcTy == dstTy)
2651 return V;
2652 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2653 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2654 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2655 return IRB.CreateICmpNE(V, getCleanShadow(V));
2656
2657 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2658 return IRB.CreateIntCast(V, dstTy, Signed);
2659 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2660 cast<VectorType>(dstTy)->getElementCount() ==
2661 cast<VectorType>(srcTy)->getElementCount())
2662 return IRB.CreateIntCast(V, dstTy, Signed);
2663 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2664 Value *V2 =
2665 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2666 return IRB.CreateBitCast(V2, dstTy);
2667 // TODO: handle struct types.
2668 }
2669
2670 /// Cast an application value to the type of its own shadow.
2671 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2672 Type *ShadowTy = getShadowTy(V);
2673 if (V->getType() == ShadowTy)
2674 return V;
2675 if (V->getType()->isPtrOrPtrVectorTy())
2676 return IRB.CreatePtrToInt(V, ShadowTy);
2677 else
2678 return IRB.CreateBitCast(V, ShadowTy);
2679 }
2680
2681 /// Propagate shadow for arbitrary operation.
2682 void handleShadowOr(Instruction &I) {
2683 IRBuilder<> IRB(&I);
2684 ShadowAndOriginCombiner SC(this, IRB);
2685 for (Use &Op : I.operands())
2686 SC.Add(Op.get());
2687 SC.Done(&I);
2688 }
2689
2690 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2691 // of elements.
2692 //
2693 // For example, suppose we have:
2694 // VectorA: <a1, a2, a3, a4, a5, a6>
2695 // VectorB: <b1, b2, b3, b4, b5, b6>
2696 // ReductionFactor: 3.
2697 // The output would be:
2698 // <a1|a2|a3, a4|a5|a6, b1|b2|b3, b4|b5|b6>
2699 //
2700 // This is convenient for instrumenting horizontal add/sub.
2701 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2702 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2703 Value *VectorA, Value *VectorB) {
2704 assert(isa<FixedVectorType>(VectorA->getType()));
2705 unsigned TotalNumElems =
2706 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2707
2708 if (VectorB) {
2709 assert(VectorA->getType() == VectorB->getType());
2710 TotalNumElems = TotalNumElems * 2;
2711 }
2712
2713 assert(TotalNumElems % ReductionFactor == 0);
2714
2715 Value *Or = nullptr;
2716
2717 IRBuilder<> IRB(&I);
2718 for (unsigned i = 0; i < ReductionFactor; i++) {
2719 SmallVector<int, 16> Mask;
2720 for (unsigned X = 0; X < TotalNumElems; X += ReductionFactor)
2721 Mask.push_back(X + i);
2722
2723 Value *Masked;
2724 if (VectorB)
2725 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2726 else
2727 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2728
2729 if (Or)
2730 Or = IRB.CreateOr(Or, Masked);
2731 else
2732 Or = Masked;
2733 }
2734
2735 return Or;
2736 }
2737
2738 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2739 /// fields.
2740 ///
2741 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2742 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2743 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I) {
2744 assert(I.arg_size() == 1 || I.arg_size() == 2);
2745
2746 assert(I.getType()->isVectorTy());
2747 assert(I.getArgOperand(0)->getType()->isVectorTy());
2748
2749 [[maybe_unused]] FixedVectorType *ParamType =
2750 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2751 assert((I.arg_size() != 2) ||
2752 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2753 [[maybe_unused]] FixedVectorType *ReturnType =
2754 cast<FixedVectorType>(I.getType());
2755 assert(ParamType->getNumElements() * I.arg_size() ==
2756 2 * ReturnType->getNumElements());
2757
2758 IRBuilder<> IRB(&I);
2759
2760 // Horizontal OR of shadow
2761 Value *FirstArgShadow = getShadow(&I, 0);
2762 Value *SecondArgShadow = nullptr;
2763 if (I.arg_size() == 2)
2764 SecondArgShadow = getShadow(&I, 1);
2765
2766 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2767 SecondArgShadow);
2768
2769 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2770
2771 setShadow(&I, OrShadow);
2772 setOriginForNaryOp(I);
2773 }
2774
2775 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2776 /// fields, with the parameters reinterpreted to have elements of a specified
2777 /// width. For example:
2778 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2779 /// conceptually operates on
2780 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2781 /// and can be handled with ReinterpretElemWidth == 16.
2782 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I,
2783 int ReinterpretElemWidth) {
2784 assert(I.arg_size() == 1 || I.arg_size() == 2);
2785
2786 assert(I.getType()->isVectorTy());
2787 assert(I.getArgOperand(0)->getType()->isVectorTy());
2788
2789 FixedVectorType *ParamType =
2790 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2791 assert((I.arg_size() != 2) ||
2792 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2793
2794 [[maybe_unused]] FixedVectorType *ReturnType =
2795 cast<FixedVectorType>(I.getType());
2796 assert(ParamType->getNumElements() * I.arg_size() ==
2797 2 * ReturnType->getNumElements());
2798
2799 IRBuilder<> IRB(&I);
2800
2801 FixedVectorType *ReinterpretShadowTy = nullptr;
2802 assert(isAligned(Align(ReinterpretElemWidth),
2803 ParamType->getPrimitiveSizeInBits()));
2804 ReinterpretShadowTy = FixedVectorType::get(
2805 IRB.getIntNTy(ReinterpretElemWidth),
2806 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2807
2808 // Horizontal OR of shadow
2809 Value *FirstArgShadow = getShadow(&I, 0);
2810 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2811
2812 // If we had two parameters each with an odd number of elements, the total
2813 // number of elements is even, but we have never seen this in extant
2814 // instruction sets, so we enforce that each parameter must have an even
2815 // number of elements.
2817 Align(2),
2818 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2819
2820 Value *SecondArgShadow = nullptr;
2821 if (I.arg_size() == 2) {
2822 SecondArgShadow = getShadow(&I, 1);
2823 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2824 }
2825
2826 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, FirstArgShadow,
2827 SecondArgShadow);
2828
2829 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2830
2831 setShadow(&I, OrShadow);
2832 setOriginForNaryOp(I);
2833 }
2834
2835 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2836
2837 // Handle multiplication by constant.
2838 //
2839 // Handle a special case of multiplication by constant that may have one or
2840 // more zeros in the lower bits. This makes corresponding number of lower bits
2841 // of the result zero as well. We model it by shifting the other operand
2842 // shadow left by the required number of bits. Effectively, we transform
2843 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2844 // We use multiplication by 2**N instead of shift to cover the case of
2845 // multiplication by 0, which may occur in some elements of a vector operand.
2846 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2847 Value *OtherArg) {
2848 Constant *ShadowMul;
2849 Type *Ty = ConstArg->getType();
2850 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2851 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2852 Type *EltTy = VTy->getElementType();
2854 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2855 if (ConstantInt *Elt =
2857 const APInt &V = Elt->getValue();
2858 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2859 Elements.push_back(ConstantInt::get(EltTy, V2));
2860 } else {
2861 Elements.push_back(ConstantInt::get(EltTy, 1));
2862 }
2863 }
2864 ShadowMul = ConstantVector::get(Elements);
2865 } else {
2866 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2867 const APInt &V = Elt->getValue();
2868 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2869 ShadowMul = ConstantInt::get(Ty, V2);
2870 } else {
2871 ShadowMul = ConstantInt::get(Ty, 1);
2872 }
2873 }
2874
2875 IRBuilder<> IRB(&I);
2876 setShadow(&I,
2877 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2878 setOrigin(&I, getOrigin(OtherArg));
2879 }
2880
2881 void visitMul(BinaryOperator &I) {
2882 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2883 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2884 if (constOp0 && !constOp1)
2885 handleMulByConstant(I, constOp0, I.getOperand(1));
2886 else if (constOp1 && !constOp0)
2887 handleMulByConstant(I, constOp1, I.getOperand(0));
2888 else
2889 handleShadowOr(I);
2890 }
2891
2892 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2893 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2894 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2895 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2896 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2897 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2898
2899 void handleIntegerDiv(Instruction &I) {
2900 IRBuilder<> IRB(&I);
2901 // Strict on the second argument.
2902 insertCheckShadowOf(I.getOperand(1), &I);
2903 setShadow(&I, getShadow(&I, 0));
2904 setOrigin(&I, getOrigin(&I, 0));
2905 }
2906
2907 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2908 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2909 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2910 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2911
2912 // Floating point division is side-effect free. We can not require that the
2913 // divisor is fully initialized and must propagate shadow. See PR37523.
2914 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2915 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2916
2917 /// Instrument == and != comparisons.
2918 ///
2919 /// Sometimes the comparison result is known even if some of the bits of the
2920 /// arguments are not.
2921 void handleEqualityComparison(ICmpInst &I) {
2922 IRBuilder<> IRB(&I);
2923 Value *A = I.getOperand(0);
2924 Value *B = I.getOperand(1);
2925 Value *Sa = getShadow(A);
2926 Value *Sb = getShadow(B);
2927
2928 // Get rid of pointers and vectors of pointers.
2929 // For ints (and vectors of ints), types of A and Sa match,
2930 // and this is a no-op.
2931 A = IRB.CreatePointerCast(A, Sa->getType());
2932 B = IRB.CreatePointerCast(B, Sb->getType());
2933
2934 // A == B <==> (C = A^B) == 0
2935 // A != B <==> (C = A^B) != 0
2936 // Sc = Sa | Sb
2937 Value *C = IRB.CreateXor(A, B);
2938 Value *Sc = IRB.CreateOr(Sa, Sb);
2939 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2940 // Result is defined if one of the following is true
2941 // * there is a defined 1 bit in C
2942 // * C is fully defined
2943 // Si = !(C & ~Sc) && Sc
2945 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2946 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2947 Value *RHS =
2948 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2949 Value *Si = IRB.CreateAnd(LHS, RHS);
2950 Si->setName("_msprop_icmp");
2951 setShadow(&I, Si);
2952 setOriginForNaryOp(I);
2953 }
2954
2955 /// Instrument relational comparisons.
2956 ///
2957 /// This function does exact shadow propagation for all relational
2958 /// comparisons of integers, pointers and vectors of those.
2959 /// FIXME: output seems suboptimal when one of the operands is a constant
2960 void handleRelationalComparisonExact(ICmpInst &I) {
2961 IRBuilder<> IRB(&I);
2962 Value *A = I.getOperand(0);
2963 Value *B = I.getOperand(1);
2964 Value *Sa = getShadow(A);
2965 Value *Sb = getShadow(B);
2966
2967 // Get rid of pointers and vectors of pointers.
2968 // For ints (and vectors of ints), types of A and Sa match,
2969 // and this is a no-op.
2970 A = IRB.CreatePointerCast(A, Sa->getType());
2971 B = IRB.CreatePointerCast(B, Sb->getType());
2972
2973 // Let [a0, a1] be the interval of possible values of A, taking into account
2974 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
2975 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
2976 bool IsSigned = I.isSigned();
2977
2978 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
2979 if (IsSigned) {
2980 // Sign-flip to map from signed range to unsigned range. Relation A vs B
2981 // should be preserved, if checked with `getUnsignedPredicate()`.
2982 // Relationship between Amin, Amax, Bmin, Bmax also will not be
2983 // affected, as they are created by effectively adding/substructing from
2984 // A (or B) a value, derived from shadow, with no overflow, either
2985 // before or after sign flip.
2986 APInt MinVal =
2987 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
2988 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
2989 }
2990 // Minimize undefined bits.
2991 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
2992 Value *Max = IRB.CreateOr(V, S);
2993 return std::make_pair(Min, Max);
2994 };
2995
2996 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
2997 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
2998 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
2999 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3000
3001 Value *Si = IRB.CreateXor(S1, S2);
3002 setShadow(&I, Si);
3003 setOriginForNaryOp(I);
3004 }
3005
3006 /// Instrument signed relational comparisons.
3007 ///
3008 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3009 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3010 void handleSignedRelationalComparison(ICmpInst &I) {
3011 Constant *constOp;
3012 Value *op = nullptr;
3014 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3015 op = I.getOperand(0);
3016 pre = I.getPredicate();
3017 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3018 op = I.getOperand(1);
3019 pre = I.getSwappedPredicate();
3020 } else {
3021 handleShadowOr(I);
3022 return;
3023 }
3024
3025 if ((constOp->isNullValue() &&
3026 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3027 (constOp->isAllOnesValue() &&
3028 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3029 IRBuilder<> IRB(&I);
3030 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3031 "_msprop_icmp_s");
3032 setShadow(&I, Shadow);
3033 setOrigin(&I, getOrigin(op));
3034 } else {
3035 handleShadowOr(I);
3036 }
3037 }
3038
3039 void visitICmpInst(ICmpInst &I) {
3040 if (!ClHandleICmp) {
3041 handleShadowOr(I);
3042 return;
3043 }
3044 if (I.isEquality()) {
3045 handleEqualityComparison(I);
3046 return;
3047 }
3048
3049 assert(I.isRelational());
3050 if (ClHandleICmpExact) {
3051 handleRelationalComparisonExact(I);
3052 return;
3053 }
3054 if (I.isSigned()) {
3055 handleSignedRelationalComparison(I);
3056 return;
3057 }
3058
3059 assert(I.isUnsigned());
3060 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3061 handleRelationalComparisonExact(I);
3062 return;
3063 }
3064
3065 handleShadowOr(I);
3066 }
3067
3068 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3069
3070 void handleShift(BinaryOperator &I) {
3071 IRBuilder<> IRB(&I);
3072 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3073 // Otherwise perform the same shift on S1.
3074 Value *S1 = getShadow(&I, 0);
3075 Value *S2 = getShadow(&I, 1);
3076 Value *S2Conv =
3077 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3078 Value *V2 = I.getOperand(1);
3079 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3080 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3081 setOriginForNaryOp(I);
3082 }
3083
3084 void visitShl(BinaryOperator &I) { handleShift(I); }
3085 void visitAShr(BinaryOperator &I) { handleShift(I); }
3086 void visitLShr(BinaryOperator &I) { handleShift(I); }
3087
3088 void handleFunnelShift(IntrinsicInst &I) {
3089 IRBuilder<> IRB(&I);
3090 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3091 // Otherwise perform the same shift on S0 and S1.
3092 Value *S0 = getShadow(&I, 0);
3093 Value *S1 = getShadow(&I, 1);
3094 Value *S2 = getShadow(&I, 2);
3095 Value *S2Conv =
3096 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3097 Value *V2 = I.getOperand(2);
3098 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3099 {S0, S1, V2});
3100 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3101 setOriginForNaryOp(I);
3102 }
3103
3104 /// Instrument llvm.memmove
3105 ///
3106 /// At this point we don't know if llvm.memmove will be inlined or not.
3107 /// If we don't instrument it and it gets inlined,
3108 /// our interceptor will not kick in and we will lose the memmove.
3109 /// If we instrument the call here, but it does not get inlined,
3110 /// we will memove the shadow twice: which is bad in case
3111 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3112 ///
3113 /// Similar situation exists for memcpy and memset.
3114 void visitMemMoveInst(MemMoveInst &I) {
3115 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3116 IRBuilder<> IRB(&I);
3117 IRB.CreateCall(MS.MemmoveFn,
3118 {I.getArgOperand(0), I.getArgOperand(1),
3119 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3121 }
3122
3123 /// Instrument memcpy
3124 ///
3125 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3126 /// unfortunate as it may slowdown small constant memcpys.
3127 /// FIXME: consider doing manual inline for small constant sizes and proper
3128 /// alignment.
3129 ///
3130 /// Note: This also handles memcpy.inline, which promises no calls to external
3131 /// functions as an optimization. However, with instrumentation enabled this
3132 /// is difficult to promise; additionally, we know that the MSan runtime
3133 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3134 /// instrumentation it's safe to turn memcpy.inline into a call to
3135 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3136 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3137 void visitMemCpyInst(MemCpyInst &I) {
3138 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3139 IRBuilder<> IRB(&I);
3140 IRB.CreateCall(MS.MemcpyFn,
3141 {I.getArgOperand(0), I.getArgOperand(1),
3142 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3144 }
3145
3146 // Same as memcpy.
3147 void visitMemSetInst(MemSetInst &I) {
3148 IRBuilder<> IRB(&I);
3149 IRB.CreateCall(
3150 MS.MemsetFn,
3151 {I.getArgOperand(0),
3152 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3153 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3155 }
3156
3157 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3158
3159 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3160
3161 /// Handle vector store-like intrinsics.
3162 ///
3163 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3164 /// has 1 pointer argument and 1 vector argument, returns void.
3165 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3166 assert(I.arg_size() == 2);
3167
3168 IRBuilder<> IRB(&I);
3169 Value *Addr = I.getArgOperand(0);
3170 Value *Shadow = getShadow(&I, 1);
3171 Value *ShadowPtr, *OriginPtr;
3172
3173 // We don't know the pointer alignment (could be unaligned SSE store!).
3174 // Have to assume to worst case.
3175 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3176 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3177 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3178
3180 insertCheckShadowOf(Addr, &I);
3181
3182 // FIXME: factor out common code from materializeStores
3183 if (MS.TrackOrigins)
3184 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3185 return true;
3186 }
3187
3188 /// Handle vector load-like intrinsics.
3189 ///
3190 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3191 /// has 1 pointer argument, returns a vector.
3192 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3193 assert(I.arg_size() == 1);
3194
3195 IRBuilder<> IRB(&I);
3196 Value *Addr = I.getArgOperand(0);
3197
3198 Type *ShadowTy = getShadowTy(&I);
3199 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3200 if (PropagateShadow) {
3201 // We don't know the pointer alignment (could be unaligned SSE load!).
3202 // Have to assume to worst case.
3203 const Align Alignment = Align(1);
3204 std::tie(ShadowPtr, OriginPtr) =
3205 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3206 setShadow(&I,
3207 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3208 } else {
3209 setShadow(&I, getCleanShadow(&I));
3210 }
3211
3213 insertCheckShadowOf(Addr, &I);
3214
3215 if (MS.TrackOrigins) {
3216 if (PropagateShadow)
3217 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3218 else
3219 setOrigin(&I, getCleanOrigin());
3220 }
3221 return true;
3222 }
3223
3224 /// Handle (SIMD arithmetic)-like intrinsics.
3225 ///
3226 /// Instrument intrinsics with any number of arguments of the same type [*],
3227 /// equal to the return type, plus a specified number of trailing flags of
3228 /// any type.
3229 ///
3230 /// [*] The type should be simple (no aggregates or pointers; vectors are
3231 /// fine).
3232 ///
3233 /// Caller guarantees that this intrinsic does not access memory.
3234 ///
3235 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3236 /// by this handler. See horizontalReduce().
3237 ///
3238 /// TODO: permutation intrinsics are also often incorrectly matched.
3239 [[maybe_unused]] bool
3240 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3241 unsigned int trailingFlags) {
3242 Type *RetTy = I.getType();
3243 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3244 return false;
3245
3246 unsigned NumArgOperands = I.arg_size();
3247 assert(NumArgOperands >= trailingFlags);
3248 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3249 Type *Ty = I.getArgOperand(i)->getType();
3250 if (Ty != RetTy)
3251 return false;
3252 }
3253
3254 IRBuilder<> IRB(&I);
3255 ShadowAndOriginCombiner SC(this, IRB);
3256 for (unsigned i = 0; i < NumArgOperands; ++i)
3257 SC.Add(I.getArgOperand(i));
3258 SC.Done(&I);
3259
3260 return true;
3261 }
3262
3263 /// Returns whether it was able to heuristically instrument unknown
3264 /// intrinsics.
3265 ///
3266 /// The main purpose of this code is to do something reasonable with all
3267 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3268 /// We recognize several classes of intrinsics by their argument types and
3269 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3270 /// sure that we know what the intrinsic does.
3271 ///
3272 /// We special-case intrinsics where this approach fails. See llvm.bswap
3273 /// handling as an example of that.
3274 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3275 unsigned NumArgOperands = I.arg_size();
3276 if (NumArgOperands == 0)
3277 return false;
3278
3279 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3280 I.getArgOperand(1)->getType()->isVectorTy() &&
3281 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3282 // This looks like a vector store.
3283 return handleVectorStoreIntrinsic(I);
3284 }
3285
3286 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3287 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3288 // This looks like a vector load.
3289 return handleVectorLoadIntrinsic(I);
3290 }
3291
3292 if (I.doesNotAccessMemory())
3293 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3294 return true;
3295
3296 // FIXME: detect and handle SSE maskstore/maskload?
3297 // Some cases are now handled in handleAVXMasked{Load,Store}.
3298 return false;
3299 }
3300
3301 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3302 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3304 dumpInst(I);
3305
3306 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3307 << "\n");
3308 return true;
3309 } else
3310 return false;
3311 }
3312
3313 void handleInvariantGroup(IntrinsicInst &I) {
3314 setShadow(&I, getShadow(&I, 0));
3315 setOrigin(&I, getOrigin(&I, 0));
3316 }
3317
3318 void handleLifetimeStart(IntrinsicInst &I) {
3319 if (!PoisonStack)
3320 return;
3321 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3322 if (AI)
3323 LifetimeStartList.push_back(std::make_pair(&I, AI));
3324 }
3325
3326 void handleBswap(IntrinsicInst &I) {
3327 IRBuilder<> IRB(&I);
3328 Value *Op = I.getArgOperand(0);
3329 Type *OpType = Op->getType();
3330 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3331 getShadow(Op)));
3332 setOrigin(&I, getOrigin(Op));
3333 }
3334
3335 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3336 // and a 1. If the input is all zero, it is fully initialized iff
3337 // !is_zero_poison.
3338 //
3339 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3340 // concrete value 0/1, and ? is an uninitialized bit:
3341 // - 0001 0??? is fully initialized
3342 // - 000? ???? is fully uninitialized (*)
3343 // - ???? ???? is fully uninitialized
3344 // - 0000 0000 is fully uninitialized if is_zero_poison,
3345 // fully initialized otherwise
3346 //
3347 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3348 // only need to poison 4 bits.
3349 //
3350 // OutputShadow =
3351 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3352 // || (is_zero_poison && AllZeroSrc)
3353 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3354 IRBuilder<> IRB(&I);
3355 Value *Src = I.getArgOperand(0);
3356 Value *SrcShadow = getShadow(Src);
3357
3358 Value *False = IRB.getInt1(false);
3359 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3360 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3361 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3362 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3363
3364 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3365 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3366
3367 Value *NotAllZeroShadow =
3368 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3369 Value *OutputShadow =
3370 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3371
3372 // If zero poison is requested, mix in with the shadow
3373 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3374 if (!IsZeroPoison->isZeroValue()) {
3375 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3376 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3377 }
3378
3379 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3380
3381 setShadow(&I, OutputShadow);
3382 setOriginForNaryOp(I);
3383 }
3384
3385 /// Handle Arm NEON vector convert intrinsics.
3386 ///
3387 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3388 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64(double)
3389 ///
3390 /// For x86 SSE vector convert intrinsics, see
3391 /// handleSSEVectorConvertIntrinsic().
3392 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I) {
3393 assert(I.arg_size() == 1);
3394
3395 IRBuilder<> IRB(&I);
3396 Value *S0 = getShadow(&I, 0);
3397
3398 /// For scalars:
3399 /// Since they are converting from floating-point to integer, the output is
3400 /// - fully uninitialized if *any* bit of the input is uninitialized
3401 /// - fully ininitialized if all bits of the input are ininitialized
3402 /// We apply the same principle on a per-field basis for vectors.
3403 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3404 getShadowTy(&I));
3405 setShadow(&I, OutShadow);
3406 setOriginForNaryOp(I);
3407 }
3408
3409 /// Some instructions have additional zero-elements in the return type
3410 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3411 ///
3412 /// This function will return a vector type with the same number of elements
3413 /// as the input, but same per-element width as the return value e.g.,
3414 /// <8 x i8>.
3415 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3416 assert(isa<FixedVectorType>(getShadowTy(&I)));
3417 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3418
3419 // TODO: generalize beyond 2x?
3420 if (ShadowType->getElementCount() ==
3421 cast<VectorType>(Src->getType())->getElementCount() * 2)
3422 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3423
3424 assert(ShadowType->getElementCount() ==
3425 cast<VectorType>(Src->getType())->getElementCount());
3426
3427 return ShadowType;
3428 }
3429
3430 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3431 /// to match the length of the shadow for the instruction.
3432 /// If scalar types of the vectors are different, it will use the type of the
3433 /// input vector.
3434 /// This is more type-safe than CreateShadowCast().
3435 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3436 IRBuilder<> IRB(&I);
3438 assert(isa<FixedVectorType>(I.getType()));
3439
3440 Value *FullShadow = getCleanShadow(&I);
3441 unsigned ShadowNumElems =
3442 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3443 unsigned FullShadowNumElems =
3444 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3445
3446 assert((ShadowNumElems == FullShadowNumElems) ||
3447 (ShadowNumElems * 2 == FullShadowNumElems));
3448
3449 if (ShadowNumElems == FullShadowNumElems) {
3450 FullShadow = Shadow;
3451 } else {
3452 // TODO: generalize beyond 2x?
3453 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3454 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3455
3456 // Append zeros
3457 FullShadow =
3458 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3459 }
3460
3461 return FullShadow;
3462 }
3463
3464 /// Handle x86 SSE vector conversion.
3465 ///
3466 /// e.g., single-precision to half-precision conversion:
3467 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3468 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3469 ///
3470 /// floating-point to integer:
3471 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3472 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3473 ///
3474 /// Note: if the output has more elements, they are zero-initialized (and
3475 /// therefore the shadow will also be initialized).
3476 ///
3477 /// This differs from handleSSEVectorConvertIntrinsic() because it
3478 /// propagates uninitialized shadow (instead of checking the shadow).
3479 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3480 bool HasRoundingMode) {
3481 if (HasRoundingMode) {
3482 assert(I.arg_size() == 2);
3483 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3484 assert(RoundingMode->getType()->isIntegerTy());
3485 } else {
3486 assert(I.arg_size() == 1);
3487 }
3488
3489 Value *Src = I.getArgOperand(0);
3490 assert(Src->getType()->isVectorTy());
3491
3492 // The return type might have more elements than the input.
3493 // Temporarily shrink the return type's number of elements.
3494 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3495
3496 IRBuilder<> IRB(&I);
3497 Value *S0 = getShadow(&I, 0);
3498
3499 /// For scalars:
3500 /// Since they are converting to and/or from floating-point, the output is:
3501 /// - fully uninitialized if *any* bit of the input is uninitialized
3502 /// - fully ininitialized if all bits of the input are ininitialized
3503 /// We apply the same principle on a per-field basis for vectors.
3504 Value *Shadow =
3505 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3506
3507 // The return type might have more elements than the input.
3508 // Extend the return type back to its original width if necessary.
3509 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3510
3511 setShadow(&I, FullShadow);
3512 setOriginForNaryOp(I);
3513 }
3514
3515 // Instrument x86 SSE vector convert intrinsic.
3516 //
3517 // This function instruments intrinsics like cvtsi2ss:
3518 // %Out = int_xxx_cvtyyy(%ConvertOp)
3519 // or
3520 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3521 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3522 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3523 // elements from \p CopyOp.
3524 // In most cases conversion involves floating-point value which may trigger a
3525 // hardware exception when not fully initialized. For this reason we require
3526 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3527 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3528 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3529 // return a fully initialized value.
3530 //
3531 // For Arm NEON vector convert intrinsics, see
3532 // handleNEONVectorConvertIntrinsic().
3533 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3534 bool HasRoundingMode = false) {
3535 IRBuilder<> IRB(&I);
3536 Value *CopyOp, *ConvertOp;
3537
3538 assert((!HasRoundingMode ||
3539 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3540 "Invalid rounding mode");
3541
3542 switch (I.arg_size() - HasRoundingMode) {
3543 case 2:
3544 CopyOp = I.getArgOperand(0);
3545 ConvertOp = I.getArgOperand(1);
3546 break;
3547 case 1:
3548 ConvertOp = I.getArgOperand(0);
3549 CopyOp = nullptr;
3550 break;
3551 default:
3552 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3553 }
3554
3555 // The first *NumUsedElements* elements of ConvertOp are converted to the
3556 // same number of output elements. The rest of the output is copied from
3557 // CopyOp, or (if not available) filled with zeroes.
3558 // Combine shadow for elements of ConvertOp that are used in this operation,
3559 // and insert a check.
3560 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3561 // int->any conversion.
3562 Value *ConvertShadow = getShadow(ConvertOp);
3563 Value *AggShadow = nullptr;
3564 if (ConvertOp->getType()->isVectorTy()) {
3565 AggShadow = IRB.CreateExtractElement(
3566 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3567 for (int i = 1; i < NumUsedElements; ++i) {
3568 Value *MoreShadow = IRB.CreateExtractElement(
3569 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3570 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3571 }
3572 } else {
3573 AggShadow = ConvertShadow;
3574 }
3575 assert(AggShadow->getType()->isIntegerTy());
3576 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3577
3578 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3579 // ConvertOp.
3580 if (CopyOp) {
3581 assert(CopyOp->getType() == I.getType());
3582 assert(CopyOp->getType()->isVectorTy());
3583 Value *ResultShadow = getShadow(CopyOp);
3584 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3585 for (int i = 0; i < NumUsedElements; ++i) {
3586 ResultShadow = IRB.CreateInsertElement(
3587 ResultShadow, ConstantInt::getNullValue(EltTy),
3588 ConstantInt::get(IRB.getInt32Ty(), i));
3589 }
3590 setShadow(&I, ResultShadow);
3591 setOrigin(&I, getOrigin(CopyOp));
3592 } else {
3593 setShadow(&I, getCleanShadow(&I));
3594 setOrigin(&I, getCleanOrigin());
3595 }
3596 }
3597
3598 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3599 // zeroes if it is zero, and all ones otherwise.
3600 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3601 if (S->getType()->isVectorTy())
3602 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3603 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3604 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3605 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3606 }
3607
3608 // Given a vector, extract its first element, and return all
3609 // zeroes if it is zero, and all ones otherwise.
3610 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3611 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3612 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3613 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3614 }
3615
3616 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3617 Type *T = S->getType();
3618 assert(T->isVectorTy());
3619 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3620 return IRB.CreateSExt(S2, T);
3621 }
3622
3623 // Instrument vector shift intrinsic.
3624 //
3625 // This function instruments intrinsics like int_x86_avx2_psll_w.
3626 // Intrinsic shifts %In by %ShiftSize bits.
3627 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3628 // size, and the rest is ignored. Behavior is defined even if shift size is
3629 // greater than register (or field) width.
3630 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3631 assert(I.arg_size() == 2);
3632 IRBuilder<> IRB(&I);
3633 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3634 // Otherwise perform the same shift on S1.
3635 Value *S1 = getShadow(&I, 0);
3636 Value *S2 = getShadow(&I, 1);
3637 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3638 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3639 Value *V1 = I.getOperand(0);
3640 Value *V2 = I.getOperand(1);
3641 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3642 {IRB.CreateBitCast(S1, V1->getType()), V2});
3643 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3644 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3645 setOriginForNaryOp(I);
3646 }
3647
3648 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3649 // vectors.
3650 Type *getMMXVectorTy(unsigned EltSizeInBits,
3651 unsigned X86_MMXSizeInBits = 64) {
3652 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3653 "Illegal MMX vector element size");
3654 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3655 X86_MMXSizeInBits / EltSizeInBits);
3656 }
3657
3658 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3659 // intrinsic.
3660 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3661 switch (id) {
3662 case Intrinsic::x86_sse2_packsswb_128:
3663 case Intrinsic::x86_sse2_packuswb_128:
3664 return Intrinsic::x86_sse2_packsswb_128;
3665
3666 case Intrinsic::x86_sse2_packssdw_128:
3667 case Intrinsic::x86_sse41_packusdw:
3668 return Intrinsic::x86_sse2_packssdw_128;
3669
3670 case Intrinsic::x86_avx2_packsswb:
3671 case Intrinsic::x86_avx2_packuswb:
3672 return Intrinsic::x86_avx2_packsswb;
3673
3674 case Intrinsic::x86_avx2_packssdw:
3675 case Intrinsic::x86_avx2_packusdw:
3676 return Intrinsic::x86_avx2_packssdw;
3677
3678 case Intrinsic::x86_mmx_packsswb:
3679 case Intrinsic::x86_mmx_packuswb:
3680 return Intrinsic::x86_mmx_packsswb;
3681
3682 case Intrinsic::x86_mmx_packssdw:
3683 return Intrinsic::x86_mmx_packssdw;
3684
3685 case Intrinsic::x86_avx512_packssdw_512:
3686 case Intrinsic::x86_avx512_packusdw_512:
3687 return Intrinsic::x86_avx512_packssdw_512;
3688
3689 case Intrinsic::x86_avx512_packsswb_512:
3690 case Intrinsic::x86_avx512_packuswb_512:
3691 return Intrinsic::x86_avx512_packsswb_512;
3692
3693 default:
3694 llvm_unreachable("unexpected intrinsic id");
3695 }
3696 }
3697
3698 // Instrument vector pack intrinsic.
3699 //
3700 // This function instruments intrinsics like x86_mmx_packsswb, that
3701 // packs elements of 2 input vectors into half as many bits with saturation.
3702 // Shadow is propagated with the signed variant of the same intrinsic applied
3703 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3704 // MMXEltSizeInBits is used only for x86mmx arguments.
3705 //
3706 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3707 void handleVectorPackIntrinsic(IntrinsicInst &I,
3708 unsigned MMXEltSizeInBits = 0) {
3709 assert(I.arg_size() == 2);
3710 IRBuilder<> IRB(&I);
3711 Value *S1 = getShadow(&I, 0);
3712 Value *S2 = getShadow(&I, 1);
3713 assert(S1->getType()->isVectorTy());
3714
3715 // SExt and ICmpNE below must apply to individual elements of input vectors.
3716 // In case of x86mmx arguments, cast them to appropriate vector types and
3717 // back.
3718 Type *T =
3719 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3720 if (MMXEltSizeInBits) {
3721 S1 = IRB.CreateBitCast(S1, T);
3722 S2 = IRB.CreateBitCast(S2, T);
3723 }
3724 Value *S1_ext =
3726 Value *S2_ext =
3728 if (MMXEltSizeInBits) {
3729 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3730 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3731 }
3732
3733 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3734 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3735 "_msprop_vector_pack");
3736 if (MMXEltSizeInBits)
3737 S = IRB.CreateBitCast(S, getShadowTy(&I));
3738 setShadow(&I, S);
3739 setOriginForNaryOp(I);
3740 }
3741
3742 // Convert `Mask` into `<n x i1>`.
3743 Constant *createDppMask(unsigned Width, unsigned Mask) {
3745 for (auto &M : R) {
3746 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3747 Mask >>= 1;
3748 }
3749 return ConstantVector::get(R);
3750 }
3751
3752 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3753 // arg is poisoned, entire dot product is poisoned.
3754 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3755 unsigned DstMask) {
3756 const unsigned Width =
3757 cast<FixedVectorType>(S->getType())->getNumElements();
3758
3759 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3761 Value *SElem = IRB.CreateOrReduce(S);
3762 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3763 Value *DstMaskV = createDppMask(Width, DstMask);
3764
3765 return IRB.CreateSelect(
3766 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3767 }
3768
3769 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3770 //
3771 // 2 and 4 element versions produce single scalar of dot product, and then
3772 // puts it into elements of output vector, selected by 4 lowest bits of the
3773 // mask. Top 4 bits of the mask control which elements of input to use for dot
3774 // product.
3775 //
3776 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3777 // mask. According to the spec it just operates as 4 element version on first
3778 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3779 // output.
3780 void handleDppIntrinsic(IntrinsicInst &I) {
3781 IRBuilder<> IRB(&I);
3782
3783 Value *S0 = getShadow(&I, 0);
3784 Value *S1 = getShadow(&I, 1);
3785 Value *S = IRB.CreateOr(S0, S1);
3786
3787 const unsigned Width =
3788 cast<FixedVectorType>(S->getType())->getNumElements();
3789 assert(Width == 2 || Width == 4 || Width == 8);
3790
3791 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3792 const unsigned SrcMask = Mask >> 4;
3793 const unsigned DstMask = Mask & 0xf;
3794
3795 // Calculate shadow as `<n x i1>`.
3796 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3797 if (Width == 8) {
3798 // First 4 elements of shadow are already calculated. `makeDppShadow`
3799 // operats on 32 bit masks, so we can just shift masks, and repeat.
3800 SI1 = IRB.CreateOr(
3801 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3802 }
3803 // Extend to real size of shadow, poisoning either all or none bits of an
3804 // element.
3805 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3806
3807 setShadow(&I, S);
3808 setOriginForNaryOp(I);
3809 }
3810
3811 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3812 C = CreateAppToShadowCast(IRB, C);
3813 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3814 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3815 C = IRB.CreateAShr(C, ElSize - 1);
3816 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3817 return IRB.CreateTrunc(C, FVT);
3818 }
3819
3820 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3821 void handleBlendvIntrinsic(IntrinsicInst &I) {
3822 Value *C = I.getOperand(2);
3823 Value *T = I.getOperand(1);
3824 Value *F = I.getOperand(0);
3825
3826 Value *Sc = getShadow(&I, 2);
3827 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3828
3829 {
3830 IRBuilder<> IRB(&I);
3831 // Extract top bit from condition and its shadow.
3832 C = convertBlendvToSelectMask(IRB, C);
3833 Sc = convertBlendvToSelectMask(IRB, Sc);
3834
3835 setShadow(C, Sc);
3836 setOrigin(C, Oc);
3837 }
3838
3839 handleSelectLikeInst(I, C, T, F);
3840 }
3841
3842 // Instrument sum-of-absolute-differences intrinsic.
3843 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3844 const unsigned SignificantBitsPerResultElement = 16;
3845 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3846 unsigned ZeroBitsPerResultElement =
3847 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3848
3849 IRBuilder<> IRB(&I);
3850 auto *Shadow0 = getShadow(&I, 0);
3851 auto *Shadow1 = getShadow(&I, 1);
3852 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3853 S = IRB.CreateBitCast(S, ResTy);
3854 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3855 ResTy);
3856 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3857 S = IRB.CreateBitCast(S, getShadowTy(&I));
3858 setShadow(&I, S);
3859 setOriginForNaryOp(I);
3860 }
3861
3862 // Instrument multiply-add(-accumulate)? intrinsics.
3863 //
3864 // e.g., Two operands:
3865 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3866 //
3867 // Two operands which require an EltSizeInBits override:
3868 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3869 //
3870 // Three operands:
3871 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3872 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3873 // (this is equivalent to multiply-add on %a and %b, followed by
3874 // adding/"accumulating" %s. "Accumulation" stores the result in one
3875 // of the source registers, but this accumulate vs. add distinction
3876 // is lost when dealing with LLVM intrinsics.)
3877 void handleVectorPmaddIntrinsic(IntrinsicInst &I, unsigned ReductionFactor,
3878 unsigned EltSizeInBits = 0) {
3879 IRBuilder<> IRB(&I);
3880
3881 [[maybe_unused]] FixedVectorType *ReturnType =
3882 cast<FixedVectorType>(I.getType());
3883 assert(isa<FixedVectorType>(ReturnType));
3884
3885 // Vectors A and B, and shadows
3886 Value *Va = nullptr;
3887 Value *Vb = nullptr;
3888 Value *Sa = nullptr;
3889 Value *Sb = nullptr;
3890
3891 assert(I.arg_size() == 2 || I.arg_size() == 3);
3892 if (I.arg_size() == 2) {
3893 Va = I.getOperand(0);
3894 Vb = I.getOperand(1);
3895
3896 Sa = getShadow(&I, 0);
3897 Sb = getShadow(&I, 1);
3898 } else if (I.arg_size() == 3) {
3899 // Operand 0 is the accumulator. We will deal with that below.
3900 Va = I.getOperand(1);
3901 Vb = I.getOperand(2);
3902
3903 Sa = getShadow(&I, 1);
3904 Sb = getShadow(&I, 2);
3905 }
3906
3907 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3908 assert(ParamType == Vb->getType());
3909
3910 assert(ParamType->getPrimitiveSizeInBits() ==
3911 ReturnType->getPrimitiveSizeInBits());
3912
3913 if (I.arg_size() == 3) {
3914 [[maybe_unused]] auto *AccumulatorType =
3915 cast<FixedVectorType>(I.getOperand(0)->getType());
3916 assert(AccumulatorType == ReturnType);
3917 }
3918
3919 FixedVectorType *ImplicitReturnType = ReturnType;
3920 // Step 1: instrument multiplication of corresponding vector elements
3921 if (EltSizeInBits) {
3922 ImplicitReturnType = cast<FixedVectorType>(
3923 getMMXVectorTy(EltSizeInBits * ReductionFactor,
3924 ParamType->getPrimitiveSizeInBits()));
3925 ParamType = cast<FixedVectorType>(
3926 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
3927
3928 Va = IRB.CreateBitCast(Va, ParamType);
3929 Vb = IRB.CreateBitCast(Vb, ParamType);
3930
3931 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
3932 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
3933 } else {
3934 assert(ParamType->getNumElements() ==
3935 ReturnType->getNumElements() * ReductionFactor);
3936 }
3937
3938 // Multiplying an *initialized* zero by an uninitialized element results in
3939 // an initialized zero element.
3940 //
3941 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
3942 // results in an unpoisoned value. We can therefore adapt the visitAnd()
3943 // instrumentation:
3944 // OutShadow = (SaNonZero & SbNonZero)
3945 // | (VaNonZero & SbNonZero)
3946 // | (SaNonZero & VbNonZero)
3947 // where non-zero is checked on a per-element basis (not per bit).
3948 Value *SZero = Constant::getNullValue(Va->getType());
3949 Value *VZero = Constant::getNullValue(Sa->getType());
3950 Value *SaNonZero = IRB.CreateICmpNE(Sa, SZero);
3951 Value *SbNonZero = IRB.CreateICmpNE(Sb, SZero);
3952 Value *VaNonZero = IRB.CreateICmpNE(Va, VZero);
3953 Value *VbNonZero = IRB.CreateICmpNE(Vb, VZero);
3954
3955 Value *SaAndSbNonZero = IRB.CreateAnd(SaNonZero, SbNonZero);
3956 Value *VaAndSbNonZero = IRB.CreateAnd(VaNonZero, SbNonZero);
3957 Value *SaAndVbNonZero = IRB.CreateAnd(SaNonZero, VbNonZero);
3958
3959 // Each element of the vector is represented by a single bit (poisoned or
3960 // not) e.g., <8 x i1>.
3961 Value *And = IRB.CreateOr({SaAndSbNonZero, VaAndSbNonZero, SaAndVbNonZero});
3962
3963 // Extend <8 x i1> to <8 x i16>.
3964 // (The real pmadd intrinsic would have computed intermediate values of
3965 // <8 x i32>, but that is irrelevant for our shadow purposes because we
3966 // consider each element to be either fully initialized or fully
3967 // uninitialized.)
3968 And = IRB.CreateSExt(And, Sa->getType());
3969
3970 // Step 2: instrument horizontal add
3971 // We don't need bit-precise horizontalReduce because we only want to check
3972 // if each pair/quad of elements is fully zero.
3973 // Cast to <4 x i32>.
3974 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
3975
3976 // Compute <4 x i1>, then extend back to <4 x i32>.
3977 Value *OutShadow = IRB.CreateSExt(
3978 IRB.CreateICmpNE(Horizontal,
3979 Constant::getNullValue(Horizontal->getType())),
3980 ImplicitReturnType);
3981
3982 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
3983 // AVX, it is already correct).
3984 if (EltSizeInBits)
3985 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
3986
3987 // Step 3 (if applicable): instrument accumulator
3988 if (I.arg_size() == 3)
3989 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
3990
3991 setShadow(&I, OutShadow);
3992 setOriginForNaryOp(I);
3993 }
3994
3995 // Instrument compare-packed intrinsic.
3996 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
3997 // all-ones shadow.
3998 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
3999 IRBuilder<> IRB(&I);
4000 Type *ResTy = getShadowTy(&I);
4001 auto *Shadow0 = getShadow(&I, 0);
4002 auto *Shadow1 = getShadow(&I, 1);
4003 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4004 Value *S = IRB.CreateSExt(
4005 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4006 setShadow(&I, S);
4007 setOriginForNaryOp(I);
4008 }
4009
4010 // Instrument compare-scalar intrinsic.
4011 // This handles both cmp* intrinsics which return the result in the first
4012 // element of a vector, and comi* which return the result as i32.
4013 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4014 IRBuilder<> IRB(&I);
4015 auto *Shadow0 = getShadow(&I, 0);
4016 auto *Shadow1 = getShadow(&I, 1);
4017 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4018 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4019 setShadow(&I, S);
4020 setOriginForNaryOp(I);
4021 }
4022
4023 // Instrument generic vector reduction intrinsics
4024 // by ORing together all their fields.
4025 //
4026 // If AllowShadowCast is true, the return type does not need to be the same
4027 // type as the fields
4028 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4029 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4030 assert(I.arg_size() == 1);
4031
4032 IRBuilder<> IRB(&I);
4033 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4034 if (AllowShadowCast)
4035 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4036 else
4037 assert(S->getType() == getShadowTy(&I));
4038 setShadow(&I, S);
4039 setOriginForNaryOp(I);
4040 }
4041
4042 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4043 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4044 // %a1)
4045 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4046 //
4047 // The type of the return value, initial starting value, and elements of the
4048 // vector must be identical.
4049 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4050 assert(I.arg_size() == 2);
4051
4052 IRBuilder<> IRB(&I);
4053 Value *Shadow0 = getShadow(&I, 0);
4054 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4055 assert(Shadow0->getType() == Shadow1->getType());
4056 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4057 assert(S->getType() == getShadowTy(&I));
4058 setShadow(&I, S);
4059 setOriginForNaryOp(I);
4060 }
4061
4062 // Instrument vector.reduce.or intrinsic.
4063 // Valid (non-poisoned) set bits in the operand pull low the
4064 // corresponding shadow bits.
4065 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4066 assert(I.arg_size() == 1);
4067
4068 IRBuilder<> IRB(&I);
4069 Value *OperandShadow = getShadow(&I, 0);
4070 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4071 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4072 // Bit N is clean if any field's bit N is 1 and unpoison
4073 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4074 // Otherwise, it is clean if every field's bit N is unpoison
4075 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4076 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4077
4078 setShadow(&I, S);
4079 setOrigin(&I, getOrigin(&I, 0));
4080 }
4081
4082 // Instrument vector.reduce.and intrinsic.
4083 // Valid (non-poisoned) unset bits in the operand pull down the
4084 // corresponding shadow bits.
4085 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4086 assert(I.arg_size() == 1);
4087
4088 IRBuilder<> IRB(&I);
4089 Value *OperandShadow = getShadow(&I, 0);
4090 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4091 // Bit N is clean if any field's bit N is 0 and unpoison
4092 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4093 // Otherwise, it is clean if every field's bit N is unpoison
4094 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4095 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4096
4097 setShadow(&I, S);
4098 setOrigin(&I, getOrigin(&I, 0));
4099 }
4100
4101 void handleStmxcsr(IntrinsicInst &I) {
4102 IRBuilder<> IRB(&I);
4103 Value *Addr = I.getArgOperand(0);
4104 Type *Ty = IRB.getInt32Ty();
4105 Value *ShadowPtr =
4106 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4107
4108 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4109
4111 insertCheckShadowOf(Addr, &I);
4112 }
4113
4114 void handleLdmxcsr(IntrinsicInst &I) {
4115 if (!InsertChecks)
4116 return;
4117
4118 IRBuilder<> IRB(&I);
4119 Value *Addr = I.getArgOperand(0);
4120 Type *Ty = IRB.getInt32Ty();
4121 const Align Alignment = Align(1);
4122 Value *ShadowPtr, *OriginPtr;
4123 std::tie(ShadowPtr, OriginPtr) =
4124 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4125
4127 insertCheckShadowOf(Addr, &I);
4128
4129 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4130 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4131 : getCleanOrigin();
4132 insertCheckShadow(Shadow, Origin, &I);
4133 }
4134
4135 void handleMaskedExpandLoad(IntrinsicInst &I) {
4136 IRBuilder<> IRB(&I);
4137 Value *Ptr = I.getArgOperand(0);
4138 MaybeAlign Align = I.getParamAlign(0);
4139 Value *Mask = I.getArgOperand(1);
4140 Value *PassThru = I.getArgOperand(2);
4141
4143 insertCheckShadowOf(Ptr, &I);
4144 insertCheckShadowOf(Mask, &I);
4145 }
4146
4147 if (!PropagateShadow) {
4148 setShadow(&I, getCleanShadow(&I));
4149 setOrigin(&I, getCleanOrigin());
4150 return;
4151 }
4152
4153 Type *ShadowTy = getShadowTy(&I);
4154 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4155 auto [ShadowPtr, OriginPtr] =
4156 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4157
4158 Value *Shadow =
4159 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4160 getShadow(PassThru), "_msmaskedexpload");
4161
4162 setShadow(&I, Shadow);
4163
4164 // TODO: Store origins.
4165 setOrigin(&I, getCleanOrigin());
4166 }
4167
4168 void handleMaskedCompressStore(IntrinsicInst &I) {
4169 IRBuilder<> IRB(&I);
4170 Value *Values = I.getArgOperand(0);
4171 Value *Ptr = I.getArgOperand(1);
4172 MaybeAlign Align = I.getParamAlign(1);
4173 Value *Mask = I.getArgOperand(2);
4174
4176 insertCheckShadowOf(Ptr, &I);
4177 insertCheckShadowOf(Mask, &I);
4178 }
4179
4180 Value *Shadow = getShadow(Values);
4181 Type *ElementShadowTy =
4182 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4183 auto [ShadowPtr, OriginPtrs] =
4184 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4185
4186 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4187
4188 // TODO: Store origins.
4189 }
4190
4191 void handleMaskedGather(IntrinsicInst &I) {
4192 IRBuilder<> IRB(&I);
4193 Value *Ptrs = I.getArgOperand(0);
4194 const Align Alignment = I.getParamAlign(0).valueOrOne();
4195 Value *Mask = I.getArgOperand(1);
4196 Value *PassThru = I.getArgOperand(2);
4197
4198 Type *PtrsShadowTy = getShadowTy(Ptrs);
4200 insertCheckShadowOf(Mask, &I);
4201 Value *MaskedPtrShadow = IRB.CreateSelect(
4202 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4203 "_msmaskedptrs");
4204 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4205 }
4206
4207 if (!PropagateShadow) {
4208 setShadow(&I, getCleanShadow(&I));
4209 setOrigin(&I, getCleanOrigin());
4210 return;
4211 }
4212
4213 Type *ShadowTy = getShadowTy(&I);
4214 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4215 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4216 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4217
4218 Value *Shadow =
4219 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4220 getShadow(PassThru), "_msmaskedgather");
4221
4222 setShadow(&I, Shadow);
4223
4224 // TODO: Store origins.
4225 setOrigin(&I, getCleanOrigin());
4226 }
4227
4228 void handleMaskedScatter(IntrinsicInst &I) {
4229 IRBuilder<> IRB(&I);
4230 Value *Values = I.getArgOperand(0);
4231 Value *Ptrs = I.getArgOperand(1);
4232 const Align Alignment = I.getParamAlign(1).valueOrOne();
4233 Value *Mask = I.getArgOperand(2);
4234
4235 Type *PtrsShadowTy = getShadowTy(Ptrs);
4237 insertCheckShadowOf(Mask, &I);
4238 Value *MaskedPtrShadow = IRB.CreateSelect(
4239 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4240 "_msmaskedptrs");
4241 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4242 }
4243
4244 Value *Shadow = getShadow(Values);
4245 Type *ElementShadowTy =
4246 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4247 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4248 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4249
4250 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4251
4252 // TODO: Store origin.
4253 }
4254
4255 // Intrinsic::masked_store
4256 //
4257 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4258 // stores are lowered to Intrinsic::masked_store.
4259 void handleMaskedStore(IntrinsicInst &I) {
4260 IRBuilder<> IRB(&I);
4261 Value *V = I.getArgOperand(0);
4262 Value *Ptr = I.getArgOperand(1);
4263 const Align Alignment = I.getParamAlign(1).valueOrOne();
4264 Value *Mask = I.getArgOperand(2);
4265 Value *Shadow = getShadow(V);
4266
4268 insertCheckShadowOf(Ptr, &I);
4269 insertCheckShadowOf(Mask, &I);
4270 }
4271
4272 Value *ShadowPtr;
4273 Value *OriginPtr;
4274 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4275 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4276
4277 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4278
4279 if (!MS.TrackOrigins)
4280 return;
4281
4282 auto &DL = F.getDataLayout();
4283 paintOrigin(IRB, getOrigin(V), OriginPtr,
4284 DL.getTypeStoreSize(Shadow->getType()),
4285 std::max(Alignment, kMinOriginAlignment));
4286 }
4287
4288 // Intrinsic::masked_load
4289 //
4290 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4291 // loads are lowered to Intrinsic::masked_load.
4292 void handleMaskedLoad(IntrinsicInst &I) {
4293 IRBuilder<> IRB(&I);
4294 Value *Ptr = I.getArgOperand(0);
4295 const Align Alignment = I.getParamAlign(0).valueOrOne();
4296 Value *Mask = I.getArgOperand(1);
4297 Value *PassThru = I.getArgOperand(2);
4298
4300 insertCheckShadowOf(Ptr, &I);
4301 insertCheckShadowOf(Mask, &I);
4302 }
4303
4304 if (!PropagateShadow) {
4305 setShadow(&I, getCleanShadow(&I));
4306 setOrigin(&I, getCleanOrigin());
4307 return;
4308 }
4309
4310 Type *ShadowTy = getShadowTy(&I);
4311 Value *ShadowPtr, *OriginPtr;
4312 std::tie(ShadowPtr, OriginPtr) =
4313 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4314 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4315 getShadow(PassThru), "_msmaskedld"));
4316
4317 if (!MS.TrackOrigins)
4318 return;
4319
4320 // Choose between PassThru's and the loaded value's origins.
4321 Value *MaskedPassThruShadow = IRB.CreateAnd(
4322 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4323
4324 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4325
4326 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4327 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4328
4329 setOrigin(&I, Origin);
4330 }
4331
4332 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4333 // dst mask src
4334 //
4335 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4336 // by handleMaskedStore.
4337 //
4338 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4339 // vector of integers, unlike the LLVM masked intrinsics, which require a
4340 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4341 // mentions that the x86 backend does not know how to efficiently convert
4342 // from a vector of booleans back into the AVX mask format; therefore, they
4343 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4344 // intrinsics.
4345 void handleAVXMaskedStore(IntrinsicInst &I) {
4346 assert(I.arg_size() == 3);
4347
4348 IRBuilder<> IRB(&I);
4349
4350 Value *Dst = I.getArgOperand(0);
4351 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4352
4353 Value *Mask = I.getArgOperand(1);
4354 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4355
4356 Value *Src = I.getArgOperand(2);
4357 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4358
4359 const Align Alignment = Align(1);
4360
4361 Value *SrcShadow = getShadow(Src);
4362
4364 insertCheckShadowOf(Dst, &I);
4365 insertCheckShadowOf(Mask, &I);
4366 }
4367
4368 Value *DstShadowPtr;
4369 Value *DstOriginPtr;
4370 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4371 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4372
4373 SmallVector<Value *, 2> ShadowArgs;
4374 ShadowArgs.append(1, DstShadowPtr);
4375 ShadowArgs.append(1, Mask);
4376 // The intrinsic may require floating-point but shadows can be arbitrary
4377 // bit patterns, of which some would be interpreted as "invalid"
4378 // floating-point values (NaN etc.); we assume the intrinsic will happily
4379 // copy them.
4380 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4381
4382 CallInst *CI =
4383 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4384 setShadow(&I, CI);
4385
4386 if (!MS.TrackOrigins)
4387 return;
4388
4389 // Approximation only
4390 auto &DL = F.getDataLayout();
4391 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4392 DL.getTypeStoreSize(SrcShadow->getType()),
4393 std::max(Alignment, kMinOriginAlignment));
4394 }
4395
4396 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4397 // return src mask
4398 //
4399 // Masked-off values are replaced with 0, which conveniently also represents
4400 // initialized memory.
4401 //
4402 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4403 // by handleMaskedStore.
4404 //
4405 // We do not combine this with handleMaskedLoad; see comment in
4406 // handleAVXMaskedStore for the rationale.
4407 //
4408 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4409 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4410 // parameter.
4411 void handleAVXMaskedLoad(IntrinsicInst &I) {
4412 assert(I.arg_size() == 2);
4413
4414 IRBuilder<> IRB(&I);
4415
4416 Value *Src = I.getArgOperand(0);
4417 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4418
4419 Value *Mask = I.getArgOperand(1);
4420 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4421
4422 const Align Alignment = Align(1);
4423
4425 insertCheckShadowOf(Mask, &I);
4426 }
4427
4428 Type *SrcShadowTy = getShadowTy(Src);
4429 Value *SrcShadowPtr, *SrcOriginPtr;
4430 std::tie(SrcShadowPtr, SrcOriginPtr) =
4431 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4432
4433 SmallVector<Value *, 2> ShadowArgs;
4434 ShadowArgs.append(1, SrcShadowPtr);
4435 ShadowArgs.append(1, Mask);
4436
4437 CallInst *CI =
4438 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4439 // The AVX masked load intrinsics do not have integer variants. We use the
4440 // floating-point variants, which will happily copy the shadows even if
4441 // they are interpreted as "invalid" floating-point values (NaN etc.).
4442 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4443
4444 if (!MS.TrackOrigins)
4445 return;
4446
4447 // The "pass-through" value is always zero (initialized). To the extent
4448 // that that results in initialized aligned 4-byte chunks, the origin value
4449 // is ignored. It is therefore correct to simply copy the origin from src.
4450 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4451 setOrigin(&I, PtrSrcOrigin);
4452 }
4453
4454 // Test whether the mask indices are initialized, only checking the bits that
4455 // are actually used.
4456 //
4457 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4458 // used/checked.
4459 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4460 assert(isFixedIntVector(Idx));
4461 auto IdxVectorSize =
4462 cast<FixedVectorType>(Idx->getType())->getNumElements();
4463 assert(isPowerOf2_64(IdxVectorSize));
4464
4465 // Compiler isn't smart enough, let's help it
4466 if (isa<Constant>(Idx))
4467 return;
4468
4469 auto *IdxShadow = getShadow(Idx);
4470 Value *Truncated = IRB.CreateTrunc(
4471 IdxShadow,
4472 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4473 IdxVectorSize));
4474 insertCheckShadow(Truncated, getOrigin(Idx), I);
4475 }
4476
4477 // Instrument AVX permutation intrinsic.
4478 // We apply the same permutation (argument index 1) to the shadow.
4479 void handleAVXVpermilvar(IntrinsicInst &I) {
4480 IRBuilder<> IRB(&I);
4481 Value *Shadow = getShadow(&I, 0);
4482 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4483
4484 // Shadows are integer-ish types but some intrinsics require a
4485 // different (e.g., floating-point) type.
4486 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4487 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4488 {Shadow, I.getArgOperand(1)});
4489
4490 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4491 setOriginForNaryOp(I);
4492 }
4493
4494 // Instrument AVX permutation intrinsic.
4495 // We apply the same permutation (argument index 1) to the shadows.
4496 void handleAVXVpermi2var(IntrinsicInst &I) {
4497 assert(I.arg_size() == 3);
4498 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4499 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4500 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4501 [[maybe_unused]] auto ArgVectorSize =
4502 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4503 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4504 ->getNumElements() == ArgVectorSize);
4505 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4506 ->getNumElements() == ArgVectorSize);
4507 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4508 assert(I.getType() == I.getArgOperand(0)->getType());
4509 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4510 IRBuilder<> IRB(&I);
4511 Value *AShadow = getShadow(&I, 0);
4512 Value *Idx = I.getArgOperand(1);
4513 Value *BShadow = getShadow(&I, 2);
4514
4515 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4516
4517 // Shadows are integer-ish types but some intrinsics require a
4518 // different (e.g., floating-point) type.
4519 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4520 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4521 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4522 {AShadow, Idx, BShadow});
4523 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4524 setOriginForNaryOp(I);
4525 }
4526
4527 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4528 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4529 }
4530
4531 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4532 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4533 }
4534
4535 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4536 return isFixedIntVectorTy(V->getType());
4537 }
4538
4539 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4540 return isFixedFPVectorTy(V->getType());
4541 }
4542
4543 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4544 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4545 // i32 rounding)
4546 //
4547 // Inconveniently, some similar intrinsics have a different operand order:
4548 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4549 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4550 // i16 mask)
4551 //
4552 // If the return type has more elements than A, the excess elements are
4553 // zeroed (and the corresponding shadow is initialized).
4554 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4555 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4556 // i8 mask)
4557 //
4558 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4559 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4560 // where all_or_nothing(x) is fully uninitialized if x has any
4561 // uninitialized bits
4562 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4563 IRBuilder<> IRB(&I);
4564
4565 assert(I.arg_size() == 4);
4566 Value *A = I.getOperand(0);
4567 Value *WriteThrough;
4568 Value *Mask;
4570 if (LastMask) {
4571 WriteThrough = I.getOperand(2);
4572 Mask = I.getOperand(3);
4573 RoundingMode = I.getOperand(1);
4574 } else {
4575 WriteThrough = I.getOperand(1);
4576 Mask = I.getOperand(2);
4577 RoundingMode = I.getOperand(3);
4578 }
4579
4580 assert(isFixedFPVector(A));
4581 assert(isFixedIntVector(WriteThrough));
4582
4583 unsigned ANumElements =
4584 cast<FixedVectorType>(A->getType())->getNumElements();
4585 [[maybe_unused]] unsigned WriteThruNumElements =
4586 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4587 assert(ANumElements == WriteThruNumElements ||
4588 ANumElements * 2 == WriteThruNumElements);
4589
4590 assert(Mask->getType()->isIntegerTy());
4591 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4592 assert(ANumElements == MaskNumElements ||
4593 ANumElements * 2 == MaskNumElements);
4594
4595 assert(WriteThruNumElements == MaskNumElements);
4596
4597 // Some bits of the mask may be unused, though it's unusual to have partly
4598 // uninitialized bits.
4599 insertCheckShadowOf(Mask, &I);
4600
4601 assert(RoundingMode->getType()->isIntegerTy());
4602 // Only some bits of the rounding mode are used, though it's very
4603 // unusual to have uninitialized bits there (more commonly, it's a
4604 // constant).
4605 insertCheckShadowOf(RoundingMode, &I);
4606
4607 assert(I.getType() == WriteThrough->getType());
4608
4609 Value *AShadow = getShadow(A);
4610 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4611
4612 if (ANumElements * 2 == MaskNumElements) {
4613 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4614 // from the zeroed shadow instead of the writethrough's shadow.
4615 Mask =
4616 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4617 Mask =
4618 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4619 }
4620
4621 // Convert i16 mask to <16 x i1>
4622 Mask = IRB.CreateBitCast(
4623 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4624 "_ms_mask_bitcast");
4625
4626 /// For floating-point to integer conversion, the output is:
4627 /// - fully uninitialized if *any* bit of the input is uninitialized
4628 /// - fully ininitialized if all bits of the input are ininitialized
4629 /// We apply the same principle on a per-element basis for vectors.
4630 ///
4631 /// We use the scalar width of the return type instead of A's.
4632 AShadow = IRB.CreateSExt(
4633 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4634 getShadowTy(&I), "_ms_a_shadow");
4635
4636 Value *WriteThroughShadow = getShadow(WriteThrough);
4637 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4638 "_ms_writethru_select");
4639
4640 setShadow(&I, Shadow);
4641 setOriginForNaryOp(I);
4642 }
4643
4644 // Instrument BMI / BMI2 intrinsics.
4645 // All of these intrinsics are Z = I(X, Y)
4646 // where the types of all operands and the result match, and are either i32 or
4647 // i64. The following instrumentation happens to work for all of them:
4648 // Sz = I(Sx, Y) | (sext (Sy != 0))
4649 void handleBmiIntrinsic(IntrinsicInst &I) {
4650 IRBuilder<> IRB(&I);
4651 Type *ShadowTy = getShadowTy(&I);
4652
4653 // If any bit of the mask operand is poisoned, then the whole thing is.
4654 Value *SMask = getShadow(&I, 1);
4655 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4656 ShadowTy);
4657 // Apply the same intrinsic to the shadow of the first operand.
4658 Value *S = IRB.CreateCall(I.getCalledFunction(),
4659 {getShadow(&I, 0), I.getOperand(1)});
4660 S = IRB.CreateOr(SMask, S);
4661 setShadow(&I, S);
4662 setOriginForNaryOp(I);
4663 }
4664
4665 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4666 SmallVector<int, 8> Mask;
4667 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4668 Mask.append(2, X);
4669 }
4670 return Mask;
4671 }
4672
4673 // Instrument pclmul intrinsics.
4674 // These intrinsics operate either on odd or on even elements of the input
4675 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4676 // Replace the unused elements with copies of the used ones, ex:
4677 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4678 // or
4679 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4680 // and then apply the usual shadow combining logic.
4681 void handlePclmulIntrinsic(IntrinsicInst &I) {
4682 IRBuilder<> IRB(&I);
4683 unsigned Width =
4684 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4685 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4686 "pclmul 3rd operand must be a constant");
4687 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4688 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4689 getPclmulMask(Width, Imm & 0x01));
4690 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4691 getPclmulMask(Width, Imm & 0x10));
4692 ShadowAndOriginCombiner SOC(this, IRB);
4693 SOC.Add(Shuf0, getOrigin(&I, 0));
4694 SOC.Add(Shuf1, getOrigin(&I, 1));
4695 SOC.Done(&I);
4696 }
4697
4698 // Instrument _mm_*_sd|ss intrinsics
4699 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4700 IRBuilder<> IRB(&I);
4701 unsigned Width =
4702 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4703 Value *First = getShadow(&I, 0);
4704 Value *Second = getShadow(&I, 1);
4705 // First element of second operand, remaining elements of first operand
4706 SmallVector<int, 16> Mask;
4707 Mask.push_back(Width);
4708 for (unsigned i = 1; i < Width; i++)
4709 Mask.push_back(i);
4710 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4711
4712 setShadow(&I, Shadow);
4713 setOriginForNaryOp(I);
4714 }
4715
4716 void handleVtestIntrinsic(IntrinsicInst &I) {
4717 IRBuilder<> IRB(&I);
4718 Value *Shadow0 = getShadow(&I, 0);
4719 Value *Shadow1 = getShadow(&I, 1);
4720 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4721 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4722 Value *Scalar = convertShadowToScalar(NZ, IRB);
4723 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4724
4725 setShadow(&I, Shadow);
4726 setOriginForNaryOp(I);
4727 }
4728
4729 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4730 IRBuilder<> IRB(&I);
4731 unsigned Width =
4732 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4733 Value *First = getShadow(&I, 0);
4734 Value *Second = getShadow(&I, 1);
4735 Value *OrShadow = IRB.CreateOr(First, Second);
4736 // First element of both OR'd together, remaining elements of first operand
4737 SmallVector<int, 16> Mask;
4738 Mask.push_back(Width);
4739 for (unsigned i = 1; i < Width; i++)
4740 Mask.push_back(i);
4741 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4742
4743 setShadow(&I, Shadow);
4744 setOriginForNaryOp(I);
4745 }
4746
4747 // _mm_round_ps / _mm_round_ps.
4748 // Similar to maybeHandleSimpleNomemIntrinsic except
4749 // the second argument is guranteed to be a constant integer.
4750 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4751 assert(I.getArgOperand(0)->getType() == I.getType());
4752 assert(I.arg_size() == 2);
4753 assert(isa<ConstantInt>(I.getArgOperand(1)));
4754
4755 IRBuilder<> IRB(&I);
4756 ShadowAndOriginCombiner SC(this, IRB);
4757 SC.Add(I.getArgOperand(0));
4758 SC.Done(&I);
4759 }
4760
4761 // Instrument @llvm.abs intrinsic.
4762 //
4763 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4764 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4765 void handleAbsIntrinsic(IntrinsicInst &I) {
4766 assert(I.arg_size() == 2);
4767 Value *Src = I.getArgOperand(0);
4768 Value *IsIntMinPoison = I.getArgOperand(1);
4769
4770 assert(I.getType()->isIntOrIntVectorTy());
4771
4772 assert(Src->getType() == I.getType());
4773
4774 assert(IsIntMinPoison->getType()->isIntegerTy());
4775 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4776
4777 IRBuilder<> IRB(&I);
4778 Value *SrcShadow = getShadow(Src);
4779
4780 APInt MinVal =
4781 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4782 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4783 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4784
4785 Value *PoisonedShadow = getPoisonedShadow(Src);
4786 Value *PoisonedIfIntMinShadow =
4787 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4788 Value *Shadow =
4789 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4790
4791 setShadow(&I, Shadow);
4792 setOrigin(&I, getOrigin(&I, 0));
4793 }
4794
4795 void handleIsFpClass(IntrinsicInst &I) {
4796 IRBuilder<> IRB(&I);
4797 Value *Shadow = getShadow(&I, 0);
4798 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4799 setOrigin(&I, getOrigin(&I, 0));
4800 }
4801
4802 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4803 IRBuilder<> IRB(&I);
4804 Value *Shadow0 = getShadow(&I, 0);
4805 Value *Shadow1 = getShadow(&I, 1);
4806 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4807 Value *ShadowElt1 =
4808 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4809
4810 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4811 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4812 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4813
4814 setShadow(&I, Shadow);
4815 setOriginForNaryOp(I);
4816 }
4817
4818 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4819 assert(isa<FixedVectorType>(V->getType()));
4820 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4821 Value *Shadow = getShadow(V);
4822 return IRB.CreateExtractElement(Shadow,
4823 ConstantInt::get(IRB.getInt32Ty(), 0));
4824 }
4825
4826 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4827 //
4828 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4829 // (<8 x i64>, <16 x i8>, i8)
4830 // A WriteThru Mask
4831 //
4832 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4833 // (<16 x i32>, <16 x i8>, i16)
4834 //
4835 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4836 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4837 //
4838 // If Dst has more elements than A, the excess elements are zeroed (and the
4839 // corresponding shadow is initialized).
4840 //
4841 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4842 // and is much faster than this handler.
4843 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4844 IRBuilder<> IRB(&I);
4845
4846 assert(I.arg_size() == 3);
4847 Value *A = I.getOperand(0);
4848 Value *WriteThrough = I.getOperand(1);
4849 Value *Mask = I.getOperand(2);
4850
4851 assert(isFixedIntVector(A));
4852 assert(isFixedIntVector(WriteThrough));
4853
4854 unsigned ANumElements =
4855 cast<FixedVectorType>(A->getType())->getNumElements();
4856 unsigned OutputNumElements =
4857 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4858 assert(ANumElements == OutputNumElements ||
4859 ANumElements * 2 == OutputNumElements);
4860
4861 assert(Mask->getType()->isIntegerTy());
4862 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4863 insertCheckShadowOf(Mask, &I);
4864
4865 assert(I.getType() == WriteThrough->getType());
4866
4867 // Widen the mask, if necessary, to have one bit per element of the output
4868 // vector.
4869 // We want the extra bits to have '1's, so that the CreateSelect will
4870 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4871 // versions of the intrinsics are sometimes implemented using an all-1's
4872 // mask and an undefined value for WriteThroughShadow). We accomplish this
4873 // by using bitwise NOT before and after the ZExt.
4874 if (ANumElements != OutputNumElements) {
4875 Mask = IRB.CreateNot(Mask);
4876 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4877 "_ms_widen_mask");
4878 Mask = IRB.CreateNot(Mask);
4879 }
4880 Mask = IRB.CreateBitCast(
4881 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4882
4883 Value *AShadow = getShadow(A);
4884
4885 // The return type might have more elements than the input.
4886 // Temporarily shrink the return type's number of elements.
4887 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4888
4889 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4890 // This handler treats them all as truncation, which leads to some rare
4891 // false positives in the cases where the truncated bytes could
4892 // unambiguously saturate the value e.g., if A = ??????10 ????????
4893 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4894 // fully defined, but the truncated byte is ????????.
4895 //
4896 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4897 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4898 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4899
4900 Value *WriteThroughShadow = getShadow(WriteThrough);
4901
4902 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4903 setShadow(&I, Shadow);
4904 setOriginForNaryOp(I);
4905 }
4906
4907 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4908 // values and perform an operation whose shadow propagation should be handled
4909 // as all-or-nothing [*], with masking provided by a vector and a mask
4910 // supplied as an integer.
4911 //
4912 // [*] if all bits of a vector element are initialized, the output is fully
4913 // initialized; otherwise, the output is fully uninitialized
4914 //
4915 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
4916 // (<16 x float>, <16 x float>, i16)
4917 // A WriteThru Mask
4918 //
4919 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
4920 // (<2 x double>, <2 x double>, i8)
4921 //
4922 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
4923 // (<8 x double>, i32, <8 x double>, i8, i32)
4924 // A Imm WriteThru Mask Rounding
4925 //
4926 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
4927 // be fully initialized.
4928 //
4929 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
4930 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
4931 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
4932 unsigned WriteThruIndex,
4933 unsigned MaskIndex) {
4934 IRBuilder<> IRB(&I);
4935
4936 unsigned NumArgs = I.arg_size();
4937 assert(AIndex < NumArgs);
4938 assert(WriteThruIndex < NumArgs);
4939 assert(MaskIndex < NumArgs);
4940 assert(AIndex != WriteThruIndex);
4941 assert(AIndex != MaskIndex);
4942 assert(WriteThruIndex != MaskIndex);
4943
4944 Value *A = I.getOperand(AIndex);
4945 Value *WriteThru = I.getOperand(WriteThruIndex);
4946 Value *Mask = I.getOperand(MaskIndex);
4947
4948 assert(isFixedFPVector(A));
4949 assert(isFixedFPVector(WriteThru));
4950
4951 [[maybe_unused]] unsigned ANumElements =
4952 cast<FixedVectorType>(A->getType())->getNumElements();
4953 unsigned OutputNumElements =
4954 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
4955 assert(ANumElements == OutputNumElements);
4956
4957 for (unsigned i = 0; i < NumArgs; ++i) {
4958 if (i != AIndex && i != WriteThruIndex) {
4959 // Imm, Mask, Rounding etc. are "control" data, hence we require that
4960 // they be fully initialized.
4961 assert(I.getOperand(i)->getType()->isIntegerTy());
4962 insertCheckShadowOf(I.getOperand(i), &I);
4963 }
4964 }
4965
4966 // The mask has 1 bit per element of A, but a minimum of 8 bits.
4967 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
4968 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
4969 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4970
4971 assert(I.getType() == WriteThru->getType());
4972
4973 Mask = IRB.CreateBitCast(
4974 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4975
4976 Value *AShadow = getShadow(A);
4977
4978 // All-or-nothing shadow
4979 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
4980 AShadow->getType());
4981
4982 Value *WriteThruShadow = getShadow(WriteThru);
4983
4984 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
4985 setShadow(&I, Shadow);
4986
4987 setOriginForNaryOp(I);
4988 }
4989
4990 // For sh.* compiler intrinsics:
4991 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
4992 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
4993 // A B WriteThru Mask RoundingMode
4994 //
4995 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
4996 // DstShadow[1..7] = AShadow[1..7]
4997 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
4998 IRBuilder<> IRB(&I);
4999
5000 assert(I.arg_size() == 5);
5001 Value *A = I.getOperand(0);
5002 Value *B = I.getOperand(1);
5003 Value *WriteThrough = I.getOperand(2);
5004 Value *Mask = I.getOperand(3);
5005 Value *RoundingMode = I.getOperand(4);
5006
5007 // Technically, we could probably just check whether the LSB is
5008 // initialized, but intuitively it feels like a partly uninitialized mask
5009 // is unintended, and we should warn the user immediately.
5010 insertCheckShadowOf(Mask, &I);
5011 insertCheckShadowOf(RoundingMode, &I);
5012
5013 assert(isa<FixedVectorType>(A->getType()));
5014 unsigned NumElements =
5015 cast<FixedVectorType>(A->getType())->getNumElements();
5016 assert(NumElements == 8);
5017 assert(A->getType() == B->getType());
5018 assert(B->getType() == WriteThrough->getType());
5019 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5020 assert(RoundingMode->getType()->isIntegerTy());
5021
5022 Value *ALowerShadow = extractLowerShadow(IRB, A);
5023 Value *BLowerShadow = extractLowerShadow(IRB, B);
5024
5025 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5026
5027 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5028
5029 Mask = IRB.CreateBitCast(
5030 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5031 Value *MaskLower =
5032 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5033
5034 Value *AShadow = getShadow(A);
5035 Value *DstLowerShadow =
5036 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5037 Value *DstShadow = IRB.CreateInsertElement(
5038 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5039 "_msprop");
5040
5041 setShadow(&I, DstShadow);
5042 setOriginForNaryOp(I);
5043 }
5044
5045 // Approximately handle AVX Galois Field Affine Transformation
5046 //
5047 // e.g.,
5048 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5049 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5050 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5051 // Out A x b
5052 // where A and x are packed matrices, b is a vector,
5053 // Out = A * x + b in GF(2)
5054 //
5055 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5056 // computation also includes a parity calculation.
5057 //
5058 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5059 // Out_Shadow = (V1_Shadow & V2_Shadow)
5060 // | (V1 & V2_Shadow)
5061 // | (V1_Shadow & V2 )
5062 //
5063 // We approximate the shadow of gf2p8affineqb using:
5064 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5065 // | gf2p8affineqb(x, A_shadow, 0)
5066 // | gf2p8affineqb(x_Shadow, A, 0)
5067 // | set1_epi8(b_Shadow)
5068 //
5069 // This approximation has false negatives: if an intermediate dot-product
5070 // contains an even number of 1's, the parity is 0.
5071 // It has no false positives.
5072 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5073 IRBuilder<> IRB(&I);
5074
5075 assert(I.arg_size() == 3);
5076 Value *A = I.getOperand(0);
5077 Value *X = I.getOperand(1);
5078 Value *B = I.getOperand(2);
5079
5080 assert(isFixedIntVector(A));
5081 assert(cast<VectorType>(A->getType())
5082 ->getElementType()
5083 ->getScalarSizeInBits() == 8);
5084
5085 assert(A->getType() == X->getType());
5086
5087 assert(B->getType()->isIntegerTy());
5088 assert(B->getType()->getScalarSizeInBits() == 8);
5089
5090 assert(I.getType() == A->getType());
5091
5092 Value *AShadow = getShadow(A);
5093 Value *XShadow = getShadow(X);
5094 Value *BZeroShadow = getCleanShadow(B);
5095
5096 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5097 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5098 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5099 {X, AShadow, BZeroShadow});
5100 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5101 {XShadow, A, BZeroShadow});
5102
5103 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5104 Value *BShadow = getShadow(B);
5105 Value *BBroadcastShadow = getCleanShadow(AShadow);
5106 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5107 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5108 // lower appropriately (e.g., VPBROADCASTB).
5109 // Besides, b is often a constant, in which case it is fully initialized.
5110 for (unsigned i = 0; i < NumElements; i++)
5111 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5112
5113 setShadow(&I, IRB.CreateOr(
5114 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5115 setOriginForNaryOp(I);
5116 }
5117
5118 // Handle Arm NEON vector load intrinsics (vld*).
5119 //
5120 // The WithLane instructions (ld[234]lane) are similar to:
5121 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5122 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5123 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5124 // %A)
5125 //
5126 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5127 // to:
5128 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5129 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5130 unsigned int numArgs = I.arg_size();
5131
5132 // Return type is a struct of vectors of integers or floating-point
5133 assert(I.getType()->isStructTy());
5134 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5135 assert(RetTy->getNumElements() > 0);
5137 RetTy->getElementType(0)->isFPOrFPVectorTy());
5138 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5139 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5140
5141 if (WithLane) {
5142 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5143 assert(4 <= numArgs && numArgs <= 6);
5144
5145 // Return type is a struct of the input vectors
5146 assert(RetTy->getNumElements() + 2 == numArgs);
5147 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5148 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5149 } else {
5150 assert(numArgs == 1);
5151 }
5152
5153 IRBuilder<> IRB(&I);
5154
5155 SmallVector<Value *, 6> ShadowArgs;
5156 if (WithLane) {
5157 for (unsigned int i = 0; i < numArgs - 2; i++)
5158 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5159
5160 // Lane number, passed verbatim
5161 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5162 ShadowArgs.push_back(LaneNumber);
5163
5164 // TODO: blend shadow of lane number into output shadow?
5165 insertCheckShadowOf(LaneNumber, &I);
5166 }
5167
5168 Value *Src = I.getArgOperand(numArgs - 1);
5169 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5170
5171 Type *SrcShadowTy = getShadowTy(Src);
5172 auto [SrcShadowPtr, SrcOriginPtr] =
5173 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5174 ShadowArgs.push_back(SrcShadowPtr);
5175
5176 // The NEON vector load instructions handled by this function all have
5177 // integer variants. It is easier to use those rather than trying to cast
5178 // a struct of vectors of floats into a struct of vectors of integers.
5179 CallInst *CI =
5180 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5181 setShadow(&I, CI);
5182
5183 if (!MS.TrackOrigins)
5184 return;
5185
5186 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5187 setOrigin(&I, PtrSrcOrigin);
5188 }
5189
5190 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5191 /// and vst{2,3,4}lane).
5192 ///
5193 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5194 /// last argument, with the initial arguments being the inputs (and lane
5195 /// number for vst{2,3,4}lane). They return void.
5196 ///
5197 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5198 /// abcdabcdabcdabcd... into *outP
5199 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5200 /// writes aaaa...bbbb...cccc...dddd... into *outP
5201 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5202 /// These instructions can all be instrumented with essentially the same
5203 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5204 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5205 IRBuilder<> IRB(&I);
5206
5207 // Don't use getNumOperands() because it includes the callee
5208 int numArgOperands = I.arg_size();
5209
5210 // The last arg operand is the output (pointer)
5211 assert(numArgOperands >= 1);
5212 Value *Addr = I.getArgOperand(numArgOperands - 1);
5213 assert(Addr->getType()->isPointerTy());
5214 int skipTrailingOperands = 1;
5215
5217 insertCheckShadowOf(Addr, &I);
5218
5219 // Second-last operand is the lane number (for vst{2,3,4}lane)
5220 if (useLane) {
5221 skipTrailingOperands++;
5222 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5224 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5225 }
5226
5227 SmallVector<Value *, 8> ShadowArgs;
5228 // All the initial operands are the inputs
5229 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5230 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5231 Value *Shadow = getShadow(&I, i);
5232 ShadowArgs.append(1, Shadow);
5233 }
5234
5235 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5236 // e.g., for:
5237 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5238 // we know the type of the output (and its shadow) is <16 x i8>.
5239 //
5240 // Arm NEON VST is unusual because the last argument is the output address:
5241 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5242 // call void @llvm.aarch64.neon.st2.v16i8.p0
5243 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5244 // and we have no type information about P's operand. We must manually
5245 // compute the type (<16 x i8> x 2).
5246 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5247 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5248 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5249 (numArgOperands - skipTrailingOperands));
5250 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5251
5252 if (useLane)
5253 ShadowArgs.append(1,
5254 I.getArgOperand(numArgOperands - skipTrailingOperands));
5255
5256 Value *OutputShadowPtr, *OutputOriginPtr;
5257 // AArch64 NEON does not need alignment (unless OS requires it)
5258 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5259 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5260 ShadowArgs.append(1, OutputShadowPtr);
5261
5262 CallInst *CI =
5263 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5264 setShadow(&I, CI);
5265
5266 if (MS.TrackOrigins) {
5267 // TODO: if we modelled the vst* instruction more precisely, we could
5268 // more accurately track the origins (e.g., if both inputs are
5269 // uninitialized for vst2, we currently blame the second input, even
5270 // though part of the output depends only on the first input).
5271 //
5272 // This is particularly imprecise for vst{2,3,4}lane, since only one
5273 // lane of each input is actually copied to the output.
5274 OriginCombiner OC(this, IRB);
5275 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5276 OC.Add(I.getArgOperand(i));
5277
5278 const DataLayout &DL = F.getDataLayout();
5279 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5280 OutputOriginPtr);
5281 }
5282 }
5283
5284 /// Handle intrinsics by applying the intrinsic to the shadows.
5285 ///
5286 /// The trailing arguments are passed verbatim to the intrinsic, though any
5287 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5288 /// intrinsic with one trailing verbatim argument:
5289 /// out = intrinsic(var1, var2, opType)
5290 /// we compute:
5291 /// shadow[out] =
5292 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5293 ///
5294 /// Typically, shadowIntrinsicID will be specified by the caller to be
5295 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5296 /// intrinsic of the same type.
5297 ///
5298 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5299 /// bit-patterns (for example, if the intrinsic accepts floats for
5300 /// var1, we require that it doesn't care if inputs are NaNs).
5301 ///
5302 /// For example, this can be applied to the Arm NEON vector table intrinsics
5303 /// (tbl{1,2,3,4}).
5304 ///
5305 /// The origin is approximated using setOriginForNaryOp.
5306 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5307 Intrinsic::ID shadowIntrinsicID,
5308 unsigned int trailingVerbatimArgs) {
5309 IRBuilder<> IRB(&I);
5310
5311 assert(trailingVerbatimArgs < I.arg_size());
5312
5313 SmallVector<Value *, 8> ShadowArgs;
5314 // Don't use getNumOperands() because it includes the callee
5315 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5316 Value *Shadow = getShadow(&I, i);
5317
5318 // Shadows are integer-ish types but some intrinsics require a
5319 // different (e.g., floating-point) type.
5320 ShadowArgs.push_back(
5321 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5322 }
5323
5324 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5325 i++) {
5326 Value *Arg = I.getArgOperand(i);
5327 ShadowArgs.push_back(Arg);
5328 }
5329
5330 CallInst *CI =
5331 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5332 Value *CombinedShadow = CI;
5333
5334 // Combine the computed shadow with the shadow of trailing args
5335 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5336 i++) {
5337 Value *Shadow =
5338 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5339 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5340 }
5341
5342 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5343
5344 setOriginForNaryOp(I);
5345 }
5346
5347 // Approximation only
5348 //
5349 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5350 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5351 assert(I.arg_size() == 2);
5352
5353 handleShadowOr(I);
5354 }
5355
5356 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5357 switch (I.getIntrinsicID()) {
5358 case Intrinsic::uadd_with_overflow:
5359 case Intrinsic::sadd_with_overflow:
5360 case Intrinsic::usub_with_overflow:
5361 case Intrinsic::ssub_with_overflow:
5362 case Intrinsic::umul_with_overflow:
5363 case Intrinsic::smul_with_overflow:
5364 handleArithmeticWithOverflow(I);
5365 break;
5366 case Intrinsic::abs:
5367 handleAbsIntrinsic(I);
5368 break;
5369 case Intrinsic::bitreverse:
5370 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5371 /*trailingVerbatimArgs*/ 0);
5372 break;
5373 case Intrinsic::is_fpclass:
5374 handleIsFpClass(I);
5375 break;
5376 case Intrinsic::lifetime_start:
5377 handleLifetimeStart(I);
5378 break;
5379 case Intrinsic::launder_invariant_group:
5380 case Intrinsic::strip_invariant_group:
5381 handleInvariantGroup(I);
5382 break;
5383 case Intrinsic::bswap:
5384 handleBswap(I);
5385 break;
5386 case Intrinsic::ctlz:
5387 case Intrinsic::cttz:
5388 handleCountLeadingTrailingZeros(I);
5389 break;
5390 case Intrinsic::masked_compressstore:
5391 handleMaskedCompressStore(I);
5392 break;
5393 case Intrinsic::masked_expandload:
5394 handleMaskedExpandLoad(I);
5395 break;
5396 case Intrinsic::masked_gather:
5397 handleMaskedGather(I);
5398 break;
5399 case Intrinsic::masked_scatter:
5400 handleMaskedScatter(I);
5401 break;
5402 case Intrinsic::masked_store:
5403 handleMaskedStore(I);
5404 break;
5405 case Intrinsic::masked_load:
5406 handleMaskedLoad(I);
5407 break;
5408 case Intrinsic::vector_reduce_and:
5409 handleVectorReduceAndIntrinsic(I);
5410 break;
5411 case Intrinsic::vector_reduce_or:
5412 handleVectorReduceOrIntrinsic(I);
5413 break;
5414
5415 case Intrinsic::vector_reduce_add:
5416 case Intrinsic::vector_reduce_xor:
5417 case Intrinsic::vector_reduce_mul:
5418 // Signed/Unsigned Min/Max
5419 // TODO: handling similarly to AND/OR may be more precise.
5420 case Intrinsic::vector_reduce_smax:
5421 case Intrinsic::vector_reduce_smin:
5422 case Intrinsic::vector_reduce_umax:
5423 case Intrinsic::vector_reduce_umin:
5424 // TODO: this has no false positives, but arguably we should check that all
5425 // the bits are initialized.
5426 case Intrinsic::vector_reduce_fmax:
5427 case Intrinsic::vector_reduce_fmin:
5428 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5429 break;
5430
5431 case Intrinsic::vector_reduce_fadd:
5432 case Intrinsic::vector_reduce_fmul:
5433 handleVectorReduceWithStarterIntrinsic(I);
5434 break;
5435
5436 case Intrinsic::scmp:
5437 case Intrinsic::ucmp: {
5438 handleShadowOr(I);
5439 break;
5440 }
5441
5442 case Intrinsic::fshl:
5443 case Intrinsic::fshr:
5444 handleFunnelShift(I);
5445 break;
5446
5447 case Intrinsic::is_constant:
5448 // The result of llvm.is.constant() is always defined.
5449 setShadow(&I, getCleanShadow(&I));
5450 setOrigin(&I, getCleanOrigin());
5451 break;
5452
5453 default:
5454 return false;
5455 }
5456
5457 return true;
5458 }
5459
5460 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5461 switch (I.getIntrinsicID()) {
5462 case Intrinsic::x86_sse_stmxcsr:
5463 handleStmxcsr(I);
5464 break;
5465 case Intrinsic::x86_sse_ldmxcsr:
5466 handleLdmxcsr(I);
5467 break;
5468
5469 // Convert Scalar Double Precision Floating-Point Value
5470 // to Unsigned Doubleword Integer
5471 // etc.
5472 case Intrinsic::x86_avx512_vcvtsd2usi64:
5473 case Intrinsic::x86_avx512_vcvtsd2usi32:
5474 case Intrinsic::x86_avx512_vcvtss2usi64:
5475 case Intrinsic::x86_avx512_vcvtss2usi32:
5476 case Intrinsic::x86_avx512_cvttss2usi64:
5477 case Intrinsic::x86_avx512_cvttss2usi:
5478 case Intrinsic::x86_avx512_cvttsd2usi64:
5479 case Intrinsic::x86_avx512_cvttsd2usi:
5480 case Intrinsic::x86_avx512_cvtusi2ss:
5481 case Intrinsic::x86_avx512_cvtusi642sd:
5482 case Intrinsic::x86_avx512_cvtusi642ss:
5483 handleSSEVectorConvertIntrinsic(I, 1, true);
5484 break;
5485 case Intrinsic::x86_sse2_cvtsd2si64:
5486 case Intrinsic::x86_sse2_cvtsd2si:
5487 case Intrinsic::x86_sse2_cvtsd2ss:
5488 case Intrinsic::x86_sse2_cvttsd2si64:
5489 case Intrinsic::x86_sse2_cvttsd2si:
5490 case Intrinsic::x86_sse_cvtss2si64:
5491 case Intrinsic::x86_sse_cvtss2si:
5492 case Intrinsic::x86_sse_cvttss2si64:
5493 case Intrinsic::x86_sse_cvttss2si:
5494 handleSSEVectorConvertIntrinsic(I, 1);
5495 break;
5496 case Intrinsic::x86_sse_cvtps2pi:
5497 case Intrinsic::x86_sse_cvttps2pi:
5498 handleSSEVectorConvertIntrinsic(I, 2);
5499 break;
5500
5501 // TODO:
5502 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5503 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5504 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5505
5506 case Intrinsic::x86_vcvtps2ph_128:
5507 case Intrinsic::x86_vcvtps2ph_256: {
5508 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5509 break;
5510 }
5511
5512 // Convert Packed Single Precision Floating-Point Values
5513 // to Packed Signed Doubleword Integer Values
5514 //
5515 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5516 // (<16 x float>, <16 x i32>, i16, i32)
5517 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5518 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5519 break;
5520
5521 // Convert Packed Double Precision Floating-Point Values
5522 // to Packed Single Precision Floating-Point Values
5523 case Intrinsic::x86_sse2_cvtpd2ps:
5524 case Intrinsic::x86_sse2_cvtps2dq:
5525 case Intrinsic::x86_sse2_cvtpd2dq:
5526 case Intrinsic::x86_sse2_cvttps2dq:
5527 case Intrinsic::x86_sse2_cvttpd2dq:
5528 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5529 case Intrinsic::x86_avx_cvt_ps2dq_256:
5530 case Intrinsic::x86_avx_cvt_pd2dq_256:
5531 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5532 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5533 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5534 break;
5535 }
5536
5537 // Convert Single-Precision FP Value to 16-bit FP Value
5538 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5539 // (<16 x float>, i32, <16 x i16>, i16)
5540 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5541 // (<4 x float>, i32, <8 x i16>, i8)
5542 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5543 // (<8 x float>, i32, <8 x i16>, i8)
5544 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5545 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5546 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5547 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5548 break;
5549
5550 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5551 case Intrinsic::x86_avx512_psll_w_512:
5552 case Intrinsic::x86_avx512_psll_d_512:
5553 case Intrinsic::x86_avx512_psll_q_512:
5554 case Intrinsic::x86_avx512_pslli_w_512:
5555 case Intrinsic::x86_avx512_pslli_d_512:
5556 case Intrinsic::x86_avx512_pslli_q_512:
5557 case Intrinsic::x86_avx512_psrl_w_512:
5558 case Intrinsic::x86_avx512_psrl_d_512:
5559 case Intrinsic::x86_avx512_psrl_q_512:
5560 case Intrinsic::x86_avx512_psra_w_512:
5561 case Intrinsic::x86_avx512_psra_d_512:
5562 case Intrinsic::x86_avx512_psra_q_512:
5563 case Intrinsic::x86_avx512_psrli_w_512:
5564 case Intrinsic::x86_avx512_psrli_d_512:
5565 case Intrinsic::x86_avx512_psrli_q_512:
5566 case Intrinsic::x86_avx512_psrai_w_512:
5567 case Intrinsic::x86_avx512_psrai_d_512:
5568 case Intrinsic::x86_avx512_psrai_q_512:
5569 case Intrinsic::x86_avx512_psra_q_256:
5570 case Intrinsic::x86_avx512_psra_q_128:
5571 case Intrinsic::x86_avx512_psrai_q_256:
5572 case Intrinsic::x86_avx512_psrai_q_128:
5573 case Intrinsic::x86_avx2_psll_w:
5574 case Intrinsic::x86_avx2_psll_d:
5575 case Intrinsic::x86_avx2_psll_q:
5576 case Intrinsic::x86_avx2_pslli_w:
5577 case Intrinsic::x86_avx2_pslli_d:
5578 case Intrinsic::x86_avx2_pslli_q:
5579 case Intrinsic::x86_avx2_psrl_w:
5580 case Intrinsic::x86_avx2_psrl_d:
5581 case Intrinsic::x86_avx2_psrl_q:
5582 case Intrinsic::x86_avx2_psra_w:
5583 case Intrinsic::x86_avx2_psra_d:
5584 case Intrinsic::x86_avx2_psrli_w:
5585 case Intrinsic::x86_avx2_psrli_d:
5586 case Intrinsic::x86_avx2_psrli_q:
5587 case Intrinsic::x86_avx2_psrai_w:
5588 case Intrinsic::x86_avx2_psrai_d:
5589 case Intrinsic::x86_sse2_psll_w:
5590 case Intrinsic::x86_sse2_psll_d:
5591 case Intrinsic::x86_sse2_psll_q:
5592 case Intrinsic::x86_sse2_pslli_w:
5593 case Intrinsic::x86_sse2_pslli_d:
5594 case Intrinsic::x86_sse2_pslli_q:
5595 case Intrinsic::x86_sse2_psrl_w:
5596 case Intrinsic::x86_sse2_psrl_d:
5597 case Intrinsic::x86_sse2_psrl_q:
5598 case Intrinsic::x86_sse2_psra_w:
5599 case Intrinsic::x86_sse2_psra_d:
5600 case Intrinsic::x86_sse2_psrli_w:
5601 case Intrinsic::x86_sse2_psrli_d:
5602 case Intrinsic::x86_sse2_psrli_q:
5603 case Intrinsic::x86_sse2_psrai_w:
5604 case Intrinsic::x86_sse2_psrai_d:
5605 case Intrinsic::x86_mmx_psll_w:
5606 case Intrinsic::x86_mmx_psll_d:
5607 case Intrinsic::x86_mmx_psll_q:
5608 case Intrinsic::x86_mmx_pslli_w:
5609 case Intrinsic::x86_mmx_pslli_d:
5610 case Intrinsic::x86_mmx_pslli_q:
5611 case Intrinsic::x86_mmx_psrl_w:
5612 case Intrinsic::x86_mmx_psrl_d:
5613 case Intrinsic::x86_mmx_psrl_q:
5614 case Intrinsic::x86_mmx_psra_w:
5615 case Intrinsic::x86_mmx_psra_d:
5616 case Intrinsic::x86_mmx_psrli_w:
5617 case Intrinsic::x86_mmx_psrli_d:
5618 case Intrinsic::x86_mmx_psrli_q:
5619 case Intrinsic::x86_mmx_psrai_w:
5620 case Intrinsic::x86_mmx_psrai_d:
5621 handleVectorShiftIntrinsic(I, /* Variable */ false);
5622 break;
5623 case Intrinsic::x86_avx2_psllv_d:
5624 case Intrinsic::x86_avx2_psllv_d_256:
5625 case Intrinsic::x86_avx512_psllv_d_512:
5626 case Intrinsic::x86_avx2_psllv_q:
5627 case Intrinsic::x86_avx2_psllv_q_256:
5628 case Intrinsic::x86_avx512_psllv_q_512:
5629 case Intrinsic::x86_avx2_psrlv_d:
5630 case Intrinsic::x86_avx2_psrlv_d_256:
5631 case Intrinsic::x86_avx512_psrlv_d_512:
5632 case Intrinsic::x86_avx2_psrlv_q:
5633 case Intrinsic::x86_avx2_psrlv_q_256:
5634 case Intrinsic::x86_avx512_psrlv_q_512:
5635 case Intrinsic::x86_avx2_psrav_d:
5636 case Intrinsic::x86_avx2_psrav_d_256:
5637 case Intrinsic::x86_avx512_psrav_d_512:
5638 case Intrinsic::x86_avx512_psrav_q_128:
5639 case Intrinsic::x86_avx512_psrav_q_256:
5640 case Intrinsic::x86_avx512_psrav_q_512:
5641 handleVectorShiftIntrinsic(I, /* Variable */ true);
5642 break;
5643
5644 // Pack with Signed/Unsigned Saturation
5645 case Intrinsic::x86_sse2_packsswb_128:
5646 case Intrinsic::x86_sse2_packssdw_128:
5647 case Intrinsic::x86_sse2_packuswb_128:
5648 case Intrinsic::x86_sse41_packusdw:
5649 case Intrinsic::x86_avx2_packsswb:
5650 case Intrinsic::x86_avx2_packssdw:
5651 case Intrinsic::x86_avx2_packuswb:
5652 case Intrinsic::x86_avx2_packusdw:
5653 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5654 // (<32 x i16> %a, <32 x i16> %b)
5655 // <32 x i16> @llvm.x86.avx512.packssdw.512
5656 // (<16 x i32> %a, <16 x i32> %b)
5657 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5658 case Intrinsic::x86_avx512_packsswb_512:
5659 case Intrinsic::x86_avx512_packssdw_512:
5660 case Intrinsic::x86_avx512_packuswb_512:
5661 case Intrinsic::x86_avx512_packusdw_512:
5662 handleVectorPackIntrinsic(I);
5663 break;
5664
5665 case Intrinsic::x86_sse41_pblendvb:
5666 case Intrinsic::x86_sse41_blendvpd:
5667 case Intrinsic::x86_sse41_blendvps:
5668 case Intrinsic::x86_avx_blendv_pd_256:
5669 case Intrinsic::x86_avx_blendv_ps_256:
5670 case Intrinsic::x86_avx2_pblendvb:
5671 handleBlendvIntrinsic(I);
5672 break;
5673
5674 case Intrinsic::x86_avx_dp_ps_256:
5675 case Intrinsic::x86_sse41_dppd:
5676 case Intrinsic::x86_sse41_dpps:
5677 handleDppIntrinsic(I);
5678 break;
5679
5680 case Intrinsic::x86_mmx_packsswb:
5681 case Intrinsic::x86_mmx_packuswb:
5682 handleVectorPackIntrinsic(I, 16);
5683 break;
5684
5685 case Intrinsic::x86_mmx_packssdw:
5686 handleVectorPackIntrinsic(I, 32);
5687 break;
5688
5689 case Intrinsic::x86_mmx_psad_bw:
5690 handleVectorSadIntrinsic(I, true);
5691 break;
5692 case Intrinsic::x86_sse2_psad_bw:
5693 case Intrinsic::x86_avx2_psad_bw:
5694 handleVectorSadIntrinsic(I);
5695 break;
5696
5697 // Multiply and Add Packed Words
5698 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5699 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5700 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5701 //
5702 // Multiply and Add Packed Signed and Unsigned Bytes
5703 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5704 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5705 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5706 //
5707 // These intrinsics are auto-upgraded into non-masked forms:
5708 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5709 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5710 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5711 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5712 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5713 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5714 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5715 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5716 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5717 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5718 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5719 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5720 case Intrinsic::x86_sse2_pmadd_wd:
5721 case Intrinsic::x86_avx2_pmadd_wd:
5722 case Intrinsic::x86_avx512_pmaddw_d_512:
5723 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5724 case Intrinsic::x86_avx2_pmadd_ub_sw:
5725 case Intrinsic::x86_avx512_pmaddubs_w_512:
5726 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2);
5727 break;
5728
5729 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5730 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5731 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/8);
5732 break;
5733
5734 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5735 case Intrinsic::x86_mmx_pmadd_wd:
5736 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5737 break;
5738
5739 // AVX Vector Neural Network Instructions: bytes
5740 //
5741 // Multiply and Add Packed Signed and Unsigned Bytes
5742 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5743 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5744 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5745 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5746 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5747 // (<16 x i32>, <64 x i8>, <64 x i8>)
5748 //
5749 // Multiply and Add Unsigned and Signed Bytes With Saturation
5750 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5751 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5752 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5753 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5754 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5755 // (<16 x i32>, <64 x i8>, <64 x i8>)
5756 //
5757 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5758 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5759 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5760 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5761 //
5762 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5763 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5764 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5765 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5766 //
5767 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5768 // (<16 x i32>, <16 x i32>, <16 x i32>)
5769 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5770 // (<16 x i32>, <16 x i32>, <16 x i32>)
5771 //
5772 // These intrinsics are auto-upgraded into non-masked forms:
5773 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5774 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5775 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5776 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5777 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5778 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5779 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
5780 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5781 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
5782 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5783 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
5784 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5785 //
5786 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
5787 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5788 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
5789 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5790 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
5791 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5792 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
5793 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5794 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
5795 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5796 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
5797 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5798 case Intrinsic::x86_avx512_vpdpbusd_128:
5799 case Intrinsic::x86_avx512_vpdpbusd_256:
5800 case Intrinsic::x86_avx512_vpdpbusd_512:
5801 case Intrinsic::x86_avx512_vpdpbusds_128:
5802 case Intrinsic::x86_avx512_vpdpbusds_256:
5803 case Intrinsic::x86_avx512_vpdpbusds_512:
5804 case Intrinsic::x86_avx2_vpdpbssd_128:
5805 case Intrinsic::x86_avx2_vpdpbssd_256:
5806 case Intrinsic::x86_avx10_vpdpbssd_512:
5807 case Intrinsic::x86_avx2_vpdpbssds_128:
5808 case Intrinsic::x86_avx2_vpdpbssds_256:
5809 case Intrinsic::x86_avx10_vpdpbssds_512:
5810 case Intrinsic::x86_avx2_vpdpbsud_128:
5811 case Intrinsic::x86_avx2_vpdpbsud_256:
5812 case Intrinsic::x86_avx10_vpdpbsud_512:
5813 case Intrinsic::x86_avx2_vpdpbsuds_128:
5814 case Intrinsic::x86_avx2_vpdpbsuds_256:
5815 case Intrinsic::x86_avx10_vpdpbsuds_512:
5816 case Intrinsic::x86_avx2_vpdpbuud_128:
5817 case Intrinsic::x86_avx2_vpdpbuud_256:
5818 case Intrinsic::x86_avx10_vpdpbuud_512:
5819 case Intrinsic::x86_avx2_vpdpbuuds_128:
5820 case Intrinsic::x86_avx2_vpdpbuuds_256:
5821 case Intrinsic::x86_avx10_vpdpbuuds_512:
5822 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/4, /*EltSize=*/8);
5823 break;
5824
5825 // AVX Vector Neural Network Instructions: words
5826 //
5827 // Multiply and Add Signed Word Integers
5828 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
5829 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5830 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
5831 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5832 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
5833 // (<16 x i32>, <16 x i32>, <16 x i32>)
5834 //
5835 // Multiply and Add Signed Word Integers With Saturation
5836 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
5837 // (< 4 x i32>, < 4 x i32>, < 4 x i32>)
5838 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
5839 // (< 8 x i32>, < 8 x i32>, < 8 x i32>)
5840 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
5841 // (<16 x i32>, <16 x i32>, <16 x i32>)
5842 //
5843 // These intrinsics are auto-upgraded into non-masked forms:
5844 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
5845 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5846 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
5847 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5848 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
5849 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5850 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
5851 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5852 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
5853 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5854 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
5855 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5856 //
5857 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
5858 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5859 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
5860 // (<4 x i32>, <4 x i32>, <4 x i32>, i8)
5861 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
5862 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5863 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
5864 // (<8 x i32>, <8 x i32>, <8 x i32>, i8)
5865 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
5866 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5867 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
5868 // (<16 x i32>, <16 x i32>, <16 x i32>, i16)
5869 case Intrinsic::x86_avx512_vpdpwssd_128:
5870 case Intrinsic::x86_avx512_vpdpwssd_256:
5871 case Intrinsic::x86_avx512_vpdpwssd_512:
5872 case Intrinsic::x86_avx512_vpdpwssds_128:
5873 case Intrinsic::x86_avx512_vpdpwssds_256:
5874 case Intrinsic::x86_avx512_vpdpwssds_512:
5875 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2, /*EltSize=*/16);
5876 break;
5877
5878 // TODO: Dot Product of BF16 Pairs Accumulated Into Packed Single
5879 // Precision
5880 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
5881 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
5882 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
5883 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
5884 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
5885 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
5886 // handleVectorPmaddIntrinsic() currently only handles integer types.
5887
5888 case Intrinsic::x86_sse_cmp_ss:
5889 case Intrinsic::x86_sse2_cmp_sd:
5890 case Intrinsic::x86_sse_comieq_ss:
5891 case Intrinsic::x86_sse_comilt_ss:
5892 case Intrinsic::x86_sse_comile_ss:
5893 case Intrinsic::x86_sse_comigt_ss:
5894 case Intrinsic::x86_sse_comige_ss:
5895 case Intrinsic::x86_sse_comineq_ss:
5896 case Intrinsic::x86_sse_ucomieq_ss:
5897 case Intrinsic::x86_sse_ucomilt_ss:
5898 case Intrinsic::x86_sse_ucomile_ss:
5899 case Intrinsic::x86_sse_ucomigt_ss:
5900 case Intrinsic::x86_sse_ucomige_ss:
5901 case Intrinsic::x86_sse_ucomineq_ss:
5902 case Intrinsic::x86_sse2_comieq_sd:
5903 case Intrinsic::x86_sse2_comilt_sd:
5904 case Intrinsic::x86_sse2_comile_sd:
5905 case Intrinsic::x86_sse2_comigt_sd:
5906 case Intrinsic::x86_sse2_comige_sd:
5907 case Intrinsic::x86_sse2_comineq_sd:
5908 case Intrinsic::x86_sse2_ucomieq_sd:
5909 case Intrinsic::x86_sse2_ucomilt_sd:
5910 case Intrinsic::x86_sse2_ucomile_sd:
5911 case Intrinsic::x86_sse2_ucomigt_sd:
5912 case Intrinsic::x86_sse2_ucomige_sd:
5913 case Intrinsic::x86_sse2_ucomineq_sd:
5914 handleVectorCompareScalarIntrinsic(I);
5915 break;
5916
5917 case Intrinsic::x86_avx_cmp_pd_256:
5918 case Intrinsic::x86_avx_cmp_ps_256:
5919 case Intrinsic::x86_sse2_cmp_pd:
5920 case Intrinsic::x86_sse_cmp_ps:
5921 handleVectorComparePackedIntrinsic(I);
5922 break;
5923
5924 case Intrinsic::x86_bmi_bextr_32:
5925 case Intrinsic::x86_bmi_bextr_64:
5926 case Intrinsic::x86_bmi_bzhi_32:
5927 case Intrinsic::x86_bmi_bzhi_64:
5928 case Intrinsic::x86_bmi_pdep_32:
5929 case Intrinsic::x86_bmi_pdep_64:
5930 case Intrinsic::x86_bmi_pext_32:
5931 case Intrinsic::x86_bmi_pext_64:
5932 handleBmiIntrinsic(I);
5933 break;
5934
5935 case Intrinsic::x86_pclmulqdq:
5936 case Intrinsic::x86_pclmulqdq_256:
5937 case Intrinsic::x86_pclmulqdq_512:
5938 handlePclmulIntrinsic(I);
5939 break;
5940
5941 case Intrinsic::x86_avx_round_pd_256:
5942 case Intrinsic::x86_avx_round_ps_256:
5943 case Intrinsic::x86_sse41_round_pd:
5944 case Intrinsic::x86_sse41_round_ps:
5945 handleRoundPdPsIntrinsic(I);
5946 break;
5947
5948 case Intrinsic::x86_sse41_round_sd:
5949 case Intrinsic::x86_sse41_round_ss:
5950 handleUnarySdSsIntrinsic(I);
5951 break;
5952
5953 case Intrinsic::x86_sse2_max_sd:
5954 case Intrinsic::x86_sse_max_ss:
5955 case Intrinsic::x86_sse2_min_sd:
5956 case Intrinsic::x86_sse_min_ss:
5957 handleBinarySdSsIntrinsic(I);
5958 break;
5959
5960 case Intrinsic::x86_avx_vtestc_pd:
5961 case Intrinsic::x86_avx_vtestc_pd_256:
5962 case Intrinsic::x86_avx_vtestc_ps:
5963 case Intrinsic::x86_avx_vtestc_ps_256:
5964 case Intrinsic::x86_avx_vtestnzc_pd:
5965 case Intrinsic::x86_avx_vtestnzc_pd_256:
5966 case Intrinsic::x86_avx_vtestnzc_ps:
5967 case Intrinsic::x86_avx_vtestnzc_ps_256:
5968 case Intrinsic::x86_avx_vtestz_pd:
5969 case Intrinsic::x86_avx_vtestz_pd_256:
5970 case Intrinsic::x86_avx_vtestz_ps:
5971 case Intrinsic::x86_avx_vtestz_ps_256:
5972 case Intrinsic::x86_avx_ptestc_256:
5973 case Intrinsic::x86_avx_ptestnzc_256:
5974 case Intrinsic::x86_avx_ptestz_256:
5975 case Intrinsic::x86_sse41_ptestc:
5976 case Intrinsic::x86_sse41_ptestnzc:
5977 case Intrinsic::x86_sse41_ptestz:
5978 handleVtestIntrinsic(I);
5979 break;
5980
5981 // Packed Horizontal Add/Subtract
5982 case Intrinsic::x86_ssse3_phadd_w:
5983 case Intrinsic::x86_ssse3_phadd_w_128:
5984 case Intrinsic::x86_avx2_phadd_w:
5985 case Intrinsic::x86_ssse3_phsub_w:
5986 case Intrinsic::x86_ssse3_phsub_w_128:
5987 case Intrinsic::x86_avx2_phsub_w: {
5988 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
5989 break;
5990 }
5991
5992 // Packed Horizontal Add/Subtract
5993 case Intrinsic::x86_ssse3_phadd_d:
5994 case Intrinsic::x86_ssse3_phadd_d_128:
5995 case Intrinsic::x86_avx2_phadd_d:
5996 case Intrinsic::x86_ssse3_phsub_d:
5997 case Intrinsic::x86_ssse3_phsub_d_128:
5998 case Intrinsic::x86_avx2_phsub_d: {
5999 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/32);
6000 break;
6001 }
6002
6003 // Packed Horizontal Add/Subtract and Saturate
6004 case Intrinsic::x86_ssse3_phadd_sw:
6005 case Intrinsic::x86_ssse3_phadd_sw_128:
6006 case Intrinsic::x86_avx2_phadd_sw:
6007 case Intrinsic::x86_ssse3_phsub_sw:
6008 case Intrinsic::x86_ssse3_phsub_sw_128:
6009 case Intrinsic::x86_avx2_phsub_sw: {
6010 handlePairwiseShadowOrIntrinsic(I, /*ReinterpretElemWidth=*/16);
6011 break;
6012 }
6013
6014 // Packed Single/Double Precision Floating-Point Horizontal Add
6015 case Intrinsic::x86_sse3_hadd_ps:
6016 case Intrinsic::x86_sse3_hadd_pd:
6017 case Intrinsic::x86_avx_hadd_pd_256:
6018 case Intrinsic::x86_avx_hadd_ps_256:
6019 case Intrinsic::x86_sse3_hsub_ps:
6020 case Intrinsic::x86_sse3_hsub_pd:
6021 case Intrinsic::x86_avx_hsub_pd_256:
6022 case Intrinsic::x86_avx_hsub_ps_256: {
6023 handlePairwiseShadowOrIntrinsic(I);
6024 break;
6025 }
6026
6027 case Intrinsic::x86_avx_maskstore_ps:
6028 case Intrinsic::x86_avx_maskstore_pd:
6029 case Intrinsic::x86_avx_maskstore_ps_256:
6030 case Intrinsic::x86_avx_maskstore_pd_256:
6031 case Intrinsic::x86_avx2_maskstore_d:
6032 case Intrinsic::x86_avx2_maskstore_q:
6033 case Intrinsic::x86_avx2_maskstore_d_256:
6034 case Intrinsic::x86_avx2_maskstore_q_256: {
6035 handleAVXMaskedStore(I);
6036 break;
6037 }
6038
6039 case Intrinsic::x86_avx_maskload_ps:
6040 case Intrinsic::x86_avx_maskload_pd:
6041 case Intrinsic::x86_avx_maskload_ps_256:
6042 case Intrinsic::x86_avx_maskload_pd_256:
6043 case Intrinsic::x86_avx2_maskload_d:
6044 case Intrinsic::x86_avx2_maskload_q:
6045 case Intrinsic::x86_avx2_maskload_d_256:
6046 case Intrinsic::x86_avx2_maskload_q_256: {
6047 handleAVXMaskedLoad(I);
6048 break;
6049 }
6050
6051 // Packed
6052 case Intrinsic::x86_avx512fp16_add_ph_512:
6053 case Intrinsic::x86_avx512fp16_sub_ph_512:
6054 case Intrinsic::x86_avx512fp16_mul_ph_512:
6055 case Intrinsic::x86_avx512fp16_div_ph_512:
6056 case Intrinsic::x86_avx512fp16_max_ph_512:
6057 case Intrinsic::x86_avx512fp16_min_ph_512:
6058 case Intrinsic::x86_avx512_min_ps_512:
6059 case Intrinsic::x86_avx512_min_pd_512:
6060 case Intrinsic::x86_avx512_max_ps_512:
6061 case Intrinsic::x86_avx512_max_pd_512: {
6062 // These AVX512 variants contain the rounding mode as a trailing flag.
6063 // Earlier variants do not have a trailing flag and are already handled
6064 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6065 // maybeHandleUnknownIntrinsic.
6066 [[maybe_unused]] bool Success =
6067 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6068 assert(Success);
6069 break;
6070 }
6071
6072 case Intrinsic::x86_avx_vpermilvar_pd:
6073 case Intrinsic::x86_avx_vpermilvar_pd_256:
6074 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6075 case Intrinsic::x86_avx_vpermilvar_ps:
6076 case Intrinsic::x86_avx_vpermilvar_ps_256:
6077 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6078 handleAVXVpermilvar(I);
6079 break;
6080 }
6081
6082 case Intrinsic::x86_avx512_vpermi2var_d_128:
6083 case Intrinsic::x86_avx512_vpermi2var_d_256:
6084 case Intrinsic::x86_avx512_vpermi2var_d_512:
6085 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6086 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6087 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6088 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6089 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6090 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6091 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6092 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6093 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6094 case Intrinsic::x86_avx512_vpermi2var_q_128:
6095 case Intrinsic::x86_avx512_vpermi2var_q_256:
6096 case Intrinsic::x86_avx512_vpermi2var_q_512:
6097 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6098 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6099 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6100 handleAVXVpermi2var(I);
6101 break;
6102
6103 // Packed Shuffle
6104 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6105 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6106 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6107 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6108 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6109 //
6110 // The following intrinsics are auto-upgraded:
6111 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6112 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6113 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6114 case Intrinsic::x86_avx2_pshuf_b:
6115 case Intrinsic::x86_sse_pshuf_w:
6116 case Intrinsic::x86_ssse3_pshuf_b_128:
6117 case Intrinsic::x86_ssse3_pshuf_b:
6118 case Intrinsic::x86_avx512_pshuf_b_512:
6119 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6120 /*trailingVerbatimArgs=*/1);
6121 break;
6122
6123 // AVX512 PMOV: Packed MOV, with truncation
6124 // Precisely handled by applying the same intrinsic to the shadow
6125 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6126 case Intrinsic::x86_avx512_mask_pmov_db_512:
6127 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6128 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6129 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6130 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6131 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6132 /*trailingVerbatimArgs=*/1);
6133 break;
6134 }
6135
6136 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6137 // Approximately handled using the corresponding truncation intrinsic
6138 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6139 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6140 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6141 handleIntrinsicByApplyingToShadow(I,
6142 Intrinsic::x86_avx512_mask_pmov_dw_512,
6143 /* trailingVerbatimArgs=*/1);
6144 break;
6145 }
6146
6147 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6148 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6149 handleIntrinsicByApplyingToShadow(I,
6150 Intrinsic::x86_avx512_mask_pmov_db_512,
6151 /* trailingVerbatimArgs=*/1);
6152 break;
6153 }
6154
6155 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6156 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6157 handleIntrinsicByApplyingToShadow(I,
6158 Intrinsic::x86_avx512_mask_pmov_qb_512,
6159 /* trailingVerbatimArgs=*/1);
6160 break;
6161 }
6162
6163 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6164 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6165 handleIntrinsicByApplyingToShadow(I,
6166 Intrinsic::x86_avx512_mask_pmov_qw_512,
6167 /* trailingVerbatimArgs=*/1);
6168 break;
6169 }
6170
6171 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6172 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6173 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6174 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6175 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6176 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6177 // slow-path handler.
6178 handleAVX512VectorDownConvert(I);
6179 break;
6180 }
6181
6182 // AVX512/AVX10 Reciprocal
6183 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6184 // (<16 x float>, <16 x float>, i16)
6185 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6186 // (<8 x float>, <8 x float>, i8)
6187 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6188 // (<4 x float>, <4 x float>, i8)
6189 //
6190 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6191 // (<8 x double>, <8 x double>, i8)
6192 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6193 // (<4 x double>, <4 x double>, i8)
6194 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6195 // (<2 x double>, <2 x double>, i8)
6196 //
6197 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6198 // (<32 x bfloat>, <32 x bfloat>, i32)
6199 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6200 // (<16 x bfloat>, <16 x bfloat>, i16)
6201 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6202 // (<8 x bfloat>, <8 x bfloat>, i8)
6203 //
6204 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6205 // (<32 x half>, <32 x half>, i32)
6206 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6207 // (<16 x half>, <16 x half>, i16)
6208 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6209 // (<8 x half>, <8 x half>, i8)
6210 //
6211 // TODO: 3-operand variants are not handled:
6212 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6213 // (<2 x double>, <2 x double>, <2 x double>, i8)
6214 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6215 // (<4 x float>, <4 x float>, <4 x float>, i8)
6216 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6217 // (<8 x half>, <8 x half>, <8 x half>, i8)
6218 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6219 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6220 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6221 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6222 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6223 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6224 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6225 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6226 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6227 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6228 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6229 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6230 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6231 /*MaskIndex=*/2);
6232 break;
6233
6234 // AVX512/AVX10 Reciprocal Square Root
6235 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6236 // (<16 x float>, <16 x float>, i16)
6237 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6238 // (<8 x float>, <8 x float>, i8)
6239 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6240 // (<4 x float>, <4 x float>, i8)
6241 //
6242 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6243 // (<8 x double>, <8 x double>, i8)
6244 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6245 // (<4 x double>, <4 x double>, i8)
6246 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6247 // (<2 x double>, <2 x double>, i8)
6248 //
6249 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6250 // (<32 x bfloat>, <32 x bfloat>, i32)
6251 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6252 // (<16 x bfloat>, <16 x bfloat>, i16)
6253 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6254 // (<8 x bfloat>, <8 x bfloat>, i8)
6255 //
6256 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6257 // (<32 x half>, <32 x half>, i32)
6258 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6259 // (<16 x half>, <16 x half>, i16)
6260 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6261 // (<8 x half>, <8 x half>, i8)
6262 //
6263 // TODO: 3-operand variants are not handled:
6264 // <2 x double> @llvm.x86.avx512.rcp14.sd
6265 // (<2 x double>, <2 x double>, <2 x double>, i8)
6266 // <4 x float> @llvm.x86.avx512.rcp14.ss
6267 // (<4 x float>, <4 x float>, <4 x float>, i8)
6268 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6269 // (<8 x half>, <8 x half>, <8 x half>, i8)
6270 case Intrinsic::x86_avx512_rcp14_ps_512:
6271 case Intrinsic::x86_avx512_rcp14_ps_256:
6272 case Intrinsic::x86_avx512_rcp14_ps_128:
6273 case Intrinsic::x86_avx512_rcp14_pd_512:
6274 case Intrinsic::x86_avx512_rcp14_pd_256:
6275 case Intrinsic::x86_avx512_rcp14_pd_128:
6276 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6277 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6278 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6279 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6280 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6281 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6282 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6283 /*MaskIndex=*/2);
6284 break;
6285
6286 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6287 // (<32 x half>, i32, <32 x half>, i32, i32)
6288 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6289 // (<16 x half>, i32, <16 x half>, i32, i16)
6290 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6291 // (<8 x half>, i32, <8 x half>, i32, i8)
6292 //
6293 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6294 // (<16 x float>, i32, <16 x float>, i16, i32)
6295 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6296 // (<8 x float>, i32, <8 x float>, i8)
6297 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6298 // (<4 x float>, i32, <4 x float>, i8)
6299 //
6300 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6301 // (<8 x double>, i32, <8 x double>, i8, i32)
6302 // A Imm WriteThru Mask Rounding
6303 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6304 // (<4 x double>, i32, <4 x double>, i8)
6305 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6306 // (<2 x double>, i32, <2 x double>, i8)
6307 // A Imm WriteThru Mask
6308 //
6309 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6310 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6311 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6312 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6313 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6314 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6315 //
6316 // Not supported: three vectors
6317 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6318 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6319 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6320 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6321 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6322 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6323 // i32)
6324 // A B WriteThru Mask Imm
6325 // Rounding
6326 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6327 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6328 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6329 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6330 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6331 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6332 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6333 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6334 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6335 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6336 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6337 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6338 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6339 /*MaskIndex=*/3);
6340 break;
6341
6342 // AVX512 FP16 Arithmetic
6343 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6344 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6345 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6346 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6347 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6348 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6349 visitGenericScalarHalfwordInst(I);
6350 break;
6351 }
6352
6353 // AVX Galois Field New Instructions
6354 case Intrinsic::x86_vgf2p8affineqb_128:
6355 case Intrinsic::x86_vgf2p8affineqb_256:
6356 case Intrinsic::x86_vgf2p8affineqb_512:
6357 handleAVXGF2P8Affine(I);
6358 break;
6359
6360 default:
6361 return false;
6362 }
6363
6364 return true;
6365 }
6366
6367 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6368 switch (I.getIntrinsicID()) {
6369 case Intrinsic::aarch64_neon_rshrn:
6370 case Intrinsic::aarch64_neon_sqrshl:
6371 case Intrinsic::aarch64_neon_sqrshrn:
6372 case Intrinsic::aarch64_neon_sqrshrun:
6373 case Intrinsic::aarch64_neon_sqshl:
6374 case Intrinsic::aarch64_neon_sqshlu:
6375 case Intrinsic::aarch64_neon_sqshrn:
6376 case Intrinsic::aarch64_neon_sqshrun:
6377 case Intrinsic::aarch64_neon_srshl:
6378 case Intrinsic::aarch64_neon_sshl:
6379 case Intrinsic::aarch64_neon_uqrshl:
6380 case Intrinsic::aarch64_neon_uqrshrn:
6381 case Intrinsic::aarch64_neon_uqshl:
6382 case Intrinsic::aarch64_neon_uqshrn:
6383 case Intrinsic::aarch64_neon_urshl:
6384 case Intrinsic::aarch64_neon_ushl:
6385 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6386 handleVectorShiftIntrinsic(I, /* Variable */ false);
6387 break;
6388
6389 // TODO: handling max/min similarly to AND/OR may be more precise
6390 // Floating-Point Maximum/Minimum Pairwise
6391 case Intrinsic::aarch64_neon_fmaxp:
6392 case Intrinsic::aarch64_neon_fminp:
6393 // Floating-Point Maximum/Minimum Number Pairwise
6394 case Intrinsic::aarch64_neon_fmaxnmp:
6395 case Intrinsic::aarch64_neon_fminnmp:
6396 // Signed/Unsigned Maximum/Minimum Pairwise
6397 case Intrinsic::aarch64_neon_smaxp:
6398 case Intrinsic::aarch64_neon_sminp:
6399 case Intrinsic::aarch64_neon_umaxp:
6400 case Intrinsic::aarch64_neon_uminp:
6401 // Add Pairwise
6402 case Intrinsic::aarch64_neon_addp:
6403 // Floating-point Add Pairwise
6404 case Intrinsic::aarch64_neon_faddp:
6405 // Add Long Pairwise
6406 case Intrinsic::aarch64_neon_saddlp:
6407 case Intrinsic::aarch64_neon_uaddlp: {
6408 handlePairwiseShadowOrIntrinsic(I);
6409 break;
6410 }
6411
6412 // Floating-point Convert to integer, rounding to nearest with ties to Away
6413 case Intrinsic::aarch64_neon_fcvtas:
6414 case Intrinsic::aarch64_neon_fcvtau:
6415 // Floating-point convert to integer, rounding toward minus infinity
6416 case Intrinsic::aarch64_neon_fcvtms:
6417 case Intrinsic::aarch64_neon_fcvtmu:
6418 // Floating-point convert to integer, rounding to nearest with ties to even
6419 case Intrinsic::aarch64_neon_fcvtns:
6420 case Intrinsic::aarch64_neon_fcvtnu:
6421 // Floating-point convert to integer, rounding toward plus infinity
6422 case Intrinsic::aarch64_neon_fcvtps:
6423 case Intrinsic::aarch64_neon_fcvtpu:
6424 // Floating-point Convert to integer, rounding toward Zero
6425 case Intrinsic::aarch64_neon_fcvtzs:
6426 case Intrinsic::aarch64_neon_fcvtzu:
6427 // Floating-point convert to lower precision narrow, rounding to odd
6428 case Intrinsic::aarch64_neon_fcvtxn: {
6429 handleNEONVectorConvertIntrinsic(I);
6430 break;
6431 }
6432
6433 // Add reduction to scalar
6434 case Intrinsic::aarch64_neon_faddv:
6435 case Intrinsic::aarch64_neon_saddv:
6436 case Intrinsic::aarch64_neon_uaddv:
6437 // Signed/Unsigned min/max (Vector)
6438 // TODO: handling similarly to AND/OR may be more precise.
6439 case Intrinsic::aarch64_neon_smaxv:
6440 case Intrinsic::aarch64_neon_sminv:
6441 case Intrinsic::aarch64_neon_umaxv:
6442 case Intrinsic::aarch64_neon_uminv:
6443 // Floating-point min/max (vector)
6444 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6445 // but our shadow propagation is the same.
6446 case Intrinsic::aarch64_neon_fmaxv:
6447 case Intrinsic::aarch64_neon_fminv:
6448 case Intrinsic::aarch64_neon_fmaxnmv:
6449 case Intrinsic::aarch64_neon_fminnmv:
6450 // Sum long across vector
6451 case Intrinsic::aarch64_neon_saddlv:
6452 case Intrinsic::aarch64_neon_uaddlv:
6453 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6454 break;
6455
6456 case Intrinsic::aarch64_neon_ld1x2:
6457 case Intrinsic::aarch64_neon_ld1x3:
6458 case Intrinsic::aarch64_neon_ld1x4:
6459 case Intrinsic::aarch64_neon_ld2:
6460 case Intrinsic::aarch64_neon_ld3:
6461 case Intrinsic::aarch64_neon_ld4:
6462 case Intrinsic::aarch64_neon_ld2r:
6463 case Intrinsic::aarch64_neon_ld3r:
6464 case Intrinsic::aarch64_neon_ld4r: {
6465 handleNEONVectorLoad(I, /*WithLane=*/false);
6466 break;
6467 }
6468
6469 case Intrinsic::aarch64_neon_ld2lane:
6470 case Intrinsic::aarch64_neon_ld3lane:
6471 case Intrinsic::aarch64_neon_ld4lane: {
6472 handleNEONVectorLoad(I, /*WithLane=*/true);
6473 break;
6474 }
6475
6476 // Saturating extract narrow
6477 case Intrinsic::aarch64_neon_sqxtn:
6478 case Intrinsic::aarch64_neon_sqxtun:
6479 case Intrinsic::aarch64_neon_uqxtn:
6480 // These only have one argument, but we (ab)use handleShadowOr because it
6481 // does work on single argument intrinsics and will typecast the shadow
6482 // (and update the origin).
6483 handleShadowOr(I);
6484 break;
6485
6486 case Intrinsic::aarch64_neon_st1x2:
6487 case Intrinsic::aarch64_neon_st1x3:
6488 case Intrinsic::aarch64_neon_st1x4:
6489 case Intrinsic::aarch64_neon_st2:
6490 case Intrinsic::aarch64_neon_st3:
6491 case Intrinsic::aarch64_neon_st4: {
6492 handleNEONVectorStoreIntrinsic(I, false);
6493 break;
6494 }
6495
6496 case Intrinsic::aarch64_neon_st2lane:
6497 case Intrinsic::aarch64_neon_st3lane:
6498 case Intrinsic::aarch64_neon_st4lane: {
6499 handleNEONVectorStoreIntrinsic(I, true);
6500 break;
6501 }
6502
6503 // Arm NEON vector table intrinsics have the source/table register(s) as
6504 // arguments, followed by the index register. They return the output.
6505 //
6506 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6507 // original value unchanged in the destination register.'
6508 // Conveniently, zero denotes a clean shadow, which means out-of-range
6509 // indices for TBL will initialize the user data with zero and also clean
6510 // the shadow. (For TBX, neither the user data nor the shadow will be
6511 // updated, which is also correct.)
6512 case Intrinsic::aarch64_neon_tbl1:
6513 case Intrinsic::aarch64_neon_tbl2:
6514 case Intrinsic::aarch64_neon_tbl3:
6515 case Intrinsic::aarch64_neon_tbl4:
6516 case Intrinsic::aarch64_neon_tbx1:
6517 case Intrinsic::aarch64_neon_tbx2:
6518 case Intrinsic::aarch64_neon_tbx3:
6519 case Intrinsic::aarch64_neon_tbx4: {
6520 // The last trailing argument (index register) should be handled verbatim
6521 handleIntrinsicByApplyingToShadow(
6522 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6523 /*trailingVerbatimArgs*/ 1);
6524 break;
6525 }
6526
6527 case Intrinsic::aarch64_neon_fmulx:
6528 case Intrinsic::aarch64_neon_pmul:
6529 case Intrinsic::aarch64_neon_pmull:
6530 case Intrinsic::aarch64_neon_smull:
6531 case Intrinsic::aarch64_neon_pmull64:
6532 case Intrinsic::aarch64_neon_umull: {
6533 handleNEONVectorMultiplyIntrinsic(I);
6534 break;
6535 }
6536
6537 default:
6538 return false;
6539 }
6540
6541 return true;
6542 }
6543
6544 void visitIntrinsicInst(IntrinsicInst &I) {
6545 if (maybeHandleCrossPlatformIntrinsic(I))
6546 return;
6547
6548 if (maybeHandleX86SIMDIntrinsic(I))
6549 return;
6550
6551 if (maybeHandleArmSIMDIntrinsic(I))
6552 return;
6553
6554 if (maybeHandleUnknownIntrinsic(I))
6555 return;
6556
6557 visitInstruction(I);
6558 }
6559
6560 void visitLibAtomicLoad(CallBase &CB) {
6561 // Since we use getNextNode here, we can't have CB terminate the BB.
6562 assert(isa<CallInst>(CB));
6563
6564 IRBuilder<> IRB(&CB);
6565 Value *Size = CB.getArgOperand(0);
6566 Value *SrcPtr = CB.getArgOperand(1);
6567 Value *DstPtr = CB.getArgOperand(2);
6568 Value *Ordering = CB.getArgOperand(3);
6569 // Convert the call to have at least Acquire ordering to make sure
6570 // the shadow operations aren't reordered before it.
6571 Value *NewOrdering =
6572 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6573 CB.setArgOperand(3, NewOrdering);
6574
6575 NextNodeIRBuilder NextIRB(&CB);
6576 Value *SrcShadowPtr, *SrcOriginPtr;
6577 std::tie(SrcShadowPtr, SrcOriginPtr) =
6578 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6579 /*isStore*/ false);
6580 Value *DstShadowPtr =
6581 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6582 /*isStore*/ true)
6583 .first;
6584
6585 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6586 if (MS.TrackOrigins) {
6587 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6589 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6590 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6591 }
6592 }
6593
6594 void visitLibAtomicStore(CallBase &CB) {
6595 IRBuilder<> IRB(&CB);
6596 Value *Size = CB.getArgOperand(0);
6597 Value *DstPtr = CB.getArgOperand(2);
6598 Value *Ordering = CB.getArgOperand(3);
6599 // Convert the call to have at least Release ordering to make sure
6600 // the shadow operations aren't reordered after it.
6601 Value *NewOrdering =
6602 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6603 CB.setArgOperand(3, NewOrdering);
6604
6605 Value *DstShadowPtr =
6606 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6607 /*isStore*/ true)
6608 .first;
6609
6610 // Atomic store always paints clean shadow/origin. See file header.
6611 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6612 Align(1));
6613 }
6614
6615 void visitCallBase(CallBase &CB) {
6616 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6617 if (CB.isInlineAsm()) {
6618 // For inline asm (either a call to asm function, or callbr instruction),
6619 // do the usual thing: check argument shadow and mark all outputs as
6620 // clean. Note that any side effects of the inline asm that are not
6621 // immediately visible in its constraints are not handled.
6623 visitAsmInstruction(CB);
6624 else
6625 visitInstruction(CB);
6626 return;
6627 }
6628 LibFunc LF;
6629 if (TLI->getLibFunc(CB, LF)) {
6630 // libatomic.a functions need to have special handling because there isn't
6631 // a good way to intercept them or compile the library with
6632 // instrumentation.
6633 switch (LF) {
6634 case LibFunc_atomic_load:
6635 if (!isa<CallInst>(CB)) {
6636 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6637 "Ignoring!\n";
6638 break;
6639 }
6640 visitLibAtomicLoad(CB);
6641 return;
6642 case LibFunc_atomic_store:
6643 visitLibAtomicStore(CB);
6644 return;
6645 default:
6646 break;
6647 }
6648 }
6649
6650 if (auto *Call = dyn_cast<CallInst>(&CB)) {
6651 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
6652
6653 // We are going to insert code that relies on the fact that the callee
6654 // will become a non-readonly function after it is instrumented by us. To
6655 // prevent this code from being optimized out, mark that function
6656 // non-readonly in advance.
6657 // TODO: We can likely do better than dropping memory() completely here.
6658 AttributeMask B;
6659 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
6660
6662 if (Function *Func = Call->getCalledFunction()) {
6663 Func->removeFnAttrs(B);
6664 }
6665
6667 }
6668 IRBuilder<> IRB(&CB);
6669 bool MayCheckCall = MS.EagerChecks;
6670 if (Function *Func = CB.getCalledFunction()) {
6671 // __sanitizer_unaligned_{load,store} functions may be called by users
6672 // and always expects shadows in the TLS. So don't check them.
6673 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
6674 }
6675
6676 unsigned ArgOffset = 0;
6677 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
6678 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
6679 if (!A->getType()->isSized()) {
6680 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
6681 continue;
6682 }
6683
6684 if (A->getType()->isScalableTy()) {
6685 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
6686 // Handle as noundef, but don't reserve tls slots.
6687 insertCheckShadowOf(A, &CB);
6688 continue;
6689 }
6690
6691 unsigned Size = 0;
6692 const DataLayout &DL = F.getDataLayout();
6693
6694 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
6695 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
6696 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
6697
6698 if (EagerCheck) {
6699 insertCheckShadowOf(A, &CB);
6700 Size = DL.getTypeAllocSize(A->getType());
6701 } else {
6702 [[maybe_unused]] Value *Store = nullptr;
6703 // Compute the Shadow for arg even if it is ByVal, because
6704 // in that case getShadow() will copy the actual arg shadow to
6705 // __msan_param_tls.
6706 Value *ArgShadow = getShadow(A);
6707 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
6708 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
6709 << " Shadow: " << *ArgShadow << "\n");
6710 if (ByVal) {
6711 // ByVal requires some special handling as it's too big for a single
6712 // load
6713 assert(A->getType()->isPointerTy() &&
6714 "ByVal argument is not a pointer!");
6715 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
6716 if (ArgOffset + Size > kParamTLSSize)
6717 break;
6718 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
6719 MaybeAlign Alignment = std::nullopt;
6720 if (ParamAlignment)
6721 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
6722 Value *AShadowPtr, *AOriginPtr;
6723 std::tie(AShadowPtr, AOriginPtr) =
6724 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
6725 /*isStore*/ false);
6726 if (!PropagateShadow) {
6727 Store = IRB.CreateMemSet(ArgShadowBase,
6729 Size, Alignment);
6730 } else {
6731 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
6732 Alignment, Size);
6733 if (MS.TrackOrigins) {
6734 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
6735 // FIXME: OriginSize should be:
6736 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
6737 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
6738 IRB.CreateMemCpy(
6739 ArgOriginBase,
6740 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
6741 AOriginPtr,
6742 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
6743 }
6744 }
6745 } else {
6746 // Any other parameters mean we need bit-grained tracking of uninit
6747 // data
6748 Size = DL.getTypeAllocSize(A->getType());
6749 if (ArgOffset + Size > kParamTLSSize)
6750 break;
6751 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
6753 Constant *Cst = dyn_cast<Constant>(ArgShadow);
6754 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
6755 IRB.CreateStore(getOrigin(A),
6756 getOriginPtrForArgument(IRB, ArgOffset));
6757 }
6758 }
6759 assert(Store != nullptr);
6760 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
6761 }
6762 assert(Size != 0);
6763 ArgOffset += alignTo(Size, kShadowTLSAlignment);
6764 }
6765 LLVM_DEBUG(dbgs() << " done with call args\n");
6766
6767 FunctionType *FT = CB.getFunctionType();
6768 if (FT->isVarArg()) {
6769 VAHelper->visitCallBase(CB, IRB);
6770 }
6771
6772 // Now, get the shadow for the RetVal.
6773 if (!CB.getType()->isSized())
6774 return;
6775 // Don't emit the epilogue for musttail call returns.
6776 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
6777 return;
6778
6779 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
6780 setShadow(&CB, getCleanShadow(&CB));
6781 setOrigin(&CB, getCleanOrigin());
6782 return;
6783 }
6784
6785 IRBuilder<> IRBBefore(&CB);
6786 // Until we have full dynamic coverage, make sure the retval shadow is 0.
6787 Value *Base = getShadowPtrForRetval(IRBBefore);
6788 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
6790 BasicBlock::iterator NextInsn;
6791 if (isa<CallInst>(CB)) {
6792 NextInsn = ++CB.getIterator();
6793 assert(NextInsn != CB.getParent()->end());
6794 } else {
6795 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
6796 if (!NormalDest->getSinglePredecessor()) {
6797 // FIXME: this case is tricky, so we are just conservative here.
6798 // Perhaps we need to split the edge between this BB and NormalDest,
6799 // but a naive attempt to use SplitEdge leads to a crash.
6800 setShadow(&CB, getCleanShadow(&CB));
6801 setOrigin(&CB, getCleanOrigin());
6802 return;
6803 }
6804 // FIXME: NextInsn is likely in a basic block that has not been visited
6805 // yet. Anything inserted there will be instrumented by MSan later!
6806 NextInsn = NormalDest->getFirstInsertionPt();
6807 assert(NextInsn != NormalDest->end() &&
6808 "Could not find insertion point for retval shadow load");
6809 }
6810 IRBuilder<> IRBAfter(&*NextInsn);
6811 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
6812 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
6813 "_msret");
6814 setShadow(&CB, RetvalShadow);
6815 if (MS.TrackOrigins)
6816 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
6817 }
6818
6819 bool isAMustTailRetVal(Value *RetVal) {
6820 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
6821 RetVal = I->getOperand(0);
6822 }
6823 if (auto *I = dyn_cast<CallInst>(RetVal)) {
6824 return I->isMustTailCall();
6825 }
6826 return false;
6827 }
6828
6829 void visitReturnInst(ReturnInst &I) {
6830 IRBuilder<> IRB(&I);
6831 Value *RetVal = I.getReturnValue();
6832 if (!RetVal)
6833 return;
6834 // Don't emit the epilogue for musttail call returns.
6835 if (isAMustTailRetVal(RetVal))
6836 return;
6837 Value *ShadowPtr = getShadowPtrForRetval(IRB);
6838 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
6839 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
6840 // FIXME: Consider using SpecialCaseList to specify a list of functions that
6841 // must always return fully initialized values. For now, we hardcode "main".
6842 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
6843
6844 Value *Shadow = getShadow(RetVal);
6845 bool StoreOrigin = true;
6846 if (EagerCheck) {
6847 insertCheckShadowOf(RetVal, &I);
6848 Shadow = getCleanShadow(RetVal);
6849 StoreOrigin = false;
6850 }
6851
6852 // The caller may still expect information passed over TLS if we pass our
6853 // check
6854 if (StoreShadow) {
6855 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
6856 if (MS.TrackOrigins && StoreOrigin)
6857 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
6858 }
6859 }
6860
6861 void visitPHINode(PHINode &I) {
6862 IRBuilder<> IRB(&I);
6863 if (!PropagateShadow) {
6864 setShadow(&I, getCleanShadow(&I));
6865 setOrigin(&I, getCleanOrigin());
6866 return;
6867 }
6868
6869 ShadowPHINodes.push_back(&I);
6870 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
6871 "_msphi_s"));
6872 if (MS.TrackOrigins)
6873 setOrigin(
6874 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
6875 }
6876
6877 Value *getLocalVarIdptr(AllocaInst &I) {
6878 ConstantInt *IntConst =
6879 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
6880 return new GlobalVariable(*F.getParent(), IntConst->getType(),
6881 /*isConstant=*/false, GlobalValue::PrivateLinkage,
6882 IntConst);
6883 }
6884
6885 Value *getLocalVarDescription(AllocaInst &I) {
6886 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
6887 }
6888
6889 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6890 if (PoisonStack && ClPoisonStackWithCall) {
6891 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
6892 } else {
6893 Value *ShadowBase, *OriginBase;
6894 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
6895 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
6896
6897 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
6898 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
6899 }
6900
6901 if (PoisonStack && MS.TrackOrigins) {
6902 Value *Idptr = getLocalVarIdptr(I);
6903 if (ClPrintStackNames) {
6904 Value *Descr = getLocalVarDescription(I);
6905 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
6906 {&I, Len, Idptr, Descr});
6907 } else {
6908 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
6909 }
6910 }
6911 }
6912
6913 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
6914 Value *Descr = getLocalVarDescription(I);
6915 if (PoisonStack) {
6916 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
6917 } else {
6918 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
6919 }
6920 }
6921
6922 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
6923 if (!InsPoint)
6924 InsPoint = &I;
6925 NextNodeIRBuilder IRB(InsPoint);
6926 const DataLayout &DL = F.getDataLayout();
6927 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
6928 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
6929 if (I.isArrayAllocation())
6930 Len = IRB.CreateMul(Len,
6931 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
6932
6933 if (MS.CompileKernel)
6934 poisonAllocaKmsan(I, IRB, Len);
6935 else
6936 poisonAllocaUserspace(I, IRB, Len);
6937 }
6938
6939 void visitAllocaInst(AllocaInst &I) {
6940 setShadow(&I, getCleanShadow(&I));
6941 setOrigin(&I, getCleanOrigin());
6942 // We'll get to this alloca later unless it's poisoned at the corresponding
6943 // llvm.lifetime.start.
6944 AllocaSet.insert(&I);
6945 }
6946
6947 void visitSelectInst(SelectInst &I) {
6948 // a = select b, c, d
6949 Value *B = I.getCondition();
6950 Value *C = I.getTrueValue();
6951 Value *D = I.getFalseValue();
6952
6953 handleSelectLikeInst(I, B, C, D);
6954 }
6955
6956 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
6957 IRBuilder<> IRB(&I);
6958
6959 Value *Sb = getShadow(B);
6960 Value *Sc = getShadow(C);
6961 Value *Sd = getShadow(D);
6962
6963 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
6964 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
6965 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
6966
6967 // Result shadow if condition shadow is 0.
6968 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
6969 Value *Sa1;
6970 if (I.getType()->isAggregateType()) {
6971 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
6972 // an extra "select". This results in much more compact IR.
6973 // Sa = select Sb, poisoned, (select b, Sc, Sd)
6974 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
6975 } else {
6976 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
6977 // If Sb (condition is poisoned), look for bits in c and d that are equal
6978 // and both unpoisoned.
6979 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
6980
6981 // Cast arguments to shadow-compatible type.
6982 C = CreateAppToShadowCast(IRB, C);
6983 D = CreateAppToShadowCast(IRB, D);
6984
6985 // Result shadow if condition shadow is 1.
6986 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
6987 }
6988 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
6989 setShadow(&I, Sa);
6990 if (MS.TrackOrigins) {
6991 // Origins are always i32, so any vector conditions must be flattened.
6992 // FIXME: consider tracking vector origins for app vectors?
6993 if (B->getType()->isVectorTy()) {
6994 B = convertToBool(B, IRB);
6995 Sb = convertToBool(Sb, IRB);
6996 }
6997 // a = select b, c, d
6998 // Oa = Sb ? Ob : (b ? Oc : Od)
6999 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7000 }
7001 }
7002
7003 void visitLandingPadInst(LandingPadInst &I) {
7004 // Do nothing.
7005 // See https://github.com/google/sanitizers/issues/504
7006 setShadow(&I, getCleanShadow(&I));
7007 setOrigin(&I, getCleanOrigin());
7008 }
7009
7010 void visitCatchSwitchInst(CatchSwitchInst &I) {
7011 setShadow(&I, getCleanShadow(&I));
7012 setOrigin(&I, getCleanOrigin());
7013 }
7014
7015 void visitFuncletPadInst(FuncletPadInst &I) {
7016 setShadow(&I, getCleanShadow(&I));
7017 setOrigin(&I, getCleanOrigin());
7018 }
7019
7020 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7021
7022 void visitExtractValueInst(ExtractValueInst &I) {
7023 IRBuilder<> IRB(&I);
7024 Value *Agg = I.getAggregateOperand();
7025 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7026 Value *AggShadow = getShadow(Agg);
7027 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7028 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7029 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7030 setShadow(&I, ResShadow);
7031 setOriginForNaryOp(I);
7032 }
7033
7034 void visitInsertValueInst(InsertValueInst &I) {
7035 IRBuilder<> IRB(&I);
7036 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7037 Value *AggShadow = getShadow(I.getAggregateOperand());
7038 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7039 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7040 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7041 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7042 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7043 setShadow(&I, Res);
7044 setOriginForNaryOp(I);
7045 }
7046
7047 void dumpInst(Instruction &I) {
7048 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7049 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7050 } else {
7051 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7052 }
7053 errs() << "QQQ " << I << "\n";
7054 }
7055
7056 void visitResumeInst(ResumeInst &I) {
7057 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7058 // Nothing to do here.
7059 }
7060
7061 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7062 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7063 // Nothing to do here.
7064 }
7065
7066 void visitCatchReturnInst(CatchReturnInst &CRI) {
7067 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7068 // Nothing to do here.
7069 }
7070
7071 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7072 IRBuilder<> &IRB, const DataLayout &DL,
7073 bool isOutput) {
7074 // For each assembly argument, we check its value for being initialized.
7075 // If the argument is a pointer, we assume it points to a single element
7076 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7077 // Each such pointer is instrumented with a call to the runtime library.
7078 Type *OpType = Operand->getType();
7079 // Check the operand value itself.
7080 insertCheckShadowOf(Operand, &I);
7081 if (!OpType->isPointerTy() || !isOutput) {
7082 assert(!isOutput);
7083 return;
7084 }
7085 if (!ElemTy->isSized())
7086 return;
7087 auto Size = DL.getTypeStoreSize(ElemTy);
7088 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7089 if (MS.CompileKernel) {
7090 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7091 } else {
7092 // ElemTy, derived from elementtype(), does not encode the alignment of
7093 // the pointer. Conservatively assume that the shadow memory is unaligned.
7094 // When Size is large, avoid StoreInst as it would expand to many
7095 // instructions.
7096 auto [ShadowPtr, _] =
7097 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7098 if (Size <= 32)
7099 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7100 else
7101 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7102 SizeVal, Align(1));
7103 }
7104 }
7105
7106 /// Get the number of output arguments returned by pointers.
7107 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7108 int NumRetOutputs = 0;
7109 int NumOutputs = 0;
7110 Type *RetTy = cast<Value>(CB)->getType();
7111 if (!RetTy->isVoidTy()) {
7112 // Register outputs are returned via the CallInst return value.
7113 auto *ST = dyn_cast<StructType>(RetTy);
7114 if (ST)
7115 NumRetOutputs = ST->getNumElements();
7116 else
7117 NumRetOutputs = 1;
7118 }
7119 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7120 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7121 switch (Info.Type) {
7123 NumOutputs++;
7124 break;
7125 default:
7126 break;
7127 }
7128 }
7129 return NumOutputs - NumRetOutputs;
7130 }
7131
7132 void visitAsmInstruction(Instruction &I) {
7133 // Conservative inline assembly handling: check for poisoned shadow of
7134 // asm() arguments, then unpoison the result and all the memory locations
7135 // pointed to by those arguments.
7136 // An inline asm() statement in C++ contains lists of input and output
7137 // arguments used by the assembly code. These are mapped to operands of the
7138 // CallInst as follows:
7139 // - nR register outputs ("=r) are returned by value in a single structure
7140 // (SSA value of the CallInst);
7141 // - nO other outputs ("=m" and others) are returned by pointer as first
7142 // nO operands of the CallInst;
7143 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7144 // remaining nI operands.
7145 // The total number of asm() arguments in the source is nR+nO+nI, and the
7146 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7147 // function to be called).
7148 const DataLayout &DL = F.getDataLayout();
7149 CallBase *CB = cast<CallBase>(&I);
7150 IRBuilder<> IRB(&I);
7151 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7152 int OutputArgs = getNumOutputArgs(IA, CB);
7153 // The last operand of a CallInst is the function itself.
7154 int NumOperands = CB->getNumOperands() - 1;
7155
7156 // Check input arguments. Doing so before unpoisoning output arguments, so
7157 // that we won't overwrite uninit values before checking them.
7158 for (int i = OutputArgs; i < NumOperands; i++) {
7159 Value *Operand = CB->getOperand(i);
7160 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7161 /*isOutput*/ false);
7162 }
7163 // Unpoison output arguments. This must happen before the actual InlineAsm
7164 // call, so that the shadow for memory published in the asm() statement
7165 // remains valid.
7166 for (int i = 0; i < OutputArgs; i++) {
7167 Value *Operand = CB->getOperand(i);
7168 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7169 /*isOutput*/ true);
7170 }
7171
7172 setShadow(&I, getCleanShadow(&I));
7173 setOrigin(&I, getCleanOrigin());
7174 }
7175
7176 void visitFreezeInst(FreezeInst &I) {
7177 // Freeze always returns a fully defined value.
7178 setShadow(&I, getCleanShadow(&I));
7179 setOrigin(&I, getCleanOrigin());
7180 }
7181
7182 void visitInstruction(Instruction &I) {
7183 // Everything else: stop propagating and check for poisoned shadow.
7185 dumpInst(I);
7186 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7187 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7188 Value *Operand = I.getOperand(i);
7189 if (Operand->getType()->isSized())
7190 insertCheckShadowOf(Operand, &I);
7191 }
7192 setShadow(&I, getCleanShadow(&I));
7193 setOrigin(&I, getCleanOrigin());
7194 }
7195};
7196
7197struct VarArgHelperBase : public VarArgHelper {
7198 Function &F;
7199 MemorySanitizer &MS;
7200 MemorySanitizerVisitor &MSV;
7201 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7202 const unsigned VAListTagSize;
7203
7204 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7205 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7206 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7207
7208 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7209 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7210 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7211 }
7212
7213 /// Compute the shadow address for a given va_arg.
7214 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7215 return IRB.CreatePtrAdd(
7216 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7217 }
7218
7219 /// Compute the shadow address for a given va_arg.
7220 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7221 unsigned ArgSize) {
7222 // Make sure we don't overflow __msan_va_arg_tls.
7223 if (ArgOffset + ArgSize > kParamTLSSize)
7224 return nullptr;
7225 return getShadowPtrForVAArgument(IRB, ArgOffset);
7226 }
7227
7228 /// Compute the origin address for a given va_arg.
7229 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7230 // getOriginPtrForVAArgument() is always called after
7231 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7232 // overflow.
7233 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7234 ConstantInt::get(MS.IntptrTy, ArgOffset),
7235 "_msarg_va_o");
7236 }
7237
7238 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7239 unsigned BaseOffset) {
7240 // The tails of __msan_va_arg_tls is not large enough to fit full
7241 // value shadow, but it will be copied to backup anyway. Make it
7242 // clean.
7243 if (BaseOffset >= kParamTLSSize)
7244 return;
7245 Value *TailSize =
7246 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7247 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7248 TailSize, Align(8));
7249 }
7250
7251 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7252 IRBuilder<> IRB(&I);
7253 Value *VAListTag = I.getArgOperand(0);
7254 const Align Alignment = Align(8);
7255 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7256 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7257 // Unpoison the whole __va_list_tag.
7258 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7259 VAListTagSize, Alignment, false);
7260 }
7261
7262 void visitVAStartInst(VAStartInst &I) override {
7263 if (F.getCallingConv() == CallingConv::Win64)
7264 return;
7265 VAStartInstrumentationList.push_back(&I);
7266 unpoisonVAListTagForInst(I);
7267 }
7268
7269 void visitVACopyInst(VACopyInst &I) override {
7270 if (F.getCallingConv() == CallingConv::Win64)
7271 return;
7272 unpoisonVAListTagForInst(I);
7273 }
7274};
7275
7276/// AMD64-specific implementation of VarArgHelper.
7277struct VarArgAMD64Helper : public VarArgHelperBase {
7278 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7279 // See a comment in visitCallBase for more details.
7280 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7281 static const unsigned AMD64FpEndOffsetSSE = 176;
7282 // If SSE is disabled, fp_offset in va_list is zero.
7283 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7284
7285 unsigned AMD64FpEndOffset;
7286 AllocaInst *VAArgTLSCopy = nullptr;
7287 AllocaInst *VAArgTLSOriginCopy = nullptr;
7288 Value *VAArgOverflowSize = nullptr;
7289
7290 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7291
7292 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7293 MemorySanitizerVisitor &MSV)
7294 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7295 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7296 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7297 if (Attr.isStringAttribute() &&
7298 (Attr.getKindAsString() == "target-features")) {
7299 if (Attr.getValueAsString().contains("-sse"))
7300 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7301 break;
7302 }
7303 }
7304 }
7305
7306 ArgKind classifyArgument(Value *arg) {
7307 // A very rough approximation of X86_64 argument classification rules.
7308 Type *T = arg->getType();
7309 if (T->isX86_FP80Ty())
7310 return AK_Memory;
7311 if (T->isFPOrFPVectorTy())
7312 return AK_FloatingPoint;
7313 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7314 return AK_GeneralPurpose;
7315 if (T->isPointerTy())
7316 return AK_GeneralPurpose;
7317 return AK_Memory;
7318 }
7319
7320 // For VarArg functions, store the argument shadow in an ABI-specific format
7321 // that corresponds to va_list layout.
7322 // We do this because Clang lowers va_arg in the frontend, and this pass
7323 // only sees the low level code that deals with va_list internals.
7324 // A much easier alternative (provided that Clang emits va_arg instructions)
7325 // would have been to associate each live instance of va_list with a copy of
7326 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7327 // order.
7328 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7329 unsigned GpOffset = 0;
7330 unsigned FpOffset = AMD64GpEndOffset;
7331 unsigned OverflowOffset = AMD64FpEndOffset;
7332 const DataLayout &DL = F.getDataLayout();
7333
7334 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7335 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7336 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7337 if (IsByVal) {
7338 // ByVal arguments always go to the overflow area.
7339 // Fixed arguments passed through the overflow area will be stepped
7340 // over by va_start, so don't count them towards the offset.
7341 if (IsFixed)
7342 continue;
7343 assert(A->getType()->isPointerTy());
7344 Type *RealTy = CB.getParamByValType(ArgNo);
7345 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7346 uint64_t AlignedSize = alignTo(ArgSize, 8);
7347 unsigned BaseOffset = OverflowOffset;
7348 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7349 Value *OriginBase = nullptr;
7350 if (MS.TrackOrigins)
7351 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7352 OverflowOffset += AlignedSize;
7353
7354 if (OverflowOffset > kParamTLSSize) {
7355 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7356 continue; // We have no space to copy shadow there.
7357 }
7358
7359 Value *ShadowPtr, *OriginPtr;
7360 std::tie(ShadowPtr, OriginPtr) =
7361 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7362 /*isStore*/ false);
7363 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7364 kShadowTLSAlignment, ArgSize);
7365 if (MS.TrackOrigins)
7366 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7367 kShadowTLSAlignment, ArgSize);
7368 } else {
7369 ArgKind AK = classifyArgument(A);
7370 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7371 AK = AK_Memory;
7372 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7373 AK = AK_Memory;
7374 Value *ShadowBase, *OriginBase = nullptr;
7375 switch (AK) {
7376 case AK_GeneralPurpose:
7377 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7378 if (MS.TrackOrigins)
7379 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7380 GpOffset += 8;
7381 assert(GpOffset <= kParamTLSSize);
7382 break;
7383 case AK_FloatingPoint:
7384 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7385 if (MS.TrackOrigins)
7386 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7387 FpOffset += 16;
7388 assert(FpOffset <= kParamTLSSize);
7389 break;
7390 case AK_Memory:
7391 if (IsFixed)
7392 continue;
7393 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7394 uint64_t AlignedSize = alignTo(ArgSize, 8);
7395 unsigned BaseOffset = OverflowOffset;
7396 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7397 if (MS.TrackOrigins) {
7398 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7399 }
7400 OverflowOffset += AlignedSize;
7401 if (OverflowOffset > kParamTLSSize) {
7402 // We have no space to copy shadow there.
7403 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7404 continue;
7405 }
7406 }
7407 // Take fixed arguments into account for GpOffset and FpOffset,
7408 // but don't actually store shadows for them.
7409 // TODO(glider): don't call get*PtrForVAArgument() for them.
7410 if (IsFixed)
7411 continue;
7412 Value *Shadow = MSV.getShadow(A);
7413 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7414 if (MS.TrackOrigins) {
7415 Value *Origin = MSV.getOrigin(A);
7416 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7417 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7419 }
7420 }
7421 }
7422 Constant *OverflowSize =
7423 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7424 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7425 }
7426
7427 void finalizeInstrumentation() override {
7428 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7429 "finalizeInstrumentation called twice");
7430 if (!VAStartInstrumentationList.empty()) {
7431 // If there is a va_start in this function, make a backup copy of
7432 // va_arg_tls somewhere in the function entry block.
7433 IRBuilder<> IRB(MSV.FnPrologueEnd);
7434 VAArgOverflowSize =
7435 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7436 Value *CopySize = IRB.CreateAdd(
7437 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7438 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7439 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7440 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7441 CopySize, kShadowTLSAlignment, false);
7442
7443 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7444 Intrinsic::umin, CopySize,
7445 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7446 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7447 kShadowTLSAlignment, SrcSize);
7448 if (MS.TrackOrigins) {
7449 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7450 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7451 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7452 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7453 }
7454 }
7455
7456 // Instrument va_start.
7457 // Copy va_list shadow from the backup copy of the TLS contents.
7458 for (CallInst *OrigInst : VAStartInstrumentationList) {
7459 NextNodeIRBuilder IRB(OrigInst);
7460 Value *VAListTag = OrigInst->getArgOperand(0);
7461
7462 Value *RegSaveAreaPtrPtr =
7463 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7464 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7465 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7466 const Align Alignment = Align(16);
7467 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7468 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7469 Alignment, /*isStore*/ true);
7470 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7471 AMD64FpEndOffset);
7472 if (MS.TrackOrigins)
7473 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7474 Alignment, AMD64FpEndOffset);
7475 Value *OverflowArgAreaPtrPtr =
7476 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7477 Value *OverflowArgAreaPtr =
7478 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7479 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7480 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7481 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7482 Alignment, /*isStore*/ true);
7483 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7484 AMD64FpEndOffset);
7485 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7486 VAArgOverflowSize);
7487 if (MS.TrackOrigins) {
7488 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7489 AMD64FpEndOffset);
7490 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7491 VAArgOverflowSize);
7492 }
7493 }
7494 }
7495};
7496
7497/// AArch64-specific implementation of VarArgHelper.
7498struct VarArgAArch64Helper : public VarArgHelperBase {
7499 static const unsigned kAArch64GrArgSize = 64;
7500 static const unsigned kAArch64VrArgSize = 128;
7501
7502 static const unsigned AArch64GrBegOffset = 0;
7503 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7504 // Make VR space aligned to 16 bytes.
7505 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7506 static const unsigned AArch64VrEndOffset =
7507 AArch64VrBegOffset + kAArch64VrArgSize;
7508 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7509
7510 AllocaInst *VAArgTLSCopy = nullptr;
7511 Value *VAArgOverflowSize = nullptr;
7512
7513 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7514
7515 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7516 MemorySanitizerVisitor &MSV)
7517 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7518
7519 // A very rough approximation of aarch64 argument classification rules.
7520 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7521 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7522 return {AK_GeneralPurpose, 1};
7523 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7524 return {AK_FloatingPoint, 1};
7525
7526 if (T->isArrayTy()) {
7527 auto R = classifyArgument(T->getArrayElementType());
7528 R.second *= T->getScalarType()->getArrayNumElements();
7529 return R;
7530 }
7531
7532 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7533 auto R = classifyArgument(FV->getScalarType());
7534 R.second *= FV->getNumElements();
7535 return R;
7536 }
7537
7538 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7539 return {AK_Memory, 0};
7540 }
7541
7542 // The instrumentation stores the argument shadow in a non ABI-specific
7543 // format because it does not know which argument is named (since Clang,
7544 // like x86_64 case, lowers the va_args in the frontend and this pass only
7545 // sees the low level code that deals with va_list internals).
7546 // The first seven GR registers are saved in the first 56 bytes of the
7547 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7548 // the remaining arguments.
7549 // Using constant offset within the va_arg TLS array allows fast copy
7550 // in the finalize instrumentation.
7551 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7552 unsigned GrOffset = AArch64GrBegOffset;
7553 unsigned VrOffset = AArch64VrBegOffset;
7554 unsigned OverflowOffset = AArch64VAEndOffset;
7555
7556 const DataLayout &DL = F.getDataLayout();
7557 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7558 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7559 auto [AK, RegNum] = classifyArgument(A->getType());
7560 if (AK == AK_GeneralPurpose &&
7561 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7562 AK = AK_Memory;
7563 if (AK == AK_FloatingPoint &&
7564 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7565 AK = AK_Memory;
7566 Value *Base;
7567 switch (AK) {
7568 case AK_GeneralPurpose:
7569 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7570 GrOffset += 8 * RegNum;
7571 break;
7572 case AK_FloatingPoint:
7573 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7574 VrOffset += 16 * RegNum;
7575 break;
7576 case AK_Memory:
7577 // Don't count fixed arguments in the overflow area - va_start will
7578 // skip right over them.
7579 if (IsFixed)
7580 continue;
7581 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7582 uint64_t AlignedSize = alignTo(ArgSize, 8);
7583 unsigned BaseOffset = OverflowOffset;
7584 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7585 OverflowOffset += AlignedSize;
7586 if (OverflowOffset > kParamTLSSize) {
7587 // We have no space to copy shadow there.
7588 CleanUnusedTLS(IRB, Base, BaseOffset);
7589 continue;
7590 }
7591 break;
7592 }
7593 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7594 // bother to actually store a shadow.
7595 if (IsFixed)
7596 continue;
7597 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7598 }
7599 Constant *OverflowSize =
7600 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7601 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7602 }
7603
7604 // Retrieve a va_list field of 'void*' size.
7605 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7606 Value *SaveAreaPtrPtr =
7607 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7608 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7609 }
7610
7611 // Retrieve a va_list field of 'int' size.
7612 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7613 Value *SaveAreaPtr =
7614 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7615 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7616 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7617 }
7618
7619 void finalizeInstrumentation() override {
7620 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7621 "finalizeInstrumentation called twice");
7622 if (!VAStartInstrumentationList.empty()) {
7623 // If there is a va_start in this function, make a backup copy of
7624 // va_arg_tls somewhere in the function entry block.
7625 IRBuilder<> IRB(MSV.FnPrologueEnd);
7626 VAArgOverflowSize =
7627 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7628 Value *CopySize = IRB.CreateAdd(
7629 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7630 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7631 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7632 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7633 CopySize, kShadowTLSAlignment, false);
7634
7635 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7636 Intrinsic::umin, CopySize,
7637 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7638 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7639 kShadowTLSAlignment, SrcSize);
7640 }
7641
7642 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
7643 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
7644
7645 // Instrument va_start, copy va_list shadow from the backup copy of
7646 // the TLS contents.
7647 for (CallInst *OrigInst : VAStartInstrumentationList) {
7648 NextNodeIRBuilder IRB(OrigInst);
7649
7650 Value *VAListTag = OrigInst->getArgOperand(0);
7651
7652 // The variadic ABI for AArch64 creates two areas to save the incoming
7653 // argument registers (one for 64-bit general register xn-x7 and another
7654 // for 128-bit FP/SIMD vn-v7).
7655 // We need then to propagate the shadow arguments on both regions
7656 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
7657 // The remaining arguments are saved on shadow for 'va::stack'.
7658 // One caveat is it requires only to propagate the non-named arguments,
7659 // however on the call site instrumentation 'all' the arguments are
7660 // saved. So to copy the shadow values from the va_arg TLS array
7661 // we need to adjust the offset for both GR and VR fields based on
7662 // the __{gr,vr}_offs value (since they are stores based on incoming
7663 // named arguments).
7664 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
7665
7666 // Read the stack pointer from the va_list.
7667 Value *StackSaveAreaPtr =
7668 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
7669
7670 // Read both the __gr_top and __gr_off and add them up.
7671 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
7672 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
7673
7674 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
7675 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
7676
7677 // Read both the __vr_top and __vr_off and add them up.
7678 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
7679 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
7680
7681 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
7682 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
7683
7684 // It does not know how many named arguments is being used and, on the
7685 // callsite all the arguments were saved. Since __gr_off is defined as
7686 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
7687 // argument by ignoring the bytes of shadow from named arguments.
7688 Value *GrRegSaveAreaShadowPtrOff =
7689 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
7690
7691 Value *GrRegSaveAreaShadowPtr =
7692 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7693 Align(8), /*isStore*/ true)
7694 .first;
7695
7696 Value *GrSrcPtr =
7697 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
7698 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
7699
7700 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
7701 GrCopySize);
7702
7703 // Again, but for FP/SIMD values.
7704 Value *VrRegSaveAreaShadowPtrOff =
7705 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
7706
7707 Value *VrRegSaveAreaShadowPtr =
7708 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7709 Align(8), /*isStore*/ true)
7710 .first;
7711
7712 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
7713 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
7714 IRB.getInt32(AArch64VrBegOffset)),
7715 VrRegSaveAreaShadowPtrOff);
7716 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
7717
7718 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
7719 VrCopySize);
7720
7721 // And finally for remaining arguments.
7722 Value *StackSaveAreaShadowPtr =
7723 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
7724 Align(16), /*isStore*/ true)
7725 .first;
7726
7727 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
7728 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
7729
7730 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
7731 Align(16), VAArgOverflowSize);
7732 }
7733 }
7734};
7735
7736/// PowerPC64-specific implementation of VarArgHelper.
7737struct VarArgPowerPC64Helper : public VarArgHelperBase {
7738 AllocaInst *VAArgTLSCopy = nullptr;
7739 Value *VAArgSize = nullptr;
7740
7741 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
7742 MemorySanitizerVisitor &MSV)
7743 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
7744
7745 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7746 // For PowerPC, we need to deal with alignment of stack arguments -
7747 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
7748 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
7749 // For that reason, we compute current offset from stack pointer (which is
7750 // always properly aligned), and offset for the first vararg, then subtract
7751 // them.
7752 unsigned VAArgBase;
7753 Triple TargetTriple(F.getParent()->getTargetTriple());
7754 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
7755 // and 32 bytes for ABIv2. This is usually determined by target
7756 // endianness, but in theory could be overridden by function attribute.
7757 if (TargetTriple.isPPC64ELFv2ABI())
7758 VAArgBase = 32;
7759 else
7760 VAArgBase = 48;
7761 unsigned VAArgOffset = VAArgBase;
7762 const DataLayout &DL = F.getDataLayout();
7763 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7764 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7765 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7766 if (IsByVal) {
7767 assert(A->getType()->isPointerTy());
7768 Type *RealTy = CB.getParamByValType(ArgNo);
7769 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7770 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
7771 if (ArgAlign < 8)
7772 ArgAlign = Align(8);
7773 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7774 if (!IsFixed) {
7775 Value *Base =
7776 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7777 if (Base) {
7778 Value *AShadowPtr, *AOriginPtr;
7779 std::tie(AShadowPtr, AOriginPtr) =
7780 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7781 kShadowTLSAlignment, /*isStore*/ false);
7782
7783 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7784 kShadowTLSAlignment, ArgSize);
7785 }
7786 }
7787 VAArgOffset += alignTo(ArgSize, Align(8));
7788 } else {
7789 Value *Base;
7790 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7791 Align ArgAlign = Align(8);
7792 if (A->getType()->isArrayTy()) {
7793 // Arrays are aligned to element size, except for long double
7794 // arrays, which are aligned to 8 bytes.
7795 Type *ElementTy = A->getType()->getArrayElementType();
7796 if (!ElementTy->isPPC_FP128Ty())
7797 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7798 } else if (A->getType()->isVectorTy()) {
7799 // Vectors are naturally aligned.
7800 ArgAlign = Align(ArgSize);
7801 }
7802 if (ArgAlign < 8)
7803 ArgAlign = Align(8);
7804 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7805 if (DL.isBigEndian()) {
7806 // Adjusting the shadow for argument with size < 8 to match the
7807 // placement of bits in big endian system
7808 if (ArgSize < 8)
7809 VAArgOffset += (8 - ArgSize);
7810 }
7811 if (!IsFixed) {
7812 Base =
7813 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7814 if (Base)
7815 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7816 }
7817 VAArgOffset += ArgSize;
7818 VAArgOffset = alignTo(VAArgOffset, Align(8));
7819 }
7820 if (IsFixed)
7821 VAArgBase = VAArgOffset;
7822 }
7823
7824 Constant *TotalVAArgSize =
7825 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7826 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7827 // a new class member i.e. it is the total size of all VarArgs.
7828 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7829 }
7830
7831 void finalizeInstrumentation() override {
7832 assert(!VAArgSize && !VAArgTLSCopy &&
7833 "finalizeInstrumentation called twice");
7834 IRBuilder<> IRB(MSV.FnPrologueEnd);
7835 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7836 Value *CopySize = VAArgSize;
7837
7838 if (!VAStartInstrumentationList.empty()) {
7839 // If there is a va_start in this function, make a backup copy of
7840 // va_arg_tls somewhere in the function entry block.
7841
7842 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7843 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7844 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7845 CopySize, kShadowTLSAlignment, false);
7846
7847 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7848 Intrinsic::umin, CopySize,
7849 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
7850 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7851 kShadowTLSAlignment, SrcSize);
7852 }
7853
7854 // Instrument va_start.
7855 // Copy va_list shadow from the backup copy of the TLS contents.
7856 for (CallInst *OrigInst : VAStartInstrumentationList) {
7857 NextNodeIRBuilder IRB(OrigInst);
7858 Value *VAListTag = OrigInst->getArgOperand(0);
7859 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
7860
7861 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
7862
7863 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7864 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7865 const DataLayout &DL = F.getDataLayout();
7866 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7867 const Align Alignment = Align(IntptrSize);
7868 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7869 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7870 Alignment, /*isStore*/ true);
7871 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7872 CopySize);
7873 }
7874 }
7875};
7876
7877/// PowerPC32-specific implementation of VarArgHelper.
7878struct VarArgPowerPC32Helper : public VarArgHelperBase {
7879 AllocaInst *VAArgTLSCopy = nullptr;
7880 Value *VAArgSize = nullptr;
7881
7882 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
7883 MemorySanitizerVisitor &MSV)
7884 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
7885
7886 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7887 unsigned VAArgBase;
7888 // Parameter save area is 8 bytes from frame pointer in PPC32
7889 VAArgBase = 8;
7890 unsigned VAArgOffset = VAArgBase;
7891 const DataLayout &DL = F.getDataLayout();
7892 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
7893 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7894 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7895 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7896 if (IsByVal) {
7897 assert(A->getType()->isPointerTy());
7898 Type *RealTy = CB.getParamByValType(ArgNo);
7899 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7900 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
7901 if (ArgAlign < IntptrSize)
7902 ArgAlign = Align(IntptrSize);
7903 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7904 if (!IsFixed) {
7905 Value *Base =
7906 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7907 if (Base) {
7908 Value *AShadowPtr, *AOriginPtr;
7909 std::tie(AShadowPtr, AOriginPtr) =
7910 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7911 kShadowTLSAlignment, /*isStore*/ false);
7912
7913 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7914 kShadowTLSAlignment, ArgSize);
7915 }
7916 }
7917 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
7918 } else {
7919 Value *Base;
7920 Type *ArgTy = A->getType();
7921
7922 // On PPC 32 floating point variable arguments are stored in separate
7923 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
7924 // them as they will be found when checking call arguments.
7925 if (!ArgTy->isFloatingPointTy()) {
7926 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
7927 Align ArgAlign = Align(IntptrSize);
7928 if (ArgTy->isArrayTy()) {
7929 // Arrays are aligned to element size, except for long double
7930 // arrays, which are aligned to 8 bytes.
7931 Type *ElementTy = ArgTy->getArrayElementType();
7932 if (!ElementTy->isPPC_FP128Ty())
7933 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7934 } else if (ArgTy->isVectorTy()) {
7935 // Vectors are naturally aligned.
7936 ArgAlign = Align(ArgSize);
7937 }
7938 if (ArgAlign < IntptrSize)
7939 ArgAlign = Align(IntptrSize);
7940 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7941 if (DL.isBigEndian()) {
7942 // Adjusting the shadow for argument with size < IntptrSize to match
7943 // the placement of bits in big endian system
7944 if (ArgSize < IntptrSize)
7945 VAArgOffset += (IntptrSize - ArgSize);
7946 }
7947 if (!IsFixed) {
7948 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
7949 ArgSize);
7950 if (Base)
7951 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
7953 }
7954 VAArgOffset += ArgSize;
7955 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
7956 }
7957 }
7958 }
7959
7960 Constant *TotalVAArgSize =
7961 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
7962 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
7963 // a new class member i.e. it is the total size of all VarArgs.
7964 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
7965 }
7966
7967 void finalizeInstrumentation() override {
7968 assert(!VAArgSize && !VAArgTLSCopy &&
7969 "finalizeInstrumentation called twice");
7970 IRBuilder<> IRB(MSV.FnPrologueEnd);
7971 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
7972 Value *CopySize = VAArgSize;
7973
7974 if (!VAStartInstrumentationList.empty()) {
7975 // If there is a va_start in this function, make a backup copy of
7976 // va_arg_tls somewhere in the function entry block.
7977
7978 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7979 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7980 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7981 CopySize, kShadowTLSAlignment, false);
7982
7983 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7984 Intrinsic::umin, CopySize,
7985 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7986 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7987 kShadowTLSAlignment, SrcSize);
7988 }
7989
7990 // Instrument va_start.
7991 // Copy va_list shadow from the backup copy of the TLS contents.
7992 for (CallInst *OrigInst : VAStartInstrumentationList) {
7993 NextNodeIRBuilder IRB(OrigInst);
7994 Value *VAListTag = OrigInst->getArgOperand(0);
7995 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
7996 Value *RegSaveAreaSize = CopySize;
7997
7998 // In PPC32 va_list_tag is a struct
7999 RegSaveAreaPtrPtr =
8000 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8001
8002 // On PPC 32 reg_save_area can only hold 32 bytes of data
8003 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8004 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8005
8006 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8007 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8008
8009 const DataLayout &DL = F.getDataLayout();
8010 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8011 const Align Alignment = Align(IntptrSize);
8012
8013 { // Copy reg save area
8014 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8015 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8016 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8017 Alignment, /*isStore*/ true);
8018 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8019 Alignment, RegSaveAreaSize);
8020
8021 RegSaveAreaShadowPtr =
8022 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8023 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8024 ConstantInt::get(MS.IntptrTy, 32));
8025 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8026 // We fill fp shadow with zeroes as uninitialized fp args should have
8027 // been found during call base check
8028 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8029 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8030 }
8031
8032 { // Copy overflow area
8033 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8034 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8035
8036 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8037 OverflowAreaPtrPtr =
8038 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8039 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8040
8041 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8042
8043 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8044 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8045 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8046 Alignment, /*isStore*/ true);
8047
8048 Value *OverflowVAArgTLSCopyPtr =
8049 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8050 OverflowVAArgTLSCopyPtr =
8051 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8052
8053 OverflowVAArgTLSCopyPtr =
8054 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8055 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8056 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8057 }
8058 }
8059 }
8060};
8061
8062/// SystemZ-specific implementation of VarArgHelper.
8063struct VarArgSystemZHelper : public VarArgHelperBase {
8064 static const unsigned SystemZGpOffset = 16;
8065 static const unsigned SystemZGpEndOffset = 56;
8066 static const unsigned SystemZFpOffset = 128;
8067 static const unsigned SystemZFpEndOffset = 160;
8068 static const unsigned SystemZMaxVrArgs = 8;
8069 static const unsigned SystemZRegSaveAreaSize = 160;
8070 static const unsigned SystemZOverflowOffset = 160;
8071 static const unsigned SystemZVAListTagSize = 32;
8072 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8073 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8074
8075 bool IsSoftFloatABI;
8076 AllocaInst *VAArgTLSCopy = nullptr;
8077 AllocaInst *VAArgTLSOriginCopy = nullptr;
8078 Value *VAArgOverflowSize = nullptr;
8079
8080 enum class ArgKind {
8081 GeneralPurpose,
8082 FloatingPoint,
8083 Vector,
8084 Memory,
8085 Indirect,
8086 };
8087
8088 enum class ShadowExtension { None, Zero, Sign };
8089
8090 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8091 MemorySanitizerVisitor &MSV)
8092 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8093 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8094
8095 ArgKind classifyArgument(Type *T) {
8096 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8097 // only a few possibilities of what it can be. In particular, enums, single
8098 // element structs and large types have already been taken care of.
8099
8100 // Some i128 and fp128 arguments are converted to pointers only in the
8101 // back end.
8102 if (T->isIntegerTy(128) || T->isFP128Ty())
8103 return ArgKind::Indirect;
8104 if (T->isFloatingPointTy())
8105 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8106 if (T->isIntegerTy() || T->isPointerTy())
8107 return ArgKind::GeneralPurpose;
8108 if (T->isVectorTy())
8109 return ArgKind::Vector;
8110 return ArgKind::Memory;
8111 }
8112
8113 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8114 // ABI says: "One of the simple integer types no more than 64 bits wide.
8115 // ... If such an argument is shorter than 64 bits, replace it by a full
8116 // 64-bit integer representing the same number, using sign or zero
8117 // extension". Shadow for an integer argument has the same type as the
8118 // argument itself, so it can be sign or zero extended as well.
8119 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8120 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8121 if (ZExt) {
8122 assert(!SExt);
8123 return ShadowExtension::Zero;
8124 }
8125 if (SExt) {
8126 assert(!ZExt);
8127 return ShadowExtension::Sign;
8128 }
8129 return ShadowExtension::None;
8130 }
8131
8132 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8133 unsigned GpOffset = SystemZGpOffset;
8134 unsigned FpOffset = SystemZFpOffset;
8135 unsigned VrIndex = 0;
8136 unsigned OverflowOffset = SystemZOverflowOffset;
8137 const DataLayout &DL = F.getDataLayout();
8138 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8139 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8140 // SystemZABIInfo does not produce ByVal parameters.
8141 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8142 Type *T = A->getType();
8143 ArgKind AK = classifyArgument(T);
8144 if (AK == ArgKind::Indirect) {
8145 T = MS.PtrTy;
8146 AK = ArgKind::GeneralPurpose;
8147 }
8148 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8149 AK = ArgKind::Memory;
8150 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8151 AK = ArgKind::Memory;
8152 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8153 AK = ArgKind::Memory;
8154 Value *ShadowBase = nullptr;
8155 Value *OriginBase = nullptr;
8156 ShadowExtension SE = ShadowExtension::None;
8157 switch (AK) {
8158 case ArgKind::GeneralPurpose: {
8159 // Always keep track of GpOffset, but store shadow only for varargs.
8160 uint64_t ArgSize = 8;
8161 if (GpOffset + ArgSize <= kParamTLSSize) {
8162 if (!IsFixed) {
8163 SE = getShadowExtension(CB, ArgNo);
8164 uint64_t GapSize = 0;
8165 if (SE == ShadowExtension::None) {
8166 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8167 assert(ArgAllocSize <= ArgSize);
8168 GapSize = ArgSize - ArgAllocSize;
8169 }
8170 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8171 if (MS.TrackOrigins)
8172 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8173 }
8174 GpOffset += ArgSize;
8175 } else {
8176 GpOffset = kParamTLSSize;
8177 }
8178 break;
8179 }
8180 case ArgKind::FloatingPoint: {
8181 // Always keep track of FpOffset, but store shadow only for varargs.
8182 uint64_t ArgSize = 8;
8183 if (FpOffset + ArgSize <= kParamTLSSize) {
8184 if (!IsFixed) {
8185 // PoP says: "A short floating-point datum requires only the
8186 // left-most 32 bit positions of a floating-point register".
8187 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8188 // don't extend shadow and don't mind the gap.
8189 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8190 if (MS.TrackOrigins)
8191 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8192 }
8193 FpOffset += ArgSize;
8194 } else {
8195 FpOffset = kParamTLSSize;
8196 }
8197 break;
8198 }
8199 case ArgKind::Vector: {
8200 // Keep track of VrIndex. No need to store shadow, since vector varargs
8201 // go through AK_Memory.
8202 assert(IsFixed);
8203 VrIndex++;
8204 break;
8205 }
8206 case ArgKind::Memory: {
8207 // Keep track of OverflowOffset and store shadow only for varargs.
8208 // Ignore fixed args, since we need to copy only the vararg portion of
8209 // the overflow area shadow.
8210 if (!IsFixed) {
8211 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8212 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8213 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8214 SE = getShadowExtension(CB, ArgNo);
8215 uint64_t GapSize =
8216 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8217 ShadowBase =
8218 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8219 if (MS.TrackOrigins)
8220 OriginBase =
8221 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8222 OverflowOffset += ArgSize;
8223 } else {
8224 OverflowOffset = kParamTLSSize;
8225 }
8226 }
8227 break;
8228 }
8229 case ArgKind::Indirect:
8230 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8231 }
8232 if (ShadowBase == nullptr)
8233 continue;
8234 Value *Shadow = MSV.getShadow(A);
8235 if (SE != ShadowExtension::None)
8236 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8237 /*Signed*/ SE == ShadowExtension::Sign);
8238 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8239 IRB.CreateStore(Shadow, ShadowBase);
8240 if (MS.TrackOrigins) {
8241 Value *Origin = MSV.getOrigin(A);
8242 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8243 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8245 }
8246 }
8247 Constant *OverflowSize = ConstantInt::get(
8248 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8249 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8250 }
8251
8252 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8253 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8254 IRB.CreateAdd(
8255 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8256 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8257 MS.PtrTy);
8258 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8259 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8260 const Align Alignment = Align(8);
8261 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8262 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8263 /*isStore*/ true);
8264 // TODO(iii): copy only fragments filled by visitCallBase()
8265 // TODO(iii): support packed-stack && !use-soft-float
8266 // For use-soft-float functions, it is enough to copy just the GPRs.
8267 unsigned RegSaveAreaSize =
8268 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8269 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8270 RegSaveAreaSize);
8271 if (MS.TrackOrigins)
8272 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8273 Alignment, RegSaveAreaSize);
8274 }
8275
8276 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8277 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8278 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8279 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8280 IRB.CreateAdd(
8281 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8282 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8283 MS.PtrTy);
8284 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8285 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8286 const Align Alignment = Align(8);
8287 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8288 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8289 Alignment, /*isStore*/ true);
8290 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8291 SystemZOverflowOffset);
8292 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8293 VAArgOverflowSize);
8294 if (MS.TrackOrigins) {
8295 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8296 SystemZOverflowOffset);
8297 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8298 VAArgOverflowSize);
8299 }
8300 }
8301
8302 void finalizeInstrumentation() override {
8303 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8304 "finalizeInstrumentation called twice");
8305 if (!VAStartInstrumentationList.empty()) {
8306 // If there is a va_start in this function, make a backup copy of
8307 // va_arg_tls somewhere in the function entry block.
8308 IRBuilder<> IRB(MSV.FnPrologueEnd);
8309 VAArgOverflowSize =
8310 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8311 Value *CopySize =
8312 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8313 VAArgOverflowSize);
8314 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8315 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8316 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8317 CopySize, kShadowTLSAlignment, false);
8318
8319 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8320 Intrinsic::umin, CopySize,
8321 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8322 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8323 kShadowTLSAlignment, SrcSize);
8324 if (MS.TrackOrigins) {
8325 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8326 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8327 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8328 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8329 }
8330 }
8331
8332 // Instrument va_start.
8333 // Copy va_list shadow from the backup copy of the TLS contents.
8334 for (CallInst *OrigInst : VAStartInstrumentationList) {
8335 NextNodeIRBuilder IRB(OrigInst);
8336 Value *VAListTag = OrigInst->getArgOperand(0);
8337 copyRegSaveArea(IRB, VAListTag);
8338 copyOverflowArea(IRB, VAListTag);
8339 }
8340 }
8341};
8342
8343/// i386-specific implementation of VarArgHelper.
8344struct VarArgI386Helper : public VarArgHelperBase {
8345 AllocaInst *VAArgTLSCopy = nullptr;
8346 Value *VAArgSize = nullptr;
8347
8348 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8349 MemorySanitizerVisitor &MSV)
8350 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8351
8352 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8353 const DataLayout &DL = F.getDataLayout();
8354 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8355 unsigned VAArgOffset = 0;
8356 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8357 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8358 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8359 if (IsByVal) {
8360 assert(A->getType()->isPointerTy());
8361 Type *RealTy = CB.getParamByValType(ArgNo);
8362 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8363 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8364 if (ArgAlign < IntptrSize)
8365 ArgAlign = Align(IntptrSize);
8366 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8367 if (!IsFixed) {
8368 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8369 if (Base) {
8370 Value *AShadowPtr, *AOriginPtr;
8371 std::tie(AShadowPtr, AOriginPtr) =
8372 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8373 kShadowTLSAlignment, /*isStore*/ false);
8374
8375 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8376 kShadowTLSAlignment, ArgSize);
8377 }
8378 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8379 }
8380 } else {
8381 Value *Base;
8382 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8383 Align ArgAlign = Align(IntptrSize);
8384 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8385 if (DL.isBigEndian()) {
8386 // Adjusting the shadow for argument with size < IntptrSize to match
8387 // the placement of bits in big endian system
8388 if (ArgSize < IntptrSize)
8389 VAArgOffset += (IntptrSize - ArgSize);
8390 }
8391 if (!IsFixed) {
8392 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8393 if (Base)
8394 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8395 VAArgOffset += ArgSize;
8396 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8397 }
8398 }
8399 }
8400
8401 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8402 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8403 // a new class member i.e. it is the total size of all VarArgs.
8404 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8405 }
8406
8407 void finalizeInstrumentation() override {
8408 assert(!VAArgSize && !VAArgTLSCopy &&
8409 "finalizeInstrumentation called twice");
8410 IRBuilder<> IRB(MSV.FnPrologueEnd);
8411 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8412 Value *CopySize = VAArgSize;
8413
8414 if (!VAStartInstrumentationList.empty()) {
8415 // If there is a va_start in this function, make a backup copy of
8416 // va_arg_tls somewhere in the function entry block.
8417 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8418 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8419 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8420 CopySize, kShadowTLSAlignment, false);
8421
8422 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8423 Intrinsic::umin, CopySize,
8424 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8425 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8426 kShadowTLSAlignment, SrcSize);
8427 }
8428
8429 // Instrument va_start.
8430 // Copy va_list shadow from the backup copy of the TLS contents.
8431 for (CallInst *OrigInst : VAStartInstrumentationList) {
8432 NextNodeIRBuilder IRB(OrigInst);
8433 Value *VAListTag = OrigInst->getArgOperand(0);
8434 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8435 Value *RegSaveAreaPtrPtr =
8436 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8437 PointerType::get(*MS.C, 0));
8438 Value *RegSaveAreaPtr =
8439 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8440 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8441 const DataLayout &DL = F.getDataLayout();
8442 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8443 const Align Alignment = Align(IntptrSize);
8444 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8445 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8446 Alignment, /*isStore*/ true);
8447 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8448 CopySize);
8449 }
8450 }
8451};
8452
8453/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8454/// LoongArch64.
8455struct VarArgGenericHelper : public VarArgHelperBase {
8456 AllocaInst *VAArgTLSCopy = nullptr;
8457 Value *VAArgSize = nullptr;
8458
8459 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8460 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8461 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8462
8463 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8464 unsigned VAArgOffset = 0;
8465 const DataLayout &DL = F.getDataLayout();
8466 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8467 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8468 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8469 if (IsFixed)
8470 continue;
8471 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8472 if (DL.isBigEndian()) {
8473 // Adjusting the shadow for argument with size < IntptrSize to match the
8474 // placement of bits in big endian system
8475 if (ArgSize < IntptrSize)
8476 VAArgOffset += (IntptrSize - ArgSize);
8477 }
8478 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8479 VAArgOffset += ArgSize;
8480 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8481 if (!Base)
8482 continue;
8483 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8484 }
8485
8486 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8487 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8488 // a new class member i.e. it is the total size of all VarArgs.
8489 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8490 }
8491
8492 void finalizeInstrumentation() override {
8493 assert(!VAArgSize && !VAArgTLSCopy &&
8494 "finalizeInstrumentation called twice");
8495 IRBuilder<> IRB(MSV.FnPrologueEnd);
8496 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8497 Value *CopySize = VAArgSize;
8498
8499 if (!VAStartInstrumentationList.empty()) {
8500 // If there is a va_start in this function, make a backup copy of
8501 // va_arg_tls somewhere in the function entry block.
8502 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8503 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8504 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8505 CopySize, kShadowTLSAlignment, false);
8506
8507 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8508 Intrinsic::umin, CopySize,
8509 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8510 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8511 kShadowTLSAlignment, SrcSize);
8512 }
8513
8514 // Instrument va_start.
8515 // Copy va_list shadow from the backup copy of the TLS contents.
8516 for (CallInst *OrigInst : VAStartInstrumentationList) {
8517 NextNodeIRBuilder IRB(OrigInst);
8518 Value *VAListTag = OrigInst->getArgOperand(0);
8519 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8520 Value *RegSaveAreaPtrPtr =
8521 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8522 PointerType::get(*MS.C, 0));
8523 Value *RegSaveAreaPtr =
8524 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8525 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8526 const DataLayout &DL = F.getDataLayout();
8527 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8528 const Align Alignment = Align(IntptrSize);
8529 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8530 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8531 Alignment, /*isStore*/ true);
8532 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8533 CopySize);
8534 }
8535 }
8536};
8537
8538// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8539// regarding VAArgs.
8540using VarArgARM32Helper = VarArgGenericHelper;
8541using VarArgRISCVHelper = VarArgGenericHelper;
8542using VarArgMIPSHelper = VarArgGenericHelper;
8543using VarArgLoongArch64Helper = VarArgGenericHelper;
8544
8545/// A no-op implementation of VarArgHelper.
8546struct VarArgNoOpHelper : public VarArgHelper {
8547 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8548 MemorySanitizerVisitor &MSV) {}
8549
8550 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8551
8552 void visitVAStartInst(VAStartInst &I) override {}
8553
8554 void visitVACopyInst(VACopyInst &I) override {}
8555
8556 void finalizeInstrumentation() override {}
8557};
8558
8559} // end anonymous namespace
8560
8561static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8562 MemorySanitizerVisitor &Visitor) {
8563 // VarArg handling is only implemented on AMD64. False positives are possible
8564 // on other platforms.
8565 Triple TargetTriple(Func.getParent()->getTargetTriple());
8566
8567 if (TargetTriple.getArch() == Triple::x86)
8568 return new VarArgI386Helper(Func, Msan, Visitor);
8569
8570 if (TargetTriple.getArch() == Triple::x86_64)
8571 return new VarArgAMD64Helper(Func, Msan, Visitor);
8572
8573 if (TargetTriple.isARM())
8574 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8575
8576 if (TargetTriple.isAArch64())
8577 return new VarArgAArch64Helper(Func, Msan, Visitor);
8578
8579 if (TargetTriple.isSystemZ())
8580 return new VarArgSystemZHelper(Func, Msan, Visitor);
8581
8582 // On PowerPC32 VAListTag is a struct
8583 // {char, char, i16 padding, char *, char *}
8584 if (TargetTriple.isPPC32())
8585 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8586
8587 if (TargetTriple.isPPC64())
8588 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8589
8590 if (TargetTriple.isRISCV32())
8591 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8592
8593 if (TargetTriple.isRISCV64())
8594 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8595
8596 if (TargetTriple.isMIPS32())
8597 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8598
8599 if (TargetTriple.isMIPS64())
8600 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8601
8602 if (TargetTriple.isLoongArch64())
8603 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8604 /*VAListTagSize=*/8);
8605
8606 return new VarArgNoOpHelper(Func, Msan, Visitor);
8607}
8608
8609bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8610 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8611 return false;
8612
8613 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8614 return false;
8615
8616 MemorySanitizerVisitor Visitor(F, *this, TLI);
8617
8618 // Clear out memory attributes.
8620 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8621 F.removeFnAttrs(B);
8622
8623 return Visitor.runOnFunction();
8624}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Hexagon Vector Combine
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const MemoryMapParams Linux_S390X_MemoryMapParams
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:219
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:150
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:472
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:131
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(unsigned CounterName)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:803
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2579
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1939
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1833
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2633
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2567
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1867
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2103
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2254
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2626
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2097
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2202
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2039
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2336
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1926
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1784
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2497
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1808
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2332
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2207
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1850
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2085
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2601
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1863
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2197
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2659
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2511
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2071
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2364
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2344
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2280
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2654
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1886
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2044
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2442
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2788
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:319
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:180
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:414
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1039
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1082
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1055
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:412
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1087
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1028
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1034
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:922
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1060
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1007
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1106
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:281
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:198
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:231
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:232
unsigned getNumOperands() const
Definition User.h:254
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:390
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:201
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:169
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1655
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2472
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3861
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70