LLVM 23.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-handle-lifetime-intrinsics",
304 cl::desc(
305 "when possible, poison scoped variables at the beginning of the scope "
306 "(slower, but more precise)"),
307 cl::Hidden, cl::init(true));
308
309// When compiling the Linux kernel, we sometimes see false positives related to
310// MSan being unable to understand that inline assembly calls may initialize
311// local variables.
312// This flag makes the compiler conservatively unpoison every memory location
313// passed into an assembly call. Note that this may cause false positives.
314// Because it's impossible to figure out the array sizes, we can only unpoison
315// the first sizeof(type) bytes for each type* pointer.
317 "msan-handle-asm-conservative",
318 cl::desc("conservative handling of inline assembly"), cl::Hidden,
319 cl::init(true));
320
321// This flag controls whether we check the shadow of the address
322// operand of load or store. Such bugs are very rare, since load from
323// a garbage address typically results in SEGV, but still happen
324// (e.g. only lower bits of address are garbage, or the access happens
325// early at program startup where malloc-ed memory is more likely to
326// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
328 "msan-check-access-address",
329 cl::desc("report accesses through a pointer which has poisoned shadow"),
330 cl::Hidden, cl::init(true));
331
333 "msan-eager-checks",
334 cl::desc("check arguments and return values at function call boundaries"),
335 cl::Hidden, cl::init(false));
336
338 "msan-dump-strict-instructions",
339 cl::desc("print out instructions with default strict semantics i.e.,"
340 "check that all the inputs are fully initialized, and mark "
341 "the output as fully initialized. These semantics are applied "
342 "to instructions that could not be handled explicitly nor "
343 "heuristically."),
344 cl::Hidden, cl::init(false));
345
346// Currently, all the heuristically handled instructions are specifically
347// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
348// to parallel 'msan-dump-strict-instructions', and to keep the door open to
349// handling non-intrinsic instructions heuristically.
351 "msan-dump-heuristic-instructions",
352 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
353 "Use -msan-dump-strict-instructions to print instructions that "
354 "could not be handled explicitly nor heuristically."),
355 cl::Hidden, cl::init(false));
356
358 "msan-instrumentation-with-call-threshold",
359 cl::desc(
360 "If the function being instrumented requires more than "
361 "this number of checks and origin stores, use callbacks instead of "
362 "inline checks (-1 means never use callbacks)."),
363 cl::Hidden, cl::init(3500));
364
365static cl::opt<bool>
366 ClEnableKmsan("msan-kernel",
367 cl::desc("Enable KernelMemorySanitizer instrumentation"),
368 cl::Hidden, cl::init(false));
369
370static cl::opt<bool>
371 ClDisableChecks("msan-disable-checks",
372 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
373 cl::init(false));
374
375static cl::opt<bool>
376 ClCheckConstantShadow("msan-check-constant-shadow",
377 cl::desc("Insert checks for constant shadow values"),
378 cl::Hidden, cl::init(true));
379
380// This is off by default because of a bug in gold:
381// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
382static cl::opt<bool>
383 ClWithComdat("msan-with-comdat",
384 cl::desc("Place MSan constructors in comdat sections"),
385 cl::Hidden, cl::init(false));
386
387// These options allow to specify custom memory map parameters
388// See MemoryMapParams for details.
389static cl::opt<uint64_t> ClAndMask("msan-and-mask",
390 cl::desc("Define custom MSan AndMask"),
391 cl::Hidden, cl::init(0));
392
393static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
394 cl::desc("Define custom MSan XorMask"),
395 cl::Hidden, cl::init(0));
396
397static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
398 cl::desc("Define custom MSan ShadowBase"),
399 cl::Hidden, cl::init(0));
400
401static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
402 cl::desc("Define custom MSan OriginBase"),
403 cl::Hidden, cl::init(0));
404
405static cl::opt<int>
406 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
407 cl::desc("Define threshold for number of checks per "
408 "debug location to force origin update."),
409 cl::Hidden, cl::init(3));
410
411const char kMsanModuleCtorName[] = "msan.module_ctor";
412const char kMsanInitName[] = "__msan_init";
413
414namespace {
415
416// Memory map parameters used in application-to-shadow address calculation.
417// Offset = (Addr & ~AndMask) ^ XorMask
418// Shadow = ShadowBase + Offset
419// Origin = OriginBase + Offset
420struct MemoryMapParams {
421 uint64_t AndMask;
422 uint64_t XorMask;
423 uint64_t ShadowBase;
424 uint64_t OriginBase;
425};
426
427struct PlatformMemoryMapParams {
428 const MemoryMapParams *bits32;
429 const MemoryMapParams *bits64;
430};
431
432} // end anonymous namespace
433
434// i386 Linux
435static const MemoryMapParams Linux_I386_MemoryMapParams = {
436 0x000080000000, // AndMask
437 0, // XorMask (not used)
438 0, // ShadowBase (not used)
439 0x000040000000, // OriginBase
440};
441
442// x86_64 Linux
443static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
444 0, // AndMask (not used)
445 0x500000000000, // XorMask
446 0, // ShadowBase (not used)
447 0x100000000000, // OriginBase
448};
449
450// mips32 Linux
451// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
452// after picking good constants
453
454// mips64 Linux
455static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
456 0, // AndMask (not used)
457 0x008000000000, // XorMask
458 0, // ShadowBase (not used)
459 0x002000000000, // OriginBase
460};
461
462// ppc32 Linux
463// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
464// after picking good constants
465
466// ppc64 Linux
467static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
468 0xE00000000000, // AndMask
469 0x100000000000, // XorMask
470 0x080000000000, // ShadowBase
471 0x1C0000000000, // OriginBase
472};
473
474// s390x Linux
475static const MemoryMapParams Linux_S390X_MemoryMapParams = {
476 0xC00000000000, // AndMask
477 0, // XorMask (not used)
478 0x080000000000, // ShadowBase
479 0x1C0000000000, // OriginBase
480};
481
482// arm32 Linux
483// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
484// after picking good constants
485
486// aarch64 Linux
487static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
488 0, // AndMask (not used)
489 0x0B00000000000, // XorMask
490 0, // ShadowBase (not used)
491 0x0200000000000, // OriginBase
492};
493
494// loongarch64 Linux
495static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
496 0, // AndMask (not used)
497 0x500000000000, // XorMask
498 0, // ShadowBase (not used)
499 0x100000000000, // OriginBase
500};
501
502// riscv32 Linux
503// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
504// after picking good constants
505
506// aarch64 FreeBSD
507static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
508 0x1800000000000, // AndMask
509 0x0400000000000, // XorMask
510 0x0200000000000, // ShadowBase
511 0x0700000000000, // OriginBase
512};
513
514// i386 FreeBSD
515static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
516 0x000180000000, // AndMask
517 0x000040000000, // XorMask
518 0x000020000000, // ShadowBase
519 0x000700000000, // OriginBase
520};
521
522// x86_64 FreeBSD
523static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
524 0xc00000000000, // AndMask
525 0x200000000000, // XorMask
526 0x100000000000, // ShadowBase
527 0x380000000000, // OriginBase
528};
529
530// x86_64 NetBSD
531static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
532 0, // AndMask
533 0x500000000000, // XorMask
534 0, // ShadowBase
535 0x100000000000, // OriginBase
536};
537
538static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
541};
542
543static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
544 nullptr,
546};
547
548static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
549 nullptr,
551};
552
553static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
554 nullptr,
556};
557
558static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
559 nullptr,
561};
562
563static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
564 nullptr,
566};
567
568static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
569 nullptr,
571};
572
573static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
576};
577
578static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
579 nullptr,
581};
582
583namespace {
584
585/// Instrument functions of a module to detect uninitialized reads.
586///
587/// Instantiating MemorySanitizer inserts the msan runtime library API function
588/// declarations into the module if they don't exist already. Instantiating
589/// ensures the __msan_init function is in the list of global constructors for
590/// the module.
591class MemorySanitizer {
592public:
593 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
594 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
595 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
596 initializeModule(M);
597 }
598
599 // MSan cannot be moved or copied because of MapParams.
600 MemorySanitizer(MemorySanitizer &&) = delete;
601 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
602 MemorySanitizer(const MemorySanitizer &) = delete;
603 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
604
605 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
606
607private:
608 friend struct MemorySanitizerVisitor;
609 friend struct VarArgHelperBase;
610 friend struct VarArgAMD64Helper;
611 friend struct VarArgAArch64Helper;
612 friend struct VarArgPowerPC64Helper;
613 friend struct VarArgPowerPC32Helper;
614 friend struct VarArgSystemZHelper;
615 friend struct VarArgI386Helper;
616 friend struct VarArgGenericHelper;
617
618 void initializeModule(Module &M);
619 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
620 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
621 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
622
623 template <typename... ArgsTy>
624 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
625 ArgsTy... Args);
626
627 /// True if we're compiling the Linux kernel.
628 bool CompileKernel;
629 /// Track origins (allocation points) of uninitialized values.
630 int TrackOrigins;
631 bool Recover;
632 bool EagerChecks;
633
634 Triple TargetTriple;
635 LLVMContext *C;
636 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
637 Type *OriginTy;
638 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
639
640 // XxxTLS variables represent the per-thread state in MSan and per-task state
641 // in KMSAN.
642 // For the userspace these point to thread-local globals. In the kernel land
643 // they point to the members of a per-task struct obtained via a call to
644 // __msan_get_context_state().
645
646 /// Thread-local shadow storage for function parameters.
647 Value *ParamTLS;
648
649 /// Thread-local origin storage for function parameters.
650 Value *ParamOriginTLS;
651
652 /// Thread-local shadow storage for function return value.
653 Value *RetvalTLS;
654
655 /// Thread-local origin storage for function return value.
656 Value *RetvalOriginTLS;
657
658 /// Thread-local shadow storage for in-register va_arg function.
659 Value *VAArgTLS;
660
661 /// Thread-local shadow storage for in-register va_arg function.
662 Value *VAArgOriginTLS;
663
664 /// Thread-local shadow storage for va_arg overflow area.
665 Value *VAArgOverflowSizeTLS;
666
667 /// Are the instrumentation callbacks set up?
668 bool CallbacksInitialized = false;
669
670 /// The run-time callback to print a warning.
671 FunctionCallee WarningFn;
672
673 // These arrays are indexed by log2(AccessSize).
674 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
675 FunctionCallee MaybeWarningVarSizeFn;
676 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
677
678 /// Run-time helper that generates a new origin value for a stack
679 /// allocation.
680 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
681 // No description version
682 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
683
684 /// Run-time helper that poisons stack on function entry.
685 FunctionCallee MsanPoisonStackFn;
686
687 /// Run-time helper that records a store (or any event) of an
688 /// uninitialized value and returns an updated origin id encoding this info.
689 FunctionCallee MsanChainOriginFn;
690
691 /// Run-time helper that paints an origin over a region.
692 FunctionCallee MsanSetOriginFn;
693
694 /// MSan runtime replacements for memmove, memcpy and memset.
695 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
696
697 /// KMSAN callback for task-local function argument shadow.
698 StructType *MsanContextStateTy;
699 FunctionCallee MsanGetContextStateFn;
700
701 /// Functions for poisoning/unpoisoning local variables
702 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
703
704 /// Pair of shadow/origin pointers.
705 Type *MsanMetadata;
706
707 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
708 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
709 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
710 FunctionCallee MsanMetadataPtrForStore_1_8[4];
711 FunctionCallee MsanInstrumentAsmStoreFn;
712
713 /// Storage for return values of the MsanMetadataPtrXxx functions.
714 Value *MsanMetadataAlloca;
715
716 /// Helper to choose between different MsanMetadataPtrXxx().
717 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
718
719 /// Memory map parameters used in application-to-shadow calculation.
720 const MemoryMapParams *MapParams;
721
722 /// Custom memory map parameters used when -msan-shadow-base or
723 // -msan-origin-base is provided.
724 MemoryMapParams CustomMapParams;
725
726 MDNode *ColdCallWeights;
727
728 /// Branch weights for origin store.
729 MDNode *OriginStoreWeights;
730};
731
732void insertModuleCtor(Module &M) {
735 /*InitArgTypes=*/{},
736 /*InitArgs=*/{},
737 // This callback is invoked when the functions are created the first
738 // time. Hook them into the global ctors list in that case:
739 [&](Function *Ctor, FunctionCallee) {
740 if (!ClWithComdat) {
741 appendToGlobalCtors(M, Ctor, 0);
742 return;
743 }
744 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
745 Ctor->setComdat(MsanCtorComdat);
746 appendToGlobalCtors(M, Ctor, 0, Ctor);
747 });
748}
749
750template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
751 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
752}
753
754} // end anonymous namespace
755
757 bool EagerChecks)
758 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
759 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
760 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
761 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
762
765 // Return early if nosanitize_memory module flag is present for the module.
766 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
767 return PreservedAnalyses::all();
768 bool Modified = false;
769 if (!Options.Kernel) {
770 insertModuleCtor(M);
771 Modified = true;
772 }
773
774 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
775 for (Function &F : M) {
776 if (F.empty())
777 continue;
778 MemorySanitizer Msan(*F.getParent(), Options);
779 Modified |=
780 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
781 }
782
783 if (!Modified)
784 return PreservedAnalyses::all();
785
787 // GlobalsAA is considered stateless and does not get invalidated unless
788 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
789 // make changes that require GlobalsAA to be invalidated.
790 PA.abandon<GlobalsAA>();
791 return PA;
792}
793
795 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
797 OS, MapClassName2PassName);
798 OS << '<';
799 if (Options.Recover)
800 OS << "recover;";
801 if (Options.Kernel)
802 OS << "kernel;";
803 if (Options.EagerChecks)
804 OS << "eager-checks;";
805 OS << "track-origins=" << Options.TrackOrigins;
806 OS << '>';
807}
808
809/// Create a non-const global initialized with the given string.
810///
811/// Creates a writable global for Str so that we can pass it to the
812/// run-time lib. Runtime uses first 4 bytes of the string to store the
813/// frame ID, so the string needs to be mutable.
815 StringRef Str) {
816 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
817 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
818 GlobalValue::PrivateLinkage, StrConst, "");
819}
820
821template <typename... ArgsTy>
823MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
824 ArgsTy... Args) {
825 if (TargetTriple.getArch() == Triple::systemz) {
826 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
827 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
828 std::forward<ArgsTy>(Args)...);
829 }
830
831 return M.getOrInsertFunction(Name, MsanMetadata,
832 std::forward<ArgsTy>(Args)...);
833}
834
835/// Create KMSAN API callbacks.
836void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
837 IRBuilder<> IRB(*C);
838
839 // These will be initialized in insertKmsanPrologue().
840 RetvalTLS = nullptr;
841 RetvalOriginTLS = nullptr;
842 ParamTLS = nullptr;
843 ParamOriginTLS = nullptr;
844 VAArgTLS = nullptr;
845 VAArgOriginTLS = nullptr;
846 VAArgOverflowSizeTLS = nullptr;
847
848 WarningFn = M.getOrInsertFunction("__msan_warning",
849 TLI.getAttrList(C, {0}, /*Signed=*/false),
850 IRB.getVoidTy(), IRB.getInt32Ty());
851
852 // Requests the per-task context state (kmsan_context_state*) from the
853 // runtime library.
854 MsanContextStateTy = StructType::get(
855 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
858 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
859 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
860 OriginTy);
861 MsanGetContextStateFn =
862 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
863
864 MsanMetadata = StructType::get(PtrTy, PtrTy);
865
866 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
867 std::string name_load =
868 "__msan_metadata_ptr_for_load_" + std::to_string(size);
869 std::string name_store =
870 "__msan_metadata_ptr_for_store_" + std::to_string(size);
871 MsanMetadataPtrForLoad_1_8[ind] =
872 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
873 MsanMetadataPtrForStore_1_8[ind] =
874 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
875 }
876
877 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
878 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
879 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
880 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
881
882 // Functions for poisoning and unpoisoning memory.
883 MsanPoisonAllocaFn = M.getOrInsertFunction(
884 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
885 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
886 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
887}
888
890 return M.getOrInsertGlobal(Name, Ty, [&] {
891 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
892 nullptr, Name, nullptr,
894 });
895}
896
897/// Insert declarations for userspace-specific functions and globals.
898void MemorySanitizer::createUserspaceApi(Module &M,
899 const TargetLibraryInfo &TLI) {
900 IRBuilder<> IRB(*C);
901
902 // Create the callback.
903 // FIXME: this function should have "Cold" calling conv,
904 // which is not yet implemented.
905 if (TrackOrigins) {
906 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
907 : "__msan_warning_with_origin_noreturn";
908 WarningFn = M.getOrInsertFunction(WarningFnName,
909 TLI.getAttrList(C, {0}, /*Signed=*/false),
910 IRB.getVoidTy(), IRB.getInt32Ty());
911 } else {
912 StringRef WarningFnName =
913 Recover ? "__msan_warning" : "__msan_warning_noreturn";
914 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
915 }
916
917 // Create the global TLS variables.
918 RetvalTLS =
919 getOrInsertGlobal(M, "__msan_retval_tls",
920 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
921
922 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
923
924 ParamTLS =
925 getOrInsertGlobal(M, "__msan_param_tls",
926 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
927
928 ParamOriginTLS =
929 getOrInsertGlobal(M, "__msan_param_origin_tls",
930 ArrayType::get(OriginTy, kParamTLSSize / 4));
931
932 VAArgTLS =
933 getOrInsertGlobal(M, "__msan_va_arg_tls",
934 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
935
936 VAArgOriginTLS =
937 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
938 ArrayType::get(OriginTy, kParamTLSSize / 4));
939
940 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
941 IRB.getIntPtrTy(M.getDataLayout()));
942
943 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
944 AccessSizeIndex++) {
945 unsigned AccessSize = 1 << AccessSizeIndex;
946 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
947 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
948 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
949 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
950 MaybeWarningVarSizeFn = M.getOrInsertFunction(
951 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
952 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
953 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
954 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
955 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
956 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
957 IRB.getInt32Ty());
958 }
959
960 MsanSetAllocaOriginWithDescriptionFn =
961 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
962 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
963 MsanSetAllocaOriginNoDescriptionFn =
964 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
965 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
966 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
967 IRB.getVoidTy(), PtrTy, IntptrTy);
968}
969
970/// Insert extern declaration of runtime-provided functions and globals.
971void MemorySanitizer::initializeCallbacks(Module &M,
972 const TargetLibraryInfo &TLI) {
973 // Only do this once.
974 if (CallbacksInitialized)
975 return;
976
977 IRBuilder<> IRB(*C);
978 // Initialize callbacks that are common for kernel and userspace
979 // instrumentation.
980 MsanChainOriginFn = M.getOrInsertFunction(
981 "__msan_chain_origin",
982 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
983 IRB.getInt32Ty());
984 MsanSetOriginFn = M.getOrInsertFunction(
985 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
986 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
987 MemmoveFn =
988 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
989 MemcpyFn =
990 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
991 MemsetFn = M.getOrInsertFunction("__msan_memset",
992 TLI.getAttrList(C, {1}, /*Signed=*/true),
993 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
994
995 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
996 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
997
998 if (CompileKernel) {
999 createKernelApi(M, TLI);
1000 } else {
1001 createUserspaceApi(M, TLI);
1002 }
1003 CallbacksInitialized = true;
1004}
1005
1006FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1007 int size) {
1008 FunctionCallee *Fns =
1009 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1010 switch (size) {
1011 case 1:
1012 return Fns[0];
1013 case 2:
1014 return Fns[1];
1015 case 4:
1016 return Fns[2];
1017 case 8:
1018 return Fns[3];
1019 default:
1020 return nullptr;
1021 }
1022}
1023
1024/// Module-level initialization.
1025///
1026/// inserts a call to __msan_init to the module's constructor list.
1027void MemorySanitizer::initializeModule(Module &M) {
1028 auto &DL = M.getDataLayout();
1029
1030 TargetTriple = M.getTargetTriple();
1031
1032 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1033 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1034 // Check the overrides first
1035 if (ShadowPassed || OriginPassed) {
1036 CustomMapParams.AndMask = ClAndMask;
1037 CustomMapParams.XorMask = ClXorMask;
1038 CustomMapParams.ShadowBase = ClShadowBase;
1039 CustomMapParams.OriginBase = ClOriginBase;
1040 MapParams = &CustomMapParams;
1041 } else {
1042 switch (TargetTriple.getOS()) {
1043 case Triple::FreeBSD:
1044 switch (TargetTriple.getArch()) {
1045 case Triple::aarch64:
1046 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1047 break;
1048 case Triple::x86_64:
1049 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1050 break;
1051 case Triple::x86:
1052 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1053 break;
1054 default:
1055 report_fatal_error("unsupported architecture");
1056 }
1057 break;
1058 case Triple::NetBSD:
1059 switch (TargetTriple.getArch()) {
1060 case Triple::x86_64:
1061 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1062 break;
1063 default:
1064 report_fatal_error("unsupported architecture");
1065 }
1066 break;
1067 case Triple::Linux:
1068 switch (TargetTriple.getArch()) {
1069 case Triple::x86_64:
1070 MapParams = Linux_X86_MemoryMapParams.bits64;
1071 break;
1072 case Triple::x86:
1073 MapParams = Linux_X86_MemoryMapParams.bits32;
1074 break;
1075 case Triple::mips64:
1076 case Triple::mips64el:
1077 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1078 break;
1079 case Triple::ppc64:
1080 case Triple::ppc64le:
1081 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1082 break;
1083 case Triple::systemz:
1084 MapParams = Linux_S390_MemoryMapParams.bits64;
1085 break;
1086 case Triple::aarch64:
1087 case Triple::aarch64_be:
1088 MapParams = Linux_ARM_MemoryMapParams.bits64;
1089 break;
1091 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1092 break;
1093 default:
1094 report_fatal_error("unsupported architecture");
1095 }
1096 break;
1097 default:
1098 report_fatal_error("unsupported operating system");
1099 }
1100 }
1101
1102 C = &(M.getContext());
1103 IRBuilder<> IRB(*C);
1104 IntptrTy = IRB.getIntPtrTy(DL);
1105 OriginTy = IRB.getInt32Ty();
1106 PtrTy = IRB.getPtrTy();
1107
1108 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1110
1111 if (!CompileKernel) {
1112 if (TrackOrigins)
1113 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1114 return new GlobalVariable(
1115 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1116 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1117 });
1118
1119 if (Recover)
1120 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1121 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1122 GlobalValue::WeakODRLinkage,
1123 IRB.getInt32(Recover), "__msan_keep_going");
1124 });
1125 }
1126}
1127
1128namespace {
1129
1130/// A helper class that handles instrumentation of VarArg
1131/// functions on a particular platform.
1132///
1133/// Implementations are expected to insert the instrumentation
1134/// necessary to propagate argument shadow through VarArg function
1135/// calls. Visit* methods are called during an InstVisitor pass over
1136/// the function, and should avoid creating new basic blocks. A new
1137/// instance of this class is created for each instrumented function.
1138struct VarArgHelper {
1139 virtual ~VarArgHelper() = default;
1140
1141 /// Visit a CallBase.
1142 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1143
1144 /// Visit a va_start call.
1145 virtual void visitVAStartInst(VAStartInst &I) = 0;
1146
1147 /// Visit a va_copy call.
1148 virtual void visitVACopyInst(VACopyInst &I) = 0;
1149
1150 /// Finalize function instrumentation.
1151 ///
1152 /// This method is called after visiting all interesting (see above)
1153 /// instructions in a function.
1154 virtual void finalizeInstrumentation() = 0;
1155};
1156
1157struct MemorySanitizerVisitor;
1158
1159} // end anonymous namespace
1160
1161static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1162 MemorySanitizerVisitor &Visitor);
1163
1164static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1165 if (TS.isScalable())
1166 // Scalable types unconditionally take slowpaths.
1167 return kNumberOfAccessSizes;
1168 unsigned TypeSizeFixed = TS.getFixedValue();
1169 if (TypeSizeFixed <= 8)
1170 return 0;
1171 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1172}
1173
1174namespace {
1175
1176/// Helper class to attach debug information of the given instruction onto new
1177/// instructions inserted after.
1178class NextNodeIRBuilder : public IRBuilder<> {
1179public:
1180 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1181 SetCurrentDebugLocation(IP->getDebugLoc());
1182 }
1183};
1184
1185/// This class does all the work for a given function. Store and Load
1186/// instructions store and load corresponding shadow and origin
1187/// values. Most instructions propagate shadow from arguments to their
1188/// return values. Certain instructions (most importantly, BranchInst)
1189/// test their argument shadow and print reports (with a runtime call) if it's
1190/// non-zero.
1191struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1192 Function &F;
1193 MemorySanitizer &MS;
1194 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1195 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1196 std::unique_ptr<VarArgHelper> VAHelper;
1197 const TargetLibraryInfo *TLI;
1198 Instruction *FnPrologueEnd;
1199 SmallVector<Instruction *, 16> Instructions;
1200
1201 // The following flags disable parts of MSan instrumentation based on
1202 // exclusion list contents and command-line options.
1203 bool InsertChecks;
1204 bool PropagateShadow;
1205 bool PoisonStack;
1206 bool PoisonUndef;
1207 bool PoisonUndefVectors;
1208
1209 struct ShadowOriginAndInsertPoint {
1210 Value *Shadow;
1211 Value *Origin;
1212 Instruction *OrigIns;
1213
1214 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1215 : Shadow(S), Origin(O), OrigIns(I) {}
1216 };
1218 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1219 SmallSetVector<AllocaInst *, 16> AllocaSet;
1222 int64_t SplittableBlocksCount = 0;
1223
1224 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1225 const TargetLibraryInfo &TLI)
1226 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1227 bool SanitizeFunction =
1228 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1229 InsertChecks = SanitizeFunction;
1230 PropagateShadow = SanitizeFunction;
1231 PoisonStack = SanitizeFunction && ClPoisonStack;
1232 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1233 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1234
1235 // In the presence of unreachable blocks, we may see Phi nodes with
1236 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1237 // blocks, such nodes will not have any shadow value associated with them.
1238 // It's easier to remove unreachable blocks than deal with missing shadow.
1240
1241 MS.initializeCallbacks(*F.getParent(), TLI);
1242 FnPrologueEnd =
1243 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1244 .CreateIntrinsic(Intrinsic::donothing, {});
1245
1246 if (MS.CompileKernel) {
1247 IRBuilder<> IRB(FnPrologueEnd);
1248 insertKmsanPrologue(IRB);
1249 }
1250
1251 LLVM_DEBUG(if (!InsertChecks) dbgs()
1252 << "MemorySanitizer is not inserting checks into '"
1253 << F.getName() << "'\n");
1254 }
1255
1256 bool instrumentWithCalls(Value *V) {
1257 // Constants likely will be eliminated by follow-up passes.
1258 if (isa<Constant>(V))
1259 return false;
1260 ++SplittableBlocksCount;
1262 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1263 }
1264
1265 bool isInPrologue(Instruction &I) {
1266 return I.getParent() == FnPrologueEnd->getParent() &&
1267 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1268 }
1269
1270 // Creates a new origin and records the stack trace. In general we can call
1271 // this function for any origin manipulation we like. However it will cost
1272 // runtime resources. So use this wisely only if it can provide additional
1273 // information helpful to a user.
1274 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1275 if (MS.TrackOrigins <= 1)
1276 return V;
1277 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1278 }
1279
1280 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1281 const DataLayout &DL = F.getDataLayout();
1282 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1283 if (IntptrSize == kOriginSize)
1284 return Origin;
1285 assert(IntptrSize == kOriginSize * 2);
1286 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1287 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1288 }
1289
1290 /// Fill memory range with the given origin value.
1291 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1292 TypeSize TS, Align Alignment) {
1293 const DataLayout &DL = F.getDataLayout();
1294 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1295 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1296 assert(IntptrAlignment >= kMinOriginAlignment);
1297 assert(IntptrSize >= kOriginSize);
1298
1299 // Note: The loop based formation works for fixed length vectors too,
1300 // however we prefer to unroll and specialize alignment below.
1301 if (TS.isScalable()) {
1302 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1303 Value *RoundUp =
1304 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1305 Value *End =
1306 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1307 auto [InsertPt, Index] =
1309 IRB.SetInsertPoint(InsertPt);
1310
1311 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1313 return;
1314 }
1315
1316 unsigned Size = TS.getFixedValue();
1317
1318 unsigned Ofs = 0;
1319 Align CurrentAlignment = Alignment;
1320 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1321 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1322 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1323 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1324 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1325 : IntptrOriginPtr;
1326 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1327 Ofs += IntptrSize / kOriginSize;
1328 CurrentAlignment = IntptrAlignment;
1329 }
1330 }
1331
1332 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1333 Value *GEP =
1334 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1335 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1336 CurrentAlignment = kMinOriginAlignment;
1337 }
1338 }
1339
1340 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1341 Value *OriginPtr, Align Alignment) {
1342 const DataLayout &DL = F.getDataLayout();
1343 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1344 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1345 // ZExt cannot convert between vector and scalar
1346 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1347 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1348 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1349 // Origin is not needed: value is initialized or const shadow is
1350 // ignored.
1351 return;
1352 }
1353 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1354 // Copy origin as the value is definitely uninitialized.
1355 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1356 OriginAlignment);
1357 return;
1358 }
1359 // Fallback to runtime check, which still can be optimized out later.
1360 }
1361
1362 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1363 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1364 if (instrumentWithCalls(ConvertedShadow) &&
1365 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1366 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1367 Value *ConvertedShadow2 =
1368 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1369 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1370 CB->addParamAttr(0, Attribute::ZExt);
1371 CB->addParamAttr(2, Attribute::ZExt);
1372 } else {
1373 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1375 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1376 IRBuilder<> IRBNew(CheckTerm);
1377 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1378 OriginAlignment);
1379 }
1380 }
1381
1382 void materializeStores() {
1383 for (StoreInst *SI : StoreList) {
1384 IRBuilder<> IRB(SI);
1385 Value *Val = SI->getValueOperand();
1386 Value *Addr = SI->getPointerOperand();
1387 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1388 Value *ShadowPtr, *OriginPtr;
1389 Type *ShadowTy = Shadow->getType();
1390 const Align Alignment = SI->getAlign();
1391 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1392 std::tie(ShadowPtr, OriginPtr) =
1393 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1394
1395 [[maybe_unused]] StoreInst *NewSI =
1396 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1397 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1398
1399 if (SI->isAtomic())
1400 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1401
1402 if (MS.TrackOrigins && !SI->isAtomic())
1403 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1404 OriginAlignment);
1405 }
1406 }
1407
1408 // Returns true if Debug Location corresponds to multiple warnings.
1409 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1410 if (MS.TrackOrigins < 2)
1411 return false;
1412
1413 if (LazyWarningDebugLocationCount.empty())
1414 for (const auto &I : InstrumentationList)
1415 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1416
1417 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1418 }
1419
1420 /// Helper function to insert a warning at IRB's current insert point.
1421 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1422 if (!Origin)
1423 Origin = (Value *)IRB.getInt32(0);
1424 assert(Origin->getType()->isIntegerTy());
1425
1426 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1427 // Try to create additional origin with debug info of the last origin
1428 // instruction. It may provide additional information to the user.
1429 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1430 assert(MS.TrackOrigins);
1431 auto NewDebugLoc = OI->getDebugLoc();
1432 // Origin update with missing or the same debug location provides no
1433 // additional value.
1434 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1435 // Insert update just before the check, so we call runtime only just
1436 // before the report.
1437 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1438 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1439 Origin = updateOrigin(Origin, IRBOrigin);
1440 }
1441 }
1442 }
1443
1444 if (MS.CompileKernel || MS.TrackOrigins)
1445 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1446 else
1447 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1448 // FIXME: Insert UnreachableInst if !MS.Recover?
1449 // This may invalidate some of the following checks and needs to be done
1450 // at the very end.
1451 }
1452
1453 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1454 Value *Origin) {
1455 const DataLayout &DL = F.getDataLayout();
1456 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1457 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1458 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1459 // ZExt cannot convert between vector and scalar
1460 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1461 Value *ConvertedShadow2 =
1462 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1463
1464 if (SizeIndex < kNumberOfAccessSizes) {
1465 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1466 CallBase *CB = IRB.CreateCall(
1467 Fn,
1468 {ConvertedShadow2,
1469 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1470 CB->addParamAttr(0, Attribute::ZExt);
1471 CB->addParamAttr(1, Attribute::ZExt);
1472 } else {
1473 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1474 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1475 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1476 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1477 CallBase *CB = IRB.CreateCall(
1478 Fn,
1479 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1480 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1481 CB->addParamAttr(1, Attribute::ZExt);
1482 CB->addParamAttr(2, Attribute::ZExt);
1483 }
1484 } else {
1485 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1487 Cmp, &*IRB.GetInsertPoint(),
1488 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1489
1490 IRB.SetInsertPoint(CheckTerm);
1491 insertWarningFn(IRB, Origin);
1492 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1493 }
1494 }
1495
1496 void materializeInstructionChecks(
1497 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1498 const DataLayout &DL = F.getDataLayout();
1499 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1500 // correct origin.
1501 bool Combine = !MS.TrackOrigins;
1502 Instruction *Instruction = InstructionChecks.front().OrigIns;
1503 Value *Shadow = nullptr;
1504 for (const auto &ShadowData : InstructionChecks) {
1505 assert(ShadowData.OrigIns == Instruction);
1506 IRBuilder<> IRB(Instruction);
1507
1508 Value *ConvertedShadow = ShadowData.Shadow;
1509
1510 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1511 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1512 // Skip, value is initialized or const shadow is ignored.
1513 continue;
1514 }
1515 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1516 // Report as the value is definitely uninitialized.
1517 insertWarningFn(IRB, ShadowData.Origin);
1518 if (!MS.Recover)
1519 return; // Always fail and stop here, not need to check the rest.
1520 // Skip entire instruction,
1521 continue;
1522 }
1523 // Fallback to runtime check, which still can be optimized out later.
1524 }
1525
1526 if (!Combine) {
1527 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1528 continue;
1529 }
1530
1531 if (!Shadow) {
1532 Shadow = ConvertedShadow;
1533 continue;
1534 }
1535
1536 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1537 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1538 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1539 }
1540
1541 if (Shadow) {
1542 assert(Combine);
1543 IRBuilder<> IRB(Instruction);
1544 materializeOneCheck(IRB, Shadow, nullptr);
1545 }
1546 }
1547
1548 static bool isAArch64SVCount(Type *Ty) {
1549 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1550 return TTy->getName() == "aarch64.svcount";
1551 return false;
1552 }
1553
1554 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1555 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1556 static bool isScalableNonVectorType(Type *Ty) {
1557 if (!isAArch64SVCount(Ty))
1558 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1559 << "\n");
1560
1561 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1562 }
1563
1564 void materializeChecks() {
1565#ifndef NDEBUG
1566 // For assert below.
1567 SmallPtrSet<Instruction *, 16> Done;
1568#endif
1569
1570 for (auto I = InstrumentationList.begin();
1571 I != InstrumentationList.end();) {
1572 auto OrigIns = I->OrigIns;
1573 // Checks are grouped by the original instruction. We call all
1574 // `insertShadowCheck` for an instruction at once.
1575 assert(Done.insert(OrigIns).second);
1576 auto J = std::find_if(I + 1, InstrumentationList.end(),
1577 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1578 return OrigIns != R.OrigIns;
1579 });
1580 // Process all checks of instruction at once.
1581 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1582 I = J;
1583 }
1584
1585 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1586 }
1587
1588 // Returns the last instruction in the new prologue
1589 void insertKmsanPrologue(IRBuilder<> &IRB) {
1590 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1591 Constant *Zero = IRB.getInt32(0);
1592 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1593 {Zero, IRB.getInt32(0)}, "param_shadow");
1594 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1595 {Zero, IRB.getInt32(1)}, "retval_shadow");
1596 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1597 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1598 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1599 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1600 MS.VAArgOverflowSizeTLS =
1601 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1602 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1603 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1604 {Zero, IRB.getInt32(5)}, "param_origin");
1605 MS.RetvalOriginTLS =
1606 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1607 {Zero, IRB.getInt32(6)}, "retval_origin");
1608 if (MS.TargetTriple.getArch() == Triple::systemz)
1609 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1610 }
1611
1612 /// Add MemorySanitizer instrumentation to a function.
1613 bool runOnFunction() {
1614 // Iterate all BBs in depth-first order and create shadow instructions
1615 // for all instructions (where applicable).
1616 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1617 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1618 visit(*BB);
1619
1620 // `visit` above only collects instructions. Process them after iterating
1621 // CFG to avoid requirement on CFG transformations.
1622 for (Instruction *I : Instructions)
1624
1625 // Finalize PHI nodes.
1626 for (PHINode *PN : ShadowPHINodes) {
1627 PHINode *PNS = cast<PHINode>(getShadow(PN));
1628 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1629 size_t NumValues = PN->getNumIncomingValues();
1630 for (size_t v = 0; v < NumValues; v++) {
1631 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1632 if (PNO)
1633 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1634 }
1635 }
1636
1637 VAHelper->finalizeInstrumentation();
1638
1639 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1640 // instrumenting only allocas.
1642 for (auto Item : LifetimeStartList) {
1643 instrumentAlloca(*Item.second, Item.first);
1644 AllocaSet.remove(Item.second);
1645 }
1646 }
1647 // Poison the allocas for which we didn't instrument the corresponding
1648 // lifetime intrinsics.
1649 for (AllocaInst *AI : AllocaSet)
1650 instrumentAlloca(*AI);
1651
1652 // Insert shadow value checks.
1653 materializeChecks();
1654
1655 // Delayed instrumentation of StoreInst.
1656 // This may not add new address checks.
1657 materializeStores();
1658
1659 return true;
1660 }
1661
1662 /// Compute the shadow type that corresponds to a given Value.
1663 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1664
1665 /// Compute the shadow type that corresponds to a given Type.
1666 Type *getShadowTy(Type *OrigTy) {
1667 if (!OrigTy->isSized()) {
1668 return nullptr;
1669 }
1670 // For integer type, shadow is the same as the original type.
1671 // This may return weird-sized types like i1.
1672 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1673 return IT;
1674 const DataLayout &DL = F.getDataLayout();
1675 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1676 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1677 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1678 VT->getElementCount());
1679 }
1680 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1681 return ArrayType::get(getShadowTy(AT->getElementType()),
1682 AT->getNumElements());
1683 }
1684 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1686 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1687 Elements.push_back(getShadowTy(ST->getElementType(i)));
1688 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1689 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1690 return Res;
1691 }
1692 if (isScalableNonVectorType(OrigTy)) {
1693 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1694 << "\n");
1695 return OrigTy;
1696 }
1697
1698 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1699 return IntegerType::get(*MS.C, TypeSize);
1700 }
1701
1702 /// Extract combined shadow of struct elements as a bool
1703 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1704 IRBuilder<> &IRB) {
1705 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1706 Value *Aggregator = FalseVal;
1707
1708 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1709 // Combine by ORing together each element's bool shadow
1710 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1711 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1712
1713 if (Aggregator != FalseVal)
1714 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1715 else
1716 Aggregator = ShadowBool;
1717 }
1718
1719 return Aggregator;
1720 }
1721
1722 // Extract combined shadow of array elements
1723 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1724 IRBuilder<> &IRB) {
1725 if (!Array->getNumElements())
1726 return IRB.getIntN(/* width */ 1, /* value */ 0);
1727
1728 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1729 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1730
1731 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1732 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1733 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1734 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1735 }
1736 return Aggregator;
1737 }
1738
1739 /// Convert a shadow value to it's flattened variant. The resulting
1740 /// shadow may not necessarily have the same bit width as the input
1741 /// value, but it will always be comparable to zero.
1742 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1743 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1744 return collapseStructShadow(Struct, V, IRB);
1745 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1746 return collapseArrayShadow(Array, V, IRB);
1747 if (isa<VectorType>(V->getType())) {
1748 if (isa<ScalableVectorType>(V->getType()))
1749 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1750 unsigned BitWidth =
1751 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1752 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1753 }
1754 return V;
1755 }
1756
1757 // Convert a scalar value to an i1 by comparing with 0
1758 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1759 Type *VTy = V->getType();
1760 if (!VTy->isIntegerTy())
1761 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1762 if (VTy->getIntegerBitWidth() == 1)
1763 // Just converting a bool to a bool, so do nothing.
1764 return V;
1765 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1766 }
1767
1768 Type *ptrToIntPtrType(Type *PtrTy) const {
1769 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1770 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1771 VectTy->getElementCount());
1772 }
1773 assert(PtrTy->isIntOrPtrTy());
1774 return MS.IntptrTy;
1775 }
1776
1777 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1778 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1779 return VectorType::get(
1780 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1781 VectTy->getElementCount());
1782 }
1783 assert(IntPtrTy == MS.IntptrTy);
1784 return MS.PtrTy;
1785 }
1786
1787 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1788 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1790 VectTy->getElementCount(),
1791 constToIntPtr(VectTy->getElementType(), C));
1792 }
1793 assert(IntPtrTy == MS.IntptrTy);
1794 // TODO: Avoid implicit trunc?
1795 // See https://github.com/llvm/llvm-project/issues/112510.
1796 return ConstantInt::get(MS.IntptrTy, C, /*IsSigned=*/false,
1797 /*ImplicitTrunc=*/true);
1798 }
1799
1800 /// Returns the integer shadow offset that corresponds to a given
1801 /// application address, whereby:
1802 ///
1803 /// Offset = (Addr & ~AndMask) ^ XorMask
1804 /// Shadow = ShadowBase + Offset
1805 /// Origin = (OriginBase + Offset) & ~Alignment
1806 ///
1807 /// Note: for efficiency, many shadow mappings only require use the XorMask
1808 /// and OriginBase; the AndMask and ShadowBase are often zero.
1809 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1810 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1811 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1812
1813 if (uint64_t AndMask = MS.MapParams->AndMask)
1814 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1815
1816 if (uint64_t XorMask = MS.MapParams->XorMask)
1817 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1818 return OffsetLong;
1819 }
1820
1821 /// Compute the shadow and origin addresses corresponding to a given
1822 /// application address.
1823 ///
1824 /// Shadow = ShadowBase + Offset
1825 /// Origin = (OriginBase + Offset) & ~3ULL
1826 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1827 /// a single pointee.
1828 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1829 std::pair<Value *, Value *>
1830 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1831 MaybeAlign Alignment) {
1832 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1833 if (!VectTy) {
1834 assert(Addr->getType()->isPointerTy());
1835 } else {
1836 assert(VectTy->getElementType()->isPointerTy());
1837 }
1838 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1839 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1840 Value *ShadowLong = ShadowOffset;
1841 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1842 ShadowLong =
1843 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1844 }
1845 Value *ShadowPtr = IRB.CreateIntToPtr(
1846 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1847
1848 Value *OriginPtr = nullptr;
1849 if (MS.TrackOrigins) {
1850 Value *OriginLong = ShadowOffset;
1851 uint64_t OriginBase = MS.MapParams->OriginBase;
1852 if (OriginBase != 0)
1853 OriginLong =
1854 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1855 if (!Alignment || *Alignment < kMinOriginAlignment) {
1856 uint64_t Mask = kMinOriginAlignment.value() - 1;
1857 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1858 }
1859 OriginPtr = IRB.CreateIntToPtr(
1860 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1861 }
1862 return std::make_pair(ShadowPtr, OriginPtr);
1863 }
1864
1865 template <typename... ArgsTy>
1866 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1867 ArgsTy... Args) {
1868 if (MS.TargetTriple.getArch() == Triple::systemz) {
1869 IRB.CreateCall(Callee,
1870 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1871 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1872 }
1873
1874 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1875 }
1876
1877 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1878 IRBuilder<> &IRB,
1879 Type *ShadowTy,
1880 bool isStore) {
1881 Value *ShadowOriginPtrs;
1882 const DataLayout &DL = F.getDataLayout();
1883 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1884
1885 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1886 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1887 if (Getter) {
1888 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1889 } else {
1890 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1891 ShadowOriginPtrs = createMetadataCall(
1892 IRB,
1893 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1894 AddrCast, SizeVal);
1895 }
1896 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1897 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1898 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1899
1900 return std::make_pair(ShadowPtr, OriginPtr);
1901 }
1902
1903 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1904 /// a single pointee.
1905 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1906 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1907 IRBuilder<> &IRB,
1908 Type *ShadowTy,
1909 bool isStore) {
1910 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1911 if (!VectTy) {
1912 assert(Addr->getType()->isPointerTy());
1913 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1914 }
1915
1916 // TODO: Support callbacs with vectors of addresses.
1917 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1918 Value *ShadowPtrs = ConstantInt::getNullValue(
1919 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1920 Value *OriginPtrs = nullptr;
1921 if (MS.TrackOrigins)
1922 OriginPtrs = ConstantInt::getNullValue(
1923 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1924 for (unsigned i = 0; i < NumElements; ++i) {
1925 Value *OneAddr =
1926 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1927 auto [ShadowPtr, OriginPtr] =
1928 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1929
1930 ShadowPtrs = IRB.CreateInsertElement(
1931 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1932 if (MS.TrackOrigins)
1933 OriginPtrs = IRB.CreateInsertElement(
1934 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1935 }
1936 return {ShadowPtrs, OriginPtrs};
1937 }
1938
1939 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1940 Type *ShadowTy,
1941 MaybeAlign Alignment,
1942 bool isStore) {
1943 if (MS.CompileKernel)
1944 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1945 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1946 }
1947
1948 /// Compute the shadow address for a given function argument.
1949 ///
1950 /// Shadow = ParamTLS+ArgOffset.
1951 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1952 return IRB.CreatePtrAdd(MS.ParamTLS,
1953 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1954 }
1955
1956 /// Compute the origin address for a given function argument.
1957 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1958 if (!MS.TrackOrigins)
1959 return nullptr;
1960 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1961 ConstantInt::get(MS.IntptrTy, ArgOffset),
1962 "_msarg_o");
1963 }
1964
1965 /// Compute the shadow address for a retval.
1966 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1967 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1968 }
1969
1970 /// Compute the origin address for a retval.
1971 Value *getOriginPtrForRetval() {
1972 // We keep a single origin for the entire retval. Might be too optimistic.
1973 return MS.RetvalOriginTLS;
1974 }
1975
1976 /// Set SV to be the shadow value for V.
1977 void setShadow(Value *V, Value *SV) {
1978 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1979 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1980 }
1981
1982 /// Set Origin to be the origin value for V.
1983 void setOrigin(Value *V, Value *Origin) {
1984 if (!MS.TrackOrigins)
1985 return;
1986 assert(!OriginMap.count(V) && "Values may only have one origin");
1987 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1988 OriginMap[V] = Origin;
1989 }
1990
1991 Constant *getCleanShadow(Type *OrigTy) {
1992 Type *ShadowTy = getShadowTy(OrigTy);
1993 if (!ShadowTy)
1994 return nullptr;
1995 return Constant::getNullValue(ShadowTy);
1996 }
1997
1998 /// Create a clean shadow value for a given value.
1999 ///
2000 /// Clean shadow (all zeroes) means all bits of the value are defined
2001 /// (initialized).
2002 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2003
2004 /// Create a dirty shadow of a given shadow type.
2005 Constant *getPoisonedShadow(Type *ShadowTy) {
2006 assert(ShadowTy);
2007 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2008 return Constant::getAllOnesValue(ShadowTy);
2009 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2010 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2011 getPoisonedShadow(AT->getElementType()));
2012 return ConstantArray::get(AT, Vals);
2013 }
2014 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2015 SmallVector<Constant *, 4> Vals;
2016 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2017 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2018 return ConstantStruct::get(ST, Vals);
2019 }
2020 llvm_unreachable("Unexpected shadow type");
2021 }
2022
2023 /// Create a dirty shadow for a given value.
2024 Constant *getPoisonedShadow(Value *V) {
2025 Type *ShadowTy = getShadowTy(V);
2026 if (!ShadowTy)
2027 return nullptr;
2028 return getPoisonedShadow(ShadowTy);
2029 }
2030
2031 /// Create a clean (zero) origin.
2032 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2033
2034 /// Get the shadow value for a given Value.
2035 ///
2036 /// This function either returns the value set earlier with setShadow,
2037 /// or extracts if from ParamTLS (for function arguments).
2038 Value *getShadow(Value *V) {
2039 if (Instruction *I = dyn_cast<Instruction>(V)) {
2040 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2041 return getCleanShadow(V);
2042 // For instructions the shadow is already stored in the map.
2043 Value *Shadow = ShadowMap[V];
2044 if (!Shadow) {
2045 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2046 assert(Shadow && "No shadow for a value");
2047 }
2048 return Shadow;
2049 }
2050 // Handle fully undefined values
2051 // (partially undefined constant vectors are handled later)
2052 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2053 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2054 : getCleanShadow(V);
2055 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2056 return AllOnes;
2057 }
2058 if (Argument *A = dyn_cast<Argument>(V)) {
2059 // For arguments we compute the shadow on demand and store it in the map.
2060 Value *&ShadowPtr = ShadowMap[V];
2061 if (ShadowPtr)
2062 return ShadowPtr;
2063 Function *F = A->getParent();
2064 IRBuilder<> EntryIRB(FnPrologueEnd);
2065 unsigned ArgOffset = 0;
2066 const DataLayout &DL = F->getDataLayout();
2067 for (auto &FArg : F->args()) {
2068 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2069 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2070 ? "vscale not fully supported\n"
2071 : "Arg is not sized\n"));
2072 if (A == &FArg) {
2073 ShadowPtr = getCleanShadow(V);
2074 setOrigin(A, getCleanOrigin());
2075 break;
2076 }
2077 continue;
2078 }
2079
2080 unsigned Size = FArg.hasByValAttr()
2081 ? DL.getTypeAllocSize(FArg.getParamByValType())
2082 : DL.getTypeAllocSize(FArg.getType());
2083
2084 if (A == &FArg) {
2085 bool Overflow = ArgOffset + Size > kParamTLSSize;
2086 if (FArg.hasByValAttr()) {
2087 // ByVal pointer itself has clean shadow. We copy the actual
2088 // argument shadow to the underlying memory.
2089 // Figure out maximal valid memcpy alignment.
2090 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2091 FArg.getParamAlign(), FArg.getParamByValType());
2092 Value *CpShadowPtr, *CpOriginPtr;
2093 std::tie(CpShadowPtr, CpOriginPtr) =
2094 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2095 /*isStore*/ true);
2096 if (!PropagateShadow || Overflow) {
2097 // ParamTLS overflow.
2098 EntryIRB.CreateMemSet(
2099 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2100 Size, ArgAlign);
2101 } else {
2102 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2103 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2104 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2105 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2106 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2107
2108 if (MS.TrackOrigins) {
2109 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2110 // FIXME: OriginSize should be:
2111 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2112 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2113 EntryIRB.CreateMemCpy(
2114 CpOriginPtr,
2115 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2116 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2117 OriginSize);
2118 }
2119 }
2120 }
2121
2122 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2123 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2124 ShadowPtr = getCleanShadow(V);
2125 setOrigin(A, getCleanOrigin());
2126 } else {
2127 // Shadow over TLS
2128 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2129 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2131 if (MS.TrackOrigins) {
2132 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2133 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2134 }
2135 }
2137 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2138 break;
2139 }
2140
2141 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2142 }
2143 assert(ShadowPtr && "Could not find shadow for an argument");
2144 return ShadowPtr;
2145 }
2146
2147 // Check for partially-undefined constant vectors
2148 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2149 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2150 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2151 PoisonUndefVectors) {
2152 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2153 SmallVector<Constant *, 32> ShadowVector(NumElems);
2154 for (unsigned i = 0; i != NumElems; ++i) {
2155 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2156 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2157 : getCleanShadow(Elem);
2158 }
2159
2160 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2161 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2162 << *ShadowConstant << "\n");
2163
2164 return ShadowConstant;
2165 }
2166
2167 // TODO: partially-undefined constant arrays, structures, and nested types
2168
2169 // For everything else the shadow is zero.
2170 return getCleanShadow(V);
2171 }
2172
2173 /// Get the shadow for i-th argument of the instruction I.
2174 Value *getShadow(Instruction *I, int i) {
2175 return getShadow(I->getOperand(i));
2176 }
2177
2178 /// Get the origin for a value.
2179 Value *getOrigin(Value *V) {
2180 if (!MS.TrackOrigins)
2181 return nullptr;
2182 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2183 return getCleanOrigin();
2185 "Unexpected value type in getOrigin()");
2186 if (Instruction *I = dyn_cast<Instruction>(V)) {
2187 if (I->getMetadata(LLVMContext::MD_nosanitize))
2188 return getCleanOrigin();
2189 }
2190 Value *Origin = OriginMap[V];
2191 assert(Origin && "Missing origin");
2192 return Origin;
2193 }
2194
2195 /// Get the origin for i-th argument of the instruction I.
2196 Value *getOrigin(Instruction *I, int i) {
2197 return getOrigin(I->getOperand(i));
2198 }
2199
2200 /// Remember the place where a shadow check should be inserted.
2201 ///
2202 /// This location will be later instrumented with a check that will print a
2203 /// UMR warning in runtime if the shadow value is not 0.
2204 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2205 assert(Shadow);
2206 if (!InsertChecks)
2207 return;
2208
2209 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2210 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2211 << *OrigIns << "\n");
2212 return;
2213 }
2214
2215 Type *ShadowTy = Shadow->getType();
2216 if (isScalableNonVectorType(ShadowTy)) {
2217 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2218 << " before " << *OrigIns << "\n");
2219 return;
2220 }
2221#ifndef NDEBUG
2222 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2223 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2224 "Can only insert checks for integer, vector, and aggregate shadow "
2225 "types");
2226#endif
2227 InstrumentationList.push_back(
2228 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2229 }
2230
2231 /// Get shadow for value, and remember the place where a shadow check should
2232 /// be inserted.
2233 ///
2234 /// This location will be later instrumented with a check that will print a
2235 /// UMR warning in runtime if the value is not fully defined.
2236 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2237 assert(Val);
2238 Value *Shadow, *Origin;
2240 Shadow = getShadow(Val);
2241 if (!Shadow)
2242 return;
2243 Origin = getOrigin(Val);
2244 } else {
2245 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2246 if (!Shadow)
2247 return;
2248 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2249 }
2250 insertCheckShadow(Shadow, Origin, OrigIns);
2251 }
2252
2254 switch (a) {
2255 case AtomicOrdering::NotAtomic:
2256 return AtomicOrdering::NotAtomic;
2257 case AtomicOrdering::Unordered:
2258 case AtomicOrdering::Monotonic:
2259 case AtomicOrdering::Release:
2260 return AtomicOrdering::Release;
2261 case AtomicOrdering::Acquire:
2262 case AtomicOrdering::AcquireRelease:
2263 return AtomicOrdering::AcquireRelease;
2264 case AtomicOrdering::SequentiallyConsistent:
2265 return AtomicOrdering::SequentiallyConsistent;
2266 }
2267 llvm_unreachable("Unknown ordering");
2268 }
2269
2270 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2271 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2272 uint32_t OrderingTable[NumOrderings] = {};
2273
2274 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2275 OrderingTable[(int)AtomicOrderingCABI::release] =
2276 (int)AtomicOrderingCABI::release;
2277 OrderingTable[(int)AtomicOrderingCABI::consume] =
2278 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2279 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2280 (int)AtomicOrderingCABI::acq_rel;
2281 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2282 (int)AtomicOrderingCABI::seq_cst;
2283
2284 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2285 }
2286
2288 switch (a) {
2289 case AtomicOrdering::NotAtomic:
2290 return AtomicOrdering::NotAtomic;
2291 case AtomicOrdering::Unordered:
2292 case AtomicOrdering::Monotonic:
2293 case AtomicOrdering::Acquire:
2294 return AtomicOrdering::Acquire;
2295 case AtomicOrdering::Release:
2296 case AtomicOrdering::AcquireRelease:
2297 return AtomicOrdering::AcquireRelease;
2298 case AtomicOrdering::SequentiallyConsistent:
2299 return AtomicOrdering::SequentiallyConsistent;
2300 }
2301 llvm_unreachable("Unknown ordering");
2302 }
2303
2304 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2305 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2306 uint32_t OrderingTable[NumOrderings] = {};
2307
2308 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2309 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2310 OrderingTable[(int)AtomicOrderingCABI::consume] =
2311 (int)AtomicOrderingCABI::acquire;
2312 OrderingTable[(int)AtomicOrderingCABI::release] =
2313 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2314 (int)AtomicOrderingCABI::acq_rel;
2315 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2316 (int)AtomicOrderingCABI::seq_cst;
2317
2318 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2319 }
2320
2321 // ------------------- Visitors.
2322 using InstVisitor<MemorySanitizerVisitor>::visit;
2323 void visit(Instruction &I) {
2324 if (I.getMetadata(LLVMContext::MD_nosanitize))
2325 return;
2326 // Don't want to visit if we're in the prologue
2327 if (isInPrologue(I))
2328 return;
2329 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2330 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2331 // We still need to set the shadow and origin to clean values.
2332 setShadow(&I, getCleanShadow(&I));
2333 setOrigin(&I, getCleanOrigin());
2334 return;
2335 }
2336
2337 Instructions.push_back(&I);
2338 }
2339
2340 /// Instrument LoadInst
2341 ///
2342 /// Loads the corresponding shadow and (optionally) origin.
2343 /// Optionally, checks that the load address is fully defined.
2344 void visitLoadInst(LoadInst &I) {
2345 assert(I.getType()->isSized() && "Load type must have size");
2346 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2347 NextNodeIRBuilder IRB(&I);
2348 Type *ShadowTy = getShadowTy(&I);
2349 Value *Addr = I.getPointerOperand();
2350 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2351 const Align Alignment = I.getAlign();
2352 if (PropagateShadow) {
2353 std::tie(ShadowPtr, OriginPtr) =
2354 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2355 setShadow(&I,
2356 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2357 } else {
2358 setShadow(&I, getCleanShadow(&I));
2359 }
2360
2362 insertCheckShadowOf(I.getPointerOperand(), &I);
2363
2364 if (I.isAtomic())
2365 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2366
2367 if (MS.TrackOrigins) {
2368 if (PropagateShadow) {
2369 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2370 setOrigin(
2371 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2372 } else {
2373 setOrigin(&I, getCleanOrigin());
2374 }
2375 }
2376 }
2377
2378 /// Instrument StoreInst
2379 ///
2380 /// Stores the corresponding shadow and (optionally) origin.
2381 /// Optionally, checks that the store address is fully defined.
2382 void visitStoreInst(StoreInst &I) {
2383 StoreList.push_back(&I);
2385 insertCheckShadowOf(I.getPointerOperand(), &I);
2386 }
2387
2388 void handleCASOrRMW(Instruction &I) {
2390
2391 IRBuilder<> IRB(&I);
2392 Value *Addr = I.getOperand(0);
2393 Value *Val = I.getOperand(1);
2394 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2395 /*isStore*/ true)
2396 .first;
2397
2399 insertCheckShadowOf(Addr, &I);
2400
2401 // Only test the conditional argument of cmpxchg instruction.
2402 // The other argument can potentially be uninitialized, but we can not
2403 // detect this situation reliably without possible false positives.
2405 insertCheckShadowOf(Val, &I);
2406
2407 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2408
2409 setShadow(&I, getCleanShadow(&I));
2410 setOrigin(&I, getCleanOrigin());
2411 }
2412
2413 void visitAtomicRMWInst(AtomicRMWInst &I) {
2414 handleCASOrRMW(I);
2415 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2416 }
2417
2418 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2419 handleCASOrRMW(I);
2420 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2421 }
2422
2423 // Vector manipulation.
2424 void visitExtractElementInst(ExtractElementInst &I) {
2425 insertCheckShadowOf(I.getOperand(1), &I);
2426 IRBuilder<> IRB(&I);
2427 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2428 "_msprop"));
2429 setOrigin(&I, getOrigin(&I, 0));
2430 }
2431
2432 void visitInsertElementInst(InsertElementInst &I) {
2433 insertCheckShadowOf(I.getOperand(2), &I);
2434 IRBuilder<> IRB(&I);
2435 auto *Shadow0 = getShadow(&I, 0);
2436 auto *Shadow1 = getShadow(&I, 1);
2437 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2438 "_msprop"));
2439 setOriginForNaryOp(I);
2440 }
2441
2442 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2443 IRBuilder<> IRB(&I);
2444 auto *Shadow0 = getShadow(&I, 0);
2445 auto *Shadow1 = getShadow(&I, 1);
2446 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2447 "_msprop"));
2448 setOriginForNaryOp(I);
2449 }
2450
2451 // Casts.
2452 void visitSExtInst(SExtInst &I) {
2453 IRBuilder<> IRB(&I);
2454 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2455 setOrigin(&I, getOrigin(&I, 0));
2456 }
2457
2458 void visitZExtInst(ZExtInst &I) {
2459 IRBuilder<> IRB(&I);
2460 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2461 setOrigin(&I, getOrigin(&I, 0));
2462 }
2463
2464 void visitTruncInst(TruncInst &I) {
2465 IRBuilder<> IRB(&I);
2466 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2467 setOrigin(&I, getOrigin(&I, 0));
2468 }
2469
2470 void visitBitCastInst(BitCastInst &I) {
2471 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2472 // a musttail call and a ret, don't instrument. New instructions are not
2473 // allowed after a musttail call.
2474 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2475 if (CI->isMustTailCall())
2476 return;
2477 IRBuilder<> IRB(&I);
2478 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2479 setOrigin(&I, getOrigin(&I, 0));
2480 }
2481
2482 void visitPtrToIntInst(PtrToIntInst &I) {
2483 IRBuilder<> IRB(&I);
2484 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2485 "_msprop_ptrtoint"));
2486 setOrigin(&I, getOrigin(&I, 0));
2487 }
2488
2489 void visitIntToPtrInst(IntToPtrInst &I) {
2490 IRBuilder<> IRB(&I);
2491 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2492 "_msprop_inttoptr"));
2493 setOrigin(&I, getOrigin(&I, 0));
2494 }
2495
2496 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2497 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2498 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2499 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2500 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2501 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2502
2503 /// Generic handler to compute shadow for bitwise AND.
2504 ///
2505 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2506 ///
2507 /// This code is precise: it implements the rule that "And" of an initialized
2508 /// zero bit always results in an initialized value:
2509 // 1&1 => 1; 0&1 => 0; p&1 => p;
2510 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2511 // 1&p => p; 0&p => 0; p&p => p;
2512 //
2513 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2514 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2515 Value *S2) {
2516 Value *S1S2 = IRB.CreateAnd(S1, S2);
2517 Value *V1S2 = IRB.CreateAnd(V1, S2);
2518 Value *S1V2 = IRB.CreateAnd(S1, V2);
2519
2520 if (V1->getType() != S1->getType()) {
2521 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2522 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2523 }
2524
2525 return IRB.CreateOr({S1S2, V1S2, S1V2});
2526 }
2527
2528 /// Handler for bitwise AND operator.
2529 void visitAnd(BinaryOperator &I) {
2530 IRBuilder<> IRB(&I);
2531 Value *V1 = I.getOperand(0);
2532 Value *V2 = I.getOperand(1);
2533 Value *S1 = getShadow(&I, 0);
2534 Value *S2 = getShadow(&I, 1);
2535
2536 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2537
2538 setShadow(&I, OutShadow);
2539 setOriginForNaryOp(I);
2540 }
2541
2542 void visitOr(BinaryOperator &I) {
2543 IRBuilder<> IRB(&I);
2544 // "Or" of 1 and a poisoned value results in unpoisoned value:
2545 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2546 // 1|0 => 1; 0|0 => 0; p|0 => p;
2547 // 1|p => 1; 0|p => p; p|p => p;
2548 //
2549 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2550 //
2551 // If the "disjoint OR" property is violated, the result is poison, and
2552 // hence the entire shadow is uninitialized:
2553 // S = S | SignExt(V1 & V2 != 0)
2554 Value *S1 = getShadow(&I, 0);
2555 Value *S2 = getShadow(&I, 1);
2556 Value *V1 = I.getOperand(0);
2557 Value *V2 = I.getOperand(1);
2558 if (V1->getType() != S1->getType()) {
2559 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2560 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2561 }
2562
2563 Value *NotV1 = IRB.CreateNot(V1);
2564 Value *NotV2 = IRB.CreateNot(V2);
2565
2566 Value *S1S2 = IRB.CreateAnd(S1, S2);
2567 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2568 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2569
2570 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2571
2572 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2573 Value *V1V2 = IRB.CreateAnd(V1, V2);
2574 Value *DisjointOrShadow = IRB.CreateSExt(
2575 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2576 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2577 }
2578
2579 setShadow(&I, S);
2580 setOriginForNaryOp(I);
2581 }
2582
2583 /// Default propagation of shadow and/or origin.
2584 ///
2585 /// This class implements the general case of shadow propagation, used in all
2586 /// cases where we don't know and/or don't care about what the operation
2587 /// actually does. It converts all input shadow values to a common type
2588 /// (extending or truncating as necessary), and bitwise OR's them.
2589 ///
2590 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2591 /// fully initialized), and less prone to false positives.
2592 ///
2593 /// This class also implements the general case of origin propagation. For a
2594 /// Nary operation, result origin is set to the origin of an argument that is
2595 /// not entirely initialized. If there is more than one such arguments, the
2596 /// rightmost of them is picked. It does not matter which one is picked if all
2597 /// arguments are initialized.
2598 template <bool CombineShadow> class Combiner {
2599 Value *Shadow = nullptr;
2600 Value *Origin = nullptr;
2601 IRBuilder<> &IRB;
2602 MemorySanitizerVisitor *MSV;
2603
2604 public:
2605 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2606 : IRB(IRB), MSV(MSV) {}
2607
2608 /// Add a pair of shadow and origin values to the mix.
2609 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2610 if (CombineShadow) {
2611 assert(OpShadow);
2612 if (!Shadow)
2613 Shadow = OpShadow;
2614 else {
2615 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2616 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2617 }
2618 }
2619
2620 if (MSV->MS.TrackOrigins) {
2621 assert(OpOrigin);
2622 if (!Origin) {
2623 Origin = OpOrigin;
2624 } else {
2625 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2626 // No point in adding something that might result in 0 origin value.
2627 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2628 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2629 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2630 }
2631 }
2632 }
2633 return *this;
2634 }
2635
2636 /// Add an application value to the mix.
2637 Combiner &Add(Value *V) {
2638 Value *OpShadow = MSV->getShadow(V);
2639 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2640 return Add(OpShadow, OpOrigin);
2641 }
2642
2643 /// Set the current combined values as the given instruction's shadow
2644 /// and origin.
2645 void Done(Instruction *I) {
2646 if (CombineShadow) {
2647 assert(Shadow);
2648 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2649 MSV->setShadow(I, Shadow);
2650 }
2651 if (MSV->MS.TrackOrigins) {
2652 assert(Origin);
2653 MSV->setOrigin(I, Origin);
2654 }
2655 }
2656
2657 /// Store the current combined value at the specified origin
2658 /// location.
2659 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2660 if (MSV->MS.TrackOrigins) {
2661 assert(Origin);
2662 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2663 }
2664 }
2665 };
2666
2667 using ShadowAndOriginCombiner = Combiner<true>;
2668 using OriginCombiner = Combiner<false>;
2669
2670 /// Propagate origin for arbitrary operation.
2671 void setOriginForNaryOp(Instruction &I) {
2672 if (!MS.TrackOrigins)
2673 return;
2674 IRBuilder<> IRB(&I);
2675 OriginCombiner OC(this, IRB);
2676 for (Use &Op : I.operands())
2677 OC.Add(Op.get());
2678 OC.Done(&I);
2679 }
2680
2681 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2682 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2683 "Vector of pointers is not a valid shadow type");
2684 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2686 : Ty->getPrimitiveSizeInBits();
2687 }
2688
2689 /// Cast between two shadow types, extending or truncating as
2690 /// necessary.
2691 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2692 bool Signed = false) {
2693 Type *srcTy = V->getType();
2694 if (srcTy == dstTy)
2695 return V;
2696 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2697 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2698 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2699 return IRB.CreateICmpNE(V, getCleanShadow(V));
2700
2701 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2702 return IRB.CreateIntCast(V, dstTy, Signed);
2703 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2704 cast<VectorType>(dstTy)->getElementCount() ==
2705 cast<VectorType>(srcTy)->getElementCount())
2706 return IRB.CreateIntCast(V, dstTy, Signed);
2707 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2708 Value *V2 =
2709 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2710 return IRB.CreateBitCast(V2, dstTy);
2711 // TODO: handle struct types.
2712 }
2713
2714 /// Cast an application value to the type of its own shadow.
2715 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2716 Type *ShadowTy = getShadowTy(V);
2717 if (V->getType() == ShadowTy)
2718 return V;
2719 if (V->getType()->isPtrOrPtrVectorTy())
2720 return IRB.CreatePtrToInt(V, ShadowTy);
2721 else
2722 return IRB.CreateBitCast(V, ShadowTy);
2723 }
2724
2725 /// Propagate shadow for arbitrary operation.
2726 void handleShadowOr(Instruction &I) {
2727 IRBuilder<> IRB(&I);
2728 ShadowAndOriginCombiner SC(this, IRB);
2729 for (Use &Op : I.operands())
2730 SC.Add(Op.get());
2731 SC.Done(&I);
2732 }
2733
2734 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2735 // of elements.
2736 //
2737 // For example, suppose we have:
2738 // VectorA: <a0, a1, a2, a3, a4, a5>
2739 // VectorB: <b0, b1, b2, b3, b4, b5>
2740 // ReductionFactor: 3
2741 // Shards: 1
2742 // The output would be:
2743 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2744 //
2745 // If we have:
2746 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2747 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2748 // ReductionFactor: 2
2749 // Shards: 2
2750 // then a and be each have 2 "shards", resulting in the output being
2751 // interleaved:
2752 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2753 //
2754 // This is convenient for instrumenting horizontal add/sub.
2755 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2756 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2757 unsigned Shards, Value *VectorA, Value *VectorB) {
2758 assert(isa<FixedVectorType>(VectorA->getType()));
2759 unsigned NumElems =
2760 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2761
2762 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2763 if (VectorB) {
2764 assert(VectorA->getType() == VectorB->getType());
2765 TotalNumElems *= 2;
2766 }
2767
2768 assert(NumElems % (ReductionFactor * Shards) == 0);
2769
2770 Value *Or = nullptr;
2771
2772 IRBuilder<> IRB(&I);
2773 for (unsigned i = 0; i < ReductionFactor; i++) {
2774 SmallVector<int, 16> Mask;
2775
2776 for (unsigned j = 0; j < Shards; j++) {
2777 unsigned Offset = NumElems / Shards * j;
2778
2779 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2780 Mask.push_back(Offset + X + i);
2781
2782 if (VectorB) {
2783 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2784 Mask.push_back(NumElems + Offset + X + i);
2785 }
2786 }
2787
2788 Value *Masked;
2789 if (VectorB)
2790 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2791 else
2792 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2793
2794 if (Or)
2795 Or = IRB.CreateOr(Or, Masked);
2796 else
2797 Or = Masked;
2798 }
2799
2800 return Or;
2801 }
2802
2803 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2804 /// fields.
2805 ///
2806 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2807 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2808 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2809 assert(I.arg_size() == 1 || I.arg_size() == 2);
2810
2811 assert(I.getType()->isVectorTy());
2812 assert(I.getArgOperand(0)->getType()->isVectorTy());
2813
2814 [[maybe_unused]] FixedVectorType *ParamType =
2815 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2816 assert((I.arg_size() != 2) ||
2817 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2818 [[maybe_unused]] FixedVectorType *ReturnType =
2819 cast<FixedVectorType>(I.getType());
2820 assert(ParamType->getNumElements() * I.arg_size() ==
2821 2 * ReturnType->getNumElements());
2822
2823 IRBuilder<> IRB(&I);
2824
2825 // Horizontal OR of shadow
2826 Value *FirstArgShadow = getShadow(&I, 0);
2827 Value *SecondArgShadow = nullptr;
2828 if (I.arg_size() == 2)
2829 SecondArgShadow = getShadow(&I, 1);
2830
2831 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2832 FirstArgShadow, SecondArgShadow);
2833
2834 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2835
2836 setShadow(&I, OrShadow);
2837 setOriginForNaryOp(I);
2838 }
2839
2840 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2841 /// fields, with the parameters reinterpreted to have elements of a specified
2842 /// width. For example:
2843 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2844 /// conceptually operates on
2845 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2846 /// and can be handled with ReinterpretElemWidth == 16.
2847 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2848 int ReinterpretElemWidth) {
2849 assert(I.arg_size() == 1 || I.arg_size() == 2);
2850
2851 assert(I.getType()->isVectorTy());
2852 assert(I.getArgOperand(0)->getType()->isVectorTy());
2853
2854 FixedVectorType *ParamType =
2855 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2856 assert((I.arg_size() != 2) ||
2857 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2858
2859 [[maybe_unused]] FixedVectorType *ReturnType =
2860 cast<FixedVectorType>(I.getType());
2861 assert(ParamType->getNumElements() * I.arg_size() ==
2862 2 * ReturnType->getNumElements());
2863
2864 IRBuilder<> IRB(&I);
2865
2866 FixedVectorType *ReinterpretShadowTy = nullptr;
2867 assert(isAligned(Align(ReinterpretElemWidth),
2868 ParamType->getPrimitiveSizeInBits()));
2869 ReinterpretShadowTy = FixedVectorType::get(
2870 IRB.getIntNTy(ReinterpretElemWidth),
2871 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2872
2873 // Horizontal OR of shadow
2874 Value *FirstArgShadow = getShadow(&I, 0);
2875 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2876
2877 // If we had two parameters each with an odd number of elements, the total
2878 // number of elements is even, but we have never seen this in extant
2879 // instruction sets, so we enforce that each parameter must have an even
2880 // number of elements.
2882 Align(2),
2883 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2884
2885 Value *SecondArgShadow = nullptr;
2886 if (I.arg_size() == 2) {
2887 SecondArgShadow = getShadow(&I, 1);
2888 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2889 }
2890
2891 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2892 FirstArgShadow, SecondArgShadow);
2893
2894 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2895
2896 setShadow(&I, OrShadow);
2897 setOriginForNaryOp(I);
2898 }
2899
2900 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2901
2902 // Handle multiplication by constant.
2903 //
2904 // Handle a special case of multiplication by constant that may have one or
2905 // more zeros in the lower bits. This makes corresponding number of lower bits
2906 // of the result zero as well. We model it by shifting the other operand
2907 // shadow left by the required number of bits. Effectively, we transform
2908 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2909 // We use multiplication by 2**N instead of shift to cover the case of
2910 // multiplication by 0, which may occur in some elements of a vector operand.
2911 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2912 Value *OtherArg) {
2913 Constant *ShadowMul;
2914 Type *Ty = ConstArg->getType();
2915 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2916 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2917 Type *EltTy = VTy->getElementType();
2919 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2920 if (ConstantInt *Elt =
2922 const APInt &V = Elt->getValue();
2923 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2924 Elements.push_back(ConstantInt::get(EltTy, V2));
2925 } else {
2926 Elements.push_back(ConstantInt::get(EltTy, 1));
2927 }
2928 }
2929 ShadowMul = ConstantVector::get(Elements);
2930 } else {
2931 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2932 const APInt &V = Elt->getValue();
2933 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2934 ShadowMul = ConstantInt::get(Ty, V2);
2935 } else {
2936 ShadowMul = ConstantInt::get(Ty, 1);
2937 }
2938 }
2939
2940 IRBuilder<> IRB(&I);
2941 setShadow(&I,
2942 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2943 setOrigin(&I, getOrigin(OtherArg));
2944 }
2945
2946 void visitMul(BinaryOperator &I) {
2947 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2948 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2949 if (constOp0 && !constOp1)
2950 handleMulByConstant(I, constOp0, I.getOperand(1));
2951 else if (constOp1 && !constOp0)
2952 handleMulByConstant(I, constOp1, I.getOperand(0));
2953 else
2954 handleShadowOr(I);
2955 }
2956
2957 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2958 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2959 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2960 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2961 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2962 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2963
2964 void handleIntegerDiv(Instruction &I) {
2965 IRBuilder<> IRB(&I);
2966 // Strict on the second argument.
2967 insertCheckShadowOf(I.getOperand(1), &I);
2968 setShadow(&I, getShadow(&I, 0));
2969 setOrigin(&I, getOrigin(&I, 0));
2970 }
2971
2972 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2973 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2974 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2975 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2976
2977 // Floating point division is side-effect free. We can not require that the
2978 // divisor is fully initialized and must propagate shadow. See PR37523.
2979 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2980 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2981
2982 /// Instrument == and != comparisons.
2983 ///
2984 /// Sometimes the comparison result is known even if some of the bits of the
2985 /// arguments are not.
2986 void handleEqualityComparison(ICmpInst &I) {
2987 IRBuilder<> IRB(&I);
2988 Value *A = I.getOperand(0);
2989 Value *B = I.getOperand(1);
2990 Value *Sa = getShadow(A);
2991 Value *Sb = getShadow(B);
2992
2993 // Get rid of pointers and vectors of pointers.
2994 // For ints (and vectors of ints), types of A and Sa match,
2995 // and this is a no-op.
2996 A = IRB.CreatePointerCast(A, Sa->getType());
2997 B = IRB.CreatePointerCast(B, Sb->getType());
2998
2999 // A == B <==> (C = A^B) == 0
3000 // A != B <==> (C = A^B) != 0
3001 // Sc = Sa | Sb
3002 Value *C = IRB.CreateXor(A, B);
3003 Value *Sc = IRB.CreateOr(Sa, Sb);
3004 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
3005 // Result is defined if one of the following is true
3006 // * there is a defined 1 bit in C
3007 // * C is fully defined
3008 // Si = !(C & ~Sc) && Sc
3010 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
3011 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
3012 Value *RHS =
3013 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
3014 Value *Si = IRB.CreateAnd(LHS, RHS);
3015 Si->setName("_msprop_icmp");
3016 setShadow(&I, Si);
3017 setOriginForNaryOp(I);
3018 }
3019
3020 /// Instrument relational comparisons.
3021 ///
3022 /// This function does exact shadow propagation for all relational
3023 /// comparisons of integers, pointers and vectors of those.
3024 /// FIXME: output seems suboptimal when one of the operands is a constant
3025 void handleRelationalComparisonExact(ICmpInst &I) {
3026 IRBuilder<> IRB(&I);
3027 Value *A = I.getOperand(0);
3028 Value *B = I.getOperand(1);
3029 Value *Sa = getShadow(A);
3030 Value *Sb = getShadow(B);
3031
3032 // Get rid of pointers and vectors of pointers.
3033 // For ints (and vectors of ints), types of A and Sa match,
3034 // and this is a no-op.
3035 A = IRB.CreatePointerCast(A, Sa->getType());
3036 B = IRB.CreatePointerCast(B, Sb->getType());
3037
3038 // Let [a0, a1] be the interval of possible values of A, taking into account
3039 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3040 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3041 bool IsSigned = I.isSigned();
3042
3043 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3044 if (IsSigned) {
3045 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3046 // should be preserved, if checked with `getUnsignedPredicate()`.
3047 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3048 // affected, as they are created by effectively adding/substructing from
3049 // A (or B) a value, derived from shadow, with no overflow, either
3050 // before or after sign flip.
3051 APInt MinVal =
3052 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3053 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3054 }
3055 // Minimize undefined bits.
3056 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3057 Value *Max = IRB.CreateOr(V, S);
3058 return std::make_pair(Min, Max);
3059 };
3060
3061 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3062 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3063 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3064 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3065
3066 Value *Si = IRB.CreateXor(S1, S2);
3067 setShadow(&I, Si);
3068 setOriginForNaryOp(I);
3069 }
3070
3071 /// Instrument signed relational comparisons.
3072 ///
3073 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3074 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3075 void handleSignedRelationalComparison(ICmpInst &I) {
3076 Constant *constOp;
3077 Value *op = nullptr;
3079 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3080 op = I.getOperand(0);
3081 pre = I.getPredicate();
3082 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3083 op = I.getOperand(1);
3084 pre = I.getSwappedPredicate();
3085 } else {
3086 handleShadowOr(I);
3087 return;
3088 }
3089
3090 if ((constOp->isNullValue() &&
3091 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3092 (constOp->isAllOnesValue() &&
3093 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3094 IRBuilder<> IRB(&I);
3095 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3096 "_msprop_icmp_s");
3097 setShadow(&I, Shadow);
3098 setOrigin(&I, getOrigin(op));
3099 } else {
3100 handleShadowOr(I);
3101 }
3102 }
3103
3104 void visitICmpInst(ICmpInst &I) {
3105 if (!ClHandleICmp) {
3106 handleShadowOr(I);
3107 return;
3108 }
3109 if (I.isEquality()) {
3110 handleEqualityComparison(I);
3111 return;
3112 }
3113
3114 assert(I.isRelational());
3115 if (ClHandleICmpExact) {
3116 handleRelationalComparisonExact(I);
3117 return;
3118 }
3119 if (I.isSigned()) {
3120 handleSignedRelationalComparison(I);
3121 return;
3122 }
3123
3124 assert(I.isUnsigned());
3125 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3126 handleRelationalComparisonExact(I);
3127 return;
3128 }
3129
3130 handleShadowOr(I);
3131 }
3132
3133 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3134
3135 void handleShift(BinaryOperator &I) {
3136 IRBuilder<> IRB(&I);
3137 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3138 // Otherwise perform the same shift on S1.
3139 Value *S1 = getShadow(&I, 0);
3140 Value *S2 = getShadow(&I, 1);
3141 Value *S2Conv =
3142 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3143 Value *V2 = I.getOperand(1);
3144 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3145 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3146 setOriginForNaryOp(I);
3147 }
3148
3149 void visitShl(BinaryOperator &I) { handleShift(I); }
3150 void visitAShr(BinaryOperator &I) { handleShift(I); }
3151 void visitLShr(BinaryOperator &I) { handleShift(I); }
3152
3153 void handleFunnelShift(IntrinsicInst &I) {
3154 IRBuilder<> IRB(&I);
3155 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3156 // Otherwise perform the same shift on S0 and S1.
3157 Value *S0 = getShadow(&I, 0);
3158 Value *S1 = getShadow(&I, 1);
3159 Value *S2 = getShadow(&I, 2);
3160 Value *S2Conv =
3161 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3162 Value *V2 = I.getOperand(2);
3163 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3164 {S0, S1, V2});
3165 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3166 setOriginForNaryOp(I);
3167 }
3168
3169 /// Instrument llvm.memmove
3170 ///
3171 /// At this point we don't know if llvm.memmove will be inlined or not.
3172 /// If we don't instrument it and it gets inlined,
3173 /// our interceptor will not kick in and we will lose the memmove.
3174 /// If we instrument the call here, but it does not get inlined,
3175 /// we will memmove the shadow twice: which is bad in case
3176 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3177 ///
3178 /// Similar situation exists for memcpy and memset.
3179 void visitMemMoveInst(MemMoveInst &I) {
3180 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3181 IRBuilder<> IRB(&I);
3182 IRB.CreateCall(MS.MemmoveFn,
3183 {I.getArgOperand(0), I.getArgOperand(1),
3184 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3186 }
3187
3188 /// Instrument memcpy
3189 ///
3190 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3191 /// unfortunate as it may slowdown small constant memcpys.
3192 /// FIXME: consider doing manual inline for small constant sizes and proper
3193 /// alignment.
3194 ///
3195 /// Note: This also handles memcpy.inline, which promises no calls to external
3196 /// functions as an optimization. However, with instrumentation enabled this
3197 /// is difficult to promise; additionally, we know that the MSan runtime
3198 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3199 /// instrumentation it's safe to turn memcpy.inline into a call to
3200 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3201 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3202 void visitMemCpyInst(MemCpyInst &I) {
3203 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3204 IRBuilder<> IRB(&I);
3205 IRB.CreateCall(MS.MemcpyFn,
3206 {I.getArgOperand(0), I.getArgOperand(1),
3207 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3209 }
3210
3211 // Same as memcpy.
3212 void visitMemSetInst(MemSetInst &I) {
3213 IRBuilder<> IRB(&I);
3214 IRB.CreateCall(
3215 MS.MemsetFn,
3216 {I.getArgOperand(0),
3217 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3218 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3220 }
3221
3222 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3223
3224 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3225
3226 /// Handle vector store-like intrinsics.
3227 ///
3228 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3229 /// has 1 pointer argument and 1 vector argument, returns void.
3230 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3231 assert(I.arg_size() == 2);
3232
3233 IRBuilder<> IRB(&I);
3234 Value *Addr = I.getArgOperand(0);
3235 Value *Shadow = getShadow(&I, 1);
3236 Value *ShadowPtr, *OriginPtr;
3237
3238 // We don't know the pointer alignment (could be unaligned SSE store!).
3239 // Have to assume to worst case.
3240 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3241 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3242 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3243
3245 insertCheckShadowOf(Addr, &I);
3246
3247 // FIXME: factor out common code from materializeStores
3248 if (MS.TrackOrigins)
3249 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3250 return true;
3251 }
3252
3253 /// Handle vector load-like intrinsics.
3254 ///
3255 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3256 /// has 1 pointer argument, returns a vector.
3257 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3258 assert(I.arg_size() == 1);
3259
3260 IRBuilder<> IRB(&I);
3261 Value *Addr = I.getArgOperand(0);
3262
3263 Type *ShadowTy = getShadowTy(&I);
3264 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3265 if (PropagateShadow) {
3266 // We don't know the pointer alignment (could be unaligned SSE load!).
3267 // Have to assume to worst case.
3268 const Align Alignment = Align(1);
3269 std::tie(ShadowPtr, OriginPtr) =
3270 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3271 setShadow(&I,
3272 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3273 } else {
3274 setShadow(&I, getCleanShadow(&I));
3275 }
3276
3278 insertCheckShadowOf(Addr, &I);
3279
3280 if (MS.TrackOrigins) {
3281 if (PropagateShadow)
3282 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3283 else
3284 setOrigin(&I, getCleanOrigin());
3285 }
3286 return true;
3287 }
3288
3289 /// Handle (SIMD arithmetic)-like intrinsics.
3290 ///
3291 /// Instrument intrinsics with any number of arguments of the same type [*],
3292 /// equal to the return type, plus a specified number of trailing flags of
3293 /// any type.
3294 ///
3295 /// [*] The type should be simple (no aggregates or pointers; vectors are
3296 /// fine).
3297 ///
3298 /// Caller guarantees that this intrinsic does not access memory.
3299 ///
3300 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3301 /// by this handler. See horizontalReduce().
3302 ///
3303 /// TODO: permutation intrinsics are also often incorrectly matched.
3304 [[maybe_unused]] bool
3305 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3306 unsigned int trailingFlags) {
3307 Type *RetTy = I.getType();
3308 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3309 return false;
3310
3311 unsigned NumArgOperands = I.arg_size();
3312 assert(NumArgOperands >= trailingFlags);
3313 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3314 Type *Ty = I.getArgOperand(i)->getType();
3315 if (Ty != RetTy)
3316 return false;
3317 }
3318
3319 IRBuilder<> IRB(&I);
3320 ShadowAndOriginCombiner SC(this, IRB);
3321 for (unsigned i = 0; i < NumArgOperands; ++i)
3322 SC.Add(I.getArgOperand(i));
3323 SC.Done(&I);
3324
3325 return true;
3326 }
3327
3328 /// Returns whether it was able to heuristically instrument unknown
3329 /// intrinsics.
3330 ///
3331 /// The main purpose of this code is to do something reasonable with all
3332 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3333 /// We recognize several classes of intrinsics by their argument types and
3334 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3335 /// sure that we know what the intrinsic does.
3336 ///
3337 /// We special-case intrinsics where this approach fails. See llvm.bswap
3338 /// handling as an example of that.
3339 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3340 unsigned NumArgOperands = I.arg_size();
3341 if (NumArgOperands == 0)
3342 return false;
3343
3344 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3345 I.getArgOperand(1)->getType()->isVectorTy() &&
3346 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3347 // This looks like a vector store.
3348 return handleVectorStoreIntrinsic(I);
3349 }
3350
3351 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3352 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3353 // This looks like a vector load.
3354 return handleVectorLoadIntrinsic(I);
3355 }
3356
3357 if (I.doesNotAccessMemory())
3358 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3359 return true;
3360
3361 // FIXME: detect and handle SSE maskstore/maskload?
3362 // Some cases are now handled in handleAVXMasked{Load,Store}.
3363 return false;
3364 }
3365
3366 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3367 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3369 dumpInst(I);
3370
3371 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3372 << "\n");
3373 return true;
3374 } else
3375 return false;
3376 }
3377
3378 void handleInvariantGroup(IntrinsicInst &I) {
3379 setShadow(&I, getShadow(&I, 0));
3380 setOrigin(&I, getOrigin(&I, 0));
3381 }
3382
3383 void handleLifetimeStart(IntrinsicInst &I) {
3384 if (!PoisonStack)
3385 return;
3386 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3387 if (AI)
3388 LifetimeStartList.push_back(std::make_pair(&I, AI));
3389 }
3390
3391 void handleBswap(IntrinsicInst &I) {
3392 IRBuilder<> IRB(&I);
3393 Value *Op = I.getArgOperand(0);
3394 Type *OpType = Op->getType();
3395 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3396 getShadow(Op)));
3397 setOrigin(&I, getOrigin(Op));
3398 }
3399
3400 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3401 // and a 1. If the input is all zero, it is fully initialized iff
3402 // !is_zero_poison.
3403 //
3404 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3405 // concrete value 0/1, and ? is an uninitialized bit:
3406 // - 0001 0??? is fully initialized
3407 // - 000? ???? is fully uninitialized (*)
3408 // - ???? ???? is fully uninitialized
3409 // - 0000 0000 is fully uninitialized if is_zero_poison,
3410 // fully initialized otherwise
3411 //
3412 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3413 // only need to poison 4 bits.
3414 //
3415 // OutputShadow =
3416 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3417 // || (is_zero_poison && AllZeroSrc)
3418 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3419 IRBuilder<> IRB(&I);
3420 Value *Src = I.getArgOperand(0);
3421 Value *SrcShadow = getShadow(Src);
3422
3423 Value *False = IRB.getInt1(false);
3424 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3425 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3426 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3427 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3428
3429 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3430 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3431
3432 Value *NotAllZeroShadow =
3433 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3434 Value *OutputShadow =
3435 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3436
3437 // If zero poison is requested, mix in with the shadow
3438 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3439 if (!IsZeroPoison->isZeroValue()) {
3440 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3441 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3442 }
3443
3444 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3445
3446 setShadow(&I, OutputShadow);
3447 setOriginForNaryOp(I);
3448 }
3449
3450 /// Handle Arm NEON vector convert intrinsics.
3451 ///
3452 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3453 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64(double)
3454 ///
3455 /// For x86 SSE vector convert intrinsics, see
3456 /// handleSSEVectorConvertIntrinsic().
3457 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I) {
3458 assert(I.arg_size() == 1);
3459
3460 IRBuilder<> IRB(&I);
3461 Value *S0 = getShadow(&I, 0);
3462
3463 /// For scalars:
3464 /// Since they are converting from floating-point to integer, the output is
3465 /// - fully uninitialized if *any* bit of the input is uninitialized
3466 /// - fully ininitialized if all bits of the input are ininitialized
3467 /// We apply the same principle on a per-field basis for vectors.
3468 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3469 getShadowTy(&I));
3470 setShadow(&I, OutShadow);
3471 setOriginForNaryOp(I);
3472 }
3473
3474 /// Some instructions have additional zero-elements in the return type
3475 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3476 ///
3477 /// This function will return a vector type with the same number of elements
3478 /// as the input, but same per-element width as the return value e.g.,
3479 /// <8 x i8>.
3480 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3481 assert(isa<FixedVectorType>(getShadowTy(&I)));
3482 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3483
3484 // TODO: generalize beyond 2x?
3485 if (ShadowType->getElementCount() ==
3486 cast<VectorType>(Src->getType())->getElementCount() * 2)
3487 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3488
3489 assert(ShadowType->getElementCount() ==
3490 cast<VectorType>(Src->getType())->getElementCount());
3491
3492 return ShadowType;
3493 }
3494
3495 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3496 /// to match the length of the shadow for the instruction.
3497 /// If scalar types of the vectors are different, it will use the type of the
3498 /// input vector.
3499 /// This is more type-safe than CreateShadowCast().
3500 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3501 IRBuilder<> IRB(&I);
3503 assert(isa<FixedVectorType>(I.getType()));
3504
3505 Value *FullShadow = getCleanShadow(&I);
3506 unsigned ShadowNumElems =
3507 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3508 unsigned FullShadowNumElems =
3509 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3510
3511 assert((ShadowNumElems == FullShadowNumElems) ||
3512 (ShadowNumElems * 2 == FullShadowNumElems));
3513
3514 if (ShadowNumElems == FullShadowNumElems) {
3515 FullShadow = Shadow;
3516 } else {
3517 // TODO: generalize beyond 2x?
3518 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3519 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3520
3521 // Append zeros
3522 FullShadow =
3523 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3524 }
3525
3526 return FullShadow;
3527 }
3528
3529 /// Handle x86 SSE vector conversion.
3530 ///
3531 /// e.g., single-precision to half-precision conversion:
3532 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3533 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3534 ///
3535 /// floating-point to integer:
3536 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3537 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3538 ///
3539 /// Note: if the output has more elements, they are zero-initialized (and
3540 /// therefore the shadow will also be initialized).
3541 ///
3542 /// This differs from handleSSEVectorConvertIntrinsic() because it
3543 /// propagates uninitialized shadow (instead of checking the shadow).
3544 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3545 bool HasRoundingMode) {
3546 if (HasRoundingMode) {
3547 assert(I.arg_size() == 2);
3548 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3549 assert(RoundingMode->getType()->isIntegerTy());
3550 } else {
3551 assert(I.arg_size() == 1);
3552 }
3553
3554 Value *Src = I.getArgOperand(0);
3555 assert(Src->getType()->isVectorTy());
3556
3557 // The return type might have more elements than the input.
3558 // Temporarily shrink the return type's number of elements.
3559 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3560
3561 IRBuilder<> IRB(&I);
3562 Value *S0 = getShadow(&I, 0);
3563
3564 /// For scalars:
3565 /// Since they are converting to and/or from floating-point, the output is:
3566 /// - fully uninitialized if *any* bit of the input is uninitialized
3567 /// - fully ininitialized if all bits of the input are ininitialized
3568 /// We apply the same principle on a per-field basis for vectors.
3569 Value *Shadow =
3570 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3571
3572 // The return type might have more elements than the input.
3573 // Extend the return type back to its original width if necessary.
3574 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3575
3576 setShadow(&I, FullShadow);
3577 setOriginForNaryOp(I);
3578 }
3579
3580 // Instrument x86 SSE vector convert intrinsic.
3581 //
3582 // This function instruments intrinsics like cvtsi2ss:
3583 // %Out = int_xxx_cvtyyy(%ConvertOp)
3584 // or
3585 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3586 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3587 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3588 // elements from \p CopyOp.
3589 // In most cases conversion involves floating-point value which may trigger a
3590 // hardware exception when not fully initialized. For this reason we require
3591 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3592 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3593 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3594 // return a fully initialized value.
3595 //
3596 // For Arm NEON vector convert intrinsics, see
3597 // handleNEONVectorConvertIntrinsic().
3598 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3599 bool HasRoundingMode = false) {
3600 IRBuilder<> IRB(&I);
3601 Value *CopyOp, *ConvertOp;
3602
3603 assert((!HasRoundingMode ||
3604 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3605 "Invalid rounding mode");
3606
3607 switch (I.arg_size() - HasRoundingMode) {
3608 case 2:
3609 CopyOp = I.getArgOperand(0);
3610 ConvertOp = I.getArgOperand(1);
3611 break;
3612 case 1:
3613 ConvertOp = I.getArgOperand(0);
3614 CopyOp = nullptr;
3615 break;
3616 default:
3617 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3618 }
3619
3620 // The first *NumUsedElements* elements of ConvertOp are converted to the
3621 // same number of output elements. The rest of the output is copied from
3622 // CopyOp, or (if not available) filled with zeroes.
3623 // Combine shadow for elements of ConvertOp that are used in this operation,
3624 // and insert a check.
3625 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3626 // int->any conversion.
3627 Value *ConvertShadow = getShadow(ConvertOp);
3628 Value *AggShadow = nullptr;
3629 if (ConvertOp->getType()->isVectorTy()) {
3630 AggShadow = IRB.CreateExtractElement(
3631 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3632 for (int i = 1; i < NumUsedElements; ++i) {
3633 Value *MoreShadow = IRB.CreateExtractElement(
3634 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3635 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3636 }
3637 } else {
3638 AggShadow = ConvertShadow;
3639 }
3640 assert(AggShadow->getType()->isIntegerTy());
3641 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3642
3643 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3644 // ConvertOp.
3645 if (CopyOp) {
3646 assert(CopyOp->getType() == I.getType());
3647 assert(CopyOp->getType()->isVectorTy());
3648 Value *ResultShadow = getShadow(CopyOp);
3649 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3650 for (int i = 0; i < NumUsedElements; ++i) {
3651 ResultShadow = IRB.CreateInsertElement(
3652 ResultShadow, ConstantInt::getNullValue(EltTy),
3653 ConstantInt::get(IRB.getInt32Ty(), i));
3654 }
3655 setShadow(&I, ResultShadow);
3656 setOrigin(&I, getOrigin(CopyOp));
3657 } else {
3658 setShadow(&I, getCleanShadow(&I));
3659 setOrigin(&I, getCleanOrigin());
3660 }
3661 }
3662
3663 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3664 // zeroes if it is zero, and all ones otherwise.
3665 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3666 if (S->getType()->isVectorTy())
3667 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3668 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3669 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3670 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3671 }
3672
3673 // Given a vector, extract its first element, and return all
3674 // zeroes if it is zero, and all ones otherwise.
3675 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3676 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3677 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3678 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3679 }
3680
3681 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3682 Type *T = S->getType();
3683 assert(T->isVectorTy());
3684 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3685 return IRB.CreateSExt(S2, T);
3686 }
3687
3688 // Instrument vector shift intrinsic.
3689 //
3690 // This function instruments intrinsics like int_x86_avx2_psll_w.
3691 // Intrinsic shifts %In by %ShiftSize bits.
3692 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3693 // size, and the rest is ignored. Behavior is defined even if shift size is
3694 // greater than register (or field) width.
3695 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3696 assert(I.arg_size() == 2);
3697 IRBuilder<> IRB(&I);
3698 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3699 // Otherwise perform the same shift on S1.
3700 Value *S1 = getShadow(&I, 0);
3701 Value *S2 = getShadow(&I, 1);
3702 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3703 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3704 Value *V1 = I.getOperand(0);
3705 Value *V2 = I.getOperand(1);
3706 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3707 {IRB.CreateBitCast(S1, V1->getType()), V2});
3708 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3709 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3710 setOriginForNaryOp(I);
3711 }
3712
3713 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3714 // vectors.
3715 Type *getMMXVectorTy(unsigned EltSizeInBits,
3716 unsigned X86_MMXSizeInBits = 64) {
3717 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3718 "Illegal MMX vector element size");
3719 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3720 X86_MMXSizeInBits / EltSizeInBits);
3721 }
3722
3723 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3724 // intrinsic.
3725 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3726 switch (id) {
3727 case Intrinsic::x86_sse2_packsswb_128:
3728 case Intrinsic::x86_sse2_packuswb_128:
3729 return Intrinsic::x86_sse2_packsswb_128;
3730
3731 case Intrinsic::x86_sse2_packssdw_128:
3732 case Intrinsic::x86_sse41_packusdw:
3733 return Intrinsic::x86_sse2_packssdw_128;
3734
3735 case Intrinsic::x86_avx2_packsswb:
3736 case Intrinsic::x86_avx2_packuswb:
3737 return Intrinsic::x86_avx2_packsswb;
3738
3739 case Intrinsic::x86_avx2_packssdw:
3740 case Intrinsic::x86_avx2_packusdw:
3741 return Intrinsic::x86_avx2_packssdw;
3742
3743 case Intrinsic::x86_mmx_packsswb:
3744 case Intrinsic::x86_mmx_packuswb:
3745 return Intrinsic::x86_mmx_packsswb;
3746
3747 case Intrinsic::x86_mmx_packssdw:
3748 return Intrinsic::x86_mmx_packssdw;
3749
3750 case Intrinsic::x86_avx512_packssdw_512:
3751 case Intrinsic::x86_avx512_packusdw_512:
3752 return Intrinsic::x86_avx512_packssdw_512;
3753
3754 case Intrinsic::x86_avx512_packsswb_512:
3755 case Intrinsic::x86_avx512_packuswb_512:
3756 return Intrinsic::x86_avx512_packsswb_512;
3757
3758 default:
3759 llvm_unreachable("unexpected intrinsic id");
3760 }
3761 }
3762
3763 // Instrument vector pack intrinsic.
3764 //
3765 // This function instruments intrinsics like x86_mmx_packsswb, that
3766 // packs elements of 2 input vectors into half as many bits with saturation.
3767 // Shadow is propagated with the signed variant of the same intrinsic applied
3768 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3769 // MMXEltSizeInBits is used only for x86mmx arguments.
3770 //
3771 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3772 void handleVectorPackIntrinsic(IntrinsicInst &I,
3773 unsigned MMXEltSizeInBits = 0) {
3774 assert(I.arg_size() == 2);
3775 IRBuilder<> IRB(&I);
3776 Value *S1 = getShadow(&I, 0);
3777 Value *S2 = getShadow(&I, 1);
3778 assert(S1->getType()->isVectorTy());
3779
3780 // SExt and ICmpNE below must apply to individual elements of input vectors.
3781 // In case of x86mmx arguments, cast them to appropriate vector types and
3782 // back.
3783 Type *T =
3784 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3785 if (MMXEltSizeInBits) {
3786 S1 = IRB.CreateBitCast(S1, T);
3787 S2 = IRB.CreateBitCast(S2, T);
3788 }
3789 Value *S1_ext =
3791 Value *S2_ext =
3793 if (MMXEltSizeInBits) {
3794 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3795 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3796 }
3797
3798 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3799 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3800 "_msprop_vector_pack");
3801 if (MMXEltSizeInBits)
3802 S = IRB.CreateBitCast(S, getShadowTy(&I));
3803 setShadow(&I, S);
3804 setOriginForNaryOp(I);
3805 }
3806
3807 // Convert `Mask` into `<n x i1>`.
3808 Constant *createDppMask(unsigned Width, unsigned Mask) {
3809 SmallVector<Constant *, 4> R(Width);
3810 for (auto &M : R) {
3811 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3812 Mask >>= 1;
3813 }
3814 return ConstantVector::get(R);
3815 }
3816
3817 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3818 // arg is poisoned, entire dot product is poisoned.
3819 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3820 unsigned DstMask) {
3821 const unsigned Width =
3822 cast<FixedVectorType>(S->getType())->getNumElements();
3823
3824 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3826 Value *SElem = IRB.CreateOrReduce(S);
3827 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3828 Value *DstMaskV = createDppMask(Width, DstMask);
3829
3830 return IRB.CreateSelect(
3831 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3832 }
3833
3834 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3835 //
3836 // 2 and 4 element versions produce single scalar of dot product, and then
3837 // puts it into elements of output vector, selected by 4 lowest bits of the
3838 // mask. Top 4 bits of the mask control which elements of input to use for dot
3839 // product.
3840 //
3841 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3842 // mask. According to the spec it just operates as 4 element version on first
3843 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3844 // output.
3845 void handleDppIntrinsic(IntrinsicInst &I) {
3846 IRBuilder<> IRB(&I);
3847
3848 Value *S0 = getShadow(&I, 0);
3849 Value *S1 = getShadow(&I, 1);
3850 Value *S = IRB.CreateOr(S0, S1);
3851
3852 const unsigned Width =
3853 cast<FixedVectorType>(S->getType())->getNumElements();
3854 assert(Width == 2 || Width == 4 || Width == 8);
3855
3856 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3857 const unsigned SrcMask = Mask >> 4;
3858 const unsigned DstMask = Mask & 0xf;
3859
3860 // Calculate shadow as `<n x i1>`.
3861 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3862 if (Width == 8) {
3863 // First 4 elements of shadow are already calculated. `makeDppShadow`
3864 // operats on 32 bit masks, so we can just shift masks, and repeat.
3865 SI1 = IRB.CreateOr(
3866 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3867 }
3868 // Extend to real size of shadow, poisoning either all or none bits of an
3869 // element.
3870 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3871
3872 setShadow(&I, S);
3873 setOriginForNaryOp(I);
3874 }
3875
3876 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3877 C = CreateAppToShadowCast(IRB, C);
3878 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3879 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3880 C = IRB.CreateAShr(C, ElSize - 1);
3881 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3882 return IRB.CreateTrunc(C, FVT);
3883 }
3884
3885 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3886 void handleBlendvIntrinsic(IntrinsicInst &I) {
3887 Value *C = I.getOperand(2);
3888 Value *T = I.getOperand(1);
3889 Value *F = I.getOperand(0);
3890
3891 Value *Sc = getShadow(&I, 2);
3892 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3893
3894 {
3895 IRBuilder<> IRB(&I);
3896 // Extract top bit from condition and its shadow.
3897 C = convertBlendvToSelectMask(IRB, C);
3898 Sc = convertBlendvToSelectMask(IRB, Sc);
3899
3900 setShadow(C, Sc);
3901 setOrigin(C, Oc);
3902 }
3903
3904 handleSelectLikeInst(I, C, T, F);
3905 }
3906
3907 // Instrument sum-of-absolute-differences intrinsic.
3908 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3909 const unsigned SignificantBitsPerResultElement = 16;
3910 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3911 unsigned ZeroBitsPerResultElement =
3912 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3913
3914 IRBuilder<> IRB(&I);
3915 auto *Shadow0 = getShadow(&I, 0);
3916 auto *Shadow1 = getShadow(&I, 1);
3917 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3918 S = IRB.CreateBitCast(S, ResTy);
3919 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3920 ResTy);
3921 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3922 S = IRB.CreateBitCast(S, getShadowTy(&I));
3923 setShadow(&I, S);
3924 setOriginForNaryOp(I);
3925 }
3926
3927 // Instrument multiply-add(-accumulate)? intrinsics.
3928 //
3929 // e.g., Two operands:
3930 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3931 //
3932 // Two operands which require an EltSizeInBits override:
3933 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3934 //
3935 // Three operands:
3936 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3937 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3938 // (this is equivalent to multiply-add on %a and %b, followed by
3939 // adding/"accumulating" %s. "Accumulation" stores the result in one
3940 // of the source registers, but this accumulate vs. add distinction
3941 // is lost when dealing with LLVM intrinsics.)
3942 //
3943 // ZeroPurifies means that multiplying a known-zero with an uninitialized
3944 // value results in an initialized value. This is applicable for integer
3945 // multiplication, but not floating-point (counter-example: NaN).
3946 void handleVectorPmaddIntrinsic(IntrinsicInst &I, unsigned ReductionFactor,
3947 bool ZeroPurifies,
3948 unsigned EltSizeInBits = 0) {
3949 IRBuilder<> IRB(&I);
3950
3951 [[maybe_unused]] FixedVectorType *ReturnType =
3952 cast<FixedVectorType>(I.getType());
3953 assert(isa<FixedVectorType>(ReturnType));
3954
3955 // Vectors A and B, and shadows
3956 Value *Va = nullptr;
3957 Value *Vb = nullptr;
3958 Value *Sa = nullptr;
3959 Value *Sb = nullptr;
3960
3961 assert(I.arg_size() == 2 || I.arg_size() == 3);
3962 if (I.arg_size() == 2) {
3963 Va = I.getOperand(0);
3964 Vb = I.getOperand(1);
3965
3966 Sa = getShadow(&I, 0);
3967 Sb = getShadow(&I, 1);
3968 } else if (I.arg_size() == 3) {
3969 // Operand 0 is the accumulator. We will deal with that below.
3970 Va = I.getOperand(1);
3971 Vb = I.getOperand(2);
3972
3973 Sa = getShadow(&I, 1);
3974 Sb = getShadow(&I, 2);
3975 }
3976
3977 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3978 assert(ParamType == Vb->getType());
3979
3980 assert(ParamType->getPrimitiveSizeInBits() ==
3981 ReturnType->getPrimitiveSizeInBits());
3982
3983 if (I.arg_size() == 3) {
3984 [[maybe_unused]] auto *AccumulatorType =
3985 cast<FixedVectorType>(I.getOperand(0)->getType());
3986 assert(AccumulatorType == ReturnType);
3987 }
3988
3989 FixedVectorType *ImplicitReturnType =
3990 cast<FixedVectorType>(getShadowTy(ReturnType));
3991 // Step 1: instrument multiplication of corresponding vector elements
3992 if (EltSizeInBits) {
3993 ImplicitReturnType = cast<FixedVectorType>(
3994 getMMXVectorTy(EltSizeInBits * ReductionFactor,
3995 ParamType->getPrimitiveSizeInBits()));
3996 ParamType = cast<FixedVectorType>(
3997 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
3998
3999 Va = IRB.CreateBitCast(Va, ParamType);
4000 Vb = IRB.CreateBitCast(Vb, ParamType);
4001
4002 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
4003 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
4004 } else {
4005 assert(ParamType->getNumElements() ==
4006 ReturnType->getNumElements() * ReductionFactor);
4007 }
4008
4009 // Each element of the vector is represented by a single bit (poisoned or
4010 // not) e.g., <8 x i1>.
4011 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
4012 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
4013 Value *And;
4014 if (ZeroPurifies) {
4015 // Multiplying an *initialized* zero by an uninitialized element results
4016 // in an initialized zero element.
4017 //
4018 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4019 // results in an unpoisoned value.
4020 Value *VaInt = Va;
4021 Value *VbInt = Vb;
4022 if (!Va->getType()->isIntegerTy()) {
4023 VaInt = CreateAppToShadowCast(IRB, Va);
4024 VbInt = CreateAppToShadowCast(IRB, Vb);
4025 }
4026
4027 // We check for non-zero on a per-element basis, not per-bit.
4028 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4029 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4030
4031 And = handleBitwiseAnd(IRB, VaNonZero, VbNonZero, SaNonZero, SbNonZero);
4032 } else {
4033 And = IRB.CreateOr({SaNonZero, SbNonZero});
4034 }
4035
4036 // Extend <8 x i1> to <8 x i16>.
4037 // (The real pmadd intrinsic would have computed intermediate values of
4038 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4039 // consider each element to be either fully initialized or fully
4040 // uninitialized.)
4041 And = IRB.CreateSExt(And, Sa->getType());
4042
4043 // Step 2: instrument horizontal add
4044 // We don't need bit-precise horizontalReduce because we only want to check
4045 // if each pair/quad of elements is fully zero.
4046 // Cast to <4 x i32>.
4047 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4048
4049 // Compute <4 x i1>, then extend back to <4 x i32>.
4050 Value *OutShadow = IRB.CreateSExt(
4051 IRB.CreateICmpNE(Horizontal,
4052 Constant::getNullValue(Horizontal->getType())),
4053 ImplicitReturnType);
4054
4055 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4056 // AVX, it is already correct).
4057 if (EltSizeInBits)
4058 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4059
4060 // Step 3 (if applicable): instrument accumulator
4061 if (I.arg_size() == 3)
4062 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4063
4064 setShadow(&I, OutShadow);
4065 setOriginForNaryOp(I);
4066 }
4067
4068 // Instrument compare-packed intrinsic.
4069 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4070 // all-ones shadow.
4071 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
4072 IRBuilder<> IRB(&I);
4073 Type *ResTy = getShadowTy(&I);
4074 auto *Shadow0 = getShadow(&I, 0);
4075 auto *Shadow1 = getShadow(&I, 1);
4076 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4077 Value *S = IRB.CreateSExt(
4078 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4079 setShadow(&I, S);
4080 setOriginForNaryOp(I);
4081 }
4082
4083 // Instrument compare-scalar intrinsic.
4084 // This handles both cmp* intrinsics which return the result in the first
4085 // element of a vector, and comi* which return the result as i32.
4086 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4087 IRBuilder<> IRB(&I);
4088 auto *Shadow0 = getShadow(&I, 0);
4089 auto *Shadow1 = getShadow(&I, 1);
4090 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4091 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4092 setShadow(&I, S);
4093 setOriginForNaryOp(I);
4094 }
4095
4096 // Instrument generic vector reduction intrinsics
4097 // by ORing together all their fields.
4098 //
4099 // If AllowShadowCast is true, the return type does not need to be the same
4100 // type as the fields
4101 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4102 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4103 assert(I.arg_size() == 1);
4104
4105 IRBuilder<> IRB(&I);
4106 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4107 if (AllowShadowCast)
4108 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4109 else
4110 assert(S->getType() == getShadowTy(&I));
4111 setShadow(&I, S);
4112 setOriginForNaryOp(I);
4113 }
4114
4115 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4116 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4117 // %a1)
4118 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4119 //
4120 // The type of the return value, initial starting value, and elements of the
4121 // vector must be identical.
4122 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4123 assert(I.arg_size() == 2);
4124
4125 IRBuilder<> IRB(&I);
4126 Value *Shadow0 = getShadow(&I, 0);
4127 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4128 assert(Shadow0->getType() == Shadow1->getType());
4129 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4130 assert(S->getType() == getShadowTy(&I));
4131 setShadow(&I, S);
4132 setOriginForNaryOp(I);
4133 }
4134
4135 // Instrument vector.reduce.or intrinsic.
4136 // Valid (non-poisoned) set bits in the operand pull low the
4137 // corresponding shadow bits.
4138 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4139 assert(I.arg_size() == 1);
4140
4141 IRBuilder<> IRB(&I);
4142 Value *OperandShadow = getShadow(&I, 0);
4143 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4144 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4145 // Bit N is clean if any field's bit N is 1 and unpoison
4146 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4147 // Otherwise, it is clean if every field's bit N is unpoison
4148 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4149 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4150
4151 setShadow(&I, S);
4152 setOrigin(&I, getOrigin(&I, 0));
4153 }
4154
4155 // Instrument vector.reduce.and intrinsic.
4156 // Valid (non-poisoned) unset bits in the operand pull down the
4157 // corresponding shadow bits.
4158 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4159 assert(I.arg_size() == 1);
4160
4161 IRBuilder<> IRB(&I);
4162 Value *OperandShadow = getShadow(&I, 0);
4163 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4164 // Bit N is clean if any field's bit N is 0 and unpoison
4165 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4166 // Otherwise, it is clean if every field's bit N is unpoison
4167 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4168 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4169
4170 setShadow(&I, S);
4171 setOrigin(&I, getOrigin(&I, 0));
4172 }
4173
4174 void handleStmxcsr(IntrinsicInst &I) {
4175 IRBuilder<> IRB(&I);
4176 Value *Addr = I.getArgOperand(0);
4177 Type *Ty = IRB.getInt32Ty();
4178 Value *ShadowPtr =
4179 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4180
4181 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4182
4184 insertCheckShadowOf(Addr, &I);
4185 }
4186
4187 void handleLdmxcsr(IntrinsicInst &I) {
4188 if (!InsertChecks)
4189 return;
4190
4191 IRBuilder<> IRB(&I);
4192 Value *Addr = I.getArgOperand(0);
4193 Type *Ty = IRB.getInt32Ty();
4194 const Align Alignment = Align(1);
4195 Value *ShadowPtr, *OriginPtr;
4196 std::tie(ShadowPtr, OriginPtr) =
4197 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4198
4200 insertCheckShadowOf(Addr, &I);
4201
4202 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4203 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4204 : getCleanOrigin();
4205 insertCheckShadow(Shadow, Origin, &I);
4206 }
4207
4208 void handleMaskedExpandLoad(IntrinsicInst &I) {
4209 IRBuilder<> IRB(&I);
4210 Value *Ptr = I.getArgOperand(0);
4211 MaybeAlign Align = I.getParamAlign(0);
4212 Value *Mask = I.getArgOperand(1);
4213 Value *PassThru = I.getArgOperand(2);
4214
4216 insertCheckShadowOf(Ptr, &I);
4217 insertCheckShadowOf(Mask, &I);
4218 }
4219
4220 if (!PropagateShadow) {
4221 setShadow(&I, getCleanShadow(&I));
4222 setOrigin(&I, getCleanOrigin());
4223 return;
4224 }
4225
4226 Type *ShadowTy = getShadowTy(&I);
4227 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4228 auto [ShadowPtr, OriginPtr] =
4229 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4230
4231 Value *Shadow =
4232 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4233 getShadow(PassThru), "_msmaskedexpload");
4234
4235 setShadow(&I, Shadow);
4236
4237 // TODO: Store origins.
4238 setOrigin(&I, getCleanOrigin());
4239 }
4240
4241 void handleMaskedCompressStore(IntrinsicInst &I) {
4242 IRBuilder<> IRB(&I);
4243 Value *Values = I.getArgOperand(0);
4244 Value *Ptr = I.getArgOperand(1);
4245 MaybeAlign Align = I.getParamAlign(1);
4246 Value *Mask = I.getArgOperand(2);
4247
4249 insertCheckShadowOf(Ptr, &I);
4250 insertCheckShadowOf(Mask, &I);
4251 }
4252
4253 Value *Shadow = getShadow(Values);
4254 Type *ElementShadowTy =
4255 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4256 auto [ShadowPtr, OriginPtrs] =
4257 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4258
4259 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4260
4261 // TODO: Store origins.
4262 }
4263
4264 void handleMaskedGather(IntrinsicInst &I) {
4265 IRBuilder<> IRB(&I);
4266 Value *Ptrs = I.getArgOperand(0);
4267 const Align Alignment = I.getParamAlign(0).valueOrOne();
4268 Value *Mask = I.getArgOperand(1);
4269 Value *PassThru = I.getArgOperand(2);
4270
4271 Type *PtrsShadowTy = getShadowTy(Ptrs);
4273 insertCheckShadowOf(Mask, &I);
4274 Value *MaskedPtrShadow = IRB.CreateSelect(
4275 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4276 "_msmaskedptrs");
4277 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4278 }
4279
4280 if (!PropagateShadow) {
4281 setShadow(&I, getCleanShadow(&I));
4282 setOrigin(&I, getCleanOrigin());
4283 return;
4284 }
4285
4286 Type *ShadowTy = getShadowTy(&I);
4287 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4288 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4289 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4290
4291 Value *Shadow =
4292 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4293 getShadow(PassThru), "_msmaskedgather");
4294
4295 setShadow(&I, Shadow);
4296
4297 // TODO: Store origins.
4298 setOrigin(&I, getCleanOrigin());
4299 }
4300
4301 void handleMaskedScatter(IntrinsicInst &I) {
4302 IRBuilder<> IRB(&I);
4303 Value *Values = I.getArgOperand(0);
4304 Value *Ptrs = I.getArgOperand(1);
4305 const Align Alignment = I.getParamAlign(1).valueOrOne();
4306 Value *Mask = I.getArgOperand(2);
4307
4308 Type *PtrsShadowTy = getShadowTy(Ptrs);
4310 insertCheckShadowOf(Mask, &I);
4311 Value *MaskedPtrShadow = IRB.CreateSelect(
4312 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4313 "_msmaskedptrs");
4314 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4315 }
4316
4317 Value *Shadow = getShadow(Values);
4318 Type *ElementShadowTy =
4319 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4320 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4321 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4322
4323 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4324
4325 // TODO: Store origin.
4326 }
4327
4328 // Intrinsic::masked_store
4329 //
4330 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4331 // stores are lowered to Intrinsic::masked_store.
4332 void handleMaskedStore(IntrinsicInst &I) {
4333 IRBuilder<> IRB(&I);
4334 Value *V = I.getArgOperand(0);
4335 Value *Ptr = I.getArgOperand(1);
4336 const Align Alignment = I.getParamAlign(1).valueOrOne();
4337 Value *Mask = I.getArgOperand(2);
4338 Value *Shadow = getShadow(V);
4339
4341 insertCheckShadowOf(Ptr, &I);
4342 insertCheckShadowOf(Mask, &I);
4343 }
4344
4345 Value *ShadowPtr;
4346 Value *OriginPtr;
4347 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4348 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4349
4350 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4351
4352 if (!MS.TrackOrigins)
4353 return;
4354
4355 auto &DL = F.getDataLayout();
4356 paintOrigin(IRB, getOrigin(V), OriginPtr,
4357 DL.getTypeStoreSize(Shadow->getType()),
4358 std::max(Alignment, kMinOriginAlignment));
4359 }
4360
4361 // Intrinsic::masked_load
4362 //
4363 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4364 // loads are lowered to Intrinsic::masked_load.
4365 void handleMaskedLoad(IntrinsicInst &I) {
4366 IRBuilder<> IRB(&I);
4367 Value *Ptr = I.getArgOperand(0);
4368 const Align Alignment = I.getParamAlign(0).valueOrOne();
4369 Value *Mask = I.getArgOperand(1);
4370 Value *PassThru = I.getArgOperand(2);
4371
4373 insertCheckShadowOf(Ptr, &I);
4374 insertCheckShadowOf(Mask, &I);
4375 }
4376
4377 if (!PropagateShadow) {
4378 setShadow(&I, getCleanShadow(&I));
4379 setOrigin(&I, getCleanOrigin());
4380 return;
4381 }
4382
4383 Type *ShadowTy = getShadowTy(&I);
4384 Value *ShadowPtr, *OriginPtr;
4385 std::tie(ShadowPtr, OriginPtr) =
4386 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4387 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4388 getShadow(PassThru), "_msmaskedld"));
4389
4390 if (!MS.TrackOrigins)
4391 return;
4392
4393 // Choose between PassThru's and the loaded value's origins.
4394 Value *MaskedPassThruShadow = IRB.CreateAnd(
4395 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4396
4397 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4398
4399 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4400 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4401
4402 setOrigin(&I, Origin);
4403 }
4404
4405 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4406 // dst mask src
4407 //
4408 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4409 // by handleMaskedStore.
4410 //
4411 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4412 // vector of integers, unlike the LLVM masked intrinsics, which require a
4413 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4414 // mentions that the x86 backend does not know how to efficiently convert
4415 // from a vector of booleans back into the AVX mask format; therefore, they
4416 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4417 // intrinsics.
4418 void handleAVXMaskedStore(IntrinsicInst &I) {
4419 assert(I.arg_size() == 3);
4420
4421 IRBuilder<> IRB(&I);
4422
4423 Value *Dst = I.getArgOperand(0);
4424 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4425
4426 Value *Mask = I.getArgOperand(1);
4427 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4428
4429 Value *Src = I.getArgOperand(2);
4430 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4431
4432 const Align Alignment = Align(1);
4433
4434 Value *SrcShadow = getShadow(Src);
4435
4437 insertCheckShadowOf(Dst, &I);
4438 insertCheckShadowOf(Mask, &I);
4439 }
4440
4441 Value *DstShadowPtr;
4442 Value *DstOriginPtr;
4443 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4444 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4445
4446 SmallVector<Value *, 2> ShadowArgs;
4447 ShadowArgs.append(1, DstShadowPtr);
4448 ShadowArgs.append(1, Mask);
4449 // The intrinsic may require floating-point but shadows can be arbitrary
4450 // bit patterns, of which some would be interpreted as "invalid"
4451 // floating-point values (NaN etc.); we assume the intrinsic will happily
4452 // copy them.
4453 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4454
4455 CallInst *CI =
4456 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4457 setShadow(&I, CI);
4458
4459 if (!MS.TrackOrigins)
4460 return;
4461
4462 // Approximation only
4463 auto &DL = F.getDataLayout();
4464 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4465 DL.getTypeStoreSize(SrcShadow->getType()),
4466 std::max(Alignment, kMinOriginAlignment));
4467 }
4468
4469 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4470 // return src mask
4471 //
4472 // Masked-off values are replaced with 0, which conveniently also represents
4473 // initialized memory.
4474 //
4475 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4476 // by handleMaskedStore.
4477 //
4478 // We do not combine this with handleMaskedLoad; see comment in
4479 // handleAVXMaskedStore for the rationale.
4480 //
4481 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4482 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4483 // parameter.
4484 void handleAVXMaskedLoad(IntrinsicInst &I) {
4485 assert(I.arg_size() == 2);
4486
4487 IRBuilder<> IRB(&I);
4488
4489 Value *Src = I.getArgOperand(0);
4490 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4491
4492 Value *Mask = I.getArgOperand(1);
4493 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4494
4495 const Align Alignment = Align(1);
4496
4498 insertCheckShadowOf(Mask, &I);
4499 }
4500
4501 Type *SrcShadowTy = getShadowTy(Src);
4502 Value *SrcShadowPtr, *SrcOriginPtr;
4503 std::tie(SrcShadowPtr, SrcOriginPtr) =
4504 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4505
4506 SmallVector<Value *, 2> ShadowArgs;
4507 ShadowArgs.append(1, SrcShadowPtr);
4508 ShadowArgs.append(1, Mask);
4509
4510 CallInst *CI =
4511 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4512 // The AVX masked load intrinsics do not have integer variants. We use the
4513 // floating-point variants, which will happily copy the shadows even if
4514 // they are interpreted as "invalid" floating-point values (NaN etc.).
4515 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4516
4517 if (!MS.TrackOrigins)
4518 return;
4519
4520 // The "pass-through" value is always zero (initialized). To the extent
4521 // that that results in initialized aligned 4-byte chunks, the origin value
4522 // is ignored. It is therefore correct to simply copy the origin from src.
4523 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4524 setOrigin(&I, PtrSrcOrigin);
4525 }
4526
4527 // Test whether the mask indices are initialized, only checking the bits that
4528 // are actually used.
4529 //
4530 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4531 // used/checked.
4532 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4533 assert(isFixedIntVector(Idx));
4534 auto IdxVectorSize =
4535 cast<FixedVectorType>(Idx->getType())->getNumElements();
4536 assert(isPowerOf2_64(IdxVectorSize));
4537
4538 // Compiler isn't smart enough, let's help it
4539 if (isa<Constant>(Idx))
4540 return;
4541
4542 auto *IdxShadow = getShadow(Idx);
4543 Value *Truncated = IRB.CreateTrunc(
4544 IdxShadow,
4545 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4546 IdxVectorSize));
4547 insertCheckShadow(Truncated, getOrigin(Idx), I);
4548 }
4549
4550 // Instrument AVX permutation intrinsic.
4551 // We apply the same permutation (argument index 1) to the shadow.
4552 void handleAVXVpermilvar(IntrinsicInst &I) {
4553 IRBuilder<> IRB(&I);
4554 Value *Shadow = getShadow(&I, 0);
4555 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4556
4557 // Shadows are integer-ish types but some intrinsics require a
4558 // different (e.g., floating-point) type.
4559 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4560 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4561 {Shadow, I.getArgOperand(1)});
4562
4563 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4564 setOriginForNaryOp(I);
4565 }
4566
4567 // Instrument AVX permutation intrinsic.
4568 // We apply the same permutation (argument index 1) to the shadows.
4569 void handleAVXVpermi2var(IntrinsicInst &I) {
4570 assert(I.arg_size() == 3);
4571 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4572 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4573 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4574 [[maybe_unused]] auto ArgVectorSize =
4575 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4576 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4577 ->getNumElements() == ArgVectorSize);
4578 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4579 ->getNumElements() == ArgVectorSize);
4580 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4581 assert(I.getType() == I.getArgOperand(0)->getType());
4582 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4583 IRBuilder<> IRB(&I);
4584 Value *AShadow = getShadow(&I, 0);
4585 Value *Idx = I.getArgOperand(1);
4586 Value *BShadow = getShadow(&I, 2);
4587
4588 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4589
4590 // Shadows are integer-ish types but some intrinsics require a
4591 // different (e.g., floating-point) type.
4592 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4593 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4594 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4595 {AShadow, Idx, BShadow});
4596 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4597 setOriginForNaryOp(I);
4598 }
4599
4600 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4601 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4602 }
4603
4604 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4605 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4606 }
4607
4608 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4609 return isFixedIntVectorTy(V->getType());
4610 }
4611
4612 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4613 return isFixedFPVectorTy(V->getType());
4614 }
4615
4616 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4617 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4618 // i32 rounding)
4619 //
4620 // Inconveniently, some similar intrinsics have a different operand order:
4621 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4622 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4623 // i16 mask)
4624 //
4625 // If the return type has more elements than A, the excess elements are
4626 // zeroed (and the corresponding shadow is initialized).
4627 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4628 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4629 // i8 mask)
4630 //
4631 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4632 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4633 // where all_or_nothing(x) is fully uninitialized if x has any
4634 // uninitialized bits
4635 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4636 IRBuilder<> IRB(&I);
4637
4638 assert(I.arg_size() == 4);
4639 Value *A = I.getOperand(0);
4640 Value *WriteThrough;
4641 Value *Mask;
4643 if (LastMask) {
4644 WriteThrough = I.getOperand(2);
4645 Mask = I.getOperand(3);
4646 RoundingMode = I.getOperand(1);
4647 } else {
4648 WriteThrough = I.getOperand(1);
4649 Mask = I.getOperand(2);
4650 RoundingMode = I.getOperand(3);
4651 }
4652
4653 assert(isFixedFPVector(A));
4654 assert(isFixedIntVector(WriteThrough));
4655
4656 unsigned ANumElements =
4657 cast<FixedVectorType>(A->getType())->getNumElements();
4658 [[maybe_unused]] unsigned WriteThruNumElements =
4659 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4660 assert(ANumElements == WriteThruNumElements ||
4661 ANumElements * 2 == WriteThruNumElements);
4662
4663 assert(Mask->getType()->isIntegerTy());
4664 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4665 assert(ANumElements == MaskNumElements ||
4666 ANumElements * 2 == MaskNumElements);
4667
4668 assert(WriteThruNumElements == MaskNumElements);
4669
4670 // Some bits of the mask may be unused, though it's unusual to have partly
4671 // uninitialized bits.
4672 insertCheckShadowOf(Mask, &I);
4673
4674 assert(RoundingMode->getType()->isIntegerTy());
4675 // Only some bits of the rounding mode are used, though it's very
4676 // unusual to have uninitialized bits there (more commonly, it's a
4677 // constant).
4678 insertCheckShadowOf(RoundingMode, &I);
4679
4680 assert(I.getType() == WriteThrough->getType());
4681
4682 Value *AShadow = getShadow(A);
4683 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4684
4685 if (ANumElements * 2 == MaskNumElements) {
4686 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4687 // from the zeroed shadow instead of the writethrough's shadow.
4688 Mask =
4689 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4690 Mask =
4691 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4692 }
4693
4694 // Convert i16 mask to <16 x i1>
4695 Mask = IRB.CreateBitCast(
4696 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4697 "_ms_mask_bitcast");
4698
4699 /// For floating-point to integer conversion, the output is:
4700 /// - fully uninitialized if *any* bit of the input is uninitialized
4701 /// - fully ininitialized if all bits of the input are ininitialized
4702 /// We apply the same principle on a per-element basis for vectors.
4703 ///
4704 /// We use the scalar width of the return type instead of A's.
4705 AShadow = IRB.CreateSExt(
4706 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4707 getShadowTy(&I), "_ms_a_shadow");
4708
4709 Value *WriteThroughShadow = getShadow(WriteThrough);
4710 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4711 "_ms_writethru_select");
4712
4713 setShadow(&I, Shadow);
4714 setOriginForNaryOp(I);
4715 }
4716
4717 // Instrument BMI / BMI2 intrinsics.
4718 // All of these intrinsics are Z = I(X, Y)
4719 // where the types of all operands and the result match, and are either i32 or
4720 // i64. The following instrumentation happens to work for all of them:
4721 // Sz = I(Sx, Y) | (sext (Sy != 0))
4722 void handleBmiIntrinsic(IntrinsicInst &I) {
4723 IRBuilder<> IRB(&I);
4724 Type *ShadowTy = getShadowTy(&I);
4725
4726 // If any bit of the mask operand is poisoned, then the whole thing is.
4727 Value *SMask = getShadow(&I, 1);
4728 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4729 ShadowTy);
4730 // Apply the same intrinsic to the shadow of the first operand.
4731 Value *S = IRB.CreateCall(I.getCalledFunction(),
4732 {getShadow(&I, 0), I.getOperand(1)});
4733 S = IRB.CreateOr(SMask, S);
4734 setShadow(&I, S);
4735 setOriginForNaryOp(I);
4736 }
4737
4738 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4739 SmallVector<int, 8> Mask;
4740 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4741 Mask.append(2, X);
4742 }
4743 return Mask;
4744 }
4745
4746 // Instrument pclmul intrinsics.
4747 // These intrinsics operate either on odd or on even elements of the input
4748 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4749 // Replace the unused elements with copies of the used ones, ex:
4750 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4751 // or
4752 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4753 // and then apply the usual shadow combining logic.
4754 void handlePclmulIntrinsic(IntrinsicInst &I) {
4755 IRBuilder<> IRB(&I);
4756 unsigned Width =
4757 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4758 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4759 "pclmul 3rd operand must be a constant");
4760 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4761 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4762 getPclmulMask(Width, Imm & 0x01));
4763 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4764 getPclmulMask(Width, Imm & 0x10));
4765 ShadowAndOriginCombiner SOC(this, IRB);
4766 SOC.Add(Shuf0, getOrigin(&I, 0));
4767 SOC.Add(Shuf1, getOrigin(&I, 1));
4768 SOC.Done(&I);
4769 }
4770
4771 // Instrument _mm_*_sd|ss intrinsics
4772 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4773 IRBuilder<> IRB(&I);
4774 unsigned Width =
4775 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4776 Value *First = getShadow(&I, 0);
4777 Value *Second = getShadow(&I, 1);
4778 // First element of second operand, remaining elements of first operand
4779 SmallVector<int, 16> Mask;
4780 Mask.push_back(Width);
4781 for (unsigned i = 1; i < Width; i++)
4782 Mask.push_back(i);
4783 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4784
4785 setShadow(&I, Shadow);
4786 setOriginForNaryOp(I);
4787 }
4788
4789 void handleVtestIntrinsic(IntrinsicInst &I) {
4790 IRBuilder<> IRB(&I);
4791 Value *Shadow0 = getShadow(&I, 0);
4792 Value *Shadow1 = getShadow(&I, 1);
4793 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4794 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4795 Value *Scalar = convertShadowToScalar(NZ, IRB);
4796 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4797
4798 setShadow(&I, Shadow);
4799 setOriginForNaryOp(I);
4800 }
4801
4802 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4803 IRBuilder<> IRB(&I);
4804 unsigned Width =
4805 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4806 Value *First = getShadow(&I, 0);
4807 Value *Second = getShadow(&I, 1);
4808 Value *OrShadow = IRB.CreateOr(First, Second);
4809 // First element of both OR'd together, remaining elements of first operand
4810 SmallVector<int, 16> Mask;
4811 Mask.push_back(Width);
4812 for (unsigned i = 1; i < Width; i++)
4813 Mask.push_back(i);
4814 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4815
4816 setShadow(&I, Shadow);
4817 setOriginForNaryOp(I);
4818 }
4819
4820 // _mm_round_ps / _mm_round_ps.
4821 // Similar to maybeHandleSimpleNomemIntrinsic except
4822 // the second argument is guaranteed to be a constant integer.
4823 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4824 assert(I.getArgOperand(0)->getType() == I.getType());
4825 assert(I.arg_size() == 2);
4826 assert(isa<ConstantInt>(I.getArgOperand(1)));
4827
4828 IRBuilder<> IRB(&I);
4829 ShadowAndOriginCombiner SC(this, IRB);
4830 SC.Add(I.getArgOperand(0));
4831 SC.Done(&I);
4832 }
4833
4834 // Instrument @llvm.abs intrinsic.
4835 //
4836 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4837 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4838 void handleAbsIntrinsic(IntrinsicInst &I) {
4839 assert(I.arg_size() == 2);
4840 Value *Src = I.getArgOperand(0);
4841 Value *IsIntMinPoison = I.getArgOperand(1);
4842
4843 assert(I.getType()->isIntOrIntVectorTy());
4844
4845 assert(Src->getType() == I.getType());
4846
4847 assert(IsIntMinPoison->getType()->isIntegerTy());
4848 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4849
4850 IRBuilder<> IRB(&I);
4851 Value *SrcShadow = getShadow(Src);
4852
4853 APInt MinVal =
4854 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4855 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4856 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4857
4858 Value *PoisonedShadow = getPoisonedShadow(Src);
4859 Value *PoisonedIfIntMinShadow =
4860 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4861 Value *Shadow =
4862 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4863
4864 setShadow(&I, Shadow);
4865 setOrigin(&I, getOrigin(&I, 0));
4866 }
4867
4868 void handleIsFpClass(IntrinsicInst &I) {
4869 IRBuilder<> IRB(&I);
4870 Value *Shadow = getShadow(&I, 0);
4871 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4872 setOrigin(&I, getOrigin(&I, 0));
4873 }
4874
4875 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4876 IRBuilder<> IRB(&I);
4877 Value *Shadow0 = getShadow(&I, 0);
4878 Value *Shadow1 = getShadow(&I, 1);
4879 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4880 Value *ShadowElt1 =
4881 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4882
4883 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4884 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4885 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4886
4887 setShadow(&I, Shadow);
4888 setOriginForNaryOp(I);
4889 }
4890
4891 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4892 assert(isa<FixedVectorType>(V->getType()));
4893 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4894 Value *Shadow = getShadow(V);
4895 return IRB.CreateExtractElement(Shadow,
4896 ConstantInt::get(IRB.getInt32Ty(), 0));
4897 }
4898
4899 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4900 //
4901 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4902 // (<8 x i64>, <16 x i8>, i8)
4903 // A WriteThru Mask
4904 //
4905 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4906 // (<16 x i32>, <16 x i8>, i16)
4907 //
4908 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4909 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4910 //
4911 // If Dst has more elements than A, the excess elements are zeroed (and the
4912 // corresponding shadow is initialized).
4913 //
4914 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4915 // and is much faster than this handler.
4916 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4917 IRBuilder<> IRB(&I);
4918
4919 assert(I.arg_size() == 3);
4920 Value *A = I.getOperand(0);
4921 Value *WriteThrough = I.getOperand(1);
4922 Value *Mask = I.getOperand(2);
4923
4924 assert(isFixedIntVector(A));
4925 assert(isFixedIntVector(WriteThrough));
4926
4927 unsigned ANumElements =
4928 cast<FixedVectorType>(A->getType())->getNumElements();
4929 unsigned OutputNumElements =
4930 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4931 assert(ANumElements == OutputNumElements ||
4932 ANumElements * 2 == OutputNumElements);
4933
4934 assert(Mask->getType()->isIntegerTy());
4935 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4936 insertCheckShadowOf(Mask, &I);
4937
4938 assert(I.getType() == WriteThrough->getType());
4939
4940 // Widen the mask, if necessary, to have one bit per element of the output
4941 // vector.
4942 // We want the extra bits to have '1's, so that the CreateSelect will
4943 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4944 // versions of the intrinsics are sometimes implemented using an all-1's
4945 // mask and an undefined value for WriteThroughShadow). We accomplish this
4946 // by using bitwise NOT before and after the ZExt.
4947 if (ANumElements != OutputNumElements) {
4948 Mask = IRB.CreateNot(Mask);
4949 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4950 "_ms_widen_mask");
4951 Mask = IRB.CreateNot(Mask);
4952 }
4953 Mask = IRB.CreateBitCast(
4954 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4955
4956 Value *AShadow = getShadow(A);
4957
4958 // The return type might have more elements than the input.
4959 // Temporarily shrink the return type's number of elements.
4960 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4961
4962 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4963 // This handler treats them all as truncation, which leads to some rare
4964 // false positives in the cases where the truncated bytes could
4965 // unambiguously saturate the value e.g., if A = ??????10 ????????
4966 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4967 // fully defined, but the truncated byte is ????????.
4968 //
4969 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4970 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4971 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4972
4973 Value *WriteThroughShadow = getShadow(WriteThrough);
4974
4975 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4976 setShadow(&I, Shadow);
4977 setOriginForNaryOp(I);
4978 }
4979
4980 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4981 // values and perform an operation whose shadow propagation should be handled
4982 // as all-or-nothing [*], with masking provided by a vector and a mask
4983 // supplied as an integer.
4984 //
4985 // [*] if all bits of a vector element are initialized, the output is fully
4986 // initialized; otherwise, the output is fully uninitialized
4987 //
4988 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
4989 // (<16 x float>, <16 x float>, i16)
4990 // A WriteThru Mask
4991 //
4992 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
4993 // (<2 x double>, <2 x double>, i8)
4994 //
4995 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
4996 // (<8 x double>, i32, <8 x double>, i8, i32)
4997 // A Imm WriteThru Mask Rounding
4998 //
4999 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
5000 // be fully initialized.
5001 //
5002 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
5003 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
5004 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
5005 unsigned WriteThruIndex,
5006 unsigned MaskIndex) {
5007 IRBuilder<> IRB(&I);
5008
5009 unsigned NumArgs = I.arg_size();
5010 assert(AIndex < NumArgs);
5011 assert(WriteThruIndex < NumArgs);
5012 assert(MaskIndex < NumArgs);
5013 assert(AIndex != WriteThruIndex);
5014 assert(AIndex != MaskIndex);
5015 assert(WriteThruIndex != MaskIndex);
5016
5017 Value *A = I.getOperand(AIndex);
5018 Value *WriteThru = I.getOperand(WriteThruIndex);
5019 Value *Mask = I.getOperand(MaskIndex);
5020
5021 assert(isFixedFPVector(A));
5022 assert(isFixedFPVector(WriteThru));
5023
5024 [[maybe_unused]] unsigned ANumElements =
5025 cast<FixedVectorType>(A->getType())->getNumElements();
5026 unsigned OutputNumElements =
5027 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5028 assert(ANumElements == OutputNumElements);
5029
5030 for (unsigned i = 0; i < NumArgs; ++i) {
5031 if (i != AIndex && i != WriteThruIndex) {
5032 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5033 // they be fully initialized.
5034 assert(I.getOperand(i)->getType()->isIntegerTy());
5035 insertCheckShadowOf(I.getOperand(i), &I);
5036 }
5037 }
5038
5039 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5040 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
5041 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
5042 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5043
5044 assert(I.getType() == WriteThru->getType());
5045
5046 Mask = IRB.CreateBitCast(
5047 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5048
5049 Value *AShadow = getShadow(A);
5050
5051 // All-or-nothing shadow
5052 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
5053 AShadow->getType());
5054
5055 Value *WriteThruShadow = getShadow(WriteThru);
5056
5057 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
5058 setShadow(&I, Shadow);
5059
5060 setOriginForNaryOp(I);
5061 }
5062
5063 // For sh.* compiler intrinsics:
5064 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5065 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5066 // A B WriteThru Mask RoundingMode
5067 //
5068 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5069 // DstShadow[1..7] = AShadow[1..7]
5070 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5071 IRBuilder<> IRB(&I);
5072
5073 assert(I.arg_size() == 5);
5074 Value *A = I.getOperand(0);
5075 Value *B = I.getOperand(1);
5076 Value *WriteThrough = I.getOperand(2);
5077 Value *Mask = I.getOperand(3);
5078 Value *RoundingMode = I.getOperand(4);
5079
5080 // Technically, we could probably just check whether the LSB is
5081 // initialized, but intuitively it feels like a partly uninitialized mask
5082 // is unintended, and we should warn the user immediately.
5083 insertCheckShadowOf(Mask, &I);
5084 insertCheckShadowOf(RoundingMode, &I);
5085
5086 assert(isa<FixedVectorType>(A->getType()));
5087 unsigned NumElements =
5088 cast<FixedVectorType>(A->getType())->getNumElements();
5089 assert(NumElements == 8);
5090 assert(A->getType() == B->getType());
5091 assert(B->getType() == WriteThrough->getType());
5092 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5093 assert(RoundingMode->getType()->isIntegerTy());
5094
5095 Value *ALowerShadow = extractLowerShadow(IRB, A);
5096 Value *BLowerShadow = extractLowerShadow(IRB, B);
5097
5098 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5099
5100 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5101
5102 Mask = IRB.CreateBitCast(
5103 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5104 Value *MaskLower =
5105 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5106
5107 Value *AShadow = getShadow(A);
5108 Value *DstLowerShadow =
5109 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5110 Value *DstShadow = IRB.CreateInsertElement(
5111 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5112 "_msprop");
5113
5114 setShadow(&I, DstShadow);
5115 setOriginForNaryOp(I);
5116 }
5117
5118 // Approximately handle AVX Galois Field Affine Transformation
5119 //
5120 // e.g.,
5121 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5122 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5123 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5124 // Out A x b
5125 // where A and x are packed matrices, b is a vector,
5126 // Out = A * x + b in GF(2)
5127 //
5128 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5129 // computation also includes a parity calculation.
5130 //
5131 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5132 // Out_Shadow = (V1_Shadow & V2_Shadow)
5133 // | (V1 & V2_Shadow)
5134 // | (V1_Shadow & V2 )
5135 //
5136 // We approximate the shadow of gf2p8affineqb using:
5137 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5138 // | gf2p8affineqb(x, A_shadow, 0)
5139 // | gf2p8affineqb(x_Shadow, A, 0)
5140 // | set1_epi8(b_Shadow)
5141 //
5142 // This approximation has false negatives: if an intermediate dot-product
5143 // contains an even number of 1's, the parity is 0.
5144 // It has no false positives.
5145 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5146 IRBuilder<> IRB(&I);
5147
5148 assert(I.arg_size() == 3);
5149 Value *A = I.getOperand(0);
5150 Value *X = I.getOperand(1);
5151 Value *B = I.getOperand(2);
5152
5153 assert(isFixedIntVector(A));
5154 assert(cast<VectorType>(A->getType())
5155 ->getElementType()
5156 ->getScalarSizeInBits() == 8);
5157
5158 assert(A->getType() == X->getType());
5159
5160 assert(B->getType()->isIntegerTy());
5161 assert(B->getType()->getScalarSizeInBits() == 8);
5162
5163 assert(I.getType() == A->getType());
5164
5165 Value *AShadow = getShadow(A);
5166 Value *XShadow = getShadow(X);
5167 Value *BZeroShadow = getCleanShadow(B);
5168
5169 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5170 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5171 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5172 {X, AShadow, BZeroShadow});
5173 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5174 {XShadow, A, BZeroShadow});
5175
5176 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5177 Value *BShadow = getShadow(B);
5178 Value *BBroadcastShadow = getCleanShadow(AShadow);
5179 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5180 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5181 // lower appropriately (e.g., VPBROADCASTB).
5182 // Besides, b is often a constant, in which case it is fully initialized.
5183 for (unsigned i = 0; i < NumElements; i++)
5184 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5185
5186 setShadow(&I, IRB.CreateOr(
5187 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5188 setOriginForNaryOp(I);
5189 }
5190
5191 // Handle Arm NEON vector load intrinsics (vld*).
5192 //
5193 // The WithLane instructions (ld[234]lane) are similar to:
5194 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5195 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5196 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5197 // %A)
5198 //
5199 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5200 // to:
5201 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5202 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5203 unsigned int numArgs = I.arg_size();
5204
5205 // Return type is a struct of vectors of integers or floating-point
5206 assert(I.getType()->isStructTy());
5207 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5208 assert(RetTy->getNumElements() > 0);
5210 RetTy->getElementType(0)->isFPOrFPVectorTy());
5211 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5212 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5213
5214 if (WithLane) {
5215 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5216 assert(4 <= numArgs && numArgs <= 6);
5217
5218 // Return type is a struct of the input vectors
5219 assert(RetTy->getNumElements() + 2 == numArgs);
5220 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5221 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5222 } else {
5223 assert(numArgs == 1);
5224 }
5225
5226 IRBuilder<> IRB(&I);
5227
5228 SmallVector<Value *, 6> ShadowArgs;
5229 if (WithLane) {
5230 for (unsigned int i = 0; i < numArgs - 2; i++)
5231 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5232
5233 // Lane number, passed verbatim
5234 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5235 ShadowArgs.push_back(LaneNumber);
5236
5237 // TODO: blend shadow of lane number into output shadow?
5238 insertCheckShadowOf(LaneNumber, &I);
5239 }
5240
5241 Value *Src = I.getArgOperand(numArgs - 1);
5242 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5243
5244 Type *SrcShadowTy = getShadowTy(Src);
5245 auto [SrcShadowPtr, SrcOriginPtr] =
5246 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5247 ShadowArgs.push_back(SrcShadowPtr);
5248
5249 // The NEON vector load instructions handled by this function all have
5250 // integer variants. It is easier to use those rather than trying to cast
5251 // a struct of vectors of floats into a struct of vectors of integers.
5252 CallInst *CI =
5253 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5254 setShadow(&I, CI);
5255
5256 if (!MS.TrackOrigins)
5257 return;
5258
5259 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5260 setOrigin(&I, PtrSrcOrigin);
5261 }
5262
5263 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5264 /// and vst{2,3,4}lane).
5265 ///
5266 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5267 /// last argument, with the initial arguments being the inputs (and lane
5268 /// number for vst{2,3,4}lane). They return void.
5269 ///
5270 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5271 /// abcdabcdabcdabcd... into *outP
5272 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5273 /// writes aaaa...bbbb...cccc...dddd... into *outP
5274 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5275 /// These instructions can all be instrumented with essentially the same
5276 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5277 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5278 IRBuilder<> IRB(&I);
5279
5280 // Don't use getNumOperands() because it includes the callee
5281 int numArgOperands = I.arg_size();
5282
5283 // The last arg operand is the output (pointer)
5284 assert(numArgOperands >= 1);
5285 Value *Addr = I.getArgOperand(numArgOperands - 1);
5286 assert(Addr->getType()->isPointerTy());
5287 int skipTrailingOperands = 1;
5288
5290 insertCheckShadowOf(Addr, &I);
5291
5292 // Second-last operand is the lane number (for vst{2,3,4}lane)
5293 if (useLane) {
5294 skipTrailingOperands++;
5295 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5297 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5298 }
5299
5300 SmallVector<Value *, 8> ShadowArgs;
5301 // All the initial operands are the inputs
5302 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5303 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5304 Value *Shadow = getShadow(&I, i);
5305 ShadowArgs.append(1, Shadow);
5306 }
5307
5308 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5309 // e.g., for:
5310 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5311 // we know the type of the output (and its shadow) is <16 x i8>.
5312 //
5313 // Arm NEON VST is unusual because the last argument is the output address:
5314 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5315 // call void @llvm.aarch64.neon.st2.v16i8.p0
5316 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5317 // and we have no type information about P's operand. We must manually
5318 // compute the type (<16 x i8> x 2).
5319 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5320 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5321 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5322 (numArgOperands - skipTrailingOperands));
5323 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5324
5325 if (useLane)
5326 ShadowArgs.append(1,
5327 I.getArgOperand(numArgOperands - skipTrailingOperands));
5328
5329 Value *OutputShadowPtr, *OutputOriginPtr;
5330 // AArch64 NEON does not need alignment (unless OS requires it)
5331 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5332 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5333 ShadowArgs.append(1, OutputShadowPtr);
5334
5335 CallInst *CI =
5336 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5337 setShadow(&I, CI);
5338
5339 if (MS.TrackOrigins) {
5340 // TODO: if we modelled the vst* instruction more precisely, we could
5341 // more accurately track the origins (e.g., if both inputs are
5342 // uninitialized for vst2, we currently blame the second input, even
5343 // though part of the output depends only on the first input).
5344 //
5345 // This is particularly imprecise for vst{2,3,4}lane, since only one
5346 // lane of each input is actually copied to the output.
5347 OriginCombiner OC(this, IRB);
5348 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5349 OC.Add(I.getArgOperand(i));
5350
5351 const DataLayout &DL = F.getDataLayout();
5352 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5353 OutputOriginPtr);
5354 }
5355 }
5356
5357 // <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8
5358 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5359 // <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5360 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5361 // <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8
5362 // (<4 x i32> R%, <16 x i8> %X, <16 x i8> %Y)
5363 //
5364 // Note:
5365 // - < 4 x *> is a 2x2 matrix
5366 // - <16 x *> is a 2x8 matrix and 8x2 matrix respectively
5367 //
5368 // The general shadow propagation approach is:
5369 // 1) get the shadows of the input matrices %X and %Y
5370 // 2) change the shadow values to 0x1 if the corresponding value is fully
5371 // initialized, and 0x0 otherwise
5372 // 3) perform a matrix multiplication on the shadows of %X and %Y. The output
5373 // will be a 2x2 matrix; for each element, a value of 0x8 means all the
5374 // corresponding inputs were clean.
5375 // 4) blend in the shadow of %R
5376 //
5377 // TODO: consider allowing multiplication of zero with an uninitialized value
5378 // to result in an initialized value.
5379 //
5380 // TODO: handle floating-point matrix multiply using ummla on the shadows:
5381 // case Intrinsic::aarch64_neon_bfmmla:
5382 // handleNEONMatrixMultiply(I, /*ARows=*/ 2, /*ACols=*/ 4,
5383 // /*BRows=*/ 4, /*BCols=*/ 2);
5384 //
5385 void handleNEONMatrixMultiply(IntrinsicInst &I, unsigned int ARows,
5386 unsigned int ACols, unsigned int BRows,
5387 unsigned int BCols) {
5388 IRBuilder<> IRB(&I);
5389
5390 assert(I.arg_size() == 3);
5391 Value *R = I.getArgOperand(0);
5392 Value *A = I.getArgOperand(1);
5393 Value *B = I.getArgOperand(2);
5394
5395 assert(I.getType() == R->getType());
5396
5397 assert(isa<FixedVectorType>(R->getType()));
5398 assert(isa<FixedVectorType>(A->getType()));
5399 assert(isa<FixedVectorType>(B->getType()));
5400
5401 [[maybe_unused]] FixedVectorType *RTy = cast<FixedVectorType>(R->getType());
5402 [[maybe_unused]] FixedVectorType *ATy = cast<FixedVectorType>(A->getType());
5403 [[maybe_unused]] FixedVectorType *BTy = cast<FixedVectorType>(B->getType());
5404
5405 assert(ACols == BRows);
5406 assert(ATy->getNumElements() == ARows * ACols);
5407 assert(BTy->getNumElements() == BRows * BCols);
5408 assert(RTy->getNumElements() == ARows * BCols);
5409
5410 LLVM_DEBUG(dbgs() << "### R: " << *RTy->getElementType() << "\n");
5411 LLVM_DEBUG(dbgs() << "### A: " << *ATy->getElementType() << "\n");
5412 if (RTy->getElementType()->isIntegerTy()) {
5413 // Types are not identical e.g., <4 x i32> %R, <16 x i8> %A
5415 } else {
5418 }
5419 assert(ATy->getElementType() == BTy->getElementType());
5420
5421 Value *ShadowR = getShadow(&I, 0);
5422 Value *ShadowA = getShadow(&I, 1);
5423 Value *ShadowB = getShadow(&I, 2);
5424
5425 // If the value is fully initialized, the shadow will be 000...001.
5426 // Otherwise, the shadow will be all zero.
5427 // (This is the opposite of how we typically handle shadows.)
5428 ShadowA = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowA, getCleanShadow(A)),
5429 ShadowA->getType());
5430 ShadowB = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowB, getCleanShadow(B)),
5431 ShadowB->getType());
5432
5433 Value *ShadowAB = IRB.CreateIntrinsic(
5434 I.getType(), I.getIntrinsicID(), {getCleanShadow(R), ShadowA, ShadowB});
5435
5436 Value *FullyInit = ConstantVector::getSplat(
5437 RTy->getElementCount(),
5438 ConstantInt::get(cast<VectorType>(getShadowTy(R))->getElementType(),
5439 ACols));
5440
5441 ShadowAB = IRB.CreateSExt(IRB.CreateICmpNE(ShadowAB, FullyInit),
5442 ShadowAB->getType());
5443
5444 ShadowR = IRB.CreateSExt(IRB.CreateICmpNE(ShadowR, getCleanShadow(R)),
5445 ShadowR->getType());
5446
5447 setShadow(&I, IRB.CreateOr(ShadowAB, ShadowR));
5448 setOriginForNaryOp(I);
5449 }
5450
5451 /// Handle intrinsics by applying the intrinsic to the shadows.
5452 ///
5453 /// The trailing arguments are passed verbatim to the intrinsic, though any
5454 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5455 /// intrinsic with one trailing verbatim argument:
5456 /// out = intrinsic(var1, var2, opType)
5457 /// we compute:
5458 /// shadow[out] =
5459 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5460 ///
5461 /// Typically, shadowIntrinsicID will be specified by the caller to be
5462 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5463 /// intrinsic of the same type.
5464 ///
5465 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5466 /// bit-patterns (for example, if the intrinsic accepts floats for
5467 /// var1, we require that it doesn't care if inputs are NaNs).
5468 ///
5469 /// For example, this can be applied to the Arm NEON vector table intrinsics
5470 /// (tbl{1,2,3,4}).
5471 ///
5472 /// The origin is approximated using setOriginForNaryOp.
5473 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5474 Intrinsic::ID shadowIntrinsicID,
5475 unsigned int trailingVerbatimArgs) {
5476 IRBuilder<> IRB(&I);
5477
5478 assert(trailingVerbatimArgs < I.arg_size());
5479
5480 SmallVector<Value *, 8> ShadowArgs;
5481 // Don't use getNumOperands() because it includes the callee
5482 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5483 Value *Shadow = getShadow(&I, i);
5484
5485 // Shadows are integer-ish types but some intrinsics require a
5486 // different (e.g., floating-point) type.
5487 ShadowArgs.push_back(
5488 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5489 }
5490
5491 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5492 i++) {
5493 Value *Arg = I.getArgOperand(i);
5494 ShadowArgs.push_back(Arg);
5495 }
5496
5497 CallInst *CI =
5498 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5499 Value *CombinedShadow = CI;
5500
5501 // Combine the computed shadow with the shadow of trailing args
5502 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5503 i++) {
5504 Value *Shadow =
5505 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5506 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5507 }
5508
5509 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5510
5511 setOriginForNaryOp(I);
5512 }
5513
5514 // Approximation only
5515 //
5516 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5517 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5518 assert(I.arg_size() == 2);
5519
5520 handleShadowOr(I);
5521 }
5522
5523 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5524 switch (I.getIntrinsicID()) {
5525 case Intrinsic::uadd_with_overflow:
5526 case Intrinsic::sadd_with_overflow:
5527 case Intrinsic::usub_with_overflow:
5528 case Intrinsic::ssub_with_overflow:
5529 case Intrinsic::umul_with_overflow:
5530 case Intrinsic::smul_with_overflow:
5531 handleArithmeticWithOverflow(I);
5532 break;
5533 case Intrinsic::abs:
5534 handleAbsIntrinsic(I);
5535 break;
5536 case Intrinsic::bitreverse:
5537 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5538 /*trailingVerbatimArgs*/ 0);
5539 break;
5540 case Intrinsic::is_fpclass:
5541 handleIsFpClass(I);
5542 break;
5543 case Intrinsic::lifetime_start:
5544 handleLifetimeStart(I);
5545 break;
5546 case Intrinsic::launder_invariant_group:
5547 case Intrinsic::strip_invariant_group:
5548 handleInvariantGroup(I);
5549 break;
5550 case Intrinsic::bswap:
5551 handleBswap(I);
5552 break;
5553 case Intrinsic::ctlz:
5554 case Intrinsic::cttz:
5555 handleCountLeadingTrailingZeros(I);
5556 break;
5557 case Intrinsic::masked_compressstore:
5558 handleMaskedCompressStore(I);
5559 break;
5560 case Intrinsic::masked_expandload:
5561 handleMaskedExpandLoad(I);
5562 break;
5563 case Intrinsic::masked_gather:
5564 handleMaskedGather(I);
5565 break;
5566 case Intrinsic::masked_scatter:
5567 handleMaskedScatter(I);
5568 break;
5569 case Intrinsic::masked_store:
5570 handleMaskedStore(I);
5571 break;
5572 case Intrinsic::masked_load:
5573 handleMaskedLoad(I);
5574 break;
5575 case Intrinsic::vector_reduce_and:
5576 handleVectorReduceAndIntrinsic(I);
5577 break;
5578 case Intrinsic::vector_reduce_or:
5579 handleVectorReduceOrIntrinsic(I);
5580 break;
5581
5582 case Intrinsic::vector_reduce_add:
5583 case Intrinsic::vector_reduce_xor:
5584 case Intrinsic::vector_reduce_mul:
5585 // Signed/Unsigned Min/Max
5586 // TODO: handling similarly to AND/OR may be more precise.
5587 case Intrinsic::vector_reduce_smax:
5588 case Intrinsic::vector_reduce_smin:
5589 case Intrinsic::vector_reduce_umax:
5590 case Intrinsic::vector_reduce_umin:
5591 // TODO: this has no false positives, but arguably we should check that all
5592 // the bits are initialized.
5593 case Intrinsic::vector_reduce_fmax:
5594 case Intrinsic::vector_reduce_fmin:
5595 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5596 break;
5597
5598 case Intrinsic::vector_reduce_fadd:
5599 case Intrinsic::vector_reduce_fmul:
5600 handleVectorReduceWithStarterIntrinsic(I);
5601 break;
5602
5603 case Intrinsic::scmp:
5604 case Intrinsic::ucmp: {
5605 handleShadowOr(I);
5606 break;
5607 }
5608
5609 case Intrinsic::fshl:
5610 case Intrinsic::fshr:
5611 handleFunnelShift(I);
5612 break;
5613
5614 case Intrinsic::is_constant:
5615 // The result of llvm.is.constant() is always defined.
5616 setShadow(&I, getCleanShadow(&I));
5617 setOrigin(&I, getCleanOrigin());
5618 break;
5619
5620 default:
5621 return false;
5622 }
5623
5624 return true;
5625 }
5626
5627 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5628 switch (I.getIntrinsicID()) {
5629 case Intrinsic::x86_sse_stmxcsr:
5630 handleStmxcsr(I);
5631 break;
5632 case Intrinsic::x86_sse_ldmxcsr:
5633 handleLdmxcsr(I);
5634 break;
5635
5636 // Convert Scalar Double Precision Floating-Point Value
5637 // to Unsigned Doubleword Integer
5638 // etc.
5639 case Intrinsic::x86_avx512_vcvtsd2usi64:
5640 case Intrinsic::x86_avx512_vcvtsd2usi32:
5641 case Intrinsic::x86_avx512_vcvtss2usi64:
5642 case Intrinsic::x86_avx512_vcvtss2usi32:
5643 case Intrinsic::x86_avx512_cvttss2usi64:
5644 case Intrinsic::x86_avx512_cvttss2usi:
5645 case Intrinsic::x86_avx512_cvttsd2usi64:
5646 case Intrinsic::x86_avx512_cvttsd2usi:
5647 case Intrinsic::x86_avx512_cvtusi2ss:
5648 case Intrinsic::x86_avx512_cvtusi642sd:
5649 case Intrinsic::x86_avx512_cvtusi642ss:
5650 handleSSEVectorConvertIntrinsic(I, 1, true);
5651 break;
5652 case Intrinsic::x86_sse2_cvtsd2si64:
5653 case Intrinsic::x86_sse2_cvtsd2si:
5654 case Intrinsic::x86_sse2_cvtsd2ss:
5655 case Intrinsic::x86_sse2_cvttsd2si64:
5656 case Intrinsic::x86_sse2_cvttsd2si:
5657 case Intrinsic::x86_sse_cvtss2si64:
5658 case Intrinsic::x86_sse_cvtss2si:
5659 case Intrinsic::x86_sse_cvttss2si64:
5660 case Intrinsic::x86_sse_cvttss2si:
5661 handleSSEVectorConvertIntrinsic(I, 1);
5662 break;
5663 case Intrinsic::x86_sse_cvtps2pi:
5664 case Intrinsic::x86_sse_cvttps2pi:
5665 handleSSEVectorConvertIntrinsic(I, 2);
5666 break;
5667
5668 // TODO:
5669 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5670 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5671 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5672
5673 case Intrinsic::x86_vcvtps2ph_128:
5674 case Intrinsic::x86_vcvtps2ph_256: {
5675 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5676 break;
5677 }
5678
5679 // Convert Packed Single Precision Floating-Point Values
5680 // to Packed Signed Doubleword Integer Values
5681 //
5682 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5683 // (<16 x float>, <16 x i32>, i16, i32)
5684 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5685 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5686 break;
5687
5688 // Convert Packed Double Precision Floating-Point Values
5689 // to Packed Single Precision Floating-Point Values
5690 case Intrinsic::x86_sse2_cvtpd2ps:
5691 case Intrinsic::x86_sse2_cvtps2dq:
5692 case Intrinsic::x86_sse2_cvtpd2dq:
5693 case Intrinsic::x86_sse2_cvttps2dq:
5694 case Intrinsic::x86_sse2_cvttpd2dq:
5695 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5696 case Intrinsic::x86_avx_cvt_ps2dq_256:
5697 case Intrinsic::x86_avx_cvt_pd2dq_256:
5698 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5699 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5700 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5701 break;
5702 }
5703
5704 // Convert Single-Precision FP Value to 16-bit FP Value
5705 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5706 // (<16 x float>, i32, <16 x i16>, i16)
5707 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5708 // (<4 x float>, i32, <8 x i16>, i8)
5709 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5710 // (<8 x float>, i32, <8 x i16>, i8)
5711 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5712 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5713 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5714 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5715 break;
5716
5717 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5718 case Intrinsic::x86_avx512_psll_w_512:
5719 case Intrinsic::x86_avx512_psll_d_512:
5720 case Intrinsic::x86_avx512_psll_q_512:
5721 case Intrinsic::x86_avx512_pslli_w_512:
5722 case Intrinsic::x86_avx512_pslli_d_512:
5723 case Intrinsic::x86_avx512_pslli_q_512:
5724 case Intrinsic::x86_avx512_psrl_w_512:
5725 case Intrinsic::x86_avx512_psrl_d_512:
5726 case Intrinsic::x86_avx512_psrl_q_512:
5727 case Intrinsic::x86_avx512_psra_w_512:
5728 case Intrinsic::x86_avx512_psra_d_512:
5729 case Intrinsic::x86_avx512_psra_q_512:
5730 case Intrinsic::x86_avx512_psrli_w_512:
5731 case Intrinsic::x86_avx512_psrli_d_512:
5732 case Intrinsic::x86_avx512_psrli_q_512:
5733 case Intrinsic::x86_avx512_psrai_w_512:
5734 case Intrinsic::x86_avx512_psrai_d_512:
5735 case Intrinsic::x86_avx512_psrai_q_512:
5736 case Intrinsic::x86_avx512_psra_q_256:
5737 case Intrinsic::x86_avx512_psra_q_128:
5738 case Intrinsic::x86_avx512_psrai_q_256:
5739 case Intrinsic::x86_avx512_psrai_q_128:
5740 case Intrinsic::x86_avx2_psll_w:
5741 case Intrinsic::x86_avx2_psll_d:
5742 case Intrinsic::x86_avx2_psll_q:
5743 case Intrinsic::x86_avx2_pslli_w:
5744 case Intrinsic::x86_avx2_pslli_d:
5745 case Intrinsic::x86_avx2_pslli_q:
5746 case Intrinsic::x86_avx2_psrl_w:
5747 case Intrinsic::x86_avx2_psrl_d:
5748 case Intrinsic::x86_avx2_psrl_q:
5749 case Intrinsic::x86_avx2_psra_w:
5750 case Intrinsic::x86_avx2_psra_d:
5751 case Intrinsic::x86_avx2_psrli_w:
5752 case Intrinsic::x86_avx2_psrli_d:
5753 case Intrinsic::x86_avx2_psrli_q:
5754 case Intrinsic::x86_avx2_psrai_w:
5755 case Intrinsic::x86_avx2_psrai_d:
5756 case Intrinsic::x86_sse2_psll_w:
5757 case Intrinsic::x86_sse2_psll_d:
5758 case Intrinsic::x86_sse2_psll_q:
5759 case Intrinsic::x86_sse2_pslli_w:
5760 case Intrinsic::x86_sse2_pslli_d:
5761 case Intrinsic::x86_sse2_pslli_q:
5762 case Intrinsic::x86_sse2_psrl_w:
5763 case Intrinsic::x86_sse2_psrl_d:
5764 case Intrinsic::x86_sse2_psrl_q:
5765 case Intrinsic::x86_sse2_psra_w:
5766 case Intrinsic::x86_sse2_psra_d:
5767 case Intrinsic::x86_sse2_psrli_w:
5768 case Intrinsic::x86_sse2_psrli_d:
5769 case Intrinsic::x86_sse2_psrli_q:
5770 case Intrinsic::x86_sse2_psrai_w:
5771 case Intrinsic::x86_sse2_psrai_d:
5772 case Intrinsic::x86_mmx_psll_w:
5773 case Intrinsic::x86_mmx_psll_d:
5774 case Intrinsic::x86_mmx_psll_q:
5775 case Intrinsic::x86_mmx_pslli_w:
5776 case Intrinsic::x86_mmx_pslli_d:
5777 case Intrinsic::x86_mmx_pslli_q:
5778 case Intrinsic::x86_mmx_psrl_w:
5779 case Intrinsic::x86_mmx_psrl_d:
5780 case Intrinsic::x86_mmx_psrl_q:
5781 case Intrinsic::x86_mmx_psra_w:
5782 case Intrinsic::x86_mmx_psra_d:
5783 case Intrinsic::x86_mmx_psrli_w:
5784 case Intrinsic::x86_mmx_psrli_d:
5785 case Intrinsic::x86_mmx_psrli_q:
5786 case Intrinsic::x86_mmx_psrai_w:
5787 case Intrinsic::x86_mmx_psrai_d:
5788 handleVectorShiftIntrinsic(I, /* Variable */ false);
5789 break;
5790 case Intrinsic::x86_avx2_psllv_d:
5791 case Intrinsic::x86_avx2_psllv_d_256:
5792 case Intrinsic::x86_avx512_psllv_d_512:
5793 case Intrinsic::x86_avx2_psllv_q:
5794 case Intrinsic::x86_avx2_psllv_q_256:
5795 case Intrinsic::x86_avx512_psllv_q_512:
5796 case Intrinsic::x86_avx2_psrlv_d:
5797 case Intrinsic::x86_avx2_psrlv_d_256:
5798 case Intrinsic::x86_avx512_psrlv_d_512:
5799 case Intrinsic::x86_avx2_psrlv_q:
5800 case Intrinsic::x86_avx2_psrlv_q_256:
5801 case Intrinsic::x86_avx512_psrlv_q_512:
5802 case Intrinsic::x86_avx2_psrav_d:
5803 case Intrinsic::x86_avx2_psrav_d_256:
5804 case Intrinsic::x86_avx512_psrav_d_512:
5805 case Intrinsic::x86_avx512_psrav_q_128:
5806 case Intrinsic::x86_avx512_psrav_q_256:
5807 case Intrinsic::x86_avx512_psrav_q_512:
5808 handleVectorShiftIntrinsic(I, /* Variable */ true);
5809 break;
5810
5811 // Pack with Signed/Unsigned Saturation
5812 case Intrinsic::x86_sse2_packsswb_128:
5813 case Intrinsic::x86_sse2_packssdw_128:
5814 case Intrinsic::x86_sse2_packuswb_128:
5815 case Intrinsic::x86_sse41_packusdw:
5816 case Intrinsic::x86_avx2_packsswb:
5817 case Intrinsic::x86_avx2_packssdw:
5818 case Intrinsic::x86_avx2_packuswb:
5819 case Intrinsic::x86_avx2_packusdw:
5820 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5821 // (<32 x i16> %a, <32 x i16> %b)
5822 // <32 x i16> @llvm.x86.avx512.packssdw.512
5823 // (<16 x i32> %a, <16 x i32> %b)
5824 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5825 case Intrinsic::x86_avx512_packsswb_512:
5826 case Intrinsic::x86_avx512_packssdw_512:
5827 case Intrinsic::x86_avx512_packuswb_512:
5828 case Intrinsic::x86_avx512_packusdw_512:
5829 handleVectorPackIntrinsic(I);
5830 break;
5831
5832 case Intrinsic::x86_sse41_pblendvb:
5833 case Intrinsic::x86_sse41_blendvpd:
5834 case Intrinsic::x86_sse41_blendvps:
5835 case Intrinsic::x86_avx_blendv_pd_256:
5836 case Intrinsic::x86_avx_blendv_ps_256:
5837 case Intrinsic::x86_avx2_pblendvb:
5838 handleBlendvIntrinsic(I);
5839 break;
5840
5841 case Intrinsic::x86_avx_dp_ps_256:
5842 case Intrinsic::x86_sse41_dppd:
5843 case Intrinsic::x86_sse41_dpps:
5844 handleDppIntrinsic(I);
5845 break;
5846
5847 case Intrinsic::x86_mmx_packsswb:
5848 case Intrinsic::x86_mmx_packuswb:
5849 handleVectorPackIntrinsic(I, 16);
5850 break;
5851
5852 case Intrinsic::x86_mmx_packssdw:
5853 handleVectorPackIntrinsic(I, 32);
5854 break;
5855
5856 case Intrinsic::x86_mmx_psad_bw:
5857 handleVectorSadIntrinsic(I, true);
5858 break;
5859 case Intrinsic::x86_sse2_psad_bw:
5860 case Intrinsic::x86_avx2_psad_bw:
5861 handleVectorSadIntrinsic(I);
5862 break;
5863
5864 // Multiply and Add Packed Words
5865 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5866 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5867 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5868 //
5869 // Multiply and Add Packed Signed and Unsigned Bytes
5870 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5871 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5872 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5873 //
5874 // These intrinsics are auto-upgraded into non-masked forms:
5875 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5876 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5877 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5878 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5879 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5880 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5881 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5882 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5883 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5884 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5885 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5886 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5887 case Intrinsic::x86_sse2_pmadd_wd:
5888 case Intrinsic::x86_avx2_pmadd_wd:
5889 case Intrinsic::x86_avx512_pmaddw_d_512:
5890 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5891 case Intrinsic::x86_avx2_pmadd_ub_sw:
5892 case Intrinsic::x86_avx512_pmaddubs_w_512:
5893 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
5894 /*ZeroPurifies=*/true);
5895 break;
5896
5897 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5898 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5899 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
5900 /*ZeroPurifies=*/true, /*EltSizeInBits=*/8);
5901 break;
5902
5903 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5904 case Intrinsic::x86_mmx_pmadd_wd:
5905 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
5906 /*ZeroPurifies=*/true, /*EltSizeInBits=*/16);
5907 break;
5908
5909 // AVX Vector Neural Network Instructions: bytes
5910 //
5911 // Multiply and Add Signed Bytes
5912 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5913 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5914 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5915 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5916 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5917 // (<16 x i32>, <64 x i8>, <64 x i8>)
5918 //
5919 // Multiply and Add Signed Bytes With Saturation
5920 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5921 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5922 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5923 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5924 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5925 // (<16 x i32>, <64 x i8>, <64 x i8>)
5926 //
5927 // Multiply and Add Signed and Unsigned Bytes
5928 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
5929 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5930 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
5931 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5932 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
5933 // (<16 x i32>, <64 x i8>, <64 x i8>)
5934 //
5935 // Multiply and Add Signed and Unsigned Bytes With Saturation
5936 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
5937 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5938 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
5939 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5940 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5941 // (<16 x i32>, <64 x i8>, <64 x i8>)
5942 //
5943 // Multiply and Add Unsigned and Signed Bytes
5944 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5945 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5946 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5947 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5948 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5949 // (<16 x i32>, <64 x i8>, <64 x i8>)
5950 //
5951 // Multiply and Add Unsigned and Signed Bytes With Saturation
5952 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5953 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5954 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5955 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5956 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
5957 // (<16 x i32>, <64 x i8>, <64 x i8>)
5958 //
5959 // Multiply and Add Unsigned Bytes
5960 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
5961 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5962 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
5963 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5964 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
5965 // (<16 x i32>, <64 x i8>, <64 x i8>)
5966 //
5967 // Multiply and Add Unsigned Bytes With Saturation
5968 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
5969 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5970 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
5971 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5972 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
5973 // (<16 x i32>, <64 x i8>, <64 x i8>)
5974 //
5975 // These intrinsics are auto-upgraded into non-masked forms:
5976 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5977 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5978 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5979 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5980 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5981 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5982 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
5983 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5984 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
5985 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5986 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
5987 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5988 //
5989 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
5990 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5991 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
5992 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5993 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
5994 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5995 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
5996 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5997 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
5998 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5999 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6000 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6001 case Intrinsic::x86_avx512_vpdpbusd_128:
6002 case Intrinsic::x86_avx512_vpdpbusd_256:
6003 case Intrinsic::x86_avx512_vpdpbusd_512:
6004 case Intrinsic::x86_avx512_vpdpbusds_128:
6005 case Intrinsic::x86_avx512_vpdpbusds_256:
6006 case Intrinsic::x86_avx512_vpdpbusds_512:
6007 case Intrinsic::x86_avx2_vpdpbssd_128:
6008 case Intrinsic::x86_avx2_vpdpbssd_256:
6009 case Intrinsic::x86_avx10_vpdpbssd_512:
6010 case Intrinsic::x86_avx2_vpdpbssds_128:
6011 case Intrinsic::x86_avx2_vpdpbssds_256:
6012 case Intrinsic::x86_avx10_vpdpbssds_512:
6013 case Intrinsic::x86_avx2_vpdpbsud_128:
6014 case Intrinsic::x86_avx2_vpdpbsud_256:
6015 case Intrinsic::x86_avx10_vpdpbsud_512:
6016 case Intrinsic::x86_avx2_vpdpbsuds_128:
6017 case Intrinsic::x86_avx2_vpdpbsuds_256:
6018 case Intrinsic::x86_avx10_vpdpbsuds_512:
6019 case Intrinsic::x86_avx2_vpdpbuud_128:
6020 case Intrinsic::x86_avx2_vpdpbuud_256:
6021 case Intrinsic::x86_avx10_vpdpbuud_512:
6022 case Intrinsic::x86_avx2_vpdpbuuds_128:
6023 case Intrinsic::x86_avx2_vpdpbuuds_256:
6024 case Intrinsic::x86_avx10_vpdpbuuds_512:
6025 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/4,
6026 /*ZeroPurifies=*/true);
6027 break;
6028
6029 // AVX Vector Neural Network Instructions: words
6030 //
6031 // Multiply and Add Signed Word Integers
6032 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6033 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6034 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6035 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6036 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6037 // (<16 x i32>, <32 x i16>, <32 x i16>)
6038 //
6039 // Multiply and Add Signed Word Integers With Saturation
6040 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6041 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6042 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6043 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6044 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6045 // (<16 x i32>, <32 x i16>, <32 x i16>)
6046 //
6047 // Multiply and Add Signed and Unsigned Word Integers
6048 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6049 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6050 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6051 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6052 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6053 // (<16 x i32>, <32 x i16>, <32 x i16>)
6054 //
6055 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6056 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6057 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6058 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6059 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6060 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6061 // (<16 x i32>, <32 x i16>, <32 x i16>)
6062 //
6063 // Multiply and Add Unsigned and Signed Word Integers
6064 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6065 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6066 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6067 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6068 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6069 // (<16 x i32>, <32 x i16>, <32 x i16>)
6070 //
6071 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6072 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6073 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6074 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6075 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6076 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6077 // (<16 x i32>, <32 x i16>, <32 x i16>)
6078 //
6079 // Multiply and Add Unsigned and Unsigned Word Integers
6080 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6081 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6082 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6083 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6084 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6085 // (<16 x i32>, <32 x i16>, <32 x i16>)
6086 //
6087 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6088 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6089 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6090 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6091 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6092 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6093 // (<16 x i32>, <32 x i16>, <32 x i16>)
6094 //
6095 // These intrinsics are auto-upgraded into non-masked forms:
6096 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6097 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6098 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6099 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6100 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6101 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6102 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6103 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6104 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6105 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6106 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6107 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6108 //
6109 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6110 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6111 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6112 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6113 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6114 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6115 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6116 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6117 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6118 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6119 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6120 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6121 case Intrinsic::x86_avx512_vpdpwssd_128:
6122 case Intrinsic::x86_avx512_vpdpwssd_256:
6123 case Intrinsic::x86_avx512_vpdpwssd_512:
6124 case Intrinsic::x86_avx512_vpdpwssds_128:
6125 case Intrinsic::x86_avx512_vpdpwssds_256:
6126 case Intrinsic::x86_avx512_vpdpwssds_512:
6127 case Intrinsic::x86_avx2_vpdpwsud_128:
6128 case Intrinsic::x86_avx2_vpdpwsud_256:
6129 case Intrinsic::x86_avx10_vpdpwsud_512:
6130 case Intrinsic::x86_avx2_vpdpwsuds_128:
6131 case Intrinsic::x86_avx2_vpdpwsuds_256:
6132 case Intrinsic::x86_avx10_vpdpwsuds_512:
6133 case Intrinsic::x86_avx2_vpdpwusd_128:
6134 case Intrinsic::x86_avx2_vpdpwusd_256:
6135 case Intrinsic::x86_avx10_vpdpwusd_512:
6136 case Intrinsic::x86_avx2_vpdpwusds_128:
6137 case Intrinsic::x86_avx2_vpdpwusds_256:
6138 case Intrinsic::x86_avx10_vpdpwusds_512:
6139 case Intrinsic::x86_avx2_vpdpwuud_128:
6140 case Intrinsic::x86_avx2_vpdpwuud_256:
6141 case Intrinsic::x86_avx10_vpdpwuud_512:
6142 case Intrinsic::x86_avx2_vpdpwuuds_128:
6143 case Intrinsic::x86_avx2_vpdpwuuds_256:
6144 case Intrinsic::x86_avx10_vpdpwuuds_512:
6145 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
6146 /*ZeroPurifies=*/true);
6147 break;
6148
6149 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6150 // Precision
6151 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6152 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6153 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6154 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6155 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6156 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6157 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6158 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6159 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6160 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
6161 /*ZeroPurifies=*/false);
6162 break;
6163
6164 case Intrinsic::x86_sse_cmp_ss:
6165 case Intrinsic::x86_sse2_cmp_sd:
6166 case Intrinsic::x86_sse_comieq_ss:
6167 case Intrinsic::x86_sse_comilt_ss:
6168 case Intrinsic::x86_sse_comile_ss:
6169 case Intrinsic::x86_sse_comigt_ss:
6170 case Intrinsic::x86_sse_comige_ss:
6171 case Intrinsic::x86_sse_comineq_ss:
6172 case Intrinsic::x86_sse_ucomieq_ss:
6173 case Intrinsic::x86_sse_ucomilt_ss:
6174 case Intrinsic::x86_sse_ucomile_ss:
6175 case Intrinsic::x86_sse_ucomigt_ss:
6176 case Intrinsic::x86_sse_ucomige_ss:
6177 case Intrinsic::x86_sse_ucomineq_ss:
6178 case Intrinsic::x86_sse2_comieq_sd:
6179 case Intrinsic::x86_sse2_comilt_sd:
6180 case Intrinsic::x86_sse2_comile_sd:
6181 case Intrinsic::x86_sse2_comigt_sd:
6182 case Intrinsic::x86_sse2_comige_sd:
6183 case Intrinsic::x86_sse2_comineq_sd:
6184 case Intrinsic::x86_sse2_ucomieq_sd:
6185 case Intrinsic::x86_sse2_ucomilt_sd:
6186 case Intrinsic::x86_sse2_ucomile_sd:
6187 case Intrinsic::x86_sse2_ucomigt_sd:
6188 case Intrinsic::x86_sse2_ucomige_sd:
6189 case Intrinsic::x86_sse2_ucomineq_sd:
6190 handleVectorCompareScalarIntrinsic(I);
6191 break;
6192
6193 case Intrinsic::x86_avx_cmp_pd_256:
6194 case Intrinsic::x86_avx_cmp_ps_256:
6195 case Intrinsic::x86_sse2_cmp_pd:
6196 case Intrinsic::x86_sse_cmp_ps:
6197 handleVectorComparePackedIntrinsic(I);
6198 break;
6199
6200 case Intrinsic::x86_bmi_bextr_32:
6201 case Intrinsic::x86_bmi_bextr_64:
6202 case Intrinsic::x86_bmi_bzhi_32:
6203 case Intrinsic::x86_bmi_bzhi_64:
6204 case Intrinsic::x86_bmi_pdep_32:
6205 case Intrinsic::x86_bmi_pdep_64:
6206 case Intrinsic::x86_bmi_pext_32:
6207 case Intrinsic::x86_bmi_pext_64:
6208 handleBmiIntrinsic(I);
6209 break;
6210
6211 case Intrinsic::x86_pclmulqdq:
6212 case Intrinsic::x86_pclmulqdq_256:
6213 case Intrinsic::x86_pclmulqdq_512:
6214 handlePclmulIntrinsic(I);
6215 break;
6216
6217 case Intrinsic::x86_avx_round_pd_256:
6218 case Intrinsic::x86_avx_round_ps_256:
6219 case Intrinsic::x86_sse41_round_pd:
6220 case Intrinsic::x86_sse41_round_ps:
6221 handleRoundPdPsIntrinsic(I);
6222 break;
6223
6224 case Intrinsic::x86_sse41_round_sd:
6225 case Intrinsic::x86_sse41_round_ss:
6226 handleUnarySdSsIntrinsic(I);
6227 break;
6228
6229 case Intrinsic::x86_sse2_max_sd:
6230 case Intrinsic::x86_sse_max_ss:
6231 case Intrinsic::x86_sse2_min_sd:
6232 case Intrinsic::x86_sse_min_ss:
6233 handleBinarySdSsIntrinsic(I);
6234 break;
6235
6236 case Intrinsic::x86_avx_vtestc_pd:
6237 case Intrinsic::x86_avx_vtestc_pd_256:
6238 case Intrinsic::x86_avx_vtestc_ps:
6239 case Intrinsic::x86_avx_vtestc_ps_256:
6240 case Intrinsic::x86_avx_vtestnzc_pd:
6241 case Intrinsic::x86_avx_vtestnzc_pd_256:
6242 case Intrinsic::x86_avx_vtestnzc_ps:
6243 case Intrinsic::x86_avx_vtestnzc_ps_256:
6244 case Intrinsic::x86_avx_vtestz_pd:
6245 case Intrinsic::x86_avx_vtestz_pd_256:
6246 case Intrinsic::x86_avx_vtestz_ps:
6247 case Intrinsic::x86_avx_vtestz_ps_256:
6248 case Intrinsic::x86_avx_ptestc_256:
6249 case Intrinsic::x86_avx_ptestnzc_256:
6250 case Intrinsic::x86_avx_ptestz_256:
6251 case Intrinsic::x86_sse41_ptestc:
6252 case Intrinsic::x86_sse41_ptestnzc:
6253 case Intrinsic::x86_sse41_ptestz:
6254 handleVtestIntrinsic(I);
6255 break;
6256
6257 // Packed Horizontal Add/Subtract
6258 case Intrinsic::x86_ssse3_phadd_w:
6259 case Intrinsic::x86_ssse3_phadd_w_128:
6260 case Intrinsic::x86_ssse3_phsub_w:
6261 case Intrinsic::x86_ssse3_phsub_w_128:
6262 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6263 /*ReinterpretElemWidth=*/16);
6264 break;
6265
6266 case Intrinsic::x86_avx2_phadd_w:
6267 case Intrinsic::x86_avx2_phsub_w:
6268 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6269 /*ReinterpretElemWidth=*/16);
6270 break;
6271
6272 // Packed Horizontal Add/Subtract
6273 case Intrinsic::x86_ssse3_phadd_d:
6274 case Intrinsic::x86_ssse3_phadd_d_128:
6275 case Intrinsic::x86_ssse3_phsub_d:
6276 case Intrinsic::x86_ssse3_phsub_d_128:
6277 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6278 /*ReinterpretElemWidth=*/32);
6279 break;
6280
6281 case Intrinsic::x86_avx2_phadd_d:
6282 case Intrinsic::x86_avx2_phsub_d:
6283 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6284 /*ReinterpretElemWidth=*/32);
6285 break;
6286
6287 // Packed Horizontal Add/Subtract and Saturate
6288 case Intrinsic::x86_ssse3_phadd_sw:
6289 case Intrinsic::x86_ssse3_phadd_sw_128:
6290 case Intrinsic::x86_ssse3_phsub_sw:
6291 case Intrinsic::x86_ssse3_phsub_sw_128:
6292 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6293 /*ReinterpretElemWidth=*/16);
6294 break;
6295
6296 case Intrinsic::x86_avx2_phadd_sw:
6297 case Intrinsic::x86_avx2_phsub_sw:
6298 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6299 /*ReinterpretElemWidth=*/16);
6300 break;
6301
6302 // Packed Single/Double Precision Floating-Point Horizontal Add
6303 case Intrinsic::x86_sse3_hadd_ps:
6304 case Intrinsic::x86_sse3_hadd_pd:
6305 case Intrinsic::x86_sse3_hsub_ps:
6306 case Intrinsic::x86_sse3_hsub_pd:
6307 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6308 break;
6309
6310 case Intrinsic::x86_avx_hadd_pd_256:
6311 case Intrinsic::x86_avx_hadd_ps_256:
6312 case Intrinsic::x86_avx_hsub_pd_256:
6313 case Intrinsic::x86_avx_hsub_ps_256:
6314 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6315 break;
6316
6317 case Intrinsic::x86_avx_maskstore_ps:
6318 case Intrinsic::x86_avx_maskstore_pd:
6319 case Intrinsic::x86_avx_maskstore_ps_256:
6320 case Intrinsic::x86_avx_maskstore_pd_256:
6321 case Intrinsic::x86_avx2_maskstore_d:
6322 case Intrinsic::x86_avx2_maskstore_q:
6323 case Intrinsic::x86_avx2_maskstore_d_256:
6324 case Intrinsic::x86_avx2_maskstore_q_256: {
6325 handleAVXMaskedStore(I);
6326 break;
6327 }
6328
6329 case Intrinsic::x86_avx_maskload_ps:
6330 case Intrinsic::x86_avx_maskload_pd:
6331 case Intrinsic::x86_avx_maskload_ps_256:
6332 case Intrinsic::x86_avx_maskload_pd_256:
6333 case Intrinsic::x86_avx2_maskload_d:
6334 case Intrinsic::x86_avx2_maskload_q:
6335 case Intrinsic::x86_avx2_maskload_d_256:
6336 case Intrinsic::x86_avx2_maskload_q_256: {
6337 handleAVXMaskedLoad(I);
6338 break;
6339 }
6340
6341 // Packed
6342 case Intrinsic::x86_avx512fp16_add_ph_512:
6343 case Intrinsic::x86_avx512fp16_sub_ph_512:
6344 case Intrinsic::x86_avx512fp16_mul_ph_512:
6345 case Intrinsic::x86_avx512fp16_div_ph_512:
6346 case Intrinsic::x86_avx512fp16_max_ph_512:
6347 case Intrinsic::x86_avx512fp16_min_ph_512:
6348 case Intrinsic::x86_avx512_min_ps_512:
6349 case Intrinsic::x86_avx512_min_pd_512:
6350 case Intrinsic::x86_avx512_max_ps_512:
6351 case Intrinsic::x86_avx512_max_pd_512: {
6352 // These AVX512 variants contain the rounding mode as a trailing flag.
6353 // Earlier variants do not have a trailing flag and are already handled
6354 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6355 // maybeHandleUnknownIntrinsic.
6356 [[maybe_unused]] bool Success =
6357 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6358 assert(Success);
6359 break;
6360 }
6361
6362 case Intrinsic::x86_avx_vpermilvar_pd:
6363 case Intrinsic::x86_avx_vpermilvar_pd_256:
6364 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6365 case Intrinsic::x86_avx_vpermilvar_ps:
6366 case Intrinsic::x86_avx_vpermilvar_ps_256:
6367 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6368 handleAVXVpermilvar(I);
6369 break;
6370 }
6371
6372 case Intrinsic::x86_avx512_vpermi2var_d_128:
6373 case Intrinsic::x86_avx512_vpermi2var_d_256:
6374 case Intrinsic::x86_avx512_vpermi2var_d_512:
6375 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6376 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6377 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6378 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6379 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6380 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6381 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6382 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6383 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6384 case Intrinsic::x86_avx512_vpermi2var_q_128:
6385 case Intrinsic::x86_avx512_vpermi2var_q_256:
6386 case Intrinsic::x86_avx512_vpermi2var_q_512:
6387 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6388 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6389 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6390 handleAVXVpermi2var(I);
6391 break;
6392
6393 // Packed Shuffle
6394 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6395 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6396 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6397 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6398 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6399 //
6400 // The following intrinsics are auto-upgraded:
6401 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6402 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6403 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6404 case Intrinsic::x86_avx2_pshuf_b:
6405 case Intrinsic::x86_sse_pshuf_w:
6406 case Intrinsic::x86_ssse3_pshuf_b_128:
6407 case Intrinsic::x86_ssse3_pshuf_b:
6408 case Intrinsic::x86_avx512_pshuf_b_512:
6409 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6410 /*trailingVerbatimArgs=*/1);
6411 break;
6412
6413 // AVX512 PMOV: Packed MOV, with truncation
6414 // Precisely handled by applying the same intrinsic to the shadow
6415 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6416 case Intrinsic::x86_avx512_mask_pmov_db_512:
6417 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6418 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6419 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6420 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6421 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6422 /*trailingVerbatimArgs=*/1);
6423 break;
6424 }
6425
6426 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6427 // Approximately handled using the corresponding truncation intrinsic
6428 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6429 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6430 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6431 handleIntrinsicByApplyingToShadow(I,
6432 Intrinsic::x86_avx512_mask_pmov_dw_512,
6433 /* trailingVerbatimArgs=*/1);
6434 break;
6435 }
6436
6437 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6438 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6439 handleIntrinsicByApplyingToShadow(I,
6440 Intrinsic::x86_avx512_mask_pmov_db_512,
6441 /* trailingVerbatimArgs=*/1);
6442 break;
6443 }
6444
6445 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6446 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6447 handleIntrinsicByApplyingToShadow(I,
6448 Intrinsic::x86_avx512_mask_pmov_qb_512,
6449 /* trailingVerbatimArgs=*/1);
6450 break;
6451 }
6452
6453 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6454 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6455 handleIntrinsicByApplyingToShadow(I,
6456 Intrinsic::x86_avx512_mask_pmov_qw_512,
6457 /* trailingVerbatimArgs=*/1);
6458 break;
6459 }
6460
6461 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6462 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6463 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6464 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6465 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6466 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6467 // slow-path handler.
6468 handleAVX512VectorDownConvert(I);
6469 break;
6470 }
6471
6472 // AVX512/AVX10 Reciprocal
6473 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6474 // (<16 x float>, <16 x float>, i16)
6475 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6476 // (<8 x float>, <8 x float>, i8)
6477 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6478 // (<4 x float>, <4 x float>, i8)
6479 //
6480 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6481 // (<8 x double>, <8 x double>, i8)
6482 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6483 // (<4 x double>, <4 x double>, i8)
6484 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6485 // (<2 x double>, <2 x double>, i8)
6486 //
6487 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6488 // (<32 x bfloat>, <32 x bfloat>, i32)
6489 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6490 // (<16 x bfloat>, <16 x bfloat>, i16)
6491 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6492 // (<8 x bfloat>, <8 x bfloat>, i8)
6493 //
6494 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6495 // (<32 x half>, <32 x half>, i32)
6496 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6497 // (<16 x half>, <16 x half>, i16)
6498 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6499 // (<8 x half>, <8 x half>, i8)
6500 //
6501 // TODO: 3-operand variants are not handled:
6502 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6503 // (<2 x double>, <2 x double>, <2 x double>, i8)
6504 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6505 // (<4 x float>, <4 x float>, <4 x float>, i8)
6506 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6507 // (<8 x half>, <8 x half>, <8 x half>, i8)
6508 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6509 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6510 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6511 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6512 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6513 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6514 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6515 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6516 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6517 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6518 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6519 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6520 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6521 /*MaskIndex=*/2);
6522 break;
6523
6524 // AVX512/AVX10 Reciprocal Square Root
6525 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6526 // (<16 x float>, <16 x float>, i16)
6527 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6528 // (<8 x float>, <8 x float>, i8)
6529 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6530 // (<4 x float>, <4 x float>, i8)
6531 //
6532 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6533 // (<8 x double>, <8 x double>, i8)
6534 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6535 // (<4 x double>, <4 x double>, i8)
6536 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6537 // (<2 x double>, <2 x double>, i8)
6538 //
6539 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6540 // (<32 x bfloat>, <32 x bfloat>, i32)
6541 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6542 // (<16 x bfloat>, <16 x bfloat>, i16)
6543 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6544 // (<8 x bfloat>, <8 x bfloat>, i8)
6545 //
6546 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6547 // (<32 x half>, <32 x half>, i32)
6548 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6549 // (<16 x half>, <16 x half>, i16)
6550 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6551 // (<8 x half>, <8 x half>, i8)
6552 //
6553 // TODO: 3-operand variants are not handled:
6554 // <2 x double> @llvm.x86.avx512.rcp14.sd
6555 // (<2 x double>, <2 x double>, <2 x double>, i8)
6556 // <4 x float> @llvm.x86.avx512.rcp14.ss
6557 // (<4 x float>, <4 x float>, <4 x float>, i8)
6558 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6559 // (<8 x half>, <8 x half>, <8 x half>, i8)
6560 case Intrinsic::x86_avx512_rcp14_ps_512:
6561 case Intrinsic::x86_avx512_rcp14_ps_256:
6562 case Intrinsic::x86_avx512_rcp14_ps_128:
6563 case Intrinsic::x86_avx512_rcp14_pd_512:
6564 case Intrinsic::x86_avx512_rcp14_pd_256:
6565 case Intrinsic::x86_avx512_rcp14_pd_128:
6566 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6567 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6568 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6569 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6570 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6571 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6572 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6573 /*MaskIndex=*/2);
6574 break;
6575
6576 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6577 // (<32 x half>, i32, <32 x half>, i32, i32)
6578 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6579 // (<16 x half>, i32, <16 x half>, i32, i16)
6580 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6581 // (<8 x half>, i32, <8 x half>, i32, i8)
6582 //
6583 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6584 // (<16 x float>, i32, <16 x float>, i16, i32)
6585 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6586 // (<8 x float>, i32, <8 x float>, i8)
6587 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6588 // (<4 x float>, i32, <4 x float>, i8)
6589 //
6590 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6591 // (<8 x double>, i32, <8 x double>, i8, i32)
6592 // A Imm WriteThru Mask Rounding
6593 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6594 // (<4 x double>, i32, <4 x double>, i8)
6595 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6596 // (<2 x double>, i32, <2 x double>, i8)
6597 // A Imm WriteThru Mask
6598 //
6599 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6600 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6601 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6602 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6603 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6604 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6605 //
6606 // Not supported: three vectors
6607 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6608 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6609 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6610 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6611 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6612 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6613 // i32)
6614 // A B WriteThru Mask Imm
6615 // Rounding
6616 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6617 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6618 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6619 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6620 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6621 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6622 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6623 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6624 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6625 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6626 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6627 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6628 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6629 /*MaskIndex=*/3);
6630 break;
6631
6632 // AVX512 FP16 Arithmetic
6633 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6634 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6635 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6636 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6637 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6638 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6639 visitGenericScalarHalfwordInst(I);
6640 break;
6641 }
6642
6643 // AVX Galois Field New Instructions
6644 case Intrinsic::x86_vgf2p8affineqb_128:
6645 case Intrinsic::x86_vgf2p8affineqb_256:
6646 case Intrinsic::x86_vgf2p8affineqb_512:
6647 handleAVXGF2P8Affine(I);
6648 break;
6649
6650 default:
6651 return false;
6652 }
6653
6654 return true;
6655 }
6656
6657 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6658 switch (I.getIntrinsicID()) {
6659 case Intrinsic::aarch64_neon_rshrn:
6660 case Intrinsic::aarch64_neon_sqrshl:
6661 case Intrinsic::aarch64_neon_sqrshrn:
6662 case Intrinsic::aarch64_neon_sqrshrun:
6663 case Intrinsic::aarch64_neon_sqshl:
6664 case Intrinsic::aarch64_neon_sqshlu:
6665 case Intrinsic::aarch64_neon_sqshrn:
6666 case Intrinsic::aarch64_neon_sqshrun:
6667 case Intrinsic::aarch64_neon_srshl:
6668 case Intrinsic::aarch64_neon_sshl:
6669 case Intrinsic::aarch64_neon_uqrshl:
6670 case Intrinsic::aarch64_neon_uqrshrn:
6671 case Intrinsic::aarch64_neon_uqshl:
6672 case Intrinsic::aarch64_neon_uqshrn:
6673 case Intrinsic::aarch64_neon_urshl:
6674 case Intrinsic::aarch64_neon_ushl:
6675 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6676 handleVectorShiftIntrinsic(I, /* Variable */ false);
6677 break;
6678
6679 // TODO: handling max/min similarly to AND/OR may be more precise
6680 // Floating-Point Maximum/Minimum Pairwise
6681 case Intrinsic::aarch64_neon_fmaxp:
6682 case Intrinsic::aarch64_neon_fminp:
6683 // Floating-Point Maximum/Minimum Number Pairwise
6684 case Intrinsic::aarch64_neon_fmaxnmp:
6685 case Intrinsic::aarch64_neon_fminnmp:
6686 // Signed/Unsigned Maximum/Minimum Pairwise
6687 case Intrinsic::aarch64_neon_smaxp:
6688 case Intrinsic::aarch64_neon_sminp:
6689 case Intrinsic::aarch64_neon_umaxp:
6690 case Intrinsic::aarch64_neon_uminp:
6691 // Add Pairwise
6692 case Intrinsic::aarch64_neon_addp:
6693 // Floating-point Add Pairwise
6694 case Intrinsic::aarch64_neon_faddp:
6695 // Add Long Pairwise
6696 case Intrinsic::aarch64_neon_saddlp:
6697 case Intrinsic::aarch64_neon_uaddlp: {
6698 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6699 break;
6700 }
6701
6702 // Floating-point Convert to integer, rounding to nearest with ties to Away
6703 case Intrinsic::aarch64_neon_fcvtas:
6704 case Intrinsic::aarch64_neon_fcvtau:
6705 // Floating-point convert to integer, rounding toward minus infinity
6706 case Intrinsic::aarch64_neon_fcvtms:
6707 case Intrinsic::aarch64_neon_fcvtmu:
6708 // Floating-point convert to integer, rounding to nearest with ties to even
6709 case Intrinsic::aarch64_neon_fcvtns:
6710 case Intrinsic::aarch64_neon_fcvtnu:
6711 // Floating-point convert to integer, rounding toward plus infinity
6712 case Intrinsic::aarch64_neon_fcvtps:
6713 case Intrinsic::aarch64_neon_fcvtpu:
6714 // Floating-point Convert to integer, rounding toward Zero
6715 case Intrinsic::aarch64_neon_fcvtzs:
6716 case Intrinsic::aarch64_neon_fcvtzu:
6717 // Floating-point convert to lower precision narrow, rounding to odd
6718 case Intrinsic::aarch64_neon_fcvtxn: {
6719 handleNEONVectorConvertIntrinsic(I);
6720 break;
6721 }
6722
6723 // Add reduction to scalar
6724 case Intrinsic::aarch64_neon_faddv:
6725 case Intrinsic::aarch64_neon_saddv:
6726 case Intrinsic::aarch64_neon_uaddv:
6727 // Signed/Unsigned min/max (Vector)
6728 // TODO: handling similarly to AND/OR may be more precise.
6729 case Intrinsic::aarch64_neon_smaxv:
6730 case Intrinsic::aarch64_neon_sminv:
6731 case Intrinsic::aarch64_neon_umaxv:
6732 case Intrinsic::aarch64_neon_uminv:
6733 // Floating-point min/max (vector)
6734 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6735 // but our shadow propagation is the same.
6736 case Intrinsic::aarch64_neon_fmaxv:
6737 case Intrinsic::aarch64_neon_fminv:
6738 case Intrinsic::aarch64_neon_fmaxnmv:
6739 case Intrinsic::aarch64_neon_fminnmv:
6740 // Sum long across vector
6741 case Intrinsic::aarch64_neon_saddlv:
6742 case Intrinsic::aarch64_neon_uaddlv:
6743 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6744 break;
6745
6746 case Intrinsic::aarch64_neon_ld1x2:
6747 case Intrinsic::aarch64_neon_ld1x3:
6748 case Intrinsic::aarch64_neon_ld1x4:
6749 case Intrinsic::aarch64_neon_ld2:
6750 case Intrinsic::aarch64_neon_ld3:
6751 case Intrinsic::aarch64_neon_ld4:
6752 case Intrinsic::aarch64_neon_ld2r:
6753 case Intrinsic::aarch64_neon_ld3r:
6754 case Intrinsic::aarch64_neon_ld4r: {
6755 handleNEONVectorLoad(I, /*WithLane=*/false);
6756 break;
6757 }
6758
6759 case Intrinsic::aarch64_neon_ld2lane:
6760 case Intrinsic::aarch64_neon_ld3lane:
6761 case Intrinsic::aarch64_neon_ld4lane: {
6762 handleNEONVectorLoad(I, /*WithLane=*/true);
6763 break;
6764 }
6765
6766 // Saturating extract narrow
6767 case Intrinsic::aarch64_neon_sqxtn:
6768 case Intrinsic::aarch64_neon_sqxtun:
6769 case Intrinsic::aarch64_neon_uqxtn:
6770 // These only have one argument, but we (ab)use handleShadowOr because it
6771 // does work on single argument intrinsics and will typecast the shadow
6772 // (and update the origin).
6773 handleShadowOr(I);
6774 break;
6775
6776 case Intrinsic::aarch64_neon_st1x2:
6777 case Intrinsic::aarch64_neon_st1x3:
6778 case Intrinsic::aarch64_neon_st1x4:
6779 case Intrinsic::aarch64_neon_st2:
6780 case Intrinsic::aarch64_neon_st3:
6781 case Intrinsic::aarch64_neon_st4: {
6782 handleNEONVectorStoreIntrinsic(I, false);
6783 break;
6784 }
6785
6786 case Intrinsic::aarch64_neon_st2lane:
6787 case Intrinsic::aarch64_neon_st3lane:
6788 case Intrinsic::aarch64_neon_st4lane: {
6789 handleNEONVectorStoreIntrinsic(I, true);
6790 break;
6791 }
6792
6793 // Arm NEON vector table intrinsics have the source/table register(s) as
6794 // arguments, followed by the index register. They return the output.
6795 //
6796 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6797 // original value unchanged in the destination register.'
6798 // Conveniently, zero denotes a clean shadow, which means out-of-range
6799 // indices for TBL will initialize the user data with zero and also clean
6800 // the shadow. (For TBX, neither the user data nor the shadow will be
6801 // updated, which is also correct.)
6802 case Intrinsic::aarch64_neon_tbl1:
6803 case Intrinsic::aarch64_neon_tbl2:
6804 case Intrinsic::aarch64_neon_tbl3:
6805 case Intrinsic::aarch64_neon_tbl4:
6806 case Intrinsic::aarch64_neon_tbx1:
6807 case Intrinsic::aarch64_neon_tbx2:
6808 case Intrinsic::aarch64_neon_tbx3:
6809 case Intrinsic::aarch64_neon_tbx4: {
6810 // The last trailing argument (index register) should be handled verbatim
6811 handleIntrinsicByApplyingToShadow(
6812 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6813 /*trailingVerbatimArgs*/ 1);
6814 break;
6815 }
6816
6817 case Intrinsic::aarch64_neon_fmulx:
6818 case Intrinsic::aarch64_neon_pmul:
6819 case Intrinsic::aarch64_neon_pmull:
6820 case Intrinsic::aarch64_neon_smull:
6821 case Intrinsic::aarch64_neon_pmull64:
6822 case Intrinsic::aarch64_neon_umull: {
6823 handleNEONVectorMultiplyIntrinsic(I);
6824 break;
6825 }
6826
6827 case Intrinsic::aarch64_neon_smmla:
6828 case Intrinsic::aarch64_neon_ummla:
6829 case Intrinsic::aarch64_neon_usmmla:
6830 handleNEONMatrixMultiply(I, /*ARows=*/2, /*ACols=*/8, /*BRows=*/8,
6831 /*BCols=*/2);
6832 break;
6833
6834 default:
6835 return false;
6836 }
6837
6838 return true;
6839 }
6840
6841 void visitIntrinsicInst(IntrinsicInst &I) {
6842 if (maybeHandleCrossPlatformIntrinsic(I))
6843 return;
6844
6845 if (maybeHandleX86SIMDIntrinsic(I))
6846 return;
6847
6848 if (maybeHandleArmSIMDIntrinsic(I))
6849 return;
6850
6851 if (maybeHandleUnknownIntrinsic(I))
6852 return;
6853
6854 visitInstruction(I);
6855 }
6856
6857 void visitLibAtomicLoad(CallBase &CB) {
6858 // Since we use getNextNode here, we can't have CB terminate the BB.
6859 assert(isa<CallInst>(CB));
6860
6861 IRBuilder<> IRB(&CB);
6862 Value *Size = CB.getArgOperand(0);
6863 Value *SrcPtr = CB.getArgOperand(1);
6864 Value *DstPtr = CB.getArgOperand(2);
6865 Value *Ordering = CB.getArgOperand(3);
6866 // Convert the call to have at least Acquire ordering to make sure
6867 // the shadow operations aren't reordered before it.
6868 Value *NewOrdering =
6869 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6870 CB.setArgOperand(3, NewOrdering);
6871
6872 NextNodeIRBuilder NextIRB(&CB);
6873 Value *SrcShadowPtr, *SrcOriginPtr;
6874 std::tie(SrcShadowPtr, SrcOriginPtr) =
6875 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6876 /*isStore*/ false);
6877 Value *DstShadowPtr =
6878 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6879 /*isStore*/ true)
6880 .first;
6881
6882 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6883 if (MS.TrackOrigins) {
6884 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6886 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6887 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6888 }
6889 }
6890
6891 void visitLibAtomicStore(CallBase &CB) {
6892 IRBuilder<> IRB(&CB);
6893 Value *Size = CB.getArgOperand(0);
6894 Value *DstPtr = CB.getArgOperand(2);
6895 Value *Ordering = CB.getArgOperand(3);
6896 // Convert the call to have at least Release ordering to make sure
6897 // the shadow operations aren't reordered after it.
6898 Value *NewOrdering =
6899 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6900 CB.setArgOperand(3, NewOrdering);
6901
6902 Value *DstShadowPtr =
6903 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6904 /*isStore*/ true)
6905 .first;
6906
6907 // Atomic store always paints clean shadow/origin. See file header.
6908 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6909 Align(1));
6910 }
6911
6912 void visitCallBase(CallBase &CB) {
6913 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6914 if (CB.isInlineAsm()) {
6915 // For inline asm (either a call to asm function, or callbr instruction),
6916 // do the usual thing: check argument shadow and mark all outputs as
6917 // clean. Note that any side effects of the inline asm that are not
6918 // immediately visible in its constraints are not handled.
6920 visitAsmInstruction(CB);
6921 else
6922 visitInstruction(CB);
6923 return;
6924 }
6925 LibFunc LF;
6926 if (TLI->getLibFunc(CB, LF)) {
6927 // libatomic.a functions need to have special handling because there isn't
6928 // a good way to intercept them or compile the library with
6929 // instrumentation.
6930 switch (LF) {
6931 case LibFunc_atomic_load:
6932 if (!isa<CallInst>(CB)) {
6933 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6934 "Ignoring!\n";
6935 break;
6936 }
6937 visitLibAtomicLoad(CB);
6938 return;
6939 case LibFunc_atomic_store:
6940 visitLibAtomicStore(CB);
6941 return;
6942 default:
6943 break;
6944 }
6945 }
6946
6947 if (auto *Call = dyn_cast<CallInst>(&CB)) {
6948 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
6949
6950 // We are going to insert code that relies on the fact that the callee
6951 // will become a non-readonly function after it is instrumented by us. To
6952 // prevent this code from being optimized out, mark that function
6953 // non-readonly in advance.
6954 // TODO: We can likely do better than dropping memory() completely here.
6955 AttributeMask B;
6956 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
6957
6959 if (Function *Func = Call->getCalledFunction()) {
6960 Func->removeFnAttrs(B);
6961 }
6962
6964 }
6965 IRBuilder<> IRB(&CB);
6966 bool MayCheckCall = MS.EagerChecks;
6967 if (Function *Func = CB.getCalledFunction()) {
6968 // __sanitizer_unaligned_{load,store} functions may be called by users
6969 // and always expects shadows in the TLS. So don't check them.
6970 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
6971 }
6972
6973 unsigned ArgOffset = 0;
6974 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
6975 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
6976 if (!A->getType()->isSized()) {
6977 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
6978 continue;
6979 }
6980
6981 if (A->getType()->isScalableTy()) {
6982 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
6983 // Handle as noundef, but don't reserve tls slots.
6984 insertCheckShadowOf(A, &CB);
6985 continue;
6986 }
6987
6988 unsigned Size = 0;
6989 const DataLayout &DL = F.getDataLayout();
6990
6991 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
6992 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
6993 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
6994
6995 if (EagerCheck) {
6996 insertCheckShadowOf(A, &CB);
6997 Size = DL.getTypeAllocSize(A->getType());
6998 } else {
6999 [[maybe_unused]] Value *Store = nullptr;
7000 // Compute the Shadow for arg even if it is ByVal, because
7001 // in that case getShadow() will copy the actual arg shadow to
7002 // __msan_param_tls.
7003 Value *ArgShadow = getShadow(A);
7004 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7005 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7006 << " Shadow: " << *ArgShadow << "\n");
7007 if (ByVal) {
7008 // ByVal requires some special handling as it's too big for a single
7009 // load
7010 assert(A->getType()->isPointerTy() &&
7011 "ByVal argument is not a pointer!");
7012 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
7013 if (ArgOffset + Size > kParamTLSSize)
7014 break;
7015 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
7016 MaybeAlign Alignment = std::nullopt;
7017 if (ParamAlignment)
7018 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
7019 Value *AShadowPtr, *AOriginPtr;
7020 std::tie(AShadowPtr, AOriginPtr) =
7021 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
7022 /*isStore*/ false);
7023 if (!PropagateShadow) {
7024 Store = IRB.CreateMemSet(ArgShadowBase,
7026 Size, Alignment);
7027 } else {
7028 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
7029 Alignment, Size);
7030 if (MS.TrackOrigins) {
7031 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7032 // FIXME: OriginSize should be:
7033 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7034 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
7035 IRB.CreateMemCpy(
7036 ArgOriginBase,
7037 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
7038 AOriginPtr,
7039 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
7040 }
7041 }
7042 } else {
7043 // Any other parameters mean we need bit-grained tracking of uninit
7044 // data
7045 Size = DL.getTypeAllocSize(A->getType());
7046 if (ArgOffset + Size > kParamTLSSize)
7047 break;
7048 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
7050 Constant *Cst = dyn_cast<Constant>(ArgShadow);
7051 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7052 IRB.CreateStore(getOrigin(A),
7053 getOriginPtrForArgument(IRB, ArgOffset));
7054 }
7055 }
7056 assert(Store != nullptr);
7057 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7058 }
7059 assert(Size != 0);
7060 ArgOffset += alignTo(Size, kShadowTLSAlignment);
7061 }
7062 LLVM_DEBUG(dbgs() << " done with call args\n");
7063
7064 FunctionType *FT = CB.getFunctionType();
7065 if (FT->isVarArg()) {
7066 VAHelper->visitCallBase(CB, IRB);
7067 }
7068
7069 // Now, get the shadow for the RetVal.
7070 if (!CB.getType()->isSized())
7071 return;
7072 // Don't emit the epilogue for musttail call returns.
7073 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
7074 return;
7075
7076 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
7077 setShadow(&CB, getCleanShadow(&CB));
7078 setOrigin(&CB, getCleanOrigin());
7079 return;
7080 }
7081
7082 IRBuilder<> IRBBefore(&CB);
7083 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7084 Value *Base = getShadowPtrForRetval(IRBBefore);
7085 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
7087 BasicBlock::iterator NextInsn;
7088 if (isa<CallInst>(CB)) {
7089 NextInsn = ++CB.getIterator();
7090 assert(NextInsn != CB.getParent()->end());
7091 } else {
7092 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
7093 if (!NormalDest->getSinglePredecessor()) {
7094 // FIXME: this case is tricky, so we are just conservative here.
7095 // Perhaps we need to split the edge between this BB and NormalDest,
7096 // but a naive attempt to use SplitEdge leads to a crash.
7097 setShadow(&CB, getCleanShadow(&CB));
7098 setOrigin(&CB, getCleanOrigin());
7099 return;
7100 }
7101 // FIXME: NextInsn is likely in a basic block that has not been visited
7102 // yet. Anything inserted there will be instrumented by MSan later!
7103 NextInsn = NormalDest->getFirstInsertionPt();
7104 assert(NextInsn != NormalDest->end() &&
7105 "Could not find insertion point for retval shadow load");
7106 }
7107 IRBuilder<> IRBAfter(&*NextInsn);
7108 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7109 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7110 "_msret");
7111 setShadow(&CB, RetvalShadow);
7112 if (MS.TrackOrigins)
7113 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7114 }
7115
7116 bool isAMustTailRetVal(Value *RetVal) {
7117 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7118 RetVal = I->getOperand(0);
7119 }
7120 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7121 return I->isMustTailCall();
7122 }
7123 return false;
7124 }
7125
7126 void visitReturnInst(ReturnInst &I) {
7127 IRBuilder<> IRB(&I);
7128 Value *RetVal = I.getReturnValue();
7129 if (!RetVal)
7130 return;
7131 // Don't emit the epilogue for musttail call returns.
7132 if (isAMustTailRetVal(RetVal))
7133 return;
7134 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7135 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7136 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7137 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7138 // must always return fully initialized values. For now, we hardcode "main".
7139 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7140
7141 Value *Shadow = getShadow(RetVal);
7142 bool StoreOrigin = true;
7143 if (EagerCheck) {
7144 insertCheckShadowOf(RetVal, &I);
7145 Shadow = getCleanShadow(RetVal);
7146 StoreOrigin = false;
7147 }
7148
7149 // The caller may still expect information passed over TLS if we pass our
7150 // check
7151 if (StoreShadow) {
7152 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7153 if (MS.TrackOrigins && StoreOrigin)
7154 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7155 }
7156 }
7157
7158 void visitPHINode(PHINode &I) {
7159 IRBuilder<> IRB(&I);
7160 if (!PropagateShadow) {
7161 setShadow(&I, getCleanShadow(&I));
7162 setOrigin(&I, getCleanOrigin());
7163 return;
7164 }
7165
7166 ShadowPHINodes.push_back(&I);
7167 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7168 "_msphi_s"));
7169 if (MS.TrackOrigins)
7170 setOrigin(
7171 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7172 }
7173
7174 Value *getLocalVarIdptr(AllocaInst &I) {
7175 ConstantInt *IntConst =
7176 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7177 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7178 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7179 IntConst);
7180 }
7181
7182 Value *getLocalVarDescription(AllocaInst &I) {
7183 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7184 }
7185
7186 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7187 if (PoisonStack && ClPoisonStackWithCall) {
7188 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7189 } else {
7190 Value *ShadowBase, *OriginBase;
7191 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7192 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7193
7194 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7195 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7196 }
7197
7198 if (PoisonStack && MS.TrackOrigins) {
7199 Value *Idptr = getLocalVarIdptr(I);
7200 if (ClPrintStackNames) {
7201 Value *Descr = getLocalVarDescription(I);
7202 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7203 {&I, Len, Idptr, Descr});
7204 } else {
7205 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7206 }
7207 }
7208 }
7209
7210 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7211 Value *Descr = getLocalVarDescription(I);
7212 if (PoisonStack) {
7213 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7214 } else {
7215 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7216 }
7217 }
7218
7219 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7220 if (!InsPoint)
7221 InsPoint = &I;
7222 NextNodeIRBuilder IRB(InsPoint);
7223 const DataLayout &DL = F.getDataLayout();
7224 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
7225 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
7226 if (I.isArrayAllocation())
7227 Len = IRB.CreateMul(Len,
7228 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
7229
7230 if (MS.CompileKernel)
7231 poisonAllocaKmsan(I, IRB, Len);
7232 else
7233 poisonAllocaUserspace(I, IRB, Len);
7234 }
7235
7236 void visitAllocaInst(AllocaInst &I) {
7237 setShadow(&I, getCleanShadow(&I));
7238 setOrigin(&I, getCleanOrigin());
7239 // We'll get to this alloca later unless it's poisoned at the corresponding
7240 // llvm.lifetime.start.
7241 AllocaSet.insert(&I);
7242 }
7243
7244 void visitSelectInst(SelectInst &I) {
7245 // a = select b, c, d
7246 Value *B = I.getCondition();
7247 Value *C = I.getTrueValue();
7248 Value *D = I.getFalseValue();
7249
7250 handleSelectLikeInst(I, B, C, D);
7251 }
7252
7253 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7254 IRBuilder<> IRB(&I);
7255
7256 Value *Sb = getShadow(B);
7257 Value *Sc = getShadow(C);
7258 Value *Sd = getShadow(D);
7259
7260 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7261 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7262 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7263
7264 // Result shadow if condition shadow is 0.
7265 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7266 Value *Sa1;
7267 if (I.getType()->isAggregateType()) {
7268 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7269 // an extra "select". This results in much more compact IR.
7270 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7271 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7272 } else if (isScalableNonVectorType(I.getType())) {
7273 // This is intended to handle target("aarch64.svcount"), which can't be
7274 // handled in the else branch because of incompatibility with CreateXor
7275 // ("The supported LLVM operations on this type are limited to load,
7276 // store, phi, select and alloca instructions").
7277
7278 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7279 // branch as needed instead.
7280 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7281 } else {
7282 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7283 // If Sb (condition is poisoned), look for bits in c and d that are equal
7284 // and both unpoisoned.
7285 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7286
7287 // Cast arguments to shadow-compatible type.
7288 C = CreateAppToShadowCast(IRB, C);
7289 D = CreateAppToShadowCast(IRB, D);
7290
7291 // Result shadow if condition shadow is 1.
7292 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7293 }
7294 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7295 setShadow(&I, Sa);
7296 if (MS.TrackOrigins) {
7297 // Origins are always i32, so any vector conditions must be flattened.
7298 // FIXME: consider tracking vector origins for app vectors?
7299 if (B->getType()->isVectorTy()) {
7300 B = convertToBool(B, IRB);
7301 Sb = convertToBool(Sb, IRB);
7302 }
7303 // a = select b, c, d
7304 // Oa = Sb ? Ob : (b ? Oc : Od)
7305 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7306 }
7307 }
7308
7309 void visitLandingPadInst(LandingPadInst &I) {
7310 // Do nothing.
7311 // See https://github.com/google/sanitizers/issues/504
7312 setShadow(&I, getCleanShadow(&I));
7313 setOrigin(&I, getCleanOrigin());
7314 }
7315
7316 void visitCatchSwitchInst(CatchSwitchInst &I) {
7317 setShadow(&I, getCleanShadow(&I));
7318 setOrigin(&I, getCleanOrigin());
7319 }
7320
7321 void visitFuncletPadInst(FuncletPadInst &I) {
7322 setShadow(&I, getCleanShadow(&I));
7323 setOrigin(&I, getCleanOrigin());
7324 }
7325
7326 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7327
7328 void visitExtractValueInst(ExtractValueInst &I) {
7329 IRBuilder<> IRB(&I);
7330 Value *Agg = I.getAggregateOperand();
7331 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7332 Value *AggShadow = getShadow(Agg);
7333 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7334 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7335 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7336 setShadow(&I, ResShadow);
7337 setOriginForNaryOp(I);
7338 }
7339
7340 void visitInsertValueInst(InsertValueInst &I) {
7341 IRBuilder<> IRB(&I);
7342 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7343 Value *AggShadow = getShadow(I.getAggregateOperand());
7344 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7345 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7346 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7347 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7348 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7349 setShadow(&I, Res);
7350 setOriginForNaryOp(I);
7351 }
7352
7353 void dumpInst(Instruction &I) {
7354 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7355 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7356 } else {
7357 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7358 }
7359 errs() << "QQQ " << I << "\n";
7360 }
7361
7362 void visitResumeInst(ResumeInst &I) {
7363 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7364 // Nothing to do here.
7365 }
7366
7367 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7368 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7369 // Nothing to do here.
7370 }
7371
7372 void visitCatchReturnInst(CatchReturnInst &CRI) {
7373 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7374 // Nothing to do here.
7375 }
7376
7377 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7378 IRBuilder<> &IRB, const DataLayout &DL,
7379 bool isOutput) {
7380 // For each assembly argument, we check its value for being initialized.
7381 // If the argument is a pointer, we assume it points to a single element
7382 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7383 // Each such pointer is instrumented with a call to the runtime library.
7384 Type *OpType = Operand->getType();
7385 // Check the operand value itself.
7386 insertCheckShadowOf(Operand, &I);
7387 if (!OpType->isPointerTy() || !isOutput) {
7388 assert(!isOutput);
7389 return;
7390 }
7391 if (!ElemTy->isSized())
7392 return;
7393 auto Size = DL.getTypeStoreSize(ElemTy);
7394 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7395 if (MS.CompileKernel) {
7396 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7397 } else {
7398 // ElemTy, derived from elementtype(), does not encode the alignment of
7399 // the pointer. Conservatively assume that the shadow memory is unaligned.
7400 // When Size is large, avoid StoreInst as it would expand to many
7401 // instructions.
7402 auto [ShadowPtr, _] =
7403 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7404 if (Size <= 32)
7405 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7406 else
7407 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7408 SizeVal, Align(1));
7409 }
7410 }
7411
7412 /// Get the number of output arguments returned by pointers.
7413 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7414 int NumRetOutputs = 0;
7415 int NumOutputs = 0;
7416 Type *RetTy = cast<Value>(CB)->getType();
7417 if (!RetTy->isVoidTy()) {
7418 // Register outputs are returned via the CallInst return value.
7419 auto *ST = dyn_cast<StructType>(RetTy);
7420 if (ST)
7421 NumRetOutputs = ST->getNumElements();
7422 else
7423 NumRetOutputs = 1;
7424 }
7425 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7426 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7427 switch (Info.Type) {
7429 NumOutputs++;
7430 break;
7431 default:
7432 break;
7433 }
7434 }
7435 return NumOutputs - NumRetOutputs;
7436 }
7437
7438 void visitAsmInstruction(Instruction &I) {
7439 // Conservative inline assembly handling: check for poisoned shadow of
7440 // asm() arguments, then unpoison the result and all the memory locations
7441 // pointed to by those arguments.
7442 // An inline asm() statement in C++ contains lists of input and output
7443 // arguments used by the assembly code. These are mapped to operands of the
7444 // CallInst as follows:
7445 // - nR register outputs ("=r) are returned by value in a single structure
7446 // (SSA value of the CallInst);
7447 // - nO other outputs ("=m" and others) are returned by pointer as first
7448 // nO operands of the CallInst;
7449 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7450 // remaining nI operands.
7451 // The total number of asm() arguments in the source is nR+nO+nI, and the
7452 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7453 // function to be called).
7454 const DataLayout &DL = F.getDataLayout();
7455 CallBase *CB = cast<CallBase>(&I);
7456 IRBuilder<> IRB(&I);
7457 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7458 int OutputArgs = getNumOutputArgs(IA, CB);
7459 // The last operand of a CallInst is the function itself.
7460 int NumOperands = CB->getNumOperands() - 1;
7461
7462 // Check input arguments. Doing so before unpoisoning output arguments, so
7463 // that we won't overwrite uninit values before checking them.
7464 for (int i = OutputArgs; i < NumOperands; i++) {
7465 Value *Operand = CB->getOperand(i);
7466 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7467 /*isOutput*/ false);
7468 }
7469 // Unpoison output arguments. This must happen before the actual InlineAsm
7470 // call, so that the shadow for memory published in the asm() statement
7471 // remains valid.
7472 for (int i = 0; i < OutputArgs; i++) {
7473 Value *Operand = CB->getOperand(i);
7474 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7475 /*isOutput*/ true);
7476 }
7477
7478 setShadow(&I, getCleanShadow(&I));
7479 setOrigin(&I, getCleanOrigin());
7480 }
7481
7482 void visitFreezeInst(FreezeInst &I) {
7483 // Freeze always returns a fully defined value.
7484 setShadow(&I, getCleanShadow(&I));
7485 setOrigin(&I, getCleanOrigin());
7486 }
7487
7488 void visitInstruction(Instruction &I) {
7489 // Everything else: stop propagating and check for poisoned shadow.
7491 dumpInst(I);
7492 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7493 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7494 Value *Operand = I.getOperand(i);
7495 if (Operand->getType()->isSized())
7496 insertCheckShadowOf(Operand, &I);
7497 }
7498 setShadow(&I, getCleanShadow(&I));
7499 setOrigin(&I, getCleanOrigin());
7500 }
7501};
7502
7503struct VarArgHelperBase : public VarArgHelper {
7504 Function &F;
7505 MemorySanitizer &MS;
7506 MemorySanitizerVisitor &MSV;
7507 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7508 const unsigned VAListTagSize;
7509
7510 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7511 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7512 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7513
7514 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7515 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7516 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7517 }
7518
7519 /// Compute the shadow address for a given va_arg.
7520 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7521 return IRB.CreatePtrAdd(
7522 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7523 }
7524
7525 /// Compute the shadow address for a given va_arg.
7526 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7527 unsigned ArgSize) {
7528 // Make sure we don't overflow __msan_va_arg_tls.
7529 if (ArgOffset + ArgSize > kParamTLSSize)
7530 return nullptr;
7531 return getShadowPtrForVAArgument(IRB, ArgOffset);
7532 }
7533
7534 /// Compute the origin address for a given va_arg.
7535 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7536 // getOriginPtrForVAArgument() is always called after
7537 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7538 // overflow.
7539 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7540 ConstantInt::get(MS.IntptrTy, ArgOffset),
7541 "_msarg_va_o");
7542 }
7543
7544 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7545 unsigned BaseOffset) {
7546 // The tails of __msan_va_arg_tls is not large enough to fit full
7547 // value shadow, but it will be copied to backup anyway. Make it
7548 // clean.
7549 if (BaseOffset >= kParamTLSSize)
7550 return;
7551 Value *TailSize =
7552 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7553 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7554 TailSize, Align(8));
7555 }
7556
7557 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7558 IRBuilder<> IRB(&I);
7559 Value *VAListTag = I.getArgOperand(0);
7560 const Align Alignment = Align(8);
7561 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7562 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7563 // Unpoison the whole __va_list_tag.
7564 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7565 VAListTagSize, Alignment, false);
7566 }
7567
7568 void visitVAStartInst(VAStartInst &I) override {
7569 if (F.getCallingConv() == CallingConv::Win64)
7570 return;
7571 VAStartInstrumentationList.push_back(&I);
7572 unpoisonVAListTagForInst(I);
7573 }
7574
7575 void visitVACopyInst(VACopyInst &I) override {
7576 if (F.getCallingConv() == CallingConv::Win64)
7577 return;
7578 unpoisonVAListTagForInst(I);
7579 }
7580};
7581
7582/// AMD64-specific implementation of VarArgHelper.
7583struct VarArgAMD64Helper : public VarArgHelperBase {
7584 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7585 // See a comment in visitCallBase for more details.
7586 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7587 static const unsigned AMD64FpEndOffsetSSE = 176;
7588 // If SSE is disabled, fp_offset in va_list is zero.
7589 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7590
7591 unsigned AMD64FpEndOffset;
7592 AllocaInst *VAArgTLSCopy = nullptr;
7593 AllocaInst *VAArgTLSOriginCopy = nullptr;
7594 Value *VAArgOverflowSize = nullptr;
7595
7596 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7597
7598 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7599 MemorySanitizerVisitor &MSV)
7600 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7601 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7602 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7603 if (Attr.isStringAttribute() &&
7604 (Attr.getKindAsString() == "target-features")) {
7605 if (Attr.getValueAsString().contains("-sse"))
7606 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7607 break;
7608 }
7609 }
7610 }
7611
7612 ArgKind classifyArgument(Value *arg) {
7613 // A very rough approximation of X86_64 argument classification rules.
7614 Type *T = arg->getType();
7615 if (T->isX86_FP80Ty())
7616 return AK_Memory;
7617 if (T->isFPOrFPVectorTy())
7618 return AK_FloatingPoint;
7619 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7620 return AK_GeneralPurpose;
7621 if (T->isPointerTy())
7622 return AK_GeneralPurpose;
7623 return AK_Memory;
7624 }
7625
7626 // For VarArg functions, store the argument shadow in an ABI-specific format
7627 // that corresponds to va_list layout.
7628 // We do this because Clang lowers va_arg in the frontend, and this pass
7629 // only sees the low level code that deals with va_list internals.
7630 // A much easier alternative (provided that Clang emits va_arg instructions)
7631 // would have been to associate each live instance of va_list with a copy of
7632 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7633 // order.
7634 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7635 unsigned GpOffset = 0;
7636 unsigned FpOffset = AMD64GpEndOffset;
7637 unsigned OverflowOffset = AMD64FpEndOffset;
7638 const DataLayout &DL = F.getDataLayout();
7639
7640 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7641 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7642 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7643 if (IsByVal) {
7644 // ByVal arguments always go to the overflow area.
7645 // Fixed arguments passed through the overflow area will be stepped
7646 // over by va_start, so don't count them towards the offset.
7647 if (IsFixed)
7648 continue;
7649 assert(A->getType()->isPointerTy());
7650 Type *RealTy = CB.getParamByValType(ArgNo);
7651 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7652 uint64_t AlignedSize = alignTo(ArgSize, 8);
7653 unsigned BaseOffset = OverflowOffset;
7654 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7655 Value *OriginBase = nullptr;
7656 if (MS.TrackOrigins)
7657 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7658 OverflowOffset += AlignedSize;
7659
7660 if (OverflowOffset > kParamTLSSize) {
7661 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7662 continue; // We have no space to copy shadow there.
7663 }
7664
7665 Value *ShadowPtr, *OriginPtr;
7666 std::tie(ShadowPtr, OriginPtr) =
7667 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7668 /*isStore*/ false);
7669 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7670 kShadowTLSAlignment, ArgSize);
7671 if (MS.TrackOrigins)
7672 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7673 kShadowTLSAlignment, ArgSize);
7674 } else {
7675 ArgKind AK = classifyArgument(A);
7676 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7677 AK = AK_Memory;
7678 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7679 AK = AK_Memory;
7680 Value *ShadowBase, *OriginBase = nullptr;
7681 switch (AK) {
7682 case AK_GeneralPurpose:
7683 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7684 if (MS.TrackOrigins)
7685 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7686 GpOffset += 8;
7687 assert(GpOffset <= kParamTLSSize);
7688 break;
7689 case AK_FloatingPoint:
7690 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7691 if (MS.TrackOrigins)
7692 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7693 FpOffset += 16;
7694 assert(FpOffset <= kParamTLSSize);
7695 break;
7696 case AK_Memory:
7697 if (IsFixed)
7698 continue;
7699 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7700 uint64_t AlignedSize = alignTo(ArgSize, 8);
7701 unsigned BaseOffset = OverflowOffset;
7702 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7703 if (MS.TrackOrigins) {
7704 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7705 }
7706 OverflowOffset += AlignedSize;
7707 if (OverflowOffset > kParamTLSSize) {
7708 // We have no space to copy shadow there.
7709 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7710 continue;
7711 }
7712 }
7713 // Take fixed arguments into account for GpOffset and FpOffset,
7714 // but don't actually store shadows for them.
7715 // TODO(glider): don't call get*PtrForVAArgument() for them.
7716 if (IsFixed)
7717 continue;
7718 Value *Shadow = MSV.getShadow(A);
7719 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7720 if (MS.TrackOrigins) {
7721 Value *Origin = MSV.getOrigin(A);
7722 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7723 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7725 }
7726 }
7727 }
7728 Constant *OverflowSize =
7729 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7730 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7731 }
7732
7733 void finalizeInstrumentation() override {
7734 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7735 "finalizeInstrumentation called twice");
7736 if (!VAStartInstrumentationList.empty()) {
7737 // If there is a va_start in this function, make a backup copy of
7738 // va_arg_tls somewhere in the function entry block.
7739 IRBuilder<> IRB(MSV.FnPrologueEnd);
7740 VAArgOverflowSize =
7741 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7742 Value *CopySize = IRB.CreateAdd(
7743 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7744 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7745 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7746 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7747 CopySize, kShadowTLSAlignment, false);
7748
7749 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7750 Intrinsic::umin, CopySize,
7751 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7752 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7753 kShadowTLSAlignment, SrcSize);
7754 if (MS.TrackOrigins) {
7755 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7756 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7757 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7758 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7759 }
7760 }
7761
7762 // Instrument va_start.
7763 // Copy va_list shadow from the backup copy of the TLS contents.
7764 for (CallInst *OrigInst : VAStartInstrumentationList) {
7765 NextNodeIRBuilder IRB(OrigInst);
7766 Value *VAListTag = OrigInst->getArgOperand(0);
7767
7768 Value *RegSaveAreaPtrPtr =
7769 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7770 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7771 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7772 const Align Alignment = Align(16);
7773 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7774 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7775 Alignment, /*isStore*/ true);
7776 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7777 AMD64FpEndOffset);
7778 if (MS.TrackOrigins)
7779 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7780 Alignment, AMD64FpEndOffset);
7781 Value *OverflowArgAreaPtrPtr =
7782 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7783 Value *OverflowArgAreaPtr =
7784 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7785 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7786 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7787 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7788 Alignment, /*isStore*/ true);
7789 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7790 AMD64FpEndOffset);
7791 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7792 VAArgOverflowSize);
7793 if (MS.TrackOrigins) {
7794 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7795 AMD64FpEndOffset);
7796 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7797 VAArgOverflowSize);
7798 }
7799 }
7800 }
7801};
7802
7803/// AArch64-specific implementation of VarArgHelper.
7804struct VarArgAArch64Helper : public VarArgHelperBase {
7805 static const unsigned kAArch64GrArgSize = 64;
7806 static const unsigned kAArch64VrArgSize = 128;
7807
7808 static const unsigned AArch64GrBegOffset = 0;
7809 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7810 // Make VR space aligned to 16 bytes.
7811 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7812 static const unsigned AArch64VrEndOffset =
7813 AArch64VrBegOffset + kAArch64VrArgSize;
7814 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7815
7816 AllocaInst *VAArgTLSCopy = nullptr;
7817 Value *VAArgOverflowSize = nullptr;
7818
7819 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7820
7821 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7822 MemorySanitizerVisitor &MSV)
7823 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7824
7825 // A very rough approximation of aarch64 argument classification rules.
7826 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7827 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7828 return {AK_GeneralPurpose, 1};
7829 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7830 return {AK_FloatingPoint, 1};
7831
7832 if (T->isArrayTy()) {
7833 auto R = classifyArgument(T->getArrayElementType());
7834 R.second *= T->getScalarType()->getArrayNumElements();
7835 return R;
7836 }
7837
7838 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7839 auto R = classifyArgument(FV->getScalarType());
7840 R.second *= FV->getNumElements();
7841 return R;
7842 }
7843
7844 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7845 return {AK_Memory, 0};
7846 }
7847
7848 // The instrumentation stores the argument shadow in a non ABI-specific
7849 // format because it does not know which argument is named (since Clang,
7850 // like x86_64 case, lowers the va_args in the frontend and this pass only
7851 // sees the low level code that deals with va_list internals).
7852 // The first seven GR registers are saved in the first 56 bytes of the
7853 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7854 // the remaining arguments.
7855 // Using constant offset within the va_arg TLS array allows fast copy
7856 // in the finalize instrumentation.
7857 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7858 unsigned GrOffset = AArch64GrBegOffset;
7859 unsigned VrOffset = AArch64VrBegOffset;
7860 unsigned OverflowOffset = AArch64VAEndOffset;
7861
7862 const DataLayout &DL = F.getDataLayout();
7863 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7864 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7865 auto [AK, RegNum] = classifyArgument(A->getType());
7866 if (AK == AK_GeneralPurpose &&
7867 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7868 AK = AK_Memory;
7869 if (AK == AK_FloatingPoint &&
7870 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7871 AK = AK_Memory;
7872 Value *Base;
7873 switch (AK) {
7874 case AK_GeneralPurpose:
7875 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7876 GrOffset += 8 * RegNum;
7877 break;
7878 case AK_FloatingPoint:
7879 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7880 VrOffset += 16 * RegNum;
7881 break;
7882 case AK_Memory:
7883 // Don't count fixed arguments in the overflow area - va_start will
7884 // skip right over them.
7885 if (IsFixed)
7886 continue;
7887 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7888 uint64_t AlignedSize = alignTo(ArgSize, 8);
7889 unsigned BaseOffset = OverflowOffset;
7890 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7891 OverflowOffset += AlignedSize;
7892 if (OverflowOffset > kParamTLSSize) {
7893 // We have no space to copy shadow there.
7894 CleanUnusedTLS(IRB, Base, BaseOffset);
7895 continue;
7896 }
7897 break;
7898 }
7899 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7900 // bother to actually store a shadow.
7901 if (IsFixed)
7902 continue;
7903 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7904 }
7905 Constant *OverflowSize =
7906 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7907 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7908 }
7909
7910 // Retrieve a va_list field of 'void*' size.
7911 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7912 Value *SaveAreaPtrPtr =
7913 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7914 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7915 }
7916
7917 // Retrieve a va_list field of 'int' size.
7918 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7919 Value *SaveAreaPtr =
7920 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7921 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7922 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7923 }
7924
7925 void finalizeInstrumentation() override {
7926 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7927 "finalizeInstrumentation called twice");
7928 if (!VAStartInstrumentationList.empty()) {
7929 // If there is a va_start in this function, make a backup copy of
7930 // va_arg_tls somewhere in the function entry block.
7931 IRBuilder<> IRB(MSV.FnPrologueEnd);
7932 VAArgOverflowSize =
7933 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7934 Value *CopySize = IRB.CreateAdd(
7935 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7936 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7937 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7938 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7939 CopySize, kShadowTLSAlignment, false);
7940
7941 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7942 Intrinsic::umin, CopySize,
7943 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7944 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7945 kShadowTLSAlignment, SrcSize);
7946 }
7947
7948 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
7949 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
7950
7951 // Instrument va_start, copy va_list shadow from the backup copy of
7952 // the TLS contents.
7953 for (CallInst *OrigInst : VAStartInstrumentationList) {
7954 NextNodeIRBuilder IRB(OrigInst);
7955
7956 Value *VAListTag = OrigInst->getArgOperand(0);
7957
7958 // The variadic ABI for AArch64 creates two areas to save the incoming
7959 // argument registers (one for 64-bit general register xn-x7 and another
7960 // for 128-bit FP/SIMD vn-v7).
7961 // We need then to propagate the shadow arguments on both regions
7962 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
7963 // The remaining arguments are saved on shadow for 'va::stack'.
7964 // One caveat is it requires only to propagate the non-named arguments,
7965 // however on the call site instrumentation 'all' the arguments are
7966 // saved. So to copy the shadow values from the va_arg TLS array
7967 // we need to adjust the offset for both GR and VR fields based on
7968 // the __{gr,vr}_offs value (since they are stores based on incoming
7969 // named arguments).
7970 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
7971
7972 // Read the stack pointer from the va_list.
7973 Value *StackSaveAreaPtr =
7974 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
7975
7976 // Read both the __gr_top and __gr_off and add them up.
7977 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
7978 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
7979
7980 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
7981 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
7982
7983 // Read both the __vr_top and __vr_off and add them up.
7984 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
7985 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
7986
7987 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
7988 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
7989
7990 // It does not know how many named arguments is being used and, on the
7991 // callsite all the arguments were saved. Since __gr_off is defined as
7992 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
7993 // argument by ignoring the bytes of shadow from named arguments.
7994 Value *GrRegSaveAreaShadowPtrOff =
7995 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
7996
7997 Value *GrRegSaveAreaShadowPtr =
7998 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7999 Align(8), /*isStore*/ true)
8000 .first;
8001
8002 Value *GrSrcPtr =
8003 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
8004 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
8005
8006 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
8007 GrCopySize);
8008
8009 // Again, but for FP/SIMD values.
8010 Value *VrRegSaveAreaShadowPtrOff =
8011 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
8012
8013 Value *VrRegSaveAreaShadowPtr =
8014 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8015 Align(8), /*isStore*/ true)
8016 .first;
8017
8018 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8019 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
8020 IRB.getInt32(AArch64VrBegOffset)),
8021 VrRegSaveAreaShadowPtrOff);
8022 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
8023
8024 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
8025 VrCopySize);
8026
8027 // And finally for remaining arguments.
8028 Value *StackSaveAreaShadowPtr =
8029 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
8030 Align(16), /*isStore*/ true)
8031 .first;
8032
8033 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8034 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
8035
8036 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
8037 Align(16), VAArgOverflowSize);
8038 }
8039 }
8040};
8041
8042/// PowerPC64-specific implementation of VarArgHelper.
8043struct VarArgPowerPC64Helper : public VarArgHelperBase {
8044 AllocaInst *VAArgTLSCopy = nullptr;
8045 Value *VAArgSize = nullptr;
8046
8047 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8048 MemorySanitizerVisitor &MSV)
8049 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8050
8051 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8052 // For PowerPC, we need to deal with alignment of stack arguments -
8053 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8054 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8055 // For that reason, we compute current offset from stack pointer (which is
8056 // always properly aligned), and offset for the first vararg, then subtract
8057 // them.
8058 unsigned VAArgBase;
8059 Triple TargetTriple(F.getParent()->getTargetTriple());
8060 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8061 // and 32 bytes for ABIv2. This is usually determined by target
8062 // endianness, but in theory could be overridden by function attribute.
8063 if (TargetTriple.isPPC64ELFv2ABI())
8064 VAArgBase = 32;
8065 else
8066 VAArgBase = 48;
8067 unsigned VAArgOffset = VAArgBase;
8068 const DataLayout &DL = F.getDataLayout();
8069 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8070 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8071 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8072 if (IsByVal) {
8073 assert(A->getType()->isPointerTy());
8074 Type *RealTy = CB.getParamByValType(ArgNo);
8075 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8076 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
8077 if (ArgAlign < 8)
8078 ArgAlign = Align(8);
8079 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8080 if (!IsFixed) {
8081 Value *Base =
8082 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8083 if (Base) {
8084 Value *AShadowPtr, *AOriginPtr;
8085 std::tie(AShadowPtr, AOriginPtr) =
8086 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8087 kShadowTLSAlignment, /*isStore*/ false);
8088
8089 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8090 kShadowTLSAlignment, ArgSize);
8091 }
8092 }
8093 VAArgOffset += alignTo(ArgSize, Align(8));
8094 } else {
8095 Value *Base;
8096 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8097 Align ArgAlign = Align(8);
8098 if (A->getType()->isArrayTy()) {
8099 // Arrays are aligned to element size, except for long double
8100 // arrays, which are aligned to 8 bytes.
8101 Type *ElementTy = A->getType()->getArrayElementType();
8102 if (!ElementTy->isPPC_FP128Ty())
8103 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8104 } else if (A->getType()->isVectorTy()) {
8105 // Vectors are naturally aligned.
8106 ArgAlign = Align(ArgSize);
8107 }
8108 if (ArgAlign < 8)
8109 ArgAlign = Align(8);
8110 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8111 if (DL.isBigEndian()) {
8112 // Adjusting the shadow for argument with size < 8 to match the
8113 // placement of bits in big endian system
8114 if (ArgSize < 8)
8115 VAArgOffset += (8 - ArgSize);
8116 }
8117 if (!IsFixed) {
8118 Base =
8119 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8120 if (Base)
8121 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8122 }
8123 VAArgOffset += ArgSize;
8124 VAArgOffset = alignTo(VAArgOffset, Align(8));
8125 }
8126 if (IsFixed)
8127 VAArgBase = VAArgOffset;
8128 }
8129
8130 Constant *TotalVAArgSize =
8131 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8132 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8133 // a new class member i.e. it is the total size of all VarArgs.
8134 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8135 }
8136
8137 void finalizeInstrumentation() override {
8138 assert(!VAArgSize && !VAArgTLSCopy &&
8139 "finalizeInstrumentation called twice");
8140 IRBuilder<> IRB(MSV.FnPrologueEnd);
8141 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8142 Value *CopySize = VAArgSize;
8143
8144 if (!VAStartInstrumentationList.empty()) {
8145 // If there is a va_start in this function, make a backup copy of
8146 // va_arg_tls somewhere in the function entry block.
8147
8148 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8149 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8150 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8151 CopySize, kShadowTLSAlignment, false);
8152
8153 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8154 Intrinsic::umin, CopySize,
8155 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8156 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8157 kShadowTLSAlignment, SrcSize);
8158 }
8159
8160 // Instrument va_start.
8161 // Copy va_list shadow from the backup copy of the TLS contents.
8162 for (CallInst *OrigInst : VAStartInstrumentationList) {
8163 NextNodeIRBuilder IRB(OrigInst);
8164 Value *VAListTag = OrigInst->getArgOperand(0);
8165 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8166
8167 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8168
8169 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8170 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8171 const DataLayout &DL = F.getDataLayout();
8172 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8173 const Align Alignment = Align(IntptrSize);
8174 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8175 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8176 Alignment, /*isStore*/ true);
8177 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8178 CopySize);
8179 }
8180 }
8181};
8182
8183/// PowerPC32-specific implementation of VarArgHelper.
8184struct VarArgPowerPC32Helper : public VarArgHelperBase {
8185 AllocaInst *VAArgTLSCopy = nullptr;
8186 Value *VAArgSize = nullptr;
8187
8188 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8189 MemorySanitizerVisitor &MSV)
8190 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8191
8192 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8193 unsigned VAArgBase;
8194 // Parameter save area is 8 bytes from frame pointer in PPC32
8195 VAArgBase = 8;
8196 unsigned VAArgOffset = VAArgBase;
8197 const DataLayout &DL = F.getDataLayout();
8198 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8199 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8200 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8201 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8202 if (IsByVal) {
8203 assert(A->getType()->isPointerTy());
8204 Type *RealTy = CB.getParamByValType(ArgNo);
8205 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8206 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8207 if (ArgAlign < IntptrSize)
8208 ArgAlign = Align(IntptrSize);
8209 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8210 if (!IsFixed) {
8211 Value *Base =
8212 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8213 if (Base) {
8214 Value *AShadowPtr, *AOriginPtr;
8215 std::tie(AShadowPtr, AOriginPtr) =
8216 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8217 kShadowTLSAlignment, /*isStore*/ false);
8218
8219 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8220 kShadowTLSAlignment, ArgSize);
8221 }
8222 }
8223 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8224 } else {
8225 Value *Base;
8226 Type *ArgTy = A->getType();
8227
8228 // On PPC 32 floating point variable arguments are stored in separate
8229 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8230 // them as they will be found when checking call arguments.
8231 if (!ArgTy->isFloatingPointTy()) {
8232 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8233 Align ArgAlign = Align(IntptrSize);
8234 if (ArgTy->isArrayTy()) {
8235 // Arrays are aligned to element size, except for long double
8236 // arrays, which are aligned to 8 bytes.
8237 Type *ElementTy = ArgTy->getArrayElementType();
8238 if (!ElementTy->isPPC_FP128Ty())
8239 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8240 } else if (ArgTy->isVectorTy()) {
8241 // Vectors are naturally aligned.
8242 ArgAlign = Align(ArgSize);
8243 }
8244 if (ArgAlign < IntptrSize)
8245 ArgAlign = Align(IntptrSize);
8246 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8247 if (DL.isBigEndian()) {
8248 // Adjusting the shadow for argument with size < IntptrSize to match
8249 // the placement of bits in big endian system
8250 if (ArgSize < IntptrSize)
8251 VAArgOffset += (IntptrSize - ArgSize);
8252 }
8253 if (!IsFixed) {
8254 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8255 ArgSize);
8256 if (Base)
8257 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8259 }
8260 VAArgOffset += ArgSize;
8261 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8262 }
8263 }
8264 }
8265
8266 Constant *TotalVAArgSize =
8267 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8268 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8269 // a new class member i.e. it is the total size of all VarArgs.
8270 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8271 }
8272
8273 void finalizeInstrumentation() override {
8274 assert(!VAArgSize && !VAArgTLSCopy &&
8275 "finalizeInstrumentation called twice");
8276 IRBuilder<> IRB(MSV.FnPrologueEnd);
8277 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8278 Value *CopySize = VAArgSize;
8279
8280 if (!VAStartInstrumentationList.empty()) {
8281 // If there is a va_start in this function, make a backup copy of
8282 // va_arg_tls somewhere in the function entry block.
8283
8284 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8285 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8286 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8287 CopySize, kShadowTLSAlignment, false);
8288
8289 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8290 Intrinsic::umin, CopySize,
8291 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8292 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8293 kShadowTLSAlignment, SrcSize);
8294 }
8295
8296 // Instrument va_start.
8297 // Copy va_list shadow from the backup copy of the TLS contents.
8298 for (CallInst *OrigInst : VAStartInstrumentationList) {
8299 NextNodeIRBuilder IRB(OrigInst);
8300 Value *VAListTag = OrigInst->getArgOperand(0);
8301 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8302 Value *RegSaveAreaSize = CopySize;
8303
8304 // In PPC32 va_list_tag is a struct
8305 RegSaveAreaPtrPtr =
8306 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8307
8308 // On PPC 32 reg_save_area can only hold 32 bytes of data
8309 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8310 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8311
8312 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8313 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8314
8315 const DataLayout &DL = F.getDataLayout();
8316 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8317 const Align Alignment = Align(IntptrSize);
8318
8319 { // Copy reg save area
8320 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8321 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8322 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8323 Alignment, /*isStore*/ true);
8324 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8325 Alignment, RegSaveAreaSize);
8326
8327 RegSaveAreaShadowPtr =
8328 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8329 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8330 ConstantInt::get(MS.IntptrTy, 32));
8331 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8332 // We fill fp shadow with zeroes as uninitialized fp args should have
8333 // been found during call base check
8334 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8335 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8336 }
8337
8338 { // Copy overflow area
8339 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8340 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8341
8342 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8343 OverflowAreaPtrPtr =
8344 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8345 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8346
8347 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8348
8349 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8350 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8351 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8352 Alignment, /*isStore*/ true);
8353
8354 Value *OverflowVAArgTLSCopyPtr =
8355 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8356 OverflowVAArgTLSCopyPtr =
8357 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8358
8359 OverflowVAArgTLSCopyPtr =
8360 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8361 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8362 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8363 }
8364 }
8365 }
8366};
8367
8368/// SystemZ-specific implementation of VarArgHelper.
8369struct VarArgSystemZHelper : public VarArgHelperBase {
8370 static const unsigned SystemZGpOffset = 16;
8371 static const unsigned SystemZGpEndOffset = 56;
8372 static const unsigned SystemZFpOffset = 128;
8373 static const unsigned SystemZFpEndOffset = 160;
8374 static const unsigned SystemZMaxVrArgs = 8;
8375 static const unsigned SystemZRegSaveAreaSize = 160;
8376 static const unsigned SystemZOverflowOffset = 160;
8377 static const unsigned SystemZVAListTagSize = 32;
8378 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8379 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8380
8381 bool IsSoftFloatABI;
8382 AllocaInst *VAArgTLSCopy = nullptr;
8383 AllocaInst *VAArgTLSOriginCopy = nullptr;
8384 Value *VAArgOverflowSize = nullptr;
8385
8386 enum class ArgKind {
8387 GeneralPurpose,
8388 FloatingPoint,
8389 Vector,
8390 Memory,
8391 Indirect,
8392 };
8393
8394 enum class ShadowExtension { None, Zero, Sign };
8395
8396 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8397 MemorySanitizerVisitor &MSV)
8398 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8399 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8400
8401 ArgKind classifyArgument(Type *T) {
8402 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8403 // only a few possibilities of what it can be. In particular, enums, single
8404 // element structs and large types have already been taken care of.
8405
8406 // Some i128 and fp128 arguments are converted to pointers only in the
8407 // back end.
8408 if (T->isIntegerTy(128) || T->isFP128Ty())
8409 return ArgKind::Indirect;
8410 if (T->isFloatingPointTy())
8411 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8412 if (T->isIntegerTy() || T->isPointerTy())
8413 return ArgKind::GeneralPurpose;
8414 if (T->isVectorTy())
8415 return ArgKind::Vector;
8416 return ArgKind::Memory;
8417 }
8418
8419 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8420 // ABI says: "One of the simple integer types no more than 64 bits wide.
8421 // ... If such an argument is shorter than 64 bits, replace it by a full
8422 // 64-bit integer representing the same number, using sign or zero
8423 // extension". Shadow for an integer argument has the same type as the
8424 // argument itself, so it can be sign or zero extended as well.
8425 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8426 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8427 if (ZExt) {
8428 assert(!SExt);
8429 return ShadowExtension::Zero;
8430 }
8431 if (SExt) {
8432 assert(!ZExt);
8433 return ShadowExtension::Sign;
8434 }
8435 return ShadowExtension::None;
8436 }
8437
8438 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8439 unsigned GpOffset = SystemZGpOffset;
8440 unsigned FpOffset = SystemZFpOffset;
8441 unsigned VrIndex = 0;
8442 unsigned OverflowOffset = SystemZOverflowOffset;
8443 const DataLayout &DL = F.getDataLayout();
8444 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8445 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8446 // SystemZABIInfo does not produce ByVal parameters.
8447 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8448 Type *T = A->getType();
8449 ArgKind AK = classifyArgument(T);
8450 if (AK == ArgKind::Indirect) {
8451 T = MS.PtrTy;
8452 AK = ArgKind::GeneralPurpose;
8453 }
8454 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8455 AK = ArgKind::Memory;
8456 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8457 AK = ArgKind::Memory;
8458 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8459 AK = ArgKind::Memory;
8460 Value *ShadowBase = nullptr;
8461 Value *OriginBase = nullptr;
8462 ShadowExtension SE = ShadowExtension::None;
8463 switch (AK) {
8464 case ArgKind::GeneralPurpose: {
8465 // Always keep track of GpOffset, but store shadow only for varargs.
8466 uint64_t ArgSize = 8;
8467 if (GpOffset + ArgSize <= kParamTLSSize) {
8468 if (!IsFixed) {
8469 SE = getShadowExtension(CB, ArgNo);
8470 uint64_t GapSize = 0;
8471 if (SE == ShadowExtension::None) {
8472 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8473 assert(ArgAllocSize <= ArgSize);
8474 GapSize = ArgSize - ArgAllocSize;
8475 }
8476 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8477 if (MS.TrackOrigins)
8478 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8479 }
8480 GpOffset += ArgSize;
8481 } else {
8482 GpOffset = kParamTLSSize;
8483 }
8484 break;
8485 }
8486 case ArgKind::FloatingPoint: {
8487 // Always keep track of FpOffset, but store shadow only for varargs.
8488 uint64_t ArgSize = 8;
8489 if (FpOffset + ArgSize <= kParamTLSSize) {
8490 if (!IsFixed) {
8491 // PoP says: "A short floating-point datum requires only the
8492 // left-most 32 bit positions of a floating-point register".
8493 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8494 // don't extend shadow and don't mind the gap.
8495 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8496 if (MS.TrackOrigins)
8497 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8498 }
8499 FpOffset += ArgSize;
8500 } else {
8501 FpOffset = kParamTLSSize;
8502 }
8503 break;
8504 }
8505 case ArgKind::Vector: {
8506 // Keep track of VrIndex. No need to store shadow, since vector varargs
8507 // go through AK_Memory.
8508 assert(IsFixed);
8509 VrIndex++;
8510 break;
8511 }
8512 case ArgKind::Memory: {
8513 // Keep track of OverflowOffset and store shadow only for varargs.
8514 // Ignore fixed args, since we need to copy only the vararg portion of
8515 // the overflow area shadow.
8516 if (!IsFixed) {
8517 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8518 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8519 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8520 SE = getShadowExtension(CB, ArgNo);
8521 uint64_t GapSize =
8522 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8523 ShadowBase =
8524 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8525 if (MS.TrackOrigins)
8526 OriginBase =
8527 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8528 OverflowOffset += ArgSize;
8529 } else {
8530 OverflowOffset = kParamTLSSize;
8531 }
8532 }
8533 break;
8534 }
8535 case ArgKind::Indirect:
8536 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8537 }
8538 if (ShadowBase == nullptr)
8539 continue;
8540 Value *Shadow = MSV.getShadow(A);
8541 if (SE != ShadowExtension::None)
8542 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8543 /*Signed*/ SE == ShadowExtension::Sign);
8544 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8545 IRB.CreateStore(Shadow, ShadowBase);
8546 if (MS.TrackOrigins) {
8547 Value *Origin = MSV.getOrigin(A);
8548 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8549 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8551 }
8552 }
8553 Constant *OverflowSize = ConstantInt::get(
8554 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8555 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8556 }
8557
8558 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8559 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8560 IRB.CreateAdd(
8561 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8562 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8563 MS.PtrTy);
8564 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8565 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8566 const Align Alignment = Align(8);
8567 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8568 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8569 /*isStore*/ true);
8570 // TODO(iii): copy only fragments filled by visitCallBase()
8571 // TODO(iii): support packed-stack && !use-soft-float
8572 // For use-soft-float functions, it is enough to copy just the GPRs.
8573 unsigned RegSaveAreaSize =
8574 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8575 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8576 RegSaveAreaSize);
8577 if (MS.TrackOrigins)
8578 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8579 Alignment, RegSaveAreaSize);
8580 }
8581
8582 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8583 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8584 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8585 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8586 IRB.CreateAdd(
8587 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8588 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8589 MS.PtrTy);
8590 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8591 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8592 const Align Alignment = Align(8);
8593 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8594 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8595 Alignment, /*isStore*/ true);
8596 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8597 SystemZOverflowOffset);
8598 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8599 VAArgOverflowSize);
8600 if (MS.TrackOrigins) {
8601 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8602 SystemZOverflowOffset);
8603 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8604 VAArgOverflowSize);
8605 }
8606 }
8607
8608 void finalizeInstrumentation() override {
8609 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8610 "finalizeInstrumentation called twice");
8611 if (!VAStartInstrumentationList.empty()) {
8612 // If there is a va_start in this function, make a backup copy of
8613 // va_arg_tls somewhere in the function entry block.
8614 IRBuilder<> IRB(MSV.FnPrologueEnd);
8615 VAArgOverflowSize =
8616 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8617 Value *CopySize =
8618 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8619 VAArgOverflowSize);
8620 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8621 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8622 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8623 CopySize, kShadowTLSAlignment, false);
8624
8625 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8626 Intrinsic::umin, CopySize,
8627 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8628 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8629 kShadowTLSAlignment, SrcSize);
8630 if (MS.TrackOrigins) {
8631 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8632 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8633 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8634 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8635 }
8636 }
8637
8638 // Instrument va_start.
8639 // Copy va_list shadow from the backup copy of the TLS contents.
8640 for (CallInst *OrigInst : VAStartInstrumentationList) {
8641 NextNodeIRBuilder IRB(OrigInst);
8642 Value *VAListTag = OrigInst->getArgOperand(0);
8643 copyRegSaveArea(IRB, VAListTag);
8644 copyOverflowArea(IRB, VAListTag);
8645 }
8646 }
8647};
8648
8649/// i386-specific implementation of VarArgHelper.
8650struct VarArgI386Helper : public VarArgHelperBase {
8651 AllocaInst *VAArgTLSCopy = nullptr;
8652 Value *VAArgSize = nullptr;
8653
8654 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8655 MemorySanitizerVisitor &MSV)
8656 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8657
8658 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8659 const DataLayout &DL = F.getDataLayout();
8660 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8661 unsigned VAArgOffset = 0;
8662 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8663 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8664 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8665 if (IsByVal) {
8666 assert(A->getType()->isPointerTy());
8667 Type *RealTy = CB.getParamByValType(ArgNo);
8668 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8669 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8670 if (ArgAlign < IntptrSize)
8671 ArgAlign = Align(IntptrSize);
8672 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8673 if (!IsFixed) {
8674 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8675 if (Base) {
8676 Value *AShadowPtr, *AOriginPtr;
8677 std::tie(AShadowPtr, AOriginPtr) =
8678 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8679 kShadowTLSAlignment, /*isStore*/ false);
8680
8681 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8682 kShadowTLSAlignment, ArgSize);
8683 }
8684 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8685 }
8686 } else {
8687 Value *Base;
8688 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8689 Align ArgAlign = Align(IntptrSize);
8690 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8691 if (DL.isBigEndian()) {
8692 // Adjusting the shadow for argument with size < IntptrSize to match
8693 // the placement of bits in big endian system
8694 if (ArgSize < IntptrSize)
8695 VAArgOffset += (IntptrSize - ArgSize);
8696 }
8697 if (!IsFixed) {
8698 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8699 if (Base)
8700 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8701 VAArgOffset += ArgSize;
8702 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8703 }
8704 }
8705 }
8706
8707 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8708 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8709 // a new class member i.e. it is the total size of all VarArgs.
8710 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8711 }
8712
8713 void finalizeInstrumentation() override {
8714 assert(!VAArgSize && !VAArgTLSCopy &&
8715 "finalizeInstrumentation called twice");
8716 IRBuilder<> IRB(MSV.FnPrologueEnd);
8717 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8718 Value *CopySize = VAArgSize;
8719
8720 if (!VAStartInstrumentationList.empty()) {
8721 // If there is a va_start in this function, make a backup copy of
8722 // va_arg_tls somewhere in the function entry block.
8723 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8724 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8725 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8726 CopySize, kShadowTLSAlignment, false);
8727
8728 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8729 Intrinsic::umin, CopySize,
8730 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8731 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8732 kShadowTLSAlignment, SrcSize);
8733 }
8734
8735 // Instrument va_start.
8736 // Copy va_list shadow from the backup copy of the TLS contents.
8737 for (CallInst *OrigInst : VAStartInstrumentationList) {
8738 NextNodeIRBuilder IRB(OrigInst);
8739 Value *VAListTag = OrigInst->getArgOperand(0);
8740 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8741 Value *RegSaveAreaPtrPtr =
8742 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8743 PointerType::get(*MS.C, 0));
8744 Value *RegSaveAreaPtr =
8745 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8746 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8747 const DataLayout &DL = F.getDataLayout();
8748 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8749 const Align Alignment = Align(IntptrSize);
8750 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8751 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8752 Alignment, /*isStore*/ true);
8753 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8754 CopySize);
8755 }
8756 }
8757};
8758
8759/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8760/// LoongArch64.
8761struct VarArgGenericHelper : public VarArgHelperBase {
8762 AllocaInst *VAArgTLSCopy = nullptr;
8763 Value *VAArgSize = nullptr;
8764
8765 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8766 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8767 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8768
8769 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8770 unsigned VAArgOffset = 0;
8771 const DataLayout &DL = F.getDataLayout();
8772 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8773 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8774 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8775 if (IsFixed)
8776 continue;
8777 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8778 if (DL.isBigEndian()) {
8779 // Adjusting the shadow for argument with size < IntptrSize to match the
8780 // placement of bits in big endian system
8781 if (ArgSize < IntptrSize)
8782 VAArgOffset += (IntptrSize - ArgSize);
8783 }
8784 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8785 VAArgOffset += ArgSize;
8786 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8787 if (!Base)
8788 continue;
8789 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8790 }
8791
8792 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8793 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8794 // a new class member i.e. it is the total size of all VarArgs.
8795 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8796 }
8797
8798 void finalizeInstrumentation() override {
8799 assert(!VAArgSize && !VAArgTLSCopy &&
8800 "finalizeInstrumentation called twice");
8801 IRBuilder<> IRB(MSV.FnPrologueEnd);
8802 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8803 Value *CopySize = VAArgSize;
8804
8805 if (!VAStartInstrumentationList.empty()) {
8806 // If there is a va_start in this function, make a backup copy of
8807 // va_arg_tls somewhere in the function entry block.
8808 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8809 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8810 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8811 CopySize, kShadowTLSAlignment, false);
8812
8813 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8814 Intrinsic::umin, CopySize,
8815 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8816 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8817 kShadowTLSAlignment, SrcSize);
8818 }
8819
8820 // Instrument va_start.
8821 // Copy va_list shadow from the backup copy of the TLS contents.
8822 for (CallInst *OrigInst : VAStartInstrumentationList) {
8823 NextNodeIRBuilder IRB(OrigInst);
8824 Value *VAListTag = OrigInst->getArgOperand(0);
8825 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8826 Value *RegSaveAreaPtrPtr =
8827 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8828 PointerType::get(*MS.C, 0));
8829 Value *RegSaveAreaPtr =
8830 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8831 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8832 const DataLayout &DL = F.getDataLayout();
8833 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8834 const Align Alignment = Align(IntptrSize);
8835 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8836 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8837 Alignment, /*isStore*/ true);
8838 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8839 CopySize);
8840 }
8841 }
8842};
8843
8844// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8845// regarding VAArgs.
8846using VarArgARM32Helper = VarArgGenericHelper;
8847using VarArgRISCVHelper = VarArgGenericHelper;
8848using VarArgMIPSHelper = VarArgGenericHelper;
8849using VarArgLoongArch64Helper = VarArgGenericHelper;
8850
8851/// A no-op implementation of VarArgHelper.
8852struct VarArgNoOpHelper : public VarArgHelper {
8853 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8854 MemorySanitizerVisitor &MSV) {}
8855
8856 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8857
8858 void visitVAStartInst(VAStartInst &I) override {}
8859
8860 void visitVACopyInst(VACopyInst &I) override {}
8861
8862 void finalizeInstrumentation() override {}
8863};
8864
8865} // end anonymous namespace
8866
8867static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8868 MemorySanitizerVisitor &Visitor) {
8869 // VarArg handling is only implemented on AMD64. False positives are possible
8870 // on other platforms.
8871 Triple TargetTriple(Func.getParent()->getTargetTriple());
8872
8873 if (TargetTriple.getArch() == Triple::x86)
8874 return new VarArgI386Helper(Func, Msan, Visitor);
8875
8876 if (TargetTriple.getArch() == Triple::x86_64)
8877 return new VarArgAMD64Helper(Func, Msan, Visitor);
8878
8879 if (TargetTriple.isARM())
8880 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8881
8882 if (TargetTriple.isAArch64())
8883 return new VarArgAArch64Helper(Func, Msan, Visitor);
8884
8885 if (TargetTriple.isSystemZ())
8886 return new VarArgSystemZHelper(Func, Msan, Visitor);
8887
8888 // On PowerPC32 VAListTag is a struct
8889 // {char, char, i16 padding, char *, char *}
8890 if (TargetTriple.isPPC32())
8891 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8892
8893 if (TargetTriple.isPPC64())
8894 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8895
8896 if (TargetTriple.isRISCV32())
8897 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8898
8899 if (TargetTriple.isRISCV64())
8900 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8901
8902 if (TargetTriple.isMIPS32())
8903 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8904
8905 if (TargetTriple.isMIPS64())
8906 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8907
8908 if (TargetTriple.isLoongArch64())
8909 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8910 /*VAListTagSize=*/8);
8911
8912 return new VarArgNoOpHelper(Func, Msan, Visitor);
8913}
8914
8915bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8916 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8917 return false;
8918
8919 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8920 return false;
8921
8922 MemorySanitizerVisitor Visitor(F, *this, TLI);
8923
8924 // Clear out memory attributes.
8926 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8927 F.removeFnAttrs(B);
8928
8929 return Visitor.runOnFunction();
8930}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_S390X_MemoryMapParams
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:145
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:483
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=false)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:135
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:802
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2553
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1939
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1833
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2607
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2541
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1867
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2071
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2222
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2600
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2065
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2170
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2007
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2304
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1926
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1784
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2465
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1808
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2300
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2175
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1850
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2053
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2575
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1863
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2165
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2633
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2479
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2039
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2332
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2312
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2248
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2628
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1886
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2012
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2410
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2762
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:318
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:413
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1066
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1109
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1082
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:418
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1114
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1055
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1061
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:943
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1087
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1034
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1133
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:61
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:280
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:197
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:230
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:207
unsigned getNumOperands() const
Definition User.h:229
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:397
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
Definition Types.h:26
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:532
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1667
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2544
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3865
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70