LLVM 23.0.0git
IRSimilarityIdentifier.h
Go to the documentation of this file.
1//===- IRSimilarityIdentifier.h - Find similarity in a module --------------==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// \file
10// Interface file for the IRSimilarityIdentifier for identifying similarities in
11// IR including the IRInstructionMapper, which maps an Instruction to unsigned
12// integers.
13//
14// Two sequences of instructions are called "similar" if they perform the same
15// series of operations for all inputs.
16//
17// \code
18// %1 = add i32 %a, 10
19// %2 = add i32 %a, %1
20// %3 = icmp slt icmp %1, %2
21// \endcode
22//
23// and
24//
25// \code
26// %1 = add i32 11, %a
27// %2 = sub i32 %a, %1
28// %3 = icmp sgt icmp %2, %1
29// \endcode
30//
31// ultimately have the same result, even if the inputs, and structure are
32// slightly different.
33//
34// For instructions, we do not worry about operands that do not have fixed
35// semantic meaning to the program. We consider the opcode that the instruction
36// has, the types, parameters, and extra information such as the function name,
37// or comparison predicate. These are used to create a hash to map instructions
38// to integers to be used in similarity matching in sequences of instructions
39//
40// Terminology:
41// An IRSimilarityCandidate is a region of IRInstructionData (wrapped
42// Instructions), usually used to denote a region of similarity has been found.
43//
44// A SimilarityGroup is a set of IRSimilarityCandidates that are structurally
45// similar to one another.
46//
47//===----------------------------------------------------------------------===//
48
49#ifndef LLVM_ANALYSIS_IRSIMILARITYIDENTIFIER_H
50#define LLVM_ANALYSIS_IRSIMILARITYIDENTIFIER_H
51
52#include "llvm/IR/InstVisitor.h"
54#include "llvm/IR/PassManager.h"
55#include "llvm/Pass.h"
58#include <optional>
59
60namespace llvm {
61
62namespace IRSimilarity {
63
65
66/// This represents what is and is not supported when finding similarity in
67/// Instructions.
68///
69/// Legal Instructions are considered when looking at similarity between
70/// Instructions.
71///
72/// Illegal Instructions cannot be considered when looking for similarity
73/// between Instructions. They act as boundaries between similarity regions.
74///
75/// Invisible Instructions are skipped over during analysis.
76// TODO: Shared with MachineOutliner
78
79/// This provides the utilities for hashing an Instruction to an unsigned
80/// integer. Two IRInstructionDatas produce the same hash value when their
81/// underlying Instructions perform the same operation (even if they don't have
82/// the same input operands.)
83/// As a more concrete example, consider the following:
84///
85/// \code
86/// %add1 = add i32 %a, %b
87/// %add2 = add i32 %c, %d
88/// %add3 = add i64 %e, %f
89/// \endcode
90///
91// Then the IRInstructionData wrappers for these Instructions may be hashed like
92/// so:
93///
94/// \code
95/// ; These two adds have the same types and operand types, so they hash to the
96/// ; same number.
97/// %add1 = add i32 %a, %b ; Hash: 1
98/// %add2 = add i32 %c, %d ; Hash: 1
99/// ; This add produces an i64. This differentiates it from %add1 and %add2. So,
100/// ; it hashes to a different number.
101/// %add3 = add i64 %e, %f; Hash: 2
102/// \endcode
103///
104///
105/// This hashing scheme will be used to represent the program as a very long
106/// string. This string can then be placed in a data structure which can be used
107/// for similarity queries.
108///
109/// TODO: Handle types of Instructions which can be equal even with different
110/// operands. (E.g. comparisons with swapped predicates.)
111/// TODO: Handle CallInsts, which are only checked for function type
112/// by \ref isSameOperationAs.
113/// TODO: Handle GetElementPtrInsts, as some of the operands have to be the
114/// exact same, and some do not.
116 : ilist_node<IRInstructionData, ilist_sentinel_tracking<true>> {
117
118 /// The source Instruction that is being wrapped.
119 Instruction *Inst = nullptr;
120 /// The values of the operands in the Instruction.
122 /// The legality of the wrapped instruction. This is informed by InstrType,
123 /// and is used when checking when two instructions are considered similar.
124 /// If either instruction is not legal, the instructions are automatically not
125 /// considered similar.
126 bool Legal = false;
127
128 /// This is only relevant if we are wrapping a CmpInst where we needed to
129 /// change the predicate of a compare instruction from a greater than form
130 /// to a less than form. It is std::nullopt otherwise.
131 std::optional<CmpInst::Predicate> RevisedPredicate;
132
133 /// This is only relevant if we are wrapping a CallInst. If we are requiring
134 /// that the function calls have matching names as well as types, and the
135 /// call is not an indirect call, this will hold the name of the function. If
136 /// it is an indirect string, it will be the empty string. However, if this
137 /// requirement is not in place it will be the empty string regardless of the
138 /// function call type. The value held here is used to create the hash of the
139 /// instruction, and check to make sure two instructions are close to one
140 /// another.
141 std::optional<std::string> CalleeName;
142
143 /// This structure holds the distances of how far "ahead of" or "behind" the
144 /// target blocks of a branch, or the incoming blocks of a phi nodes are.
145 /// If the value is negative, it means that the block was registered before
146 /// the block of this instruction in terms of blocks in the function.
147 /// Code Example:
148 /// \code
149 /// block_1:
150 /// br i1 %0, label %block_2, label %block_3
151 /// block_2:
152 /// br i1 %1, label %block_1, label %block_2
153 /// block_3:
154 /// br i1 %2, label %block_2, label %block_1
155 /// ; Replacing the labels with relative values, this becomes:
156 /// block_1:
157 /// br i1 %0, distance 1, distance 2
158 /// block_2:
159 /// br i1 %1, distance -1, distance 0
160 /// block_3:
161 /// br i1 %2, distance -1, distance -2
162 /// \endcode
163 /// Taking block_2 as our example, block_1 is "behind" block_2, and block_2 is
164 /// "ahead" of block_2.
166
167 /// Gather the information that is difficult to gather for an Instruction, or
168 /// is changed. i.e. the operands of an Instruction and the Types of those
169 /// operands. This extra information allows for similarity matching to make
170 /// assertions that allow for more flexibility when checking for whether an
171 /// Instruction performs the same operation.
172 LLVM_ABI IRInstructionData(Instruction &I, bool Legality,
175
176 /// Fills data stuctures for IRInstructionData when it is constructed from a
177 // reference or a pointer.
179
180 /// Get the predicate that the compare instruction is using for hashing the
181 /// instruction. the IRInstructionData must be wrapping a CmpInst.
183
184 /// Get the callee name that the call instruction is using for hashing the
185 /// instruction. The IRInstructionData must be wrapping a CallInst.
187
188 /// A function that swaps the predicates to their less than form if they are
189 /// in a greater than form. Otherwise, the predicate is unchanged.
190 ///
191 /// \param CI - The comparison operation to find a consistent preidcate for.
192 /// \return the consistent comparison predicate.
194
195 /// For an IRInstructionData containing a branch, finds the
196 /// relative distances from the source basic block to the target by taking
197 /// the difference of the number assigned to the current basic block and the
198 /// target basic block of the branch.
199 ///
200 /// \param BasicBlockToInteger - The mapping of basic blocks to their location
201 /// in the module.
202 LLVM_ABI void
204
205 /// For an IRInstructionData containing a CallInst, set the function name
206 /// appropriately. This will be an empty string if it is an indirect call,
207 /// or we are not matching by name of the called function. It will be the
208 /// name of the function if \p MatchByName is true and it is not an indirect
209 /// call. We may decide not to match by name in order to expand the
210 /// size of the regions we can match. If a function name has the same type
211 /// signature, but the different name, the region of code is still almost the
212 /// same. Since function names can be treated as constants, the name itself
213 /// could be extrapolated away. However, matching by name provides a
214 /// specificity and more "identical" code than not matching by name.
215 ///
216 /// \param MatchByName - A flag to mark whether we are using the called
217 /// function name as a differentiating parameter.
218 LLVM_ABI void setCalleeName(bool MatchByName = true);
219
220 /// For an IRInstructionData containing a PHINode, finds the
221 /// relative distances from the incoming basic block to the current block by
222 /// taking the difference of the number assigned to the current basic block
223 /// and the incoming basic block of the branch.
224 ///
225 /// \param BasicBlockToInteger - The mapping of basic blocks to their location
226 /// in the module.
227 LLVM_ABI void
229
230 /// Get the BasicBlock based operands for PHINodes and BranchInsts.
231 ///
232 /// \returns A list of relevant BasicBlocks.
234
235 /// Hashes \p Value based on its opcode, types, and operand types.
236 /// Two IRInstructionData instances produce the same hash when they perform
237 /// the same operation.
238 ///
239 /// As a simple example, consider the following instructions.
240 ///
241 /// \code
242 /// %add1 = add i32 %x1, %y1
243 /// %add2 = add i32 %x2, %y2
244 ///
245 /// %sub = sub i32 %x1, %y1
246 ///
247 /// %add_i64 = add i64 %x2, %y2
248 /// \endcode
249 ///
250 /// Because the first two adds operate the same types, and are performing the
251 /// same action, they will be hashed to the same value.
252 ///
253 /// However, the subtraction instruction is not the same as an addition, and
254 /// will be hashed to a different value.
255 ///
256 /// Finally, the last add has a different type compared to the first two add
257 /// instructions, so it will also be hashed to a different value that any of
258 /// the previous instructions.
259 ///
260 /// \param [in] ID - The IRInstructionData instance to be hashed.
261 /// \returns A hash_value of the IRInstructionData.
263 SmallVector<Type *, 4> OperTypes;
264 for (Value *V : ID.OperVals)
265 OperTypes.push_back(V->getType());
266
267 if (isa<CmpInst>(ID.Inst))
268 return llvm::hash_combine(llvm::hash_value(ID.Inst->getOpcode()),
269 llvm::hash_value(ID.Inst->getType()),
270 llvm::hash_value(ID.getPredicate()),
271 llvm::hash_combine_range(OperTypes));
272
274 // To hash intrinsics, we use the opcode, and types like the other
275 // instructions, but also, the Intrinsic ID, and the Name of the
276 // intrinsic.
277 Intrinsic::ID IntrinsicID = II->getIntrinsicID();
278 return llvm::hash_combine(llvm::hash_value(ID.Inst->getOpcode()),
279 llvm::hash_value(ID.Inst->getType()),
280 llvm::hash_value(IntrinsicID),
281 llvm::hash_value(*ID.CalleeName),
282 llvm::hash_combine_range(OperTypes));
283 }
284
285 if (isa<CallInst>(ID.Inst)) {
286 std::string FunctionName = *ID.CalleeName;
287 return llvm::hash_combine(llvm::hash_value(ID.Inst->getOpcode()),
288 llvm::hash_value(ID.Inst->getType()),
289 llvm::hash_value(ID.Inst->getType()),
290 llvm::hash_value(FunctionName),
291 llvm::hash_combine_range(OperTypes));
292 }
293
294 return llvm::hash_combine(llvm::hash_value(ID.Inst->getOpcode()),
295 llvm::hash_value(ID.Inst->getType()),
296 llvm::hash_combine_range(OperTypes));
297 }
298
300};
301
303 : simple_ilist<IRInstructionData, ilist_sentinel_tracking<true>> {};
304
305/// Compare one IRInstructionData class to another IRInstructionData class for
306/// whether they are performing a the same operation, and can mapped to the
307/// same value. For regular instructions if the hash value is the same, then
308/// they will also be close.
309///
310/// \param A - The first IRInstructionData class to compare
311/// \param B - The second IRInstructionData class to compare
312/// \returns true if \p A and \p B are similar enough to be mapped to the same
313/// value.
315
316struct IRInstructionDataTraits : DenseMapInfo<IRInstructionData *> {
317 static unsigned getHashValue(const IRInstructionData *E) {
318 using llvm::hash_value;
319 assert(E && "IRInstructionData is a nullptr?");
320 return hash_value(*E);
321 }
322
323 static bool isEqual(const IRInstructionData *LHS,
324 const IRInstructionData *RHS) {
325 assert(LHS && RHS && "nullptr is not expected as a key");
326 return isClose(*LHS, *RHS);
327 }
328};
329
330/// Helper struct for converting the Instructions in a Module into a vector of
331/// unsigned integers. This vector of unsigned integers can be thought of as a
332/// "numeric string". This numeric string can then be queried by, for example,
333/// data structures that find repeated substrings.
334///
335/// This hashing is done per BasicBlock in the module. To hash Instructions
336/// based off of their operations, each Instruction is wrapped in an
337/// IRInstructionData struct. The unsigned integer for an IRInstructionData
338/// depends on:
339/// - The hash provided by the IRInstructionData.
340/// - Which member of InstrType the IRInstructionData is classified as.
341// See InstrType for more details on the possible classifications, and how they
342// manifest in the numeric string.
343///
344/// The numeric string for an individual BasicBlock is terminated by an unique
345/// unsigned integer. This prevents data structures which rely on repetition
346/// from matching across BasicBlocks. (For example, the SuffixTree.)
347/// As a concrete example, if we have the following two BasicBlocks:
348/// \code
349/// bb0:
350/// %add1 = add i32 %a, %b
351/// %add2 = add i32 %c, %d
352/// %add3 = add i64 %e, %f
353/// bb1:
354/// %sub = sub i32 %c, %d
355/// \endcode
356/// We may hash the Instructions like this (via IRInstructionData):
357/// \code
358/// bb0:
359/// %add1 = add i32 %a, %b ; Hash: 1
360/// %add2 = add i32 %c, %d; Hash: 1
361/// %add3 = add i64 %e, %f; Hash: 2
362/// bb1:
363/// %sub = sub i32 %c, %d; Hash: 3
364/// %add4 = add i32 %c, %d ; Hash: 1
365/// \endcode
366/// And produce a "numeric string representation" like so:
367/// 1, 1, 2, unique_integer_1, 3, 1, unique_integer_2
368///
369/// TODO: This is very similar to the MachineOutliner, and should be
370/// consolidated into the same interface.
372 /// The starting illegal instruction number to map to.
373 ///
374 /// Set to -3 for compatibility with DenseMapInfo<unsigned>.
375 unsigned IllegalInstrNumber = static_cast<unsigned>(-3);
376
377 /// The next available integer to assign to a legal Instruction to.
378 unsigned LegalInstrNumber = 0;
379
380 /// Correspondence from IRInstructionData to unsigned integers.
383
384 /// A mapping for a basic block in a module to its assigned number/location
385 /// in the module.
387
388 /// Set if we added an illegal number in the previous step.
389 /// Since each illegal number is unique, we only need one of them between
390 /// each range of legal numbers. This lets us make sure we don't add more
391 /// than one illegal number per range.
393
394 /// Marks whether we found a illegal instruction in the previous step.
396
397 /// Marks whether we have found a set of instructions that is long enough
398 /// to be considered for similarity.
399 bool HaveLegalRange = false;
400
401 /// Marks whether we should use exact function names, as well as types to
402 /// find similarity between calls.
404
405 /// This allocator pointer is in charge of holding on to the IRInstructionData
406 /// so it is not deallocated until whatever external tool is using it is done
407 /// with the information.
409
410 /// This allocator pointer is in charge of creating the IRInstructionDataList
411 /// so it is not deallocated until whatever external tool is using it is done
412 /// with the information.
414
415 /// Get an allocated IRInstructionData struct using the InstDataAllocator.
416 ///
417 /// \param I - The Instruction to wrap with IRInstructionData.
418 /// \param Legality - A boolean value that is true if the instruction is to
419 /// be considered for similarity, and false if not.
420 /// \param IDL - The InstructionDataList that the IRInstructionData is
421 /// inserted into.
422 /// \returns An allocated IRInstructionData struct.
426
427 /// Get an empty allocated IRInstructionData struct using the
428 /// InstDataAllocator.
429 ///
430 /// \param IDL - The InstructionDataList that the IRInstructionData is
431 /// inserted into.
432 /// \returns An allocated IRInstructionData struct.
435
436 /// Get an allocated IRInstructionDataList object using the IDLAllocator.
437 ///
438 /// \returns An allocated IRInstructionDataList object.
440
442
443 /// Assigns values to all the basic blocks in function \p F starting from
444 /// integer \p BBNumber.
445 ///
446 /// \param F - The function containing the basic blocks to assign numbers to.
447 /// \param BBNumber - The number to start from.
448 void initializeForBBs(Function &F, unsigned &BBNumber) {
449 for (BasicBlock &BB : F)
450 BasicBlockToInteger.insert(std::make_pair(&BB, BBNumber++));
451 }
452
453 /// Assigns values to all the basic blocks in Module \p M.
454 /// \param M - The module containing the basic blocks to assign numbers to.
456 unsigned BBNumber = 0;
457 for (Function &F : M)
458 initializeForBBs(F, BBNumber);
459 }
460
461 /// Maps the Instructions in a BasicBlock \p BB to legal or illegal integers
462 /// determined by \p InstrType. Two Instructions are mapped to the same value
463 /// if they are close as defined by the InstructionData class above.
464 ///
465 /// \param [in] BB - The BasicBlock to be mapped to integers.
466 /// \param [in,out] InstrList - Vector of IRInstructionData to append to.
467 /// \param [in,out] IntegerMapping - Vector of unsigned integers to append to.
468 LLVM_ABI void
470 std::vector<IRInstructionData *> &InstrList,
471 std::vector<unsigned> &IntegerMapping);
472
473 /// Maps an Instruction to a legal integer.
474 ///
475 /// \param [in] It - The Instruction to be mapped to an integer.
476 /// \param [in,out] IntegerMappingForBB - Vector of unsigned integers to
477 /// append to.
478 /// \param [in,out] InstrListForBB - Vector of InstructionData to append to.
479 /// \returns The integer \p It was mapped to.
480 LLVM_ABI unsigned
482 std::vector<unsigned> &IntegerMappingForBB,
483 std::vector<IRInstructionData *> &InstrListForBB);
484
485 /// Maps an Instruction to an illegal integer.
486 ///
487 /// \param [in] It - The \p Instruction to be mapped to an integer.
488 /// \param [in,out] IntegerMappingForBB - Vector of unsigned integers to
489 /// append to.
490 /// \param [in,out] InstrListForBB - Vector of IRInstructionData to append to.
491 /// \param End - true if creating a dummy IRInstructionData at the end of a
492 /// basic block.
493 /// \returns The integer \p It was mapped to.
495 BasicBlock::iterator &It, std::vector<unsigned> &IntegerMappingForBB,
496 std::vector<IRInstructionData *> &InstrListForBB, bool End = false);
497
504
505 /// Custom InstVisitor to classify different instructions for whether it can
506 /// be analyzed for similarity.
508 : public InstVisitor<InstructionClassification, InstrType> {
510
511 // TODO: Determine a scheme to resolve when the label is similar enough.
513 if (EnableBranches)
514 return Legal;
515 return Illegal;
516 }
518 if (EnableBranches)
519 return Legal;
520 return Illegal;
521 }
523 if (EnableBranches)
524 return Legal;
525 return Illegal;
526 }
527 // TODO: Handle allocas.
529 // We exclude variable argument instructions since variable arguments
530 // requires extra checking of the argument list.
532 // We exclude all exception handling cases since they are so context
533 // dependent.
537 // These are disabled due to complications in the CodeExtractor when
538 // outlining these instructions. For instance, It is unclear what we
539 // should do when moving only the start or end lifetime instruction into
540 // an outlined function. Also, assume-like intrinsics could be removed
541 // from the region, removing arguments, causing discrepencies in the
542 // number of inputs between different regions.
543 if (II.isAssumeLikeIntrinsic())
544 return Illegal;
545 return EnableIntrinsics ? Legal : Illegal;
546 }
547 // We only allow call instructions where the function has a name and
548 // is not an indirect call.
551 bool IsIndirectCall = CI.isIndirectCall();
553 return Illegal;
554 if (!F && !IsIndirectCall)
555 return Illegal;
556 // Functions marked with the swifttailcc and tailcc calling conventions
557 // require special handling when outlining musttail functions. The
558 // calling convention must be passed down to the outlined function as
559 // well. Further, there is special handling for musttail calls as well,
560 // requiring a return call directly after. For now, the outliner does not
561 // support this, so we do not handle matching this case either.
565 return Illegal;
567 return Illegal;
568 return Legal;
569 }
570 // TODO: We do not current handle similarity that changes the control flow.
572 // TODO: We do not current handle similarity that changes the control flow.
574 // TODO: Handle interblock similarity.
577
578 // The flag variable that lets the classifier know whether we should
579 // allow branches to be checked for similarity.
580 bool EnableBranches = false;
581
582 // The flag variable that lets the classifier know whether we should
583 // allow indirect calls to be considered legal instructions.
585
586 // Flag that lets the classifier know whether we should allow intrinsics to
587 // be checked for similarity.
588 bool EnableIntrinsics = false;
589
590 // Flag that lets the classifier know whether we should allow tail calls to
591 // be checked for similarity.
593 };
594
595 /// Maps an Instruction to a member of InstrType.
597};
598
599/// This is a class that wraps a range of IRInstructionData from one point to
600/// another in the vector of IRInstructionData, which is a region of the
601/// program. It is also responsible for defining the structure within this
602/// region of instructions.
603///
604/// The structure of a region is defined through a value numbering system
605/// assigned to each unique value in a region at the creation of the
606/// IRSimilarityCandidate.
607///
608/// For example, for each Instruction we add a mapping for each new
609/// value seen in that Instruction.
610/// IR: Mapping Added:
611/// %add1 = add i32 %a, c1 %add1 -> 3, %a -> 1, c1 -> 2
612/// %add2 = add i32 %a, %1 %add2 -> 4
613/// %add3 = add i32 c2, c1 %add3 -> 6, c2 -> 5
614///
615/// We can compare IRSimilarityCandidates against one another.
616/// The \ref isSimilar function compares each IRInstructionData against one
617/// another and if we have the same sequences of IRInstructionData that would
618/// create the same hash, we have similar IRSimilarityCandidates.
619///
620/// We can also compare the structure of IRSimilarityCandidates. If we can
621/// create a mapping of registers in the region contained by one
622/// IRSimilarityCandidate to the region contained by different
623/// IRSimilarityCandidate, they can be considered structurally similar.
624///
625/// IRSimilarityCandidate1: IRSimilarityCandidate2:
626/// %add1 = add i32 %a, %b %add1 = add i32 %d, %e
627/// %add2 = add i32 %a, %c %add2 = add i32 %d, %f
628/// %add3 = add i32 c1, c2 %add3 = add i32 c3, c4
629///
630/// Can have the following mapping from candidate to candidate of:
631/// %a -> %d, %b -> %e, %c -> %f, c1 -> c3, c2 -> c4
632/// and can be considered similar.
633///
634/// IRSimilarityCandidate1: IRSimilarityCandidate2:
635/// %add1 = add i32 %a, %b %add1 = add i32 %d, c4
636/// %add2 = add i32 %a, %c %add2 = add i32 %d, %f
637/// %add3 = add i32 c1, c2 %add3 = add i32 c3, c4
638///
639/// We cannot create the same mapping since the use of c4 is not used in the
640/// same way as %b or c2.
642private:
643 /// The start index of this IRSimilarityCandidate in the instruction list.
644 unsigned StartIdx = 0;
645
646 /// The number of instructions in this IRSimilarityCandidate.
647 unsigned Len = 0;
648
649 /// The first instruction in this IRSimilarityCandidate.
650 IRInstructionData *FirstInst = nullptr;
651
652 /// The last instruction in this IRSimilarityCandidate.
653 IRInstructionData *LastInst = nullptr;
654
655 /// Global Value Numbering structures
656 /// @{
657 /// Stores the mapping of the value to the number assigned to it in the
658 /// IRSimilarityCandidate.
659 DenseMap<Value *, unsigned> ValueToNumber;
660 /// Stores the mapping of the number to the value assigned this number.
661 DenseMap<unsigned, Value *> NumberToValue;
662 /// Stores the mapping of a value's number to canonical numbering in the
663 /// candidate's respective similarity group.
664 DenseMap<unsigned, unsigned> NumberToCanonNum;
665 /// Stores the mapping of canonical number in the candidate's respective
666 /// similarity group to a value number.
667 DenseMap<unsigned, unsigned> CanonNumToNumber;
668 /// @}
669
670public:
671 /// \param StartIdx - The starting location of the region.
672 /// \param Len - The length of the region.
673 /// \param FirstInstIt - The starting IRInstructionData of the region.
674 /// \param LastInstIt - The ending IRInstructionData of the region.
675 LLVM_ABI IRSimilarityCandidate(unsigned StartIdx, unsigned Len,
676 IRInstructionData *FirstInstIt,
677 IRInstructionData *LastInstIt);
678
679 /// \param A - The first IRInstructionCandidate to compare.
680 /// \param B - The second IRInstructionCandidate to compare.
681 /// \returns True when every IRInstructionData in \p A is similar to every
682 /// IRInstructionData in \p B.
683 LLVM_ABI static bool isSimilar(const IRSimilarityCandidate &A,
684 const IRSimilarityCandidate &B);
685
686 /// \param [in] A - The first IRInstructionCandidate to compare.
687 /// \param [in] B - The second IRInstructionCandidate to compare.
688 /// \returns True when every IRInstructionData in \p A is structurally similar
689 /// to \p B.
691 const IRSimilarityCandidate &B);
692
693 /// \param [in] A - The first IRInstructionCandidate to compare.
694 /// \param [in] B - The second IRInstructionCandidate to compare.
695 /// \param [in,out] ValueNumberMappingA - A mapping of value numbers from
696 /// candidate \p A to candidate \B.
697 /// \param [in,out] ValueNumberMappingB - A mapping of value numbers from
698 /// candidate \p B to candidate \A.
699 /// \returns True when every IRInstructionData in \p A is structurally similar
700 /// to \p B.
701 LLVM_ABI static bool
704 DenseMap<unsigned, DenseSet<unsigned>> &ValueNumberMappingA,
705 DenseMap<unsigned, DenseSet<unsigned>> &ValueNumberMappingB);
706
708 /// The IRSimilarityCandidate that holds the instruction the OperVals were
709 /// pulled from.
711
712 /// The operand values to be analyzed.
714
715 /// The current mapping of global value numbers from one IRSimilarityCandidate
716 /// to another IRSimilarityCandidate.
718 };
719
720 /// A helper struct to hold the candidate, for a branch instruction, the
721 /// relative location of a label, and the label itself. This is mostly to
722 /// group the values together before passing them as a bundle to a function.
724 /// The IRSimilarityCandidate that holds the instruction the relative
725 /// location was pulled from.
727
728 /// The relative location to be analyzed.
730
731 /// The corresponding value.
733 };
734
735 /// Compare the operands in \p A and \p B and check that the current mapping
736 /// of global value numbers from \p A to \p B and \p B to \A is consistent.
737 ///
738 /// \param A - The first IRInstructionCandidate, operand values, and current
739 /// operand mappings to compare.
740 /// \param B - The second IRInstructionCandidate, operand values, and current
741 /// operand mappings to compare.
742 /// \returns true if the IRSimilarityCandidates operands are compatible.
745
746 /// Compare the operands in \p A and \p B and check that the current mapping
747 /// of global value numbers from \p A to \p B and \p B to \A is consistent
748 /// given that the operands are commutative.
749 ///
750 /// \param A - The first IRInstructionCandidate, operand values, and current
751 /// operand mappings to compare.
752 /// \param B - The second IRInstructionCandidate, operand values, and current
753 /// operand mappings to compare.
754 /// \returns true if the IRSimilarityCandidates operands are compatible.
757
758 /// Compare the GVN of the assignment value in corresponding instructions in
759 /// IRSimilarityCandidates \p A and \p B and check that there exists a mapping
760 /// between the values and replaces the mapping with a one-to-one value if
761 /// needed.
762 ///
763 /// \param InstValA - The assignment GVN from the first IRSimilarityCandidate.
764 /// \param InstValB - The assignment GVN from the second
765 /// IRSimilarityCandidate.
766 /// \param [in,out] ValueNumberMappingA - A mapping of value numbers from
767 /// candidate \p A to candidate \B.
768 /// \param [in,out] ValueNumberMappingB - A mapping of value numbers from
769 /// candidate \p B to candidate \A.
770 /// \returns true if the IRSimilarityCandidates assignments are compatible.
772 const unsigned InstValA, const unsigned &InstValB,
773 DenseMap<unsigned, DenseSet<unsigned>> &ValueNumberMappingA,
774 DenseMap<unsigned, DenseSet<unsigned>> &ValueNumberMappingB);
775
776 /// Compare the relative locations in \p A and \p B and check that the
777 /// distances match if both locations are contained in the region, and that
778 /// the branches both point outside the region if they do not.
779 /// Example Region:
780 /// \code
781 /// entry:
782 /// br i1 %0, label %block_1, label %block_3
783 /// block_0:
784 /// br i1 %0, label %block_1, label %block_2
785 /// block_1:
786 /// br i1 %0, label %block_2, label %block_3
787 /// block_2:
788 /// br i1 %1, label %block_1, label %block_4
789 /// block_3:
790 /// br i1 %2, label %block_2, label %block_5
791 /// \endcode
792 /// If we compare the branches in block_0 and block_1 the relative values are
793 /// 1 and 2 for both, so we consider this a match.
794 ///
795 /// If we compare the branches in entry and block_0 the relative values are
796 /// 2 and 3, and 1 and 2 respectively. Since these are not the same we do not
797 /// consider them a match.
798 ///
799 /// If we compare the branches in block_1 and block_2 the relative values are
800 /// 1 and 2, and -1 and None respectively. As a result we do not consider
801 /// these to be the same
802 ///
803 /// If we compare the branches in block_2 and block_3 the relative values are
804 /// -1 and None for both. We do consider these to be a match.
805 ///
806 /// \param A - The first IRInstructionCandidate, relative location value,
807 /// and incoming block.
808 /// \param B - The second IRInstructionCandidate, relative location value,
809 /// and incoming block.
810 /// \returns true if the relative locations match.
813
814 /// Create a mapping from the value numbering to a different separate set of
815 /// numbers. This will serve as a guide for relating one candidate to another.
816 /// The canonical number gives use the ability identify which global value
817 /// number in one candidate relates to the global value number in the other.
818 ///
819 /// \param [in, out] CurrCand - The IRSimilarityCandidate to create a
820 /// canonical numbering for.
821 LLVM_ABI static void
823
824 /// Create a mapping for the value numbering of the calling
825 /// IRSimilarityCandidate, to a different separate set of numbers, based on
826 /// the canonical ordering in \p SourceCand. These are defined based on the
827 /// found mappings in \p ToSourceMapping and \p FromSourceMapping. Both of
828 /// these relationships should have the same information, just in opposite
829 /// directions.
830 ///
831 /// \param [in, out] SourceCand - The IRSimilarityCandidate to create a
832 /// canonical numbering from.
833 /// \param ToSourceMapping - The mapping of value numbers from this candidate
834 /// to \p SourceCand.
835 /// \param FromSourceMapping - The mapping of value numbers from \p SoureCand
836 /// to this candidate.
838 IRSimilarityCandidate &SourceCand,
839 DenseMap<unsigned, DenseSet<unsigned>> &ToSourceMapping,
840 DenseMap<unsigned, DenseSet<unsigned>> &FromSourceMapping);
841
842 /// Create a mapping for the value numbering of the calling
843 /// IRSimilarityCandidate, to a different separate set of numbers, based on
844 /// the canonical ordering in \p SourceCand. These are defined based on the
845 /// found mappings in \p ToSourceMapping and \p FromSourceMapping. Both of
846 /// these relationships should have the same information, just in opposite
847 /// directions. Uses the \p OneToOne mapping from target candidate to \p
848 /// SourceCand GVNs to determine the mapping first for values with multiple
849 /// mappings. This mapping is created by the ordering of operands in the
850 /// instruction they are first seen in the candidates.
851 ///
852 /// \param [in, out] SourceCand - The IRSimilarityCandidate to create a
853 /// canonical numbering from.
854 /// \param [in,out] OneToOne - A mapping of value numbers from candidate
855 /// \p A to candidate \B using the structure of the original instructions.
856 /// \param ToSourceMapping - The mapping of value numbers from this candidate
857 /// to \p SourceCand.
858 /// \param FromSourceMapping - The mapping of value numbers from \p SoureCand
859 /// to this candidate.
862 DenseMap<unsigned, DenseSet<unsigned>> &ToSourceMapping,
863 DenseMap<unsigned, DenseSet<unsigned>> &FromSourceMapping);
864
865 /// Create a mapping for the value numbering of the calling
866 /// IRSimilarityCandidate, to a different separate set of numbers, based on
867 /// the canonical ordering in \p SourceCand. These are defined based on the
868 /// canonical mapping defined between \p SoureCandLarge and
869 /// \p TargetCandLarge. These IRSimilarityCandidates are already structurally
870 /// similar, and fully encapsulate the IRSimilarityCandidates in question.
871 /// These are used as a "bridge" from the \p SourceCand to the target.
872 ///
873 /// \param [in, out] SourceCand - The IRSimilarityCandidate to create a
874 /// canonical numbering from.
875 /// \param SoureCandLarge - The IRSimilarityCandidate fully containing
876 /// \p SourceCand.
877 /// \param TargetCandLarge - The IRSimilarityCandidate fully containing
878 /// this Candidate.
879 LLVM_ABI void
881 IRSimilarityCandidate &SourceCandLarge,
882 IRSimilarityCandidate &TargetCandLarge);
883
884 /// \param [in,out] BBSet - The set to track the basic blocks.
886 for (IRInstructionData &ID : *this) {
887 BasicBlock *BB = ID.Inst->getParent();
888 BBSet.insert(BB);
889 }
890 }
891
892 /// \param [in,out] BBSet - The set to track the basic blocks.
893 /// \param [in,out] BBList - A list in order of use to track the basic blocks.
895 SmallVector<BasicBlock *> &BBList) const {
896 for (IRInstructionData &ID : *this) {
897 BasicBlock *BB = ID.Inst->getParent();
898 if (BBSet.insert(BB).second)
899 BBList.push_back(BB);
900 }
901 }
902
903 /// Compare the start and end indices of the two IRSimilarityCandidates for
904 /// whether they overlap. If the start instruction of one
905 /// IRSimilarityCandidate is less than the end instruction of the other, and
906 /// the start instruction of one is greater than the start instruction of the
907 /// other, they overlap.
908 ///
909 /// \returns true if the IRSimilarityCandidates do not have overlapping
910 /// instructions.
911 LLVM_ABI static bool overlap(const IRSimilarityCandidate &A,
912 const IRSimilarityCandidate &B);
913
914 /// \returns the number of instructions in this Candidate.
915 unsigned getLength() const { return Len; }
916
917 /// \returns the start index of this IRSimilarityCandidate.
918 unsigned getStartIdx() const { return StartIdx; }
919
920 /// \returns the end index of this IRSimilarityCandidate.
921 unsigned getEndIdx() const { return StartIdx + Len - 1; }
922
923 /// \returns The first IRInstructionData.
924 IRInstructionData *front() const { return FirstInst; }
925 /// \returns The last IRInstructionData.
926 IRInstructionData *back() const { return LastInst; }
927
928 /// \returns The first Instruction.
929 Instruction *frontInstruction() { return FirstInst->Inst; }
930 /// \returns The last Instruction
931 Instruction *backInstruction() { return LastInst->Inst; }
932
933 /// \returns The BasicBlock the IRSimilarityCandidate starts in.
934 BasicBlock *getStartBB() { return FirstInst->Inst->getParent(); }
935 /// \returns The BasicBlock the IRSimilarityCandidate ends in.
936 BasicBlock *getEndBB() { return LastInst->Inst->getParent(); }
937
938 /// \returns The Function that the IRSimilarityCandidate is located in.
940
941 /// Finds the positive number associated with \p V if it has been mapped.
942 /// \param [in] V - the Value to find.
943 /// \returns The positive number corresponding to the value.
944 /// \returns std::nullopt if not present.
945 std::optional<unsigned> getGVN(Value *V) {
946 assert(V != nullptr && "Value is a nullptr?");
947 auto VNIt = ValueToNumber.find(V);
948 if (VNIt == ValueToNumber.end())
949 return std::nullopt;
950 return VNIt->second;
951 }
952
953 /// Finds the Value associate with \p Num if it exists.
954 /// \param [in] Num - the number to find.
955 /// \returns The Value associated with the number.
956 /// \returns std::nullopt if not present.
957 std::optional<Value *> fromGVN(unsigned Num) {
958 auto VNIt = NumberToValue.find(Num);
959 if (VNIt == NumberToValue.end())
960 return std::nullopt;
961 assert(VNIt->second != nullptr && "Found value is a nullptr!");
962 return VNIt->second;
963 }
964
965 /// Find the canonical number from the global value number \p N stored in the
966 /// candidate.
967 ///
968 /// \param N - The global value number to find the canonical number for.
969 /// \returns An optional containing the value, and std::nullopt if it could
970 /// not be found.
971 std::optional<unsigned> getCanonicalNum(unsigned N) {
972 auto NCIt = NumberToCanonNum.find(N);
973 if (NCIt == NumberToCanonNum.end())
974 return std::nullopt;
975 return NCIt->second;
976 }
977
978 /// Find the global value number from the canonical number \p N stored in the
979 /// candidate.
980 ///
981 /// \param N - The canonical number to find the global vlaue number for.
982 /// \returns An optional containing the value, and std::nullopt if it could
983 /// not be found.
984 std::optional<unsigned> fromCanonicalNum(unsigned N) {
985 auto CNIt = CanonNumToNumber.find(N);
986 if (CNIt == CanonNumToNumber.end())
987 return std::nullopt;
988 return CNIt->second;
989 }
990
991 /// \param RHS -The IRSimilarityCandidate to compare against
992 /// \returns true if the IRSimilarityCandidate is occurs after the
993 /// IRSimilarityCandidate in the program.
995 return getStartIdx() > RHS.getStartIdx();
996 }
997
999 iterator begin() const { return iterator(front()); }
1000 iterator end() const { return std::next(iterator(back())); }
1001};
1002
1006typedef std::vector<IRSimilarityCandidate> SimilarityGroup;
1007typedef std::vector<SimilarityGroup> SimilarityGroupList;
1008
1009/// This class puts all the pieces of the IRInstructionData,
1010/// IRInstructionMapper, IRSimilarityCandidate together.
1011///
1012/// It first feeds the Module or vector of Modules into the IRInstructionMapper,
1013/// and puts all the mapped instructions into a single long list of
1014/// IRInstructionData.
1015///
1016/// The list of unsigned integers is given to the Suffix Tree or similar data
1017/// structure to find repeated subsequences. We construct an
1018/// IRSimilarityCandidate for each instance of the subsequence. We compare them
1019/// against one another since These repeated subsequences can have different
1020/// structure. For each different kind of structure found, we create a
1021/// similarity group.
1022///
1023/// If we had four IRSimilarityCandidates A, B, C, and D where A, B and D are
1024/// structurally similar to one another, while C is different we would have two
1025/// SimilarityGroups:
1026///
1027/// SimilarityGroup 1: SimilarityGroup 2
1028/// A, B, D C
1029///
1030/// A list of the different similarity groups is then returned after
1031/// analyzing the module.
1033public:
1034 IRSimilarityIdentifier(bool MatchBranches = true,
1035 bool MatchIndirectCalls = true,
1036 bool MatchCallsWithName = false,
1037 bool MatchIntrinsics = true,
1038 bool MatchMustTailCalls = true)
1039 : Mapper(&InstDataAllocator, &InstDataListAllocator),
1040 EnableBranches(MatchBranches), EnableIndirectCalls(MatchIndirectCalls),
1041 EnableMatchingCallsByName(MatchCallsWithName),
1042 EnableIntrinsics(MatchIntrinsics),
1043 EnableMustTailCalls(MatchMustTailCalls) {}
1044
1045private:
1046 /// Map the instructions in the module to unsigned integers, using mapping
1047 /// already present in the Mapper if possible.
1048 ///
1049 /// \param [in] M Module - To map to integers.
1050 /// \param [in,out] InstrList - The vector to append IRInstructionData to.
1051 /// \param [in,out] IntegerMapping - The vector to append integers to.
1052 void populateMapper(Module &M, std::vector<IRInstructionData *> &InstrList,
1053 std::vector<unsigned> &IntegerMapping);
1054
1055 /// Map the instructions in the modules vector to unsigned integers, using
1056 /// mapping already present in the mapper if possible.
1057 ///
1058 /// \param [in] Modules - The list of modules to use to populate the mapper
1059 /// \param [in,out] InstrList - The vector to append IRInstructionData to.
1060 /// \param [in,out] IntegerMapping - The vector to append integers to.
1061 void populateMapper(ArrayRef<std::unique_ptr<Module>> &Modules,
1062 std::vector<IRInstructionData *> &InstrList,
1063 std::vector<unsigned> &IntegerMapping);
1064
1065 /// Find the similarity candidates in \p InstrList and corresponding
1066 /// \p UnsignedVec
1067 ///
1068 /// \param [in,out] InstrList - The vector to append IRInstructionData to.
1069 /// \param [in,out] IntegerMapping - The vector to append integers to.
1070 /// candidates found in the program.
1071 void findCandidates(std::vector<IRInstructionData *> &InstrList,
1072 std::vector<unsigned> &IntegerMapping);
1073
1074public:
1075 // Find the IRSimilarityCandidates in the \p Modules and group by structural
1076 // similarity in a SimilarityGroup, each group is returned in a
1077 // SimilarityGroupList.
1078 //
1079 // \param [in] Modules - the modules to analyze.
1080 // \returns The groups of similarity ranges found in the modules.
1082 findSimilarity(ArrayRef<std::unique_ptr<Module>> Modules);
1083
1084 // Find the IRSimilarityCandidates in the given Module grouped by structural
1085 // similarity in a SimilarityGroup, contained inside a SimilarityGroupList.
1086 //
1087 // \param [in] M - the module to analyze.
1088 // \returns The groups of similarity ranges found in the module.
1090
1091 // Clears \ref SimilarityCandidates if it is already filled by a previous run.
1093 // If we've already analyzed a Module or set of Modules, so we must clear
1094 // the SimilarityCandidates to make sure we do not have only old values
1095 // hanging around.
1096 if (SimilarityCandidates)
1097 SimilarityCandidates->clear();
1098 else
1099 SimilarityCandidates = SimilarityGroupList();
1100 }
1101
1102 // \returns The groups of similarity ranges found in the most recently passed
1103 // set of modules.
1104 std::optional<SimilarityGroupList> &getSimilarity() {
1105 return SimilarityCandidates;
1106 }
1107
1108private:
1109 /// The allocator for IRInstructionData.
1111
1112 /// The allocator for IRInstructionDataLists.
1114
1115 /// Map Instructions to unsigned integers and wraps the Instruction in an
1116 /// instance of IRInstructionData.
1117 IRInstructionMapper Mapper;
1118
1119 /// The flag variable that marks whether we should check branches for
1120 /// similarity, or only look within basic blocks.
1121 bool EnableBranches = true;
1122
1123 /// The flag variable that marks whether we allow indirect calls to be checked
1124 /// for similarity, or exclude them as a legal instruction.
1125 bool EnableIndirectCalls = true;
1126
1127 /// The flag variable that marks whether we allow calls to be marked as
1128 /// similar if they do not have the same name, only the same calling
1129 /// convention, attributes and type signature.
1130 bool EnableMatchingCallsByName = true;
1131
1132 /// The flag variable that marks whether we should check intrinsics for
1133 /// similarity.
1134 bool EnableIntrinsics = true;
1135
1136 // The flag variable that marks whether we should allow tailcalls
1137 // to be checked for similarity.
1138 bool EnableMustTailCalls = false;
1139
1140 /// The SimilarityGroups found with the most recent run of \ref
1141 /// findSimilarity. std::nullopt if there is no recent run.
1142 std::optional<SimilarityGroupList> SimilarityCandidates;
1143};
1144
1145} // end namespace IRSimilarity
1146
1147/// An analysis pass based on legacy pass manager that runs and returns
1148/// IRSimilarityIdentifier run on the Module.
1150 std::unique_ptr<IRSimilarity::IRSimilarityIdentifier> IRSI;
1151
1152public:
1153 static char ID;
1155
1157 const IRSimilarity::IRSimilarityIdentifier &getIRSI() const { return *IRSI; }
1158
1159 bool doInitialization(Module &M) override;
1160 bool doFinalization(Module &M) override;
1161 bool runOnModule(Module &M) override;
1162 void getAnalysisUsage(AnalysisUsage &AU) const override {
1163 AU.setPreservesAll();
1164 }
1165};
1166
1167/// An analysis pass that runs and returns the IRSimilarityIdentifier run on the
1168/// Module.
1169class IRSimilarityAnalysis : public AnalysisInfoMixin<IRSimilarityAnalysis> {
1170public:
1172
1174
1175private:
1177 LLVM_ABI static AnalysisKey Key;
1178};
1179
1180/// Printer pass that uses \c IRSimilarityAnalysis.
1182 : public RequiredPassInfoMixin<IRSimilarityAnalysisPrinterPass> {
1183 raw_ostream &OS;
1184
1185public:
1188};
1189
1190} // end namespace llvm
1191
1192#endif // LLVM_ANALYSIS_IRSIMILARITYIDENTIFIER_H
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file defines the BumpPtrAllocator interface.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
#define LLVM_ABI
Definition Compiler.h:213
static bool IsIndirectCall(const MachineInstr *MI)
This header defines various interfaces for pass management in LLVM.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
uint64_t IntrinsicInst * II
Value * RHS
Value * LHS
an instruction to allocate memory on the stack
Represent the analysis usage information of a pass.
void setPreservesAll()
Set by analyses that do not transform their input at all.
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
LLVM Basic Block Representation.
Definition BasicBlock.h:62
const Function * getParent() const
Return the enclosing method, or null if none.
Definition BasicBlock.h:213
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
CallingConv::ID getCallingConv() const
LLVM_ABI bool isIndirectCall() const
Return true if the callsite is an indirect call.
CallBr instruction, tracking function calls that may not return control but instead transfer it to a ...
This class represents a function call, abstracting a target machine's calling convention.
bool isMustTailCall() const
This class is the base class for the comparison instructions.
Definition InstrTypes.h:728
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:740
Conditional Branch instruction.
Implements a dense probed hash-table based set.
Definition DenseSet.h:289
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
An analysis pass that runs and returns the IRSimilarityIdentifier run on the Module.
IRSimilarity::IRSimilarityIdentifier Result
LLVM_ABI Result run(Module &M, ModuleAnalysisManager &)
IRSimilarity::IRSimilarityIdentifier & getIRSI()
void getAnalysisUsage(AnalysisUsage &AU) const override
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
const IRSimilarity::IRSimilarityIdentifier & getIRSI() const
This is a class that wraps a range of IRInstructionData from one point to another in the vector of IR...
LLVM_ABI void createCanonicalRelationFrom(IRSimilarityCandidate &SourceCand, DenseMap< unsigned, DenseSet< unsigned > > &ToSourceMapping, DenseMap< unsigned, DenseSet< unsigned > > &FromSourceMapping)
Create a mapping for the value numbering of the calling IRSimilarityCandidate, to a different separat...
static LLVM_ABI bool isSimilar(const IRSimilarityCandidate &A, const IRSimilarityCandidate &B)
LLVM_ABI void createCanonicalRelationFrom(IRSimilarityCandidate &SourceCand, DenseMap< unsigned, unsigned > &OneToOne, DenseMap< unsigned, DenseSet< unsigned > > &ToSourceMapping, DenseMap< unsigned, DenseSet< unsigned > > &FromSourceMapping)
Create a mapping for the value numbering of the calling IRSimilarityCandidate, to a different separat...
std::optional< unsigned > getGVN(Value *V)
Finds the positive number associated with V if it has been mapped.
void getBasicBlocks(DenseSet< BasicBlock * > &BBSet, SmallVector< BasicBlock * > &BBList) const
static LLVM_ABI bool compareAssignmentMapping(const unsigned InstValA, const unsigned &InstValB, DenseMap< unsigned, DenseSet< unsigned > > &ValueNumberMappingA, DenseMap< unsigned, DenseSet< unsigned > > &ValueNumberMappingB)
Compare the GVN of the assignment value in corresponding instructions in IRSimilarityCandidates A and...
void getBasicBlocks(DenseSet< BasicBlock * > &BBSet) const
bool operator<(const IRSimilarityCandidate &RHS) const
std::optional< Value * > fromGVN(unsigned Num)
Finds the Value associate with Num if it exists.
static LLVM_ABI bool checkRelativeLocations(RelativeLocMapping A, RelativeLocMapping B)
Compare the relative locations in A and B and check that the distances match if both locations are co...
static LLVM_ABI bool compareStructure(const IRSimilarityCandidate &A, const IRSimilarityCandidate &B)
static LLVM_ABI void createCanonicalMappingFor(IRSimilarityCandidate &CurrCand)
Create a mapping from the value numbering to a different separate set of numbers.
static LLVM_ABI bool overlap(const IRSimilarityCandidate &A, const IRSimilarityCandidate &B)
Compare the start and end indices of the two IRSimilarityCandidates for whether they overlap.
LLVM_ABI IRSimilarityCandidate(unsigned StartIdx, unsigned Len, IRInstructionData *FirstInstIt, IRInstructionData *LastInstIt)
std::optional< unsigned > getCanonicalNum(unsigned N)
Find the canonical number from the global value number N stored in the candidate.
std::optional< unsigned > fromCanonicalNum(unsigned N)
Find the global value number from the canonical number N stored in the candidate.
static LLVM_ABI bool compareNonCommutativeOperandMapping(OperandMapping A, OperandMapping B)
Compare the operands in A and B and check that the current mapping of global value numbers from A to ...
static LLVM_ABI bool compareCommutativeOperandMapping(OperandMapping A, OperandMapping B)
Compare the operands in A and B and check that the current mapping of global value numbers from A to ...
This class puts all the pieces of the IRInstructionData, IRInstructionMapper, IRSimilarityCandidate t...
IRSimilarityIdentifier(bool MatchBranches=true, bool MatchIndirectCalls=true, bool MatchCallsWithName=false, bool MatchIntrinsics=true, bool MatchMustTailCalls=true)
LLVM_ABI SimilarityGroupList & findSimilarity(ArrayRef< std::unique_ptr< Module > > Modules)
std::optional< SimilarityGroupList > & getSimilarity()
Base class for instruction visitors.
Definition InstVisitor.h:78
A wrapper class for inspecting calls to intrinsic functions.
Invoke instruction.
The landingpad instruction holds all of the information necessary to generate correct exception handl...
ModulePass(char &pid)
Definition Pass.h:257
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
A BumpPtrAllocator that allows only elements of a specific type to be allocated.
Definition Allocator.h:390
Represent a constant reference to a string, i.e.
Definition StringRef.h:56
Unconditional Branch instruction.
This class represents the va_arg llvm instruction, which returns an argument of the specified type gi...
LLVM Value Representation.
Definition Value.h:75
std::pair< iterator, bool > insert(const ValueT &V)
Definition DenseSet.h:212
An opaque object representing a hash code.
Definition Hashing.h:77
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
ilist_select_iterator_type< OptionsT, false, false > iterator
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ Tail
Attemps to make calls as fast as possible while guaranteeing that tail call optimization can always b...
Definition CallingConv.h:76
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
std::vector< SimilarityGroup > SimilarityGroupList
std::vector< IRSimilarityCandidate > SimilarityGroup
DenseMap< IRSimilarityCandidate *, DenseMap< unsigned, DenseSet< unsigned > > > CandidateGVNMapping
LLVM_ABI bool isClose(const IRInstructionData &A, const IRInstructionData &B)
Compare one IRInstructionData class to another IRInstructionData class for whether they are performin...
InstrType
This represents what is and is not supported when finding similarity in Instructions.
This is an optimization pass for GlobalISel generic memory operations.
hash_code hash_value(const FixedPointSemantics &Val)
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
hash_code hash_combine(const Ts &...args)
Combine values into a single hash_code.
Definition Hashing.h:305
hash_code hash_combine_range(InputIteratorT first, InputIteratorT last)
Compute a hash_code for a sequence of values.
Definition Hashing.h:285
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
#define N
A CRTP mix-in that provides informational APIs needed for analysis passes.
A special type used by analysis passes to provide an address that identifies that particular analysis...
Definition Analysis.h:29
An information struct used to provide DenseMap with the various necessary components for a given valu...
static unsigned getHashValue(const IRInstructionData *E)
static bool isEqual(const IRInstructionData *LHS, const IRInstructionData *RHS)
This provides the utilities for hashing an Instruction to an unsigned integer.
LLVM_ABI StringRef getCalleeName() const
Get the callee name that the call instruction is using for hashing the instruction.
LLVM_ABI void initializeInstruction()
Fills data stuctures for IRInstructionData when it is constructed from a.
SmallVector< int, 4 > RelativeBlockLocations
This structure holds the distances of how far "ahead of" or "behind" the target blocks of a branch,...
std::optional< std::string > CalleeName
This is only relevant if we are wrapping a CallInst.
LLVM_ABI IRInstructionData(Instruction &I, bool Legality, IRInstructionDataList &IDL)
Gather the information that is difficult to gather for an Instruction, or is changed.
LLVM_ABI ArrayRef< Value * > getBlockOperVals()
Get the BasicBlock based operands for PHINodes and BranchInsts.
Instruction * Inst
The source Instruction that is being wrapped.
static LLVM_ABI CmpInst::Predicate predicateForConsistency(CmpInst *CI)
A function that swaps the predicates to their less than form if they are in a greater than form.
LLVM_ABI void setPHIPredecessors(DenseMap< BasicBlock *, unsigned > &BasicBlockToInteger)
For an IRInstructionData containing a PHINode, finds the relative distances from the incoming basic b...
SmallVector< Value *, 4 > OperVals
The values of the operands in the Instruction.
friend hash_code hash_value(const IRInstructionData &ID)
Hashes Value based on its opcode, types, and operand types.
LLVM_ABI void setBranchSuccessors(DenseMap< BasicBlock *, unsigned > &BasicBlockToInteger)
For an IRInstructionData containing a branch, finds the relative distances from the source basic bloc...
bool Legal
The legality of the wrapped instruction.
LLVM_ABI void setCalleeName(bool MatchByName=true)
For an IRInstructionData containing a CallInst, set the function name appropriately.
LLVM_ABI CmpInst::Predicate getPredicate() const
Get the predicate that the compare instruction is using for hashing the instruction.
std::optional< CmpInst::Predicate > RevisedPredicate
This is only relevant if we are wrapping a CmpInst where we needed to change the predicate of a compa...
Custom InstVisitor to classify different instructions for whether it can be analyzed for similarity.
Helper struct for converting the Instructions in a Module into a vector of unsigned integers.
DenseMap< IRInstructionData *, unsigned, IRInstructionDataTraits > InstructionIntegerMap
Correspondence from IRInstructionData to unsigned integers.
SpecificBumpPtrAllocator< IRInstructionDataList > * IDLAllocator
This allocator pointer is in charge of creating the IRInstructionDataList so it is not deallocated un...
SpecificBumpPtrAllocator< IRInstructionData > * InstDataAllocator
This allocator pointer is in charge of holding on to the IRInstructionData so it is not deallocated u...
bool EnableMatchCallsByName
Marks whether we should use exact function names, as well as types to find similarity between calls.
unsigned LegalInstrNumber
The next available integer to assign to a legal Instruction to.
unsigned IllegalInstrNumber
The starting illegal instruction number to map to.
InstructionClassification InstClassifier
Maps an Instruction to a member of InstrType.
void initializeForBBs(Function &F, unsigned &BBNumber)
Assigns values to all the basic blocks in function F starting from integer BBNumber.
bool HaveLegalRange
Marks whether we have found a set of instructions that is long enough to be considered for similarity...
void initializeForBBs(Module &M)
Assigns values to all the basic blocks in Module M.
LLVM_ABI IRInstructionData * allocateIRInstructionData(Instruction &I, bool Legality, IRInstructionDataList &IDL)
Get an allocated IRInstructionData struct using the InstDataAllocator.
IRInstructionMapper(SpecificBumpPtrAllocator< IRInstructionData > *IDA, SpecificBumpPtrAllocator< IRInstructionDataList > *IDLA)
bool CanCombineWithPrevInstr
Marks whether we found a illegal instruction in the previous step.
DenseMap< BasicBlock *, unsigned > BasicBlockToInteger
A mapping for a basic block in a module to its assigned number/location in the module.
LLVM_ABI IRInstructionDataList * allocateIRInstructionDataList()
Get an allocated IRInstructionDataList object using the IDLAllocator.
LLVM_ABI unsigned mapToLegalUnsigned(BasicBlock::iterator &It, std::vector< unsigned > &IntegerMappingForBB, std::vector< IRInstructionData * > &InstrListForBB)
Maps an Instruction to a legal integer.
LLVM_ABI void convertToUnsignedVec(BasicBlock &BB, std::vector< IRInstructionData * > &InstrList, std::vector< unsigned > &IntegerMapping)
Maps the Instructions in a BasicBlock BB to legal or illegal integers determined by InstrType.
bool AddedIllegalLastTime
Set if we added an illegal number in the previous step.
LLVM_ABI unsigned mapToIllegalUnsigned(BasicBlock::iterator &It, std::vector< unsigned > &IntegerMappingForBB, std::vector< IRInstructionData * > &InstrListForBB, bool End=false)
Maps an Instruction to an illegal integer.
DenseMap< unsigned, DenseSet< unsigned > > & ValueNumberMapping
The current mapping of global value numbers from one IRSimilarityCandidate to another IRSimilarityCan...
const IRSimilarityCandidate & IRSC
The IRSimilarityCandidate that holds the instruction the OperVals were pulled from.
ArrayRef< Value * > & OperVals
The operand values to be analyzed.
A helper struct to hold the candidate, for a branch instruction, the relative location of a label,...
const IRSimilarityCandidate & IRSC
The IRSimilarityCandidate that holds the instruction the relative location was pulled from.
A CRTP mix-in for passes that should not be skipped.