LLVM 20.0.0git
Public Member Functions | List of all members
llvm::gsym::GsymCreator Class Reference

GsymCreator is used to emit GSYM data to a stand alone file or section within a file. More...

#include "llvm/DebugInfo/GSYM/GsymCreator.h"

Public Member Functions

 GsymCreator (bool Quiet=false)
 
llvm::Error save (StringRef Path, llvm::endianness ByteOrder, std::optional< uint64_t > SegmentSize=std::nullopt) const
 Save a GSYM file to a stand alone file.
 
llvm::Error encode (FileWriter &O) const
 Encode a GSYM into the file writer stream at the current position.
 
uint32_t insertString (StringRef S, bool Copy=true)
 Insert a string into the GSYM string table.
 
StringRef getString (uint32_t Offset)
 Retrieve a string from the GSYM string table given its offset.
 
uint32_t insertFile (StringRef Path, sys::path::Style Style=sys::path::Style::native)
 Insert a file into this GSYM creator.
 
void addFunctionInfo (FunctionInfo &&FI)
 Add a function info to this GSYM creator.
 
llvm::Error loadCallSitesFromYAML (StringRef YAMLFile)
 Load call site information from a YAML file.
 
void prepareMergedFunctions (OutputAggregator &Out)
 Organize merged FunctionInfo's.
 
llvm::Error finalize (OutputAggregator &OS)
 Finalize the data in the GSYM creator prior to saving the data out.
 
void setUUID (llvm::ArrayRef< uint8_t > UUIDBytes)
 Set the UUID value.
 
void forEachFunctionInfo (std::function< bool(FunctionInfo &)> const &Callback)
 Thread safe iteration over all function infos.
 
void forEachFunctionInfo (std::function< bool(const FunctionInfo &)> const &Callback) const
 Thread safe const iteration over all function infos.
 
size_t getNumFunctionInfos () const
 Get the current number of FunctionInfo objects contained in this object.
 
void SetValidTextRanges (AddressRanges &TextRanges)
 Set valid .text address ranges that all functions must be contained in.
 
const std::optional< AddressRangesGetValidTextRanges () const
 Get the valid text ranges.
 
bool IsValidTextAddress (uint64_t Addr) const
 Check if an address is a valid code address.
 
void setBaseAddress (uint64_t Addr)
 Set the base address to use for the GSYM file.
 
bool isQuiet () const
 Whether the transformation should be quiet, i.e. not output warnings.
 
llvm::Expected< std::unique_ptr< GsymCreator > > createSegment (uint64_t SegmentSize, size_t &FuncIdx) const
 Create a segmented GSYM creator starting with function info index FuncIdx.
 

Detailed Description

GsymCreator is used to emit GSYM data to a stand alone file or section within a file.

The GsymCreator is designed to be used in 3 stages:

The first stage involves creating FunctionInfo objects from another source of information like compiler debug info metadata, DWARF or Breakpad files. Any strings in the FunctionInfo or contained information, like InlineInfo or LineTable objects, should get the string table offsets by calling GsymCreator::insertString(...). Any file indexes that are needed should be obtained by calling GsymCreator::insertFile(...). All of the function calls in GsymCreator are thread safe. This allows multiple threads to create and add FunctionInfo objects while parsing debug information.

Once all of the FunctionInfo objects have been added, the GsymCreator::finalize(...) must be called prior to saving. This function will sort the FunctionInfo objects, finalize the string table, and do any other passes on the information needed to prepare the information to be saved.

Once the object has been finalized, it can be saved to a file or section.

ENCODING

GSYM files are designed to be memory mapped into a process as shared, read only data, and used as is.

The GSYM file format when in a stand alone file consists of:

HEADER

The header is fully described in "llvm/DebugInfo/GSYM/Header.h".

ADDRESS TABLE

The address table immediately follows the header in the file and consists of Header.NumAddresses address offsets. These offsets are sorted and can be binary searched for efficient lookups. Addresses in the address table are stored as offsets from a 64 bit base address found in Header.BaseAddress. This allows the address table to contain 8, 16, or 32 offsets. This allows the address table to not require full 64 bit addresses for each address. The resulting GSYM size is smaller and causes fewer pages to be touched during address lookups when the address table is smaller. The size of the address offsets in the address table is specified in the header in Header.AddrOffSize. The first offset in the address table is aligned to Header.AddrOffSize alignment to ensure efficient access when loaded into memory.

FUNCTION INFO OFFSETS TABLE

The function info offsets table immediately follows the address table and consists of Header.NumAddresses 32 bit file offsets: one for each address in the address table. This data is aligned to a 4 byte boundary. The offsets in this table are the relative offsets from the start offset of the GSYM header and point to the function info data for each address in the address table. Keeping this data separate from the address table helps to reduce the number of pages that are touched when address lookups occur on a GSYM file.

FILE TABLE

The file table immediately follows the function info offsets table. The encoding of the FileTable is:

struct FileTable { uint32_t Count; FileEntry Files[]; };

The file table starts with a 32 bit count of the number of files that are used in all of the function info, followed by that number of FileEntry structures. The file table is aligned to a 4 byte boundary, Each file in the file table is represented with a FileEntry structure. See "llvm/DebugInfo/GSYM/FileEntry.h" for details.

STRING TABLE

The string table follows the file table in stand alone GSYM files and contains all strings for everything contained in the GSYM file. Any string data should be added to the string table and any references to strings inside GSYM information must be stored as 32 bit string table offsets into this string table. The string table always starts with an empty string at offset zero and is followed by any strings needed by the GSYM information. The start of the string table is not aligned to any boundary.

FUNCTION INFO DATA

The function info data is the payload that contains information about the address that is being looked up. It contains all of the encoded FunctionInfo objects. Each encoded FunctionInfo's data is pointed to by an entry in the Function Info Offsets Table. For details on the exact encoding of FunctionInfo objects, see "llvm/DebugInfo/GSYM/FunctionInfo.h".

Definition at line 134 of file GsymCreator.h.

Constructor & Destructor Documentation

◆ GsymCreator()

GsymCreator::GsymCreator ( bool  Quiet = false)

Definition at line 24 of file GsymCreator.cpp.

References insertFile().

Member Function Documentation

◆ addFunctionInfo()

void GsymCreator::addFunctionInfo ( FunctionInfo &&  FI)

Add a function info to this GSYM creator.

All information in the FunctionInfo object must use the GsymCreator::insertString(...) function when creating string table offsets for names and other strings.

Parameters
FIThe function info object to emplace into our functions list.

Definition at line 395 of file GsymCreator.cpp.

Referenced by llvm::gsym::ObjectFileTransformer::convert().

◆ createSegment()

llvm::Expected< std::unique_ptr< GsymCreator > > GsymCreator::createSegment ( uint64_t  SegmentSize,
size_t &  FuncIdx 
) const

Create a segmented GSYM creator starting with function info index FuncIdx.

This function will create a GsymCreator object that will encode into roughly SegmentSize bytes and return it. It is used by the private saveSegments(...) function and also is used by the GSYM unit tests to test segmenting of GSYM files. The returned GsymCreator can be finalized and encoded.

Parameters
[in]SegmentSizeThe size in bytes to roughly segment the GSYM file into.
[in,out]FuncIdxThe index of the first function info to encode into the returned GsymCreator. This index will be updated so it can be used in subsequent calls to this function to allow more segments to be created.
Returns
An expected unique pointer to a GsymCreator or an error. The returned unique pointer can be NULL if there are no more functions to encode.

Definition at line 579 of file GsymCreator.cpp.

References llvm::alignTo(), and llvm::createStringError().

◆ encode()

llvm::Error GsymCreator::encode ( FileWriter O) const

◆ finalize()

llvm::Error GsymCreator::finalize ( OutputAggregator OS)

Finalize the data in the GSYM creator prior to saving the data out.

Finalize must be called after all FunctionInfo objects have been added and before GsymCreator::save() is called.

Parameters
OSOutput stream to report duplicate function infos, overlapping function infos, and function infos that were merged or removed.
Returns
An error object that indicates success or failure of the finalize.

Definition at line 241 of file GsymCreator.cpp.

References llvm::AddressRange::contains(), llvm::createStringError(), llvm::StringTableBuilder::finalizeInOrder(), llvm::gsym::FunctionInfo::hasRichInfo(), Idx, llvm::AddressRange::intersects(), OS, llvm::gsym::FunctionInfo::Range, Range, llvm::gsym::OutputAggregator::Report(), llvm::AddressRange::size(), llvm::sort(), llvm::AddressRange::start(), llvm::Error::success(), and std::swap().

◆ forEachFunctionInfo() [1/2]

void GsymCreator::forEachFunctionInfo ( std::function< bool(const FunctionInfo &)> const Callback) const

Thread safe const iteration over all function infos.

Parameters
CallbackA callback function that will get called with each FunctionInfo. If the callback returns false, stop iterating.

Definition at line 409 of file GsymCreator.cpp.

◆ forEachFunctionInfo() [2/2]

void GsymCreator::forEachFunctionInfo ( std::function< bool(FunctionInfo &)> const Callback)

Thread safe iteration over all function infos.

Parameters
CallbackA callback function that will get called with each FunctionInfo. If the callback returns false, stop iterating.

Definition at line 400 of file GsymCreator.cpp.

◆ getNumFunctionInfos()

size_t GsymCreator::getNumFunctionInfos ( ) const

Get the current number of FunctionInfo objects contained in this object.

Definition at line 418 of file GsymCreator.cpp.

Referenced by llvm::gsym::ObjectFileTransformer::convert(), and llvm::gsym::DwarfTransformer::convert().

◆ getString()

StringRef GsymCreator::getString ( uint32_t  Offset)

Retrieve a string from the GSYM string table given its offset.

The offset is assumed to be a valid offset into the string table. otherwise an assert will be triggered.

Parameters
OffsetThe offset of the string to retrieve, previously returned by insertString.
Returns
The string at the given offset in the string table.

Definition at line 388 of file GsymCreator.cpp.

References assert(), I, and llvm::Offset.

◆ GetValidTextRanges()

const std::optional< AddressRanges > llvm::gsym::GsymCreator::GetValidTextRanges ( ) const
inline

Get the valid text ranges.

Definition at line 425 of file GsymCreator.h.

◆ insertFile()

uint32_t GsymCreator::insertFile ( StringRef  Path,
sys::path::Style  Style = sys::path::Style::native 
)

Insert a file into this GSYM creator.

Inserts a file by adding a FileEntry into the "Files" member variable if the file has not already been added. The file path is split into directory and filename which are both added to the string table. This allows paths to be stored efficiently by reusing the directories that are common between multiple files.

Parameters
PathThe path to the file to insert.
StyleThe path style for the "Path" parameter.
Returns
The unique file index for the inserted file.

Definition at line 29 of file GsymCreator.cpp.

References llvm::sampleprof::Base, llvm::sys::path::filename(), insertString(), and llvm::sys::path::parent_path().

Referenced by convertFunctionLineTable(), llvm::gsym::CUInfo::DWARFToGSYMFileIndex(), and GsymCreator().

◆ insertString()

uint32_t GsymCreator::insertString ( StringRef  S,
bool  Copy = true 
)

Insert a string into the GSYM string table.

All strings used by GSYM files must be uniqued by adding them to this string pool and using the returned offset for any string values.

Parameters
SThe string to insert into the string table.
CopyIf true, then make a backing copy of the string. If false, the string is owned by another object that will stay around long enough for the GsymCreator to save the GSYM file.
Returns
The unique 32 bit offset into the string table.

Definition at line 362 of file GsymCreator.cpp.

References llvm::StringTableBuilder::add(), llvm::StringTableBuilder::contains(), llvm::StringRef::empty(), llvm::CachedHashStringRef::hash(), and llvm::StringSet< AllocatorTy >::insert().

Referenced by llvm::gsym::ObjectFileTransformer::convert(), getQualifiedNameIndex(), and insertFile().

◆ isQuiet()

bool llvm::gsym::GsymCreator::isQuiet ( ) const
inline

Whether the transformation should be quiet, i.e. not output warnings.

Definition at line 466 of file GsymCreator.h.

References llvm::Quiet.

◆ IsValidTextAddress()

bool GsymCreator::IsValidTextAddress ( uint64_t  Addr) const

Check if an address is a valid code address.

Any functions whose addresses do not exist within these function bounds will not be converted into the final GSYM. This allows the object file to figure out the valid file address ranges of all the code sections and ensure we don't add invalid functions to the final output. Many linkers have issues when dead stripping functions from DWARF debug info where they set the DW_AT_low_pc to zero, but newer DWARF has the DW_AT_high_pc as an offset from the DW_AT_low_pc and these size attributes have no relocations that can be applied. This results in DWARF where many functions have an DW_AT_low_pc of zero and a valid offset size for DW_AT_high_pc. If we extract all valid ranges from an object file that are marked with executable permissions, we can properly ensure that these functions are removed.

Parameters
AddrAn address to check.
Returns
True if the address is in the valid text ranges or if no valid text ranges have been set, false otherwise.

Definition at line 423 of file GsymCreator.cpp.

References Addr.

Referenced by llvm::gsym::ObjectFileTransformer::convert().

◆ loadCallSitesFromYAML()

llvm::Error GsymCreator::loadCallSitesFromYAML ( StringRef  YAMLFile)

Load call site information from a YAML file.

This function reads call site information from a specified YAML file and adds it to the GSYM data.

Parameters
YAMLFileThe path to the YAML file containing call site information.

Definition at line 192 of file GsymCreator.cpp.

References llvm::gsym::CallSiteInfoLoader::loadYAML().

◆ prepareMergedFunctions()

void GsymCreator::prepareMergedFunctions ( OutputAggregator Out)

Organize merged FunctionInfo's.

This method processes the list of function infos (Funcs) to identify and group functions with overlapping address ranges.

Parameters
OutOutput stream to report information about how merged FunctionInfo's were handled.

Definition at line 198 of file GsymCreator.cpp.

References Idx, llvm::gsym::FunctionInfo::MergedFunctions, llvm::gsym::FunctionInfo::Range, llvm::stable_sort(), and std::swap().

◆ save()

llvm::Error GsymCreator::save ( StringRef  Path,
llvm::endianness  ByteOrder,
std::optional< uint64_t SegmentSize = std::nullopt 
) const

Save a GSYM file to a stand alone file.

Parameters
PathThe file path to save the GSYM file to.
ByteOrderThe endianness to use when saving the file.
SegmentSizeThe size in bytes to segment the GSYM file into. If this option is set this function will create N segments that are all around SegmentSize bytes in size. This allows a very large GSYM file to be broken up into shards. Each GSYM file will have its own file table, and string table that only have the files and strings needed for the shared. If this argument has no value, a single GSYM file that contains all function information will be created.
Returns
An error object that indicates success or failure of the save.

Definition at line 68 of file GsymCreator.cpp.

References encode(), and llvm::errorCodeToError().

◆ setBaseAddress()

void llvm::gsym::GsymCreator::setBaseAddress ( uint64_t  Addr)
inline

Set the base address to use for the GSYM file.

Setting the base address to use for the GSYM file. Object files typically get loaded from a base address when the OS loads them into memory. Using GSYM files for symbolication becomes easier if the base address in the GSYM header is the same address as it allows addresses to be easily slid and allows symbolication without needing to find the original base address in the original object file.

Parameters
AddrThe address to use as the base address of the GSYM file when it is saved to disk.

Definition at line 461 of file GsymCreator.h.

References Addr.

◆ setUUID()

void llvm::gsym::GsymCreator::setUUID ( llvm::ArrayRef< uint8_t UUIDBytes)
inline

Set the UUID value.

Parameters
UUIDBytesThe new UUID bytes.

Definition at line 397 of file GsymCreator.h.

References llvm::ArrayRef< T >::begin(), and llvm::ArrayRef< T >::end().

Referenced by llvm::gsym::ObjectFileTransformer::convert().

◆ SetValidTextRanges()

void llvm::gsym::GsymCreator::SetValidTextRanges ( AddressRanges TextRanges)
inline

Set valid .text address ranges that all functions must be contained in.

Definition at line 420 of file GsymCreator.h.


The documentation for this class was generated from the following files: