|
|
|||
File indexing completed on 2026-05-10 08:43:42
0001 //===- GsymCreator.h --------------------------------------------*- C++ -*-===// 0002 // 0003 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 0004 // See https://llvm.org/LICENSE.txt for license information. 0005 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 0006 // 0007 //===----------------------------------------------------------------------===// 0008 0009 #ifndef LLVM_DEBUGINFO_GSYM_GSYMCREATOR_H 0010 #define LLVM_DEBUGINFO_GSYM_GSYMCREATOR_H 0011 0012 #include <functional> 0013 #include <memory> 0014 #include <mutex> 0015 #include <thread> 0016 0017 #include "llvm/ADT/AddressRanges.h" 0018 #include "llvm/ADT/ArrayRef.h" 0019 #include "llvm/ADT/StringSet.h" 0020 #include "llvm/DebugInfo/GSYM/FileEntry.h" 0021 #include "llvm/DebugInfo/GSYM/FunctionInfo.h" 0022 #include "llvm/MC/StringTableBuilder.h" 0023 #include "llvm/Support/Endian.h" 0024 #include "llvm/Support/Error.h" 0025 #include "llvm/Support/Path.h" 0026 0027 namespace llvm { 0028 0029 namespace gsym { 0030 class FileWriter; 0031 class OutputAggregator; 0032 0033 /// GsymCreator is used to emit GSYM data to a stand alone file or section 0034 /// within a file. 0035 /// 0036 /// The GsymCreator is designed to be used in 3 stages: 0037 /// - Create FunctionInfo objects and add them 0038 /// - Finalize the GsymCreator object 0039 /// - Save to file or section 0040 /// 0041 /// The first stage involves creating FunctionInfo objects from another source 0042 /// of information like compiler debug info metadata, DWARF or Breakpad files. 0043 /// Any strings in the FunctionInfo or contained information, like InlineInfo 0044 /// or LineTable objects, should get the string table offsets by calling 0045 /// GsymCreator::insertString(...). Any file indexes that are needed should be 0046 /// obtained by calling GsymCreator::insertFile(...). All of the function calls 0047 /// in GsymCreator are thread safe. This allows multiple threads to create and 0048 /// add FunctionInfo objects while parsing debug information. 0049 /// 0050 /// Once all of the FunctionInfo objects have been added, the 0051 /// GsymCreator::finalize(...) must be called prior to saving. This function 0052 /// will sort the FunctionInfo objects, finalize the string table, and do any 0053 /// other passes on the information needed to prepare the information to be 0054 /// saved. 0055 /// 0056 /// Once the object has been finalized, it can be saved to a file or section. 0057 /// 0058 /// ENCODING 0059 /// 0060 /// GSYM files are designed to be memory mapped into a process as shared, read 0061 /// only data, and used as is. 0062 /// 0063 /// The GSYM file format when in a stand alone file consists of: 0064 /// - Header 0065 /// - Address Table 0066 /// - Function Info Offsets 0067 /// - File Table 0068 /// - String Table 0069 /// - Function Info Data 0070 /// 0071 /// HEADER 0072 /// 0073 /// The header is fully described in "llvm/DebugInfo/GSYM/Header.h". 0074 /// 0075 /// ADDRESS TABLE 0076 /// 0077 /// The address table immediately follows the header in the file and consists 0078 /// of Header.NumAddresses address offsets. These offsets are sorted and can be 0079 /// binary searched for efficient lookups. Addresses in the address table are 0080 /// stored as offsets from a 64 bit base address found in Header.BaseAddress. 0081 /// This allows the address table to contain 8, 16, or 32 offsets. This allows 0082 /// the address table to not require full 64 bit addresses for each address. 0083 /// The resulting GSYM size is smaller and causes fewer pages to be touched 0084 /// during address lookups when the address table is smaller. The size of the 0085 /// address offsets in the address table is specified in the header in 0086 /// Header.AddrOffSize. The first offset in the address table is aligned to 0087 /// Header.AddrOffSize alignment to ensure efficient access when loaded into 0088 /// memory. 0089 /// 0090 /// FUNCTION INFO OFFSETS TABLE 0091 /// 0092 /// The function info offsets table immediately follows the address table and 0093 /// consists of Header.NumAddresses 32 bit file offsets: one for each address 0094 /// in the address table. This data is aligned to a 4 byte boundary. The 0095 /// offsets in this table are the relative offsets from the start offset of the 0096 /// GSYM header and point to the function info data for each address in the 0097 /// address table. Keeping this data separate from the address table helps to 0098 /// reduce the number of pages that are touched when address lookups occur on a 0099 /// GSYM file. 0100 /// 0101 /// FILE TABLE 0102 /// 0103 /// The file table immediately follows the function info offsets table. The 0104 /// encoding of the FileTable is: 0105 /// 0106 /// struct FileTable { 0107 /// uint32_t Count; 0108 /// FileEntry Files[]; 0109 /// }; 0110 /// 0111 /// The file table starts with a 32 bit count of the number of files that are 0112 /// used in all of the function info, followed by that number of FileEntry 0113 /// structures. The file table is aligned to a 4 byte boundary, Each file in 0114 /// the file table is represented with a FileEntry structure. 0115 /// See "llvm/DebugInfo/GSYM/FileEntry.h" for details. 0116 /// 0117 /// STRING TABLE 0118 /// 0119 /// The string table follows the file table in stand alone GSYM files and 0120 /// contains all strings for everything contained in the GSYM file. Any string 0121 /// data should be added to the string table and any references to strings 0122 /// inside GSYM information must be stored as 32 bit string table offsets into 0123 /// this string table. The string table always starts with an empty string at 0124 /// offset zero and is followed by any strings needed by the GSYM information. 0125 /// The start of the string table is not aligned to any boundary. 0126 /// 0127 /// FUNCTION INFO DATA 0128 /// 0129 /// The function info data is the payload that contains information about the 0130 /// address that is being looked up. It contains all of the encoded 0131 /// FunctionInfo objects. Each encoded FunctionInfo's data is pointed to by an 0132 /// entry in the Function Info Offsets Table. For details on the exact encoding 0133 /// of FunctionInfo objects, see "llvm/DebugInfo/GSYM/FunctionInfo.h". 0134 class GsymCreator { 0135 // Private member variables require Mutex protections 0136 mutable std::mutex Mutex; 0137 std::vector<FunctionInfo> Funcs; 0138 StringTableBuilder StrTab; 0139 StringSet<> StringStorage; 0140 DenseMap<llvm::gsym::FileEntry, uint32_t> FileEntryToIndex; 0141 // Needed for mapping string offsets back to the string stored in \a StrTab. 0142 DenseMap<uint64_t, CachedHashStringRef> StringOffsetMap; 0143 std::vector<llvm::gsym::FileEntry> Files; 0144 std::vector<uint8_t> UUID; 0145 std::optional<AddressRanges> ValidTextRanges; 0146 std::optional<uint64_t> BaseAddress; 0147 bool IsSegment = false; 0148 bool Finalized = false; 0149 bool Quiet; 0150 0151 0152 /// Get the first function start address. 0153 /// 0154 /// \returns The start address of the first FunctionInfo or std::nullopt if 0155 /// there are no function infos. 0156 std::optional<uint64_t> getFirstFunctionAddress() const; 0157 0158 /// Get the last function address. 0159 /// 0160 /// \returns The start address of the last FunctionInfo or std::nullopt if 0161 /// there are no function infos. 0162 std::optional<uint64_t> getLastFunctionAddress() const; 0163 0164 /// Get the base address to use for this GSYM file. 0165 /// 0166 /// \returns The base address to put into the header and to use when creating 0167 /// the address offset table or std::nullpt if there are no valid 0168 /// function infos or if the base address wasn't specified. 0169 std::optional<uint64_t> getBaseAddress() const; 0170 0171 /// Get the size of an address offset in the address offset table. 0172 /// 0173 /// GSYM files store offsets from the base address in the address offset table 0174 /// and we store the size of the address offsets in the GSYM header. This 0175 /// function will calculate the size in bytes of these address offsets based 0176 /// on the current contents of the GSYM file. 0177 /// 0178 /// \returns The size in byets of the address offsets. 0179 uint8_t getAddressOffsetSize() const; 0180 0181 /// Get the maximum address offset for the current address offset size. 0182 /// 0183 /// This is used when creating the address offset table to ensure we have 0184 /// values that are in range so we don't end up truncating address offsets 0185 /// when creating GSYM files as the code evolves. 0186 /// 0187 /// \returns The maximum address offset value that will be encoded into a GSYM 0188 /// file. 0189 uint64_t getMaxAddressOffset() const; 0190 0191 /// Calculate the byte size of the GSYM header and tables sizes. 0192 /// 0193 /// This function will calculate the exact size in bytes of the encocded GSYM 0194 /// for the following items: 0195 /// - The GSYM header 0196 /// - The Address offset table 0197 /// - The Address info offset table 0198 /// - The file table 0199 /// - The string table 0200 /// 0201 /// This is used to help split GSYM files into segments. 0202 /// 0203 /// \returns Size in bytes the GSYM header and tables. 0204 uint64_t calculateHeaderAndTableSize() const; 0205 0206 /// Copy a FunctionInfo from the \a SrcGC GSYM creator into this creator. 0207 /// 0208 /// Copy the function info and only the needed files and strings and add a 0209 /// converted FunctionInfo into this object. This is used to segment GSYM 0210 /// files into separate files while only transferring the files and strings 0211 /// that are needed from \a SrcGC. 0212 /// 0213 /// \param SrcGC The source gsym creator to copy from. 0214 /// \param FuncInfoIdx The function info index within \a SrcGC to copy. 0215 /// \returns The number of bytes it will take to encode the function info in 0216 /// this GsymCreator. This helps calculate the size of the current GSYM 0217 /// segment file. 0218 uint64_t copyFunctionInfo(const GsymCreator &SrcGC, size_t FuncInfoIdx); 0219 0220 /// Copy a string from \a SrcGC into this object. 0221 /// 0222 /// Copy a string from \a SrcGC by string table offset into this GSYM creator. 0223 /// If a string has already been copied, the uniqued string table offset will 0224 /// be returned, otherwise the string will be copied and a unique offset will 0225 /// be returned. 0226 /// 0227 /// \param SrcGC The source gsym creator to copy from. 0228 /// \param StrOff The string table offset from \a SrcGC to copy. 0229 /// \returns The new string table offset of the string within this object. 0230 uint32_t copyString(const GsymCreator &SrcGC, uint32_t StrOff); 0231 0232 /// Copy a file from \a SrcGC into this object. 0233 /// 0234 /// Copy a file from \a SrcGC by file index into this GSYM creator. Files 0235 /// consist of two string table entries, one for the directory and one for the 0236 /// filename, this function will copy any needed strings ensure the file is 0237 /// uniqued within this object. If a file already exists in this GSYM creator 0238 /// the uniqued index will be returned, else the stirngs will be copied and 0239 /// the new file index will be returned. 0240 /// 0241 /// \param SrcGC The source gsym creator to copy from. 0242 /// \param FileIdx The 1 based file table index within \a SrcGC to copy. A 0243 /// file index of zero will always return zero as the zero is a reserved file 0244 /// index that means no file. 0245 /// \returns The new file index of the file within this object. 0246 uint32_t copyFile(const GsymCreator &SrcGC, uint32_t FileIdx); 0247 0248 /// Inserts a FileEntry into the file table. 0249 /// 0250 /// This is used to insert a file entry in a thread safe way into this object. 0251 /// 0252 /// \param FE A file entry object that contains valid string table offsets 0253 /// from this object already. 0254 uint32_t insertFileEntry(FileEntry FE); 0255 0256 /// Fixup any string and file references by updating any file indexes and 0257 /// strings offsets in the InlineInfo parameter. 0258 /// 0259 /// When copying InlineInfo entries, we can simply make a copy of the object 0260 /// and then fixup the files and strings for efficiency. 0261 /// 0262 /// \param SrcGC The source gsym creator to copy from. 0263 /// \param II The inline info that contains file indexes and string offsets 0264 /// that come from \a SrcGC. The entries will be updated by coping any files 0265 /// and strings over into this object. 0266 void fixupInlineInfo(const GsymCreator &SrcGC, InlineInfo &II); 0267 0268 /// Save this GSYM file into segments that are roughly \a SegmentSize in size. 0269 /// 0270 /// When segemented GSYM files are saved to disk, they will use \a Path as a 0271 /// prefix and then have the first function info address appended to the path 0272 /// when each segment is saved. Each segmented GSYM file has a only the 0273 /// strings and files that are needed to save the function infos that are in 0274 /// each segment. These smaller files are easy to compress and download 0275 /// separately and allow for efficient lookups with very large GSYM files and 0276 /// segmenting them allows servers to download only the segments that are 0277 /// needed. 0278 /// 0279 /// \param Path The path prefix to use when saving the GSYM files. 0280 /// \param ByteOrder The endianness to use when saving the file. 0281 /// \param SegmentSize The size in bytes to segment the GSYM file into. 0282 llvm::Error saveSegments(StringRef Path, llvm::endianness ByteOrder, 0283 uint64_t SegmentSize) const; 0284 0285 /// Let this creator know that this is a segment of another GsymCreator. 0286 /// 0287 /// When we have a segment, we know that function infos will be added in 0288 /// ascending address range order without having to be finalized. We also 0289 /// don't need to sort and unique entries during the finalize function call. 0290 void setIsSegment() { 0291 IsSegment = true; 0292 } 0293 0294 public: 0295 GsymCreator(bool Quiet = false); 0296 0297 /// Save a GSYM file to a stand alone file. 0298 /// 0299 /// \param Path The file path to save the GSYM file to. 0300 /// \param ByteOrder The endianness to use when saving the file. 0301 /// \param SegmentSize The size in bytes to segment the GSYM file into. If 0302 /// this option is set this function will create N segments 0303 /// that are all around \a SegmentSize bytes in size. This 0304 /// allows a very large GSYM file to be broken up into 0305 /// shards. Each GSYM file will have its own file table, 0306 /// and string table that only have the files and strings 0307 /// needed for the shared. If this argument has no value, 0308 /// a single GSYM file that contains all function 0309 /// information will be created. 0310 /// \returns An error object that indicates success or failure of the save. 0311 llvm::Error save(StringRef Path, llvm::endianness ByteOrder, 0312 std::optional<uint64_t> SegmentSize = std::nullopt) const; 0313 0314 /// Encode a GSYM into the file writer stream at the current position. 0315 /// 0316 /// \param O The stream to save the binary data to 0317 /// \returns An error object that indicates success or failure of the save. 0318 llvm::Error encode(FileWriter &O) const; 0319 0320 /// Insert a string into the GSYM string table. 0321 /// 0322 /// All strings used by GSYM files must be uniqued by adding them to this 0323 /// string pool and using the returned offset for any string values. 0324 /// 0325 /// \param S The string to insert into the string table. 0326 /// \param Copy If true, then make a backing copy of the string. If false, 0327 /// the string is owned by another object that will stay around 0328 /// long enough for the GsymCreator to save the GSYM file. 0329 /// \returns The unique 32 bit offset into the string table. 0330 uint32_t insertString(StringRef S, bool Copy = true); 0331 0332 /// Retrieve a string from the GSYM string table given its offset. 0333 /// 0334 /// The offset is assumed to be a valid offset into the string table. 0335 /// otherwise an assert will be triggered. 0336 /// 0337 /// \param Offset The offset of the string to retrieve, previously returned by 0338 /// insertString. 0339 /// \returns The string at the given offset in the string table. 0340 StringRef getString(uint32_t Offset); 0341 0342 /// Insert a file into this GSYM creator. 0343 /// 0344 /// Inserts a file by adding a FileEntry into the "Files" member variable if 0345 /// the file has not already been added. The file path is split into 0346 /// directory and filename which are both added to the string table. This 0347 /// allows paths to be stored efficiently by reusing the directories that are 0348 /// common between multiple files. 0349 /// 0350 /// \param Path The path to the file to insert. 0351 /// \param Style The path style for the "Path" parameter. 0352 /// \returns The unique file index for the inserted file. 0353 uint32_t insertFile(StringRef Path, 0354 sys::path::Style Style = sys::path::Style::native); 0355 0356 /// Add a function info to this GSYM creator. 0357 /// 0358 /// All information in the FunctionInfo object must use the 0359 /// GsymCreator::insertString(...) function when creating string table 0360 /// offsets for names and other strings. 0361 /// 0362 /// \param FI The function info object to emplace into our functions list. 0363 void addFunctionInfo(FunctionInfo &&FI); 0364 0365 /// Load call site information from a YAML file. 0366 /// 0367 /// This function reads call site information from a specified YAML file and 0368 /// adds it to the GSYM data. 0369 /// 0370 /// \param YAMLFile The path to the YAML file containing call site 0371 /// information. 0372 llvm::Error loadCallSitesFromYAML(StringRef YAMLFile); 0373 0374 /// Organize merged FunctionInfo's 0375 /// 0376 /// This method processes the list of function infos (Funcs) to identify and 0377 /// group functions with overlapping address ranges. 0378 /// 0379 /// \param Out Output stream to report information about how merged 0380 /// FunctionInfo's were handled. 0381 void prepareMergedFunctions(OutputAggregator &Out); 0382 0383 /// Finalize the data in the GSYM creator prior to saving the data out. 0384 /// 0385 /// Finalize must be called after all FunctionInfo objects have been added 0386 /// and before GsymCreator::save() is called. 0387 /// 0388 /// \param OS Output stream to report duplicate function infos, overlapping 0389 /// function infos, and function infos that were merged or removed. 0390 /// \returns An error object that indicates success or failure of the 0391 /// finalize. 0392 llvm::Error finalize(OutputAggregator &OS); 0393 0394 /// Set the UUID value. 0395 /// 0396 /// \param UUIDBytes The new UUID bytes. 0397 void setUUID(llvm::ArrayRef<uint8_t> UUIDBytes) { 0398 UUID.assign(UUIDBytes.begin(), UUIDBytes.end()); 0399 } 0400 0401 /// Thread safe iteration over all function infos. 0402 /// 0403 /// \param Callback A callback function that will get called with each 0404 /// FunctionInfo. If the callback returns false, stop iterating. 0405 void forEachFunctionInfo( 0406 std::function<bool(FunctionInfo &)> const &Callback); 0407 0408 /// Thread safe const iteration over all function infos. 0409 /// 0410 /// \param Callback A callback function that will get called with each 0411 /// FunctionInfo. If the callback returns false, stop iterating. 0412 void forEachFunctionInfo( 0413 std::function<bool(const FunctionInfo &)> const &Callback) const; 0414 0415 /// Get the current number of FunctionInfo objects contained in this 0416 /// object. 0417 size_t getNumFunctionInfos() const; 0418 0419 /// Set valid .text address ranges that all functions must be contained in. 0420 void SetValidTextRanges(AddressRanges &TextRanges) { 0421 ValidTextRanges = TextRanges; 0422 } 0423 0424 /// Get the valid text ranges. 0425 const std::optional<AddressRanges> GetValidTextRanges() const { 0426 return ValidTextRanges; 0427 } 0428 0429 /// Check if an address is a valid code address. 0430 /// 0431 /// Any functions whose addresses do not exist within these function bounds 0432 /// will not be converted into the final GSYM. This allows the object file 0433 /// to figure out the valid file address ranges of all the code sections 0434 /// and ensure we don't add invalid functions to the final output. Many 0435 /// linkers have issues when dead stripping functions from DWARF debug info 0436 /// where they set the DW_AT_low_pc to zero, but newer DWARF has the 0437 /// DW_AT_high_pc as an offset from the DW_AT_low_pc and these size 0438 /// attributes have no relocations that can be applied. This results in DWARF 0439 /// where many functions have an DW_AT_low_pc of zero and a valid offset size 0440 /// for DW_AT_high_pc. If we extract all valid ranges from an object file 0441 /// that are marked with executable permissions, we can properly ensure that 0442 /// these functions are removed. 0443 /// 0444 /// \param Addr An address to check. 0445 /// 0446 /// \returns True if the address is in the valid text ranges or if no valid 0447 /// text ranges have been set, false otherwise. 0448 bool IsValidTextAddress(uint64_t Addr) const; 0449 0450 /// Set the base address to use for the GSYM file. 0451 /// 0452 /// Setting the base address to use for the GSYM file. Object files typically 0453 /// get loaded from a base address when the OS loads them into memory. Using 0454 /// GSYM files for symbolication becomes easier if the base address in the 0455 /// GSYM header is the same address as it allows addresses to be easily slid 0456 /// and allows symbolication without needing to find the original base 0457 /// address in the original object file. 0458 /// 0459 /// \param Addr The address to use as the base address of the GSYM file 0460 /// when it is saved to disk. 0461 void setBaseAddress(uint64_t Addr) { 0462 BaseAddress = Addr; 0463 } 0464 0465 /// Whether the transformation should be quiet, i.e. not output warnings. 0466 bool isQuiet() const { return Quiet; } 0467 0468 0469 /// Create a segmented GSYM creator starting with function info index 0470 /// \a FuncIdx. 0471 /// 0472 /// This function will create a GsymCreator object that will encode into 0473 /// roughly \a SegmentSize bytes and return it. It is used by the private 0474 /// saveSegments(...) function and also is used by the GSYM unit tests to test 0475 /// segmenting of GSYM files. The returned GsymCreator can be finalized and 0476 /// encoded. 0477 /// 0478 /// \param [in] SegmentSize The size in bytes to roughly segment the GSYM file 0479 /// into. 0480 /// \param [in,out] FuncIdx The index of the first function info to encode 0481 /// into the returned GsymCreator. This index will be updated so it can be 0482 /// used in subsequent calls to this function to allow more segments to be 0483 /// created. 0484 /// \returns An expected unique pointer to a GsymCreator or an error. The 0485 /// returned unique pointer can be NULL if there are no more functions to 0486 /// encode. 0487 llvm::Expected<std::unique_ptr<GsymCreator>> 0488 createSegment(uint64_t SegmentSize, size_t &FuncIdx) const; 0489 }; 0490 0491 } // namespace gsym 0492 } // namespace llvm 0493 0494 #endif // LLVM_DEBUGINFO_GSYM_GSYMCREATOR_H
| [ Source navigation ] | [ Diff markup ] | [ Identifier search ] | [ general search ] |
|
This page was automatically generated by the 2.3.7 LXR engine. The LXR team |
|