Back to home page

EIC code displayed by LXR

 
 

    


File indexing completed on 2026-05-10 08:44:25

0001 //===- SampleProfReader.h - Read LLVM sample profile data -------*- C++ -*-===//
0002 //
0003 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
0004 // See https://llvm.org/LICENSE.txt for license information.
0005 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
0006 //
0007 //===----------------------------------------------------------------------===//
0008 //
0009 // This file contains definitions needed for reading sample profiles.
0010 //
0011 // NOTE: If you are making changes to this file format, please remember
0012 //       to document them in the Clang documentation at
0013 //       tools/clang/docs/UsersManual.rst.
0014 //
0015 // Text format
0016 // -----------
0017 //
0018 // Sample profiles are written as ASCII text. The file is divided into
0019 // sections, which correspond to each of the functions executed at runtime.
0020 // Each section has the following format
0021 //
0022 //     function1:total_samples:total_head_samples
0023 //      offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ]
0024 //      offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ]
0025 //      ...
0026 //      offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ]
0027 //      offsetA[.discriminator]: fnA:num_of_total_samples
0028 //       offsetA1[.discriminator]: number_of_samples [fn7:num fn8:num ... ]
0029 //       ...
0030 //      !CFGChecksum: num
0031 //      !Attribute: flags
0032 //
0033 // This is a nested tree in which the indentation represents the nesting level
0034 // of the inline stack. There are no blank lines in the file. And the spacing
0035 // within a single line is fixed. Additional spaces will result in an error
0036 // while reading the file.
0037 //
0038 // Any line starting with the '#' character is completely ignored.
0039 //
0040 // Inlined calls are represented with indentation. The Inline stack is a
0041 // stack of source locations in which the top of the stack represents the
0042 // leaf function, and the bottom of the stack represents the actual
0043 // symbol to which the instruction belongs.
0044 //
0045 // Function names must be mangled in order for the profile loader to
0046 // match them in the current translation unit. The two numbers in the
0047 // function header specify how many total samples were accumulated in the
0048 // function (first number), and the total number of samples accumulated
0049 // in the prologue of the function (second number). This head sample
0050 // count provides an indicator of how frequently the function is invoked.
0051 //
0052 // There are three types of lines in the function body.
0053 //
0054 // * Sampled line represents the profile information of a source location.
0055 // * Callsite line represents the profile information of a callsite.
0056 // * Metadata line represents extra metadata of the function.
0057 //
0058 // Each sampled line may contain several items. Some are optional (marked
0059 // below):
0060 //
0061 // a. Source line offset. This number represents the line number
0062 //    in the function where the sample was collected. The line number is
0063 //    always relative to the line where symbol of the function is
0064 //    defined. So, if the function has its header at line 280, the offset
0065 //    13 is at line 293 in the file.
0066 //
0067 //    Note that this offset should never be a negative number. This could
0068 //    happen in cases like macros. The debug machinery will register the
0069 //    line number at the point of macro expansion. So, if the macro was
0070 //    expanded in a line before the start of the function, the profile
0071 //    converter should emit a 0 as the offset (this means that the optimizers
0072 //    will not be able to associate a meaningful weight to the instructions
0073 //    in the macro).
0074 //
0075 // b. [OPTIONAL] Discriminator. This is used if the sampled program
0076 //    was compiled with DWARF discriminator support
0077 //    (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).
0078 //    DWARF discriminators are unsigned integer values that allow the
0079 //    compiler to distinguish between multiple execution paths on the
0080 //    same source line location.
0081 //
0082 //    For example, consider the line of code ``if (cond) foo(); else bar();``.
0083 //    If the predicate ``cond`` is true 80% of the time, then the edge
0084 //    into function ``foo`` should be considered to be taken most of the
0085 //    time. But both calls to ``foo`` and ``bar`` are at the same source
0086 //    line, so a sample count at that line is not sufficient. The
0087 //    compiler needs to know which part of that line is taken more
0088 //    frequently.
0089 //
0090 //    This is what discriminators provide. In this case, the calls to
0091 //    ``foo`` and ``bar`` will be at the same line, but will have
0092 //    different discriminator values. This allows the compiler to correctly
0093 //    set edge weights into ``foo`` and ``bar``.
0094 //
0095 // c. Number of samples. This is an integer quantity representing the
0096 //    number of samples collected by the profiler at this source
0097 //    location.
0098 //
0099 // d. [OPTIONAL] Potential call targets and samples. If present, this
0100 //    line contains a call instruction. This models both direct and
0101 //    number of samples. For example,
0102 //
0103 //      130: 7  foo:3  bar:2  baz:7
0104 //
0105 //    The above means that at relative line offset 130 there is a call
0106 //    instruction that calls one of ``foo()``, ``bar()`` and ``baz()``,
0107 //    with ``baz()`` being the relatively more frequently called target.
0108 //
0109 // Each callsite line may contain several items. Some are optional.
0110 //
0111 // a. Source line offset. This number represents the line number of the
0112 //    callsite that is inlined in the profiled binary.
0113 //
0114 // b. [OPTIONAL] Discriminator. Same as the discriminator for sampled line.
0115 //
0116 // c. Number of samples. This is an integer quantity representing the
0117 //    total number of samples collected for the inlined instance at this
0118 //    callsite
0119 //
0120 // Metadata line can occur in lines with one indent only, containing extra
0121 // information for the top-level function. Furthermore, metadata can only
0122 // occur after all the body samples and callsite samples.
0123 // Each metadata line may contain a particular type of metadata, marked by
0124 // the starting characters annotated with !. We process each metadata line
0125 // independently, hence each metadata line has to form an independent piece
0126 // of information that does not require cross-line reference.
0127 // We support the following types of metadata:
0128 //
0129 // a. CFG Checksum (a.k.a. function hash):
0130 //   !CFGChecksum: 12345
0131 // b. CFG Checksum (see ContextAttributeMask):
0132 //   !Atribute: 1
0133 //
0134 //
0135 // Binary format
0136 // -------------
0137 //
0138 // This is a more compact encoding. Numbers are encoded as ULEB128 values
0139 // and all strings are encoded in a name table. The file is organized in
0140 // the following sections:
0141 //
0142 // MAGIC (uint64_t)
0143 //    File identifier computed by function SPMagic() (0x5350524f463432ff)
0144 //
0145 // VERSION (uint32_t)
0146 //    File format version number computed by SPVersion()
0147 //
0148 // SUMMARY
0149 //    TOTAL_COUNT (uint64_t)
0150 //        Total number of samples in the profile.
0151 //    MAX_COUNT (uint64_t)
0152 //        Maximum value of samples on a line.
0153 //    MAX_FUNCTION_COUNT (uint64_t)
0154 //        Maximum number of samples at function entry (head samples).
0155 //    NUM_COUNTS (uint64_t)
0156 //        Number of lines with samples.
0157 //    NUM_FUNCTIONS (uint64_t)
0158 //        Number of functions with samples.
0159 //    NUM_DETAILED_SUMMARY_ENTRIES (size_t)
0160 //        Number of entries in detailed summary
0161 //    DETAILED_SUMMARY
0162 //        A list of detailed summary entry. Each entry consists of
0163 //        CUTOFF (uint32_t)
0164 //            Required percentile of total sample count expressed as a fraction
0165 //            multiplied by 1000000.
0166 //        MIN_COUNT (uint64_t)
0167 //            The minimum number of samples required to reach the target
0168 //            CUTOFF.
0169 //        NUM_COUNTS (uint64_t)
0170 //            Number of samples to get to the desrired percentile.
0171 //
0172 // NAME TABLE
0173 //    SIZE (uint64_t)
0174 //        Number of entries in the name table.
0175 //    NAMES
0176 //        A NUL-separated list of SIZE strings.
0177 //
0178 // FUNCTION BODY (one for each uninlined function body present in the profile)
0179 //    HEAD_SAMPLES (uint64_t) [only for top-level functions]
0180 //        Total number of samples collected at the head (prologue) of the
0181 //        function.
0182 //        NOTE: This field should only be present for top-level functions
0183 //              (i.e., not inlined into any caller). Inlined function calls
0184 //              have no prologue, so they don't need this.
0185 //    NAME_IDX (uint64_t)
0186 //        Index into the name table indicating the function name.
0187 //    SAMPLES (uint64_t)
0188 //        Total number of samples collected in this function.
0189 //    NRECS (uint32_t)
0190 //        Total number of sampling records this function's profile.
0191 //    BODY RECORDS
0192 //        A list of NRECS entries. Each entry contains:
0193 //          OFFSET (uint32_t)
0194 //            Line offset from the start of the function.
0195 //          DISCRIMINATOR (uint32_t)
0196 //            Discriminator value (see description of discriminators
0197 //            in the text format documentation above).
0198 //          SAMPLES (uint64_t)
0199 //            Number of samples collected at this location.
0200 //          NUM_CALLS (uint32_t)
0201 //            Number of non-inlined function calls made at this location. In the
0202 //            case of direct calls, this number will always be 1. For indirect
0203 //            calls (virtual functions and function pointers) this will
0204 //            represent all the actual functions called at runtime.
0205 //          CALL_TARGETS
0206 //            A list of NUM_CALLS entries for each called function:
0207 //               NAME_IDX (uint64_t)
0208 //                  Index into the name table with the callee name.
0209 //               SAMPLES (uint64_t)
0210 //                  Number of samples collected at the call site.
0211 //    NUM_INLINED_FUNCTIONS (uint32_t)
0212 //      Number of callees inlined into this function.
0213 //    INLINED FUNCTION RECORDS
0214 //      A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
0215 //      callees.
0216 //        OFFSET (uint32_t)
0217 //          Line offset from the start of the function.
0218 //        DISCRIMINATOR (uint32_t)
0219 //          Discriminator value (see description of discriminators
0220 //          in the text format documentation above).
0221 //        FUNCTION BODY
0222 //          A FUNCTION BODY entry describing the inlined function.
0223 //===----------------------------------------------------------------------===//
0224 
0225 #ifndef LLVM_PROFILEDATA_SAMPLEPROFREADER_H
0226 #define LLVM_PROFILEDATA_SAMPLEPROFREADER_H
0227 
0228 #include "llvm/ADT/SmallVector.h"
0229 #include "llvm/ADT/StringRef.h"
0230 #include "llvm/IR/DiagnosticInfo.h"
0231 #include "llvm/IR/LLVMContext.h"
0232 #include "llvm/IR/ProfileSummary.h"
0233 #include "llvm/ProfileData/GCOV.h"
0234 #include "llvm/ProfileData/SampleProf.h"
0235 #include "llvm/ProfileData/SymbolRemappingReader.h"
0236 #include "llvm/Support/Debug.h"
0237 #include "llvm/Support/Discriminator.h"
0238 #include "llvm/Support/ErrorOr.h"
0239 #include "llvm/Support/MemoryBuffer.h"
0240 #include <cstdint>
0241 #include <list>
0242 #include <memory>
0243 #include <optional>
0244 #include <string>
0245 #include <system_error>
0246 #include <unordered_set>
0247 #include <vector>
0248 
0249 namespace llvm {
0250 
0251 class raw_ostream;
0252 class Twine;
0253 
0254 namespace vfs {
0255 class FileSystem;
0256 } // namespace vfs
0257 
0258 namespace sampleprof {
0259 
0260 class SampleProfileReader;
0261 
0262 /// SampleProfileReaderItaniumRemapper remaps the profile data from a
0263 /// sample profile data reader, by applying a provided set of equivalences
0264 /// between components of the symbol names in the profile.
0265 class SampleProfileReaderItaniumRemapper {
0266 public:
0267   SampleProfileReaderItaniumRemapper(std::unique_ptr<MemoryBuffer> B,
0268                                      std::unique_ptr<SymbolRemappingReader> SRR,
0269                                      SampleProfileReader &R)
0270       : Buffer(std::move(B)), Remappings(std::move(SRR)), Reader(R) {
0271     assert(Remappings && "Remappings cannot be nullptr");
0272   }
0273 
0274   /// Create a remapper from the given remapping file. The remapper will
0275   /// be used for profile read in by Reader.
0276   static ErrorOr<std::unique_ptr<SampleProfileReaderItaniumRemapper>>
0277   create(StringRef Filename, vfs::FileSystem &FS, SampleProfileReader &Reader,
0278          LLVMContext &C);
0279 
0280   /// Create a remapper from the given Buffer. The remapper will
0281   /// be used for profile read in by Reader.
0282   static ErrorOr<std::unique_ptr<SampleProfileReaderItaniumRemapper>>
0283   create(std::unique_ptr<MemoryBuffer> &B, SampleProfileReader &Reader,
0284          LLVMContext &C);
0285 
0286   /// Apply remappings to the profile read by Reader.
0287   void applyRemapping(LLVMContext &Ctx);
0288 
0289   bool hasApplied() { return RemappingApplied; }
0290 
0291   /// Insert function name into remapper.
0292   void insert(StringRef FunctionName) { Remappings->insert(FunctionName); }
0293 
0294   /// Query whether there is equivalent in the remapper which has been
0295   /// inserted.
0296   bool exist(StringRef FunctionName) {
0297     return Remappings->lookup(FunctionName);
0298   }
0299 
0300   /// Return the equivalent name in the profile for \p FunctionName if
0301   /// it exists.
0302   std::optional<StringRef> lookUpNameInProfile(StringRef FunctionName);
0303 
0304 private:
0305   // The buffer holding the content read from remapping file.
0306   std::unique_ptr<MemoryBuffer> Buffer;
0307   std::unique_ptr<SymbolRemappingReader> Remappings;
0308   // Map remapping key to the name in the profile. By looking up the
0309   // key in the remapper, a given new name can be mapped to the
0310   // cannonical name using the NameMap.
0311   DenseMap<SymbolRemappingReader::Key, StringRef> NameMap;
0312   // The Reader the remapper is servicing.
0313   SampleProfileReader &Reader;
0314   // Indicate whether remapping has been applied to the profile read
0315   // by Reader -- by calling applyRemapping.
0316   bool RemappingApplied = false;
0317 };
0318 
0319 /// Sample-based profile reader.
0320 ///
0321 /// Each profile contains sample counts for all the functions
0322 /// executed. Inside each function, statements are annotated with the
0323 /// collected samples on all the instructions associated with that
0324 /// statement.
0325 ///
0326 /// For this to produce meaningful data, the program needs to be
0327 /// compiled with some debug information (at minimum, line numbers:
0328 /// -gline-tables-only). Otherwise, it will be impossible to match IR
0329 /// instructions to the line numbers collected by the profiler.
0330 ///
0331 /// From the profile file, we are interested in collecting the
0332 /// following information:
0333 ///
0334 /// * A list of functions included in the profile (mangled names).
0335 ///
0336 /// * For each function F:
0337 ///   1. The total number of samples collected in F.
0338 ///
0339 ///   2. The samples collected at each line in F. To provide some
0340 ///      protection against source code shuffling, line numbers should
0341 ///      be relative to the start of the function.
0342 ///
0343 /// The reader supports two file formats: text and binary. The text format
0344 /// is useful for debugging and testing, while the binary format is more
0345 /// compact and I/O efficient. They can both be used interchangeably.
0346 class SampleProfileReader {
0347 public:
0348   SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
0349                       SampleProfileFormat Format = SPF_None)
0350       : Profiles(), Ctx(C), Buffer(std::move(B)), Format(Format) {}
0351 
0352   virtual ~SampleProfileReader() = default;
0353 
0354   /// Read and validate the file header.
0355   virtual std::error_code readHeader() = 0;
0356 
0357   /// Set the bits for FS discriminators. Parameter Pass specify the sequence
0358   /// number, Pass == i is for the i-th round of adding FS discriminators.
0359   /// Pass == 0 is for using base discriminators.
0360   void setDiscriminatorMaskedBitFrom(FSDiscriminatorPass P) {
0361     MaskedBitFrom = getFSPassBitEnd(P);
0362   }
0363 
0364   /// Get the bitmask the discriminators: For FS profiles, return the bit
0365   /// mask for this pass. For non FS profiles, return (unsigned) -1.
0366   uint32_t getDiscriminatorMask() const {
0367     if (!ProfileIsFS)
0368       return 0xFFFFFFFF;
0369     assert((MaskedBitFrom != 0) && "MaskedBitFrom is not set properly");
0370     return getN1Bits(MaskedBitFrom);
0371   }
0372 
0373   /// The interface to read sample profiles from the associated file.
0374   std::error_code read() {
0375     if (std::error_code EC = readImpl())
0376       return EC;
0377     if (Remapper)
0378       Remapper->applyRemapping(Ctx);
0379     FunctionSamples::UseMD5 = useMD5();
0380     return sampleprof_error::success;
0381   }
0382 
0383   /// Read sample profiles for the given functions.
0384   std::error_code read(const DenseSet<StringRef> &FuncsToUse) {
0385     DenseSet<StringRef> S;
0386     for (StringRef F : FuncsToUse)
0387       if (Profiles.find(FunctionId(F)) == Profiles.end())
0388         S.insert(F);
0389     if (std::error_code EC = read(S, Profiles))
0390       return EC;
0391     return sampleprof_error::success;
0392   }
0393 
0394   /// The implementaion to read sample profiles from the associated file.
0395   virtual std::error_code readImpl() = 0;
0396 
0397   /// Print the profile for \p FunctionSamples on stream \p OS.
0398   void dumpFunctionProfile(const FunctionSamples &FS, raw_ostream &OS = dbgs());
0399 
0400   /// Collect functions with definitions in Module M. For reader which
0401   /// support loading function profiles on demand, return true when the
0402   /// reader has been given a module. Always return false for reader
0403   /// which doesn't support loading function profiles on demand.
0404   virtual bool collectFuncsFromModule() { return false; }
0405 
0406   /// Print all the profiles on stream \p OS.
0407   void dump(raw_ostream &OS = dbgs());
0408 
0409   /// Print all the profiles on stream \p OS in the JSON format.
0410   void dumpJson(raw_ostream &OS = dbgs());
0411 
0412   /// Return the samples collected for function \p F.
0413   FunctionSamples *getSamplesFor(const Function &F) {
0414     // The function name may have been updated by adding suffix. Call
0415     // a helper to (optionally) strip off suffixes so that we can
0416     // match against the original function name in the profile.
0417     StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
0418     return getSamplesFor(CanonName);
0419   }
0420 
0421   /// Return the samples collected for function \p F.
0422   FunctionSamples *getSamplesFor(StringRef Fname) {
0423     auto It = Profiles.find(FunctionId(Fname));
0424     if (It != Profiles.end())
0425       return &It->second;
0426 
0427     if (FuncNameToProfNameMap && !FuncNameToProfNameMap->empty()) {
0428       auto R = FuncNameToProfNameMap->find(FunctionId(Fname));
0429       if (R != FuncNameToProfNameMap->end()) {
0430         Fname = R->second.stringRef();
0431         auto It = Profiles.find(FunctionId(Fname));
0432         if (It != Profiles.end())
0433           return &It->second;
0434       }
0435     }
0436 
0437     if (Remapper) {
0438       if (auto NameInProfile = Remapper->lookUpNameInProfile(Fname)) {
0439         auto It = Profiles.find(FunctionId(*NameInProfile));
0440         if (It != Profiles.end())
0441           return &It->second;
0442       }
0443     }
0444     return nullptr;
0445   }
0446 
0447   /// Return all the profiles.
0448   SampleProfileMap &getProfiles() { return Profiles; }
0449 
0450   /// Report a parse error message.
0451   void reportError(int64_t LineNumber, const Twine &Msg) const {
0452     Ctx.diagnose(DiagnosticInfoSampleProfile(Buffer->getBufferIdentifier(),
0453                                              LineNumber, Msg));
0454   }
0455 
0456   /// Create a sample profile reader appropriate to the file format.
0457   /// Create a remapper underlying if RemapFilename is not empty.
0458   /// Parameter P specifies the FSDiscriminatorPass.
0459   static ErrorOr<std::unique_ptr<SampleProfileReader>>
0460   create(StringRef Filename, LLVMContext &C, vfs::FileSystem &FS,
0461          FSDiscriminatorPass P = FSDiscriminatorPass::Base,
0462          StringRef RemapFilename = "");
0463 
0464   /// Create a sample profile reader from the supplied memory buffer.
0465   /// Create a remapper underlying if RemapFilename is not empty.
0466   /// Parameter P specifies the FSDiscriminatorPass.
0467   static ErrorOr<std::unique_ptr<SampleProfileReader>>
0468   create(std::unique_ptr<MemoryBuffer> &B, LLVMContext &C, vfs::FileSystem &FS,
0469          FSDiscriminatorPass P = FSDiscriminatorPass::Base,
0470          StringRef RemapFilename = "");
0471 
0472   /// Return the profile summary.
0473   ProfileSummary &getSummary() const { return *Summary; }
0474 
0475   MemoryBuffer *getBuffer() const { return Buffer.get(); }
0476 
0477   /// \brief Return the profile format.
0478   SampleProfileFormat getFormat() const { return Format; }
0479 
0480   /// Whether input profile is based on pseudo probes.
0481   bool profileIsProbeBased() const { return ProfileIsProbeBased; }
0482 
0483   /// Whether input profile is fully context-sensitive.
0484   bool profileIsCS() const { return ProfileIsCS; }
0485 
0486   /// Whether input profile contains ShouldBeInlined contexts.
0487   bool profileIsPreInlined() const { return ProfileIsPreInlined; }
0488 
0489   /// Whether input profile is flow-sensitive.
0490   bool profileIsFS() const { return ProfileIsFS; }
0491 
0492   virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {
0493     return nullptr;
0494   };
0495 
0496   /// It includes all the names that have samples either in outline instance
0497   /// or inline instance.
0498   virtual std::vector<FunctionId> *getNameTable() { return nullptr; }
0499   virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) { return false; };
0500 
0501   /// Return whether names in the profile are all MD5 numbers.
0502   bool useMD5() const { return ProfileIsMD5; }
0503 
0504   /// Force the profile to use MD5 in Sample contexts, even if function names
0505   /// are present.
0506   virtual void setProfileUseMD5() { ProfileIsMD5 = true; }
0507 
0508   /// Don't read profile without context if the flag is set.
0509   void setSkipFlatProf(bool Skip) { SkipFlatProf = Skip; }
0510 
0511   /// Return whether any name in the profile contains ".__uniq." suffix.
0512   virtual bool hasUniqSuffix() { return false; }
0513 
0514   SampleProfileReaderItaniumRemapper *getRemapper() { return Remapper.get(); }
0515 
0516   void setModule(const Module *Mod) { M = Mod; }
0517 
0518   void setFuncNameToProfNameMap(
0519       const HashKeyMap<std::unordered_map, FunctionId, FunctionId> &FPMap) {
0520     FuncNameToProfNameMap = &FPMap;
0521   }
0522 
0523 protected:
0524   /// Map every function to its associated profile.
0525   ///
0526   /// The profile of every function executed at runtime is collected
0527   /// in the structure FunctionSamples. This maps function objects
0528   /// to their corresponding profiles.
0529   SampleProfileMap Profiles;
0530 
0531   /// LLVM context used to emit diagnostics.
0532   LLVMContext &Ctx;
0533 
0534   /// Memory buffer holding the profile file.
0535   std::unique_ptr<MemoryBuffer> Buffer;
0536 
0537   /// Profile summary information.
0538   std::unique_ptr<ProfileSummary> Summary;
0539 
0540   /// Take ownership of the summary of this reader.
0541   static std::unique_ptr<ProfileSummary>
0542   takeSummary(SampleProfileReader &Reader) {
0543     return std::move(Reader.Summary);
0544   }
0545 
0546   /// Compute summary for this profile.
0547   void computeSummary();
0548 
0549   /// Read sample profiles for the given functions and write them to the given
0550   /// profile map. Currently it's only used for extended binary format to load
0551   /// the profiles on-demand.
0552   virtual std::error_code read(const DenseSet<StringRef> &FuncsToUse,
0553                                SampleProfileMap &Profiles) {
0554     return sampleprof_error::not_implemented;
0555   }
0556 
0557   std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;
0558 
0559   // A map pointer to the FuncNameToProfNameMap in SampleProfileLoader,
0560   // which maps the function name to the matched profile name. This is used
0561   // for sample loader to look up profile using the new name.
0562   const HashKeyMap<std::unordered_map, FunctionId, FunctionId>
0563       *FuncNameToProfNameMap = nullptr;
0564 
0565   // A map from a function's context hash to its meta data section range, used
0566   // for on-demand read function profile metadata.
0567   std::unordered_map<uint64_t, std::pair<const uint8_t *, const uint8_t *>>
0568       FuncMetadataIndex;
0569 
0570   std::pair<const uint8_t *, const uint8_t *> ProfileSecRange;
0571 
0572   /// Whether the profile has attribute metadata.
0573   bool ProfileHasAttribute = false;
0574 
0575   /// \brief Whether samples are collected based on pseudo probes.
0576   bool ProfileIsProbeBased = false;
0577 
0578   /// Whether function profiles are context-sensitive flat profiles.
0579   bool ProfileIsCS = false;
0580 
0581   /// Whether function profile contains ShouldBeInlined contexts.
0582   bool ProfileIsPreInlined = false;
0583 
0584   /// Number of context-sensitive profiles.
0585   uint32_t CSProfileCount = 0;
0586 
0587   /// Whether the function profiles use FS discriminators.
0588   bool ProfileIsFS = false;
0589 
0590   /// \brief The format of sample.
0591   SampleProfileFormat Format = SPF_None;
0592 
0593   /// \brief The current module being compiled if SampleProfileReader
0594   /// is used by compiler. If SampleProfileReader is used by other
0595   /// tools which are not compiler, M is usually nullptr.
0596   const Module *M = nullptr;
0597 
0598   /// Zero out the discriminator bits higher than bit MaskedBitFrom (0 based).
0599   /// The default is to keep all the bits.
0600   uint32_t MaskedBitFrom = 31;
0601 
0602   /// Whether the profile uses MD5 for Sample Contexts and function names. This
0603   /// can be one-way overriden by the user to force use MD5.
0604   bool ProfileIsMD5 = false;
0605 
0606   /// If SkipFlatProf is true, skip functions marked with !Flat in text mode or
0607   /// sections with SecFlagFlat flag in ExtBinary mode.
0608   bool SkipFlatProf = false;
0609 };
0610 
0611 class SampleProfileReaderText : public SampleProfileReader {
0612 public:
0613   SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
0614       : SampleProfileReader(std::move(B), C, SPF_Text) {}
0615 
0616   /// Read and validate the file header.
0617   std::error_code readHeader() override { return sampleprof_error::success; }
0618 
0619   /// Read sample profiles from the associated file.
0620   std::error_code readImpl() override;
0621 
0622   /// Return true if \p Buffer is in the format supported by this class.
0623   static bool hasFormat(const MemoryBuffer &Buffer);
0624 
0625   /// Text format sample profile does not support MD5 for now.
0626   void setProfileUseMD5() override {}
0627 
0628 private:
0629   /// CSNameTable is used to save full context vectors. This serves as an
0630   /// underlying immutable buffer for all clients.
0631   std::list<SampleContextFrameVector> CSNameTable;
0632 };
0633 
0634 class SampleProfileReaderBinary : public SampleProfileReader {
0635 public:
0636   SampleProfileReaderBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
0637                             SampleProfileFormat Format = SPF_None)
0638       : SampleProfileReader(std::move(B), C, Format) {}
0639 
0640   /// Read and validate the file header.
0641   std::error_code readHeader() override;
0642 
0643   /// Read sample profiles from the associated file.
0644   std::error_code readImpl() override;
0645 
0646   /// It includes all the names that have samples either in outline instance
0647   /// or inline instance.
0648   std::vector<FunctionId> *getNameTable() override {
0649     return &NameTable;
0650   }
0651 
0652 protected:
0653   /// Read a numeric value of type T from the profile.
0654   ///
0655   /// If an error occurs during decoding, a diagnostic message is emitted and
0656   /// EC is set.
0657   ///
0658   /// \returns the read value.
0659   template <typename T> ErrorOr<T> readNumber();
0660 
0661   /// Read a numeric value of type T from the profile. The value is saved
0662   /// without encoded.
0663   template <typename T> ErrorOr<T> readUnencodedNumber();
0664 
0665   /// Read a string from the profile.
0666   ///
0667   /// If an error occurs during decoding, a diagnostic message is emitted and
0668   /// EC is set.
0669   ///
0670   /// \returns the read value.
0671   ErrorOr<StringRef> readString();
0672 
0673   /// Read the string index and check whether it overflows the table.
0674   template <typename T> inline ErrorOr<size_t> readStringIndex(T &Table);
0675 
0676   /// Read the next function profile instance.
0677   std::error_code readFuncProfile(const uint8_t *Start);
0678   std::error_code readFuncProfile(const uint8_t *Start,
0679                                   SampleProfileMap &Profiles);
0680 
0681   /// Read the contents of the given profile instance.
0682   std::error_code readProfile(FunctionSamples &FProfile);
0683 
0684   /// Read the contents of Magic number and Version number.
0685   std::error_code readMagicIdent();
0686 
0687   /// Read profile summary.
0688   std::error_code readSummary();
0689 
0690   /// Read the whole name table.
0691   std::error_code readNameTable();
0692 
0693   /// Read a string indirectly via the name table. Optionally return the index.
0694   ErrorOr<FunctionId> readStringFromTable(size_t *RetIdx = nullptr);
0695 
0696   /// Read a context indirectly via the CSNameTable. Optionally return the
0697   /// index.
0698   ErrorOr<SampleContextFrames> readContextFromTable(size_t *RetIdx = nullptr);
0699 
0700   /// Read a context indirectly via the CSNameTable if the profile has context,
0701   /// otherwise same as readStringFromTable, also return its hash value.
0702   ErrorOr<std::pair<SampleContext, uint64_t>> readSampleContextFromTable();
0703 
0704   /// Points to the current location in the buffer.
0705   const uint8_t *Data = nullptr;
0706 
0707   /// Points to the end of the buffer.
0708   const uint8_t *End = nullptr;
0709 
0710   /// Function name table.
0711   std::vector<FunctionId> NameTable;
0712 
0713   /// CSNameTable is used to save full context vectors. It is the backing buffer
0714   /// for SampleContextFrames.
0715   std::vector<SampleContextFrameVector> CSNameTable;
0716 
0717   /// Table to cache MD5 values of sample contexts corresponding to
0718   /// readSampleContextFromTable(), used to index into Profiles or
0719   /// FuncOffsetTable.
0720   std::vector<uint64_t> MD5SampleContextTable;
0721 
0722   /// The starting address of the table of MD5 values of sample contexts. For
0723   /// fixed length MD5 non-CS profile it is same as MD5NameMemStart because
0724   /// hashes of non-CS contexts are already in the profile. Otherwise it points
0725   /// to the start of MD5SampleContextTable.
0726   const uint64_t *MD5SampleContextStart = nullptr;
0727 
0728 private:
0729   std::error_code readSummaryEntry(std::vector<ProfileSummaryEntry> &Entries);
0730   virtual std::error_code verifySPMagic(uint64_t Magic) = 0;
0731 };
0732 
0733 class SampleProfileReaderRawBinary : public SampleProfileReaderBinary {
0734 private:
0735   std::error_code verifySPMagic(uint64_t Magic) override;
0736 
0737 public:
0738   SampleProfileReaderRawBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
0739                                SampleProfileFormat Format = SPF_Binary)
0740       : SampleProfileReaderBinary(std::move(B), C, Format) {}
0741 
0742   /// \brief Return true if \p Buffer is in the format supported by this class.
0743   static bool hasFormat(const MemoryBuffer &Buffer);
0744 };
0745 
0746 /// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase defines
0747 /// the basic structure of the extensible binary format.
0748 /// The format is organized in sections except the magic and version number
0749 /// at the beginning. There is a section table before all the sections, and
0750 /// each entry in the table describes the entry type, start, size and
0751 /// attributes. The format in each section is defined by the section itself.
0752 ///
0753 /// It is easy to add a new section while maintaining the backward
0754 /// compatibility of the profile. Nothing extra needs to be done. If we want
0755 /// to extend an existing section, like add cache misses information in
0756 /// addition to the sample count in the profile body, we can add a new section
0757 /// with the extension and retire the existing section, and we could choose
0758 /// to keep the parser of the old section if we want the reader to be able
0759 /// to read both new and old format profile.
0760 ///
0761 /// SampleProfileReaderExtBinary/SampleProfileWriterExtBinary define the
0762 /// commonly used sections of a profile in extensible binary format. It is
0763 /// possible to define other types of profile inherited from
0764 /// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase.
0765 class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
0766 private:
0767   std::error_code decompressSection(const uint8_t *SecStart,
0768                                     const uint64_t SecSize,
0769                                     const uint8_t *&DecompressBuf,
0770                                     uint64_t &DecompressBufSize);
0771 
0772   BumpPtrAllocator Allocator;
0773 
0774 protected:
0775   std::vector<SecHdrTableEntry> SecHdrTable;
0776   std::error_code readSecHdrTableEntry(uint64_t Idx);
0777   std::error_code readSecHdrTable();
0778 
0779   std::error_code readFuncMetadata(bool ProfileHasAttribute,
0780                                    SampleProfileMap &Profiles);
0781   std::error_code readFuncMetadata(bool ProfileHasAttribute);
0782   std::error_code readFuncMetadata(bool ProfileHasAttribute,
0783                                    FunctionSamples *FProfile);
0784   std::error_code readFuncOffsetTable();
0785   std::error_code readFuncProfiles();
0786   std::error_code readFuncProfiles(const DenseSet<StringRef> &FuncsToUse,
0787                                    SampleProfileMap &Profiles);
0788   std::error_code readNameTableSec(bool IsMD5, bool FixedLengthMD5);
0789   std::error_code readCSNameTableSec();
0790   std::error_code readProfileSymbolList();
0791 
0792   std::error_code readHeader() override;
0793   std::error_code verifySPMagic(uint64_t Magic) override = 0;
0794   virtual std::error_code readOneSection(const uint8_t *Start, uint64_t Size,
0795                                          const SecHdrTableEntry &Entry);
0796   // placeholder for subclasses to dispatch their own section readers.
0797   virtual std::error_code readCustomSection(const SecHdrTableEntry &Entry) = 0;
0798 
0799   /// Determine which container readFuncOffsetTable() should populate, the list
0800   /// FuncOffsetList or the map FuncOffsetTable.
0801   bool useFuncOffsetList() const;
0802 
0803   std::unique_ptr<ProfileSymbolList> ProfSymList;
0804 
0805   /// The table mapping from a function context's MD5 to the offset of its
0806   /// FunctionSample towards file start.
0807   /// At most one of FuncOffsetTable and FuncOffsetList is populated.
0808   DenseMap<hash_code, uint64_t> FuncOffsetTable;
0809 
0810   /// The list version of FuncOffsetTable. This is used if every entry is
0811   /// being accessed.
0812   std::vector<std::pair<SampleContext, uint64_t>> FuncOffsetList;
0813 
0814   /// The set containing the functions to use when compiling a module.
0815   DenseSet<StringRef> FuncsToUse;
0816 
0817 public:
0818   SampleProfileReaderExtBinaryBase(std::unique_ptr<MemoryBuffer> B,
0819                                    LLVMContext &C, SampleProfileFormat Format)
0820       : SampleProfileReaderBinary(std::move(B), C, Format) {}
0821 
0822   /// Read sample profiles in extensible format from the associated file.
0823   std::error_code readImpl() override;
0824 
0825   /// Get the total size of all \p Type sections.
0826   uint64_t getSectionSize(SecType Type);
0827   /// Get the total size of header and all sections.
0828   uint64_t getFileSize();
0829   bool dumpSectionInfo(raw_ostream &OS = dbgs()) override;
0830 
0831   /// Collect functions with definitions in Module M. Return true if
0832   /// the reader has been given a module.
0833   bool collectFuncsFromModule() override;
0834 
0835   std::unique_ptr<ProfileSymbolList> getProfileSymbolList() override {
0836     return std::move(ProfSymList);
0837   };
0838 
0839 private:
0840   /// Read the profiles on-demand for the given functions. This is used after
0841   /// stale call graph matching finds new functions whose profiles aren't loaded
0842   /// at the beginning and we need to loaded the profiles explicitly for
0843   /// potential matching.
0844   std::error_code read(const DenseSet<StringRef> &FuncsToUse,
0845                        SampleProfileMap &Profiles) override;
0846 };
0847 
0848 class SampleProfileReaderExtBinary : public SampleProfileReaderExtBinaryBase {
0849 private:
0850   std::error_code verifySPMagic(uint64_t Magic) override;
0851   std::error_code readCustomSection(const SecHdrTableEntry &Entry) override {
0852     // Update the data reader pointer to the end of the section.
0853     Data = End;
0854     return sampleprof_error::success;
0855   };
0856 
0857 public:
0858   SampleProfileReaderExtBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
0859                                SampleProfileFormat Format = SPF_Ext_Binary)
0860       : SampleProfileReaderExtBinaryBase(std::move(B), C, Format) {}
0861 
0862   /// \brief Return true if \p Buffer is in the format supported by this class.
0863   static bool hasFormat(const MemoryBuffer &Buffer);
0864 };
0865 
0866 using InlineCallStack = SmallVector<FunctionSamples *, 10>;
0867 
0868 // Supported histogram types in GCC.  Currently, we only need support for
0869 // call target histograms.
0870 enum HistType {
0871   HIST_TYPE_INTERVAL,
0872   HIST_TYPE_POW2,
0873   HIST_TYPE_SINGLE_VALUE,
0874   HIST_TYPE_CONST_DELTA,
0875   HIST_TYPE_INDIR_CALL,
0876   HIST_TYPE_AVERAGE,
0877   HIST_TYPE_IOR,
0878   HIST_TYPE_INDIR_CALL_TOPN
0879 };
0880 
0881 class SampleProfileReaderGCC : public SampleProfileReader {
0882 public:
0883   SampleProfileReaderGCC(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
0884       : SampleProfileReader(std::move(B), C, SPF_GCC),
0885         GcovBuffer(Buffer.get()) {}
0886 
0887   /// Read and validate the file header.
0888   std::error_code readHeader() override;
0889 
0890   /// Read sample profiles from the associated file.
0891   std::error_code readImpl() override;
0892 
0893   /// Return true if \p Buffer is in the format supported by this class.
0894   static bool hasFormat(const MemoryBuffer &Buffer);
0895 
0896 protected:
0897   std::error_code readNameTable();
0898   std::error_code readOneFunctionProfile(const InlineCallStack &InlineStack,
0899                                          bool Update, uint32_t Offset);
0900   std::error_code readFunctionProfiles();
0901   std::error_code skipNextWord();
0902   template <typename T> ErrorOr<T> readNumber();
0903   ErrorOr<StringRef> readString();
0904 
0905   /// Read the section tag and check that it's the same as \p Expected.
0906   std::error_code readSectionTag(uint32_t Expected);
0907 
0908   /// GCOV buffer containing the profile.
0909   GCOVBuffer GcovBuffer;
0910 
0911   /// Function names in this profile.
0912   std::vector<std::string> Names;
0913 
0914   /// GCOV tags used to separate sections in the profile file.
0915   static const uint32_t GCOVTagAFDOFileNames = 0xaa000000;
0916   static const uint32_t GCOVTagAFDOFunction = 0xac000000;
0917 };
0918 
0919 } // end namespace sampleprof
0920 
0921 } // end namespace llvm
0922 
0923 #endif // LLVM_PROFILEDATA_SAMPLEPROFREADER_H