Back to home page

EIC code displayed by LXR

 
 

    


Warning, /epic-analysis/README.md is written in an unsupported language. File is not indexed.

0001 # EPIC-ANALYSIS
0002 
0003 General purpose analysis software for (SI)DIS at the EIC
0004 
0005 This repository provides a set of common tools for the analysis of both full and
0006 fast simulations, including the following features:
0007 
0008 - General event loops for reading upstream data structures; for example,
0009   `src/AnalysisDelphes.cxx` for reading Delphes trees
0010 - Kinematics reconstruction methods (e.g., leptonic, hadronic, Jacquet-Blondel,
0011   etc.)
0012   - see [Kinematics Documentation](doc/kinematics.md) for more information
0013   - see [Jet Kinematics Documentation](doc/kinematicsJets.md) for jet kinematics
0014 - Calculations of SIDIS variables, such as `PhiH` and `qT`, for single
0015   particles, as well as jet variables
0016 - Automation for downloading or streaming simulation data from S3, along with
0017   the capability to combine data from varying Q2 ranges using weights
0018 - Ability to specify arbitrary multi-dimensional binning schemes and cuts
0019   using [Adage](https://github.com/c-dilks/adage)
0020 - Output data structures include multi-dimensionally binned histogram sets,
0021   tables, and `TTrees`
0022 - An analysis is primarily driven by macros, used to set up the binning and
0023   other settings
0024 
0025 If you prefer to use your own analysis code, but would still like to make use of
0026 the common tools provided in this repository (e.g., kinematics reconstruction),
0027 this is also possible; you only need to stream the data structure you need, most
0028 likely within the event loop. In this situation, it is recommended you fork the
0029 repository (pull requests are also welcome).
0030 
0031 Here is a flowchart showing the main classes (underlined) and the connections to
0032 upstream simulation output:
0033 
0034 ![fig1](doc/img/flowchart.png)
0035 
0036 ---
0037 
0038 
0039 # Setup and Dependencies
0040 
0041 ## Initial setup
0042 First, clone this `epic-analysis` Github repository:
0043 ```bash
0044 git clone git@github.com:eic/epic-analysis.git      # if you have SSH permission
0045 git clone https://github.com/eic/epic-analysis.git  # if you do not have SSH permission
0046 ```
0047 This will create the directory `epic-analysis`, which you can then `cd` into.
0048 
0049 ## Upstream Dependencies
0050 These are common dependencies used for the upstream simulation, some of which
0051 are needed for `epic-analysis` as well.
0052 
0053 Follow the [EIC Software Environment Setup Guide](https://eic.github.io/tutorial-setting-up-environment/index.html)
0054 to obtain and install the EIC software image.
0055 
0056 - The `eic-shell` script is used to start a container shell
0057 - This image contains all the upstream dependencies needed for EIC simulations
0058 - All documentation below assumes you are running in `eic-shell`
0059 
0060 If you upgrade your image (`eic-shell --upgrade`), you may need to `clean` build
0061 everything: `make all-clean && make`
0062 
0063 ## Local Dependencies
0064 These are additional dependencies needed by `epic-analysis`; they will be built
0065 locally and stored in the `deps/` directory (see [deps/README.md](deps/README.md)
0066 for more details). This section documents how to obtain and build local dependencies:
0067 
0068 [Delphes](https://github.com/delphes/delphes) is the only local dependency that
0069 is not mirrored in `deps/`, so you must download and build it first:
0070 ```bash
0071 deps/install_delphes.sh
0072 ```
0073 - Alternatively, if you already have a `delphes` build elsewhere, symlink `deps/delphes` to it
0074 - All other dependencies in `deps/` are mirrors, and are already included with `epic-analysis`;
0075   they will be built automatically later
0076 
0077 While you are waiting for Delphes to build, you may want to:
0078 - Prepare to analyze some data from S3, following [s3tools documentation](s3tools/README.md)
0079 - Read through the `Kinematics` class [header](src/Kinematics.h) and [source](src/Kinematics.cxx), along
0080   with [documentation](doc/kinematics.md), to see what physics reconstruction methods are available
0081 - Tutorial macros in the `tutorial/` directory, to learn how to run `epic-analysis`
0082 
0083 ## Building
0084 First, set environment variables:
0085 ```bash
0086 source environ.sh
0087 ```
0088 Then compile `analysis-epic` (and some other local dependencies):
0089 ```bash
0090 make
0091 ```
0092 - We have not yet upgraded to `cmake` in this repository, and still use `Makefiles`
0093 - Build target locations are not yet configurable, and all will stay within `epic-analysis` (e.g.,
0094   libaries will be installd in `lib/`)
0095 - Additional `make` targets are available (see `Makefile`), for more control during
0096   development:
0097 
0098 ```bash
0099 make                     # builds dependencies, then `epic-analysis` (equivalent to `make all`)
0100 make release             # build with optimization enabled
0101 make debug               # build with debugging symbols
0102 make clean               # clean `epic-analysis` (but not dependencies)
0103 
0104 make deps                # builds only dependencies
0105 make deps-clean          # clean dependencies
0106 make all-clean           # clean `epic-analysis` and dependencies
0107 
0108 make <dependency>        # build a particular `<dependency>`
0109 make <dependency>-clean  # clean a particular `<dependency>`
0110 ```
0111 
0112 Additional build options are available:
0113 ```bash
0114 INCLUDE_CENTAURO=1 make  # build with fastjet plugin Centauro (not included in Delphes by default!)
0115 EXCLUDE_DELPHES=1 make   # build without Delphes support; primarily used to expedite CI workflows
0116 INCLUDE_PODIO=1 make     # build with support for reading data with PODIO
0117 ```
0118 
0119 ## Quick Start: Tutorial Macros
0120 If you're ready to try the software hands-on, follow the [tutorials](tutorial/README.md) in 
0121 the `tutorial/` directory. Otherwise continue reading below.
0122 
0123 ---
0124 
0125 
0126 # Simulation
0127 
0128 ## Delphes Fast Simulation
0129 
0130 ### Delphes Wrapper
0131 - for convenience, the wrapper script `deps/run_delphes.sh` is provided, which runs
0132   `delphes` on a given `hepmc` or `hepmc.gz` file, and sets the output file
0133   names and the appropriate configuration card
0134   - configuration cards are stored in the `deps/delphes_EIC/` directory,
0135     a mirror of [`eic/delphes_EIC`](https://github.com/eic/delphes_EIC/tree/master)
0136   - environment must be set first (`source environ.sh`)
0137   - run `deps/run_delphes.sh` with no arguments for usage guide
0138   - in the script, you may need to change `exeDelphes` to the proper
0139     executable, e.g., `DelphesHepMC2` or `DelphesHepMC3`, depending
0140     on the format of your generator input
0141   - if reading a gunzipped file (`*.hepmc.gz`), this script will automatically
0142     stream it through `gunzip`, so there is no need to decompress beforehand
0143   - there are some `hepmc` files on S3;  follow [s3tools documentation](s3tools/README.md)
0144     for scripts and guidance
0145 - the output will be a `TTree` stored in a `root` file
0146   - output files will be placed in `datarec/`
0147   - input `hepmc(.gz)` files can be kept in `datagen/`
0148 
0149 ### AnalysisDelphes
0150 - The class `AnalysisDelphes` contains the event loop for reading Delphes trees
0151   - There are several classes which derive from the base `Analysis` class;
0152     `Analysis` handles common setup and final output, whereas the derived
0153     classes are tuned to read the upstream data formats
0154 - See the event loop in `src/AnalysisDelphes.cxx` for details of how the full
0155   simulation data are read
0156 
0157 
0158 ## ePIC Full Simulation
0159 
0160 - Full simulation files are stored on S3; follow [s3tools documentation](s3tools/README.md)
0161   for scripts and guidance
0162 - In general, everything that can be done in fast simulation can also be done in
0163   full simulation; just replace your usage of `AnalysisDelphes` with
0164   `AnalysisEpic`
0165   - In practice, implementations may sometimes be a bit out of sync, where some
0166     features exist in fast simulation do not exist in full simulation, or vice
0167     versa
0168 - See the event loop in `src/AnalysisEpic.cxx` for details of how the full
0169   simulation data are read
0170 
0171 ## ATHENA and ECCE Full Simulations
0172 
0173 - Similar implementation as ePIC full simulation, but use `AnalysisEcce` or `AnalysisAthena`
0174 
0175 ---
0176 
0177 
0178 # Analysis Procedure
0179 
0180 After simulation, this repository separates the analysis procedure into two
0181 stages: (1) the *Analysis* stage includes the event loop, which processes either
0182 fast or full simulation output, kinematics reconstruction, and your specified
0183 binning scheme, while (2) the *Post-processing* stage includes histogram
0184 drawing, comparisons, table printouts, and any feature you would like to add.
0185 
0186 The two stages are driven by macros. See examples in the `tutorial` directory,
0187 and follow the [README](tutorial/README.md).
0188 
0189 - **Note**: most macros stored in this repository must be executed from the
0190   `epic-analysis` top directory, not from within their subdirectory, e.g., run
0191   `root -b -q tutorial/analysis_template.C`; this is because certain library
0192   and data directory paths are given as relative paths
0193 
0194 In general, these macros will run single-threaded. See [HPC documentation](hpc/README.md)
0195 for guidance how to run multi-threaded or on a High Performance Computing (HPC) cluster.
0196 
0197 ## Analysis Stage
0198 
0199 ### Analysis Macro and Class
0200 
0201 - the `Analysis` class is the main class that performs the analysis; it is 
0202   controlled at the macro level
0203   - a typical analysis macro must do the following:
0204     - instantiate an `Analysis` derived class (e.g., `AnalysisDelphes`)
0205     - set up bin schemes and bins (arbitrary specification, see below)
0206     - set any other settings (e.g., a maximum number of events to process,
0207       useful for quick tests)
0208     - execute the analysis
0209   - the input is a config file, which contains a list of files to analyze
0210     together with settings such as beam energy and Q2 ranges; see
0211     [doc/example.config](doc/example.config) for an example config file and
0212     more details
0213   - the output will be a `root` file, filled with `TObjArray`s of
0214     histograms
0215     - each `TObjArray` can be for a different subset of events (bin), e.g.,
0216       different minimum `y` cuts, so that their histograms can be compared and
0217       divided; you can open the `root` file in a `TBrowser` to browse the
0218       histograms
0219     - the `Histos` class is a container for the histograms, and instances of
0220       `Histos` will also be streamed to `root` files, along with the binning
0221       scheme (handled by the Adage `BinSet` class); downstream post processing code
0222       makes use of these streamed objects, rather than the `TObjArray`s
0223   - derived classes are specific to upstream data structures:
0224     - `AnalysisDelphes` for Delphes trees (fast simulations)
0225     - `AnalysisAthena` for trees from the DD4hep+Juggler stack (ATHENA full simulations)
0226     - `AnalysisEcce` for trees from the Fun4all+EventEvaluator stack (ECCE full simulations)
0227   - the `Kinematics` class is used to calculate all kinematics
0228     - `Analysis`-derived classes have one instance of `Kinematics` for generated
0229       variables, and another for reconstructed variables, to allow quick
0230       comparison (e.g., for resolutions)
0231     - calculations are called by `Analysis`-derived classes, event-by-event or
0232       particle-by-particle or jet-by-jet
0233     - see [Kinematics Documentation](doc/kinematics.md) for details of `Kinematics`
0234 
0235 ### Bin Specification
0236 
0237 - The bins may be specified arbitrarily, using the Adage `BinSet` and `CutDef` classes
0238   - see example `analysis_*C` macros in `tutorial/`
0239   - `CutDef` can store and apply an arbitrary cut for a single variable, such as:
0240     - ranges: `a<x<b` or `|x-a|<b`
0241     - minimum or maximum: `x>a` or `x<a`
0242     - no cut (useful for "full" bins)
0243   - The set of bins for a variable is defined by `BinSet`, a set of bins
0244     - These bins can be defined arbitrarily, with the help of the `CutDef`
0245       class; you can either:
0246       - Automatically define a set of bins, e.g., `N` bins between `a` and `b`
0247         - Equal width in linear scale
0248         - Equal width in log scale (useful for `x` and `Q2`)
0249         - Any custom `TAxis`
0250       - Manually define each bin
0251         - example: specific bins in `z` and `pT`:
0252           - `|z-0.3|<0.1` and `|pT-0.2|<0.05`
0253           - `|z-0.7|<0.1` and `|pT-0.5|<0.05`
0254         - example: 3 different `y` minima:
0255           - `y>0.05`
0256           - `y>0.03`
0257           - `y>0` (no cut)
0258           - note that the arbitrary specification permits bins to overlap, e.g.,
0259             an event with `y=0.1` will appear in all three bins
0260 - Multi-dimensional binning
0261   - Binning in multi-dimensions is allowed, e.g., 3D binning in `x`,`Q2`,`z`
0262   - See [Adage documentation](deps/adage/README.md) for more information on how multi-dimensional
0263     binning is handled, as well as the [Adage syntax reference](deps/adage/doc/syntax.md)
0264   - Be careful of the curse of dimensionality
0265 
0266 ### Simple Tree
0267 
0268 - The `Analysis` class is also capable of producing a simple `TTree`, handled by the
0269   `SidisTree` class, which can also be useful for analysis
0270   - As the name suggests, it is a flat tree with a minimal set of variables,
0271     specifically needed for SIDIS spin asymmetry analysis
0272   - The tree branches are configured to be compatible with 
0273     [asymmetry analysis code](https://github.com/c-dilks/largex-eic-asym)
0274     built on the [BruFit](https://github.com/dglazier/brufit) framework
0275   - There is a switch in `Analysis` to enable/disable whether this tree is 
0276     written
0277 
0278 
0279 ## Post-Processing Stage
0280 
0281 ### Post-Processing Macro and Class
0282 
0283 - results processing is handled by the `PostProcessor` class, which does tasks
0284   such as printing tables of average values, and drawing ratios of histograms
0285   - this class is steered by `postprocess_*.C` macros, which includes the
0286     following:
0287     - instantiate `PostProcessor`, with the specified `root` file that contains
0288       output from the analysis macro
0289     - loops over bins and perform actions, using Adage
0290 - see `src/PostProcessor.h` and `src/PostProcessor.cxx` for available
0291   post-processing routines; you are welcome to add your own
0292 
0293 ---
0294 
0295 
0296 # Contributions
0297 
0298 - Add your own analysis scripts (macros, etc.) in `macro/`, either in the main
0299   directory or in a subdirectory of `macro/`.
0300   - The `macro/ci` directory is for scripts used by the CI (see `.github/workflows/ci.yml`);
0301     you are welcome to add new analysis scripts to the CI
0302   - Make changes in classes such as `PostProcessor` as needed
0303 
0304 - Git workflow:
0305   - Contributions are welcome via pull requests and issues reporting; it is
0306     recommended to fork this repository or ask to be a contributor
0307   - Continuous Integration (CI) will trigger on pull requests, which will build
0308     and test your contribution
0309     - see `Actions` tab for workflows for details
0310     - many CI jobs will not work properly from forks (for security), but you
0311       may ask to be a contributor
0312   - It is recommended to keep up-to-date with developments by browsing the pull
0313     requests, issues, and viewing the latest commits by going to the `Insights`
0314     tab, and clicking `Network` to show the commit graph