Back to home page

EIC code displayed by LXR

 
 

    


Warning, /tutorial-analysis/_episodes/01-introduction.md is written in an unsupported language. File is not indexed.

0001 ---
0002 title: "Introduction"
0003 teaching: 15
0004 exercises: 5
0005 questions:
0006 - "How do I locate and access the simulation output?"
0007 objectives:
0008 - "Understand how the simulation output is organized"
0009 - "Know how to access the simulation output using Jefferson Lab xrootd"
0010 keypoints:
0011 - "Use `xrdfs` from within the eic-shell to browse available files from simulations campaigns."
0012 - "Use `xrdcp` from within eic-shell to copy files to your local environment."
0013 - "Within eic-shell, you can also stream files directly in your root macros."
0014 ---
0015 
0016 More detailed information on the simulation productions, including the information presented below, can be found on the [Simulation Production Campaign Website](https://eic.github.io/epic-prod/). 
0017 
0018 ## Simulation Files Organization
0019 
0020 There are three broad classes of files stored on xrootd/S3, each in their own directory:
0021 - EVGEN: The input hepmc3 datasets
0022     - E.g. some files that have been supplied by a physics event generator
0023 - FULL: The full GEANT4 output root files (usually only saved for a fraction of runs)
0024     - If running a simulation yourself, this would be your output from processing npsim
0025 - RECO: The output root files from the reconstruction
0026     - And again, if running yourself, this would be your output from EICrecon (after you've used your awesome new reconstruction algorithm from the later tutorial of course)
0027 
0028 Most users will interact with the files in the RECO directory and that is what we will focus on in this tutorial. Within the RECO directory, files are organized by campaign (24.04.0 for the April 2024 campaign, for example), detector configuration and then physics process. Each physics process will have different sub directories, for example generator version, energy, or Q2. The directory structure and number of reconstructed files for each campaign can be found on the Simulation Website [here](https://eic.github.io/epic-prod/campaigns/campaigns_reco.html).
0029 
0030 > Note that S3 is being phased out. Simulation campaigns from Summer 2024 onwards will only be available on xrootd.
0031 > Instructions for S3 access are provided for reference only at this point.
0032 {: .callout}
0033 
0034 ## Access Simulation from Jefferson Lab xrootd
0035 
0036 The prefered method for browsing the simulation output is to use xrootd from within the eic-shell. To browse the directory structure and exit, one can run the commands:
0037 ```console
0038 ./eic-shell
0039 xrdfs root://dtn-eic.jlab.org
0040 ls /work/eic2/EPIC/RECO/24.04.0
0041 exit
0042 ```
0043 It is also possible to copy a file and open it locally using the `xrdcp` command:
0044 ```console
0045 ./eic-shell
0046 xrdcp root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.04.0/path-to-file .
0047 exit
0048 ```
0049 
0050 Files can also be coppied locally by replacing `ls` with `cp`.
0051 
0052 ## Streaming Files
0053 
0054 It is also possible to open a file directly in ROOT. Note that the following command should be executed after opening root and `TFile::Open()` should be used:
0055 ```console
0056 auto f = TFile::Open("root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/path-to-file")
0057 ```
0058 or alternatively
0059 ```console
0060 auto f = TFile::Open("s3https://eics3.sdcc.bnl.gov:9000/eictest/EPIC/RECO/path-to-file");
0061 ```
0062 
0063 ## Reminder - Download a file for the next step!
0064 
0065 We will need a file to analyse going forward, if you have not done so, download a file now!
0066 
0067 Grab a file from -
0068 
0069 ```console
0070 /work/eic2/EPIC/RECO/24.04.0/epic_craterlake/DIS/NC/18x275/minQ2=10/
0071 ```
0072 For example -
0073 
0074 ```console
0075 xrdcp root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.04.0/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_5.0001.eicrecon.tree.edm4eic.root ./
0076 ```
0077 Note that the ./ at the end is the target location to copy to. Change this as desired.
0078 
0079 You can also stream the file if you prefer, just copy the path of the file above. You will need to modify the scripts later in the tutorial accordingly to account for this.
0080 
0081 ## Advanced Use Case - Grabbing a whole bunch of files
0082 
0083 I won't go through this in the tutorial, but this may be something you want to come back to as you get deeper into writing and using your own analysis code. This advanced use case involves copying/using a large number of processed files. Something you might want to do once your analysis is out of the testing phase and onto the "Let's process ALL of the data!" stage.
0084 
0085 If you're moving a lot of files around, you might normally resort to using a wildcard -
0086 
0087 ```console
0088 cp File* My_Folder/
0089 ```
0090 
0091 or similar. However, with xrdcp, this isn't so trivial. Some methods to test and try are include below. 
0092 
0093 where here we're finding things in the given path that match the name pattern provided, and copying them to our current directory.
0094 
0095 Alternatively, you could grab a list of the files you want and pipe them to a file -
0096 
0097 ```console
0098 xrdfs root://dtn-eic.jlab.org ls /work/eic2/EPIC/RECO/24.04.0/epic_craterlake/DIS/NC/18x275/minQ2=10 | sed 's|^|root://dtn-eic.jlab.org/|g' > list.txt
0099 ```
0100 
0101 In this case, we're listing all files on the server in that path, piping them to sed and inserting "root://dtn-eic.jlab.org/" at the front and then feeding the output to the file "list.txt".
0102 
0103 ```console
0104 more list.txt
0105 root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.04.0/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0000.eicrecon.tree.edm4eic.root
0106 root://dtn-eic.jlab.org//work/eic2/EPIC/RECO/24.04.0/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0001.eicrecon.tree.edm4eic.root
0107 ...
0108 ```
0109 We could then, for example, feed this list to a TChain -
0110 
0111 ```console
0112 TChain events("events")
0113 std::ifstream in("list.txt")
0114 std::string file("")
0115 while (in >> file) events.Add(file.data())
0116 events.Scan("@MCParticles.size()","","",10)
0117 ```
0118 Where in the final line we're only going to skim over the first 10 events.
0119 
0120 It should be noted that the best solution may just be to run the files from the server, rather than copying them to somewhere else and running them there.
0121 
0122 ## OUTDATED - Access Simulation from BNL S3
0123 
0124 > Note that S3 is being phased out. Simulation campaigns from Summer 2024 onwards will only be available on xrootd.
0125 > Instructions for S3 access are provided for reference only at this point.
0126 {: .callout}
0127 
0128 The simulation files can also be accessed from S3 storage at BNL using the MinIO client for S3 storage. It is included in eic-shell. To install it natively, you can issue the following commands to install minio:
0129 ```console
0130 mkdir --parent ~/bin
0131 curl https://dl.min.io/client/mc/release/linux-amd64/mc --create-dirs -o ~/bin/mc
0132 chmod +x ~/bin/mc
0133 ```
0134 From here on out, we assume `mc` is in your PATH variable, otherwise you can use the full path, in the above example `~/bin/mc`.
0135 After the client is installed, it needs to be configured for read access:
0136 ```console
0137 export S3_ACCESS_KEY=<credential>; export S3_SECRET_KEY=<credential>
0138 mc config host add S3 https://eics3.sdcc.bnl.gov:9000 $S3_ACCESS_KEY $S3_SECRET_KEY
0139 ```
0140 The <credential> for read access values can be obtained by asking on Mattermost. Assuming the minio client is installed and configured as above, one can browse the file structure using the minio `ls` command:
0141 ```console
0142 mc ls S3/eictest/EPIC/RECO
0143 ```