Back to home page

EIC code displayed by LXR

 
 

    


Warning, /tutorial-analysis/_episodes/01-introduction.md is written in an unsupported language. File is not indexed.

0001 ---
0002 title: "Introduction"
0003 teaching: 15
0004 exercises: 5
0005 questions:
0006 - "How do I locate and access the simulation output?"
0007 objectives:
0008 - "Understand how the simulation output is organized"
0009 - "Know how to access the simulation output using Jefferson Lab xrootd"
0010 keypoints:
0011 - "Use `xrdfs` from within the eic-shell to browse available files from simulations campaigns."
0012 - "Use `xrdcp` from within eic-shell to copy files to your local environment."
0013 - "Within eic-shell, you can also stream files directly in your root macros."
0014 ---
0015 
0016 More detailed information on the simulation productions, including the information presented below, can be found on the [Simulation Production Campaign Website](https://eic.github.io/epic-prod/). 
0017 
0018 ## Simulation Files Organization
0019 
0020 There are three broad classes of files stored on xrootd, each in their own directory:
0021 - EVGEN: The input hepmc3 datasets
0022     - E.g. some files that have been supplied by a physics event generator
0023 - FULL: The full GEANT4 output root files (usually only saved for a fraction of runs)
0024     - If running a simulation yourself, this would be your output from processing npsim
0025 - RECO: The output root files from the reconstruction
0026     - And again, if running yourself, this would be your output from EICrecon (after you've used your awesome new reconstruction algorithm from the later tutorial of course)
0027 
0028 Most users will interact with the files in the RECO directory and that is what we will focus on in this tutorial. Within the RECO directory, files are organized by campaign (25.01.1 for the January 2025 campaign, for example), detector configuration and then physics process. Each physics process will have different sub directories, for example generator version, energy, or Q2. The directory structure and number of reconstructed files for each campaign can be found on the Simulation Website [here](https://eic.github.io/epic-prod/campaigns/campaigns_reco.html).
0029 
0030 ## Access Simulation from Jefferson Lab xrootd
0031 
0032 The prefered method for browsing the simulation output is to use xrootd from within the eic-shell. To browse the directory structure and exit, one can run the commands:
0033 ```console
0034 ./eic-shell
0035 xrdfs root://dtn-eic.jlab.org
0036 ls /volatile/eic/EPIC/RECO/25.01.1
0037 exit
0038 ```
0039 It is also possible to copy a file and open it locally using the `xrdcp` command:
0040 ```console
0041 ./eic-shell
0042 xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/path-to-file .
0043 exit
0044 ```
0045 
0046 Files can also be coppied locally by replacing `ls` with `cp`.
0047 
0048 ## Streaming Files
0049 
0050 It is also possible to open a file directly in ROOT. Note that the following command should be executed after opening root and `TFile::Open()` should be used:
0051 ```console
0052 auto f = TFile::Open("root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/path-to-file")
0053 ```
0054 
0055 ## Reminder - Download a file for the next step!
0056 
0057 We will need a file to analyse going forward, if you have not done so, download a file now!
0058 
0059 Grab a file from -
0060 
0061 ```console
0062 /volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/
0063 ```
0064 For example -
0065 
0066 ```console
0067 xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_5.0001.eicrecon.tree.edm4eic.root ./
0068 ```
0069 Note that the ./ at the end is the target location to copy to. Change this as desired.
0070 
0071 > Note that we can also specify a different filename to copy to as we could with a normal cp command. You might want to do this as the filename is a little cumbersome.
0072 > I called mine NC_DIS_18x275_JanCampaign.root, just replace ./ with your file name of choice.
0073 {: .callout}
0074 
0075 You can also stream the file if you prefer, just copy the path of the file above. You will need to modify the scripts later in the tutorial accordingly to account for this.
0076 
0077 ## Advanced Use Case - Grabbing a whole bunch of files
0078 
0079 I won't go through this in the tutorial, but this may be something you want to come back to as you get deeper into writing and using your own analysis code. This advanced use case involves copying/using a large number of processed files. Something you might want to do once your analysis is out of the testing phase and onto the "Let's process ALL of the data!" stage.
0080 
0081 If you're moving a lot of files around, you might normally resort to using a wildcard -
0082 
0083 ```console
0084 cp File* My_Folder/
0085 ```
0086 
0087 or similar. However, with xrdcp, this isn't so trivial. Some methods to test and try are include below. 
0088 
0089 where here we're finding things in the given path that match the name pattern provided, and copying them to our current directory.
0090 
0091 Alternatively, you could grab a list of the files you want and pipe them to a file -
0092 
0093 ```console
0094 xrdfs root://dtn-eic.jlab.org ls /volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10 | sed 's|^|root://dtn-eic.jlab.org/|g' > list.txt
0095 ```
0096 
0097 In this case, we're listing all files on the server in that path, piping them to sed and inserting "root://dtn-eic.jlab.org/" at the front and then feeding the output to the file "list.txt".
0098 
0099 ```console
0100 more list.txt
0101 root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0000.eicrecon.tree.edm4eic.root
0102 root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0001.eicrecon.tree.edm4eic.root
0103 ...
0104 ```
0105 We could then, for example, feed this list to a TChain -
0106 
0107 ```console
0108 TChain events("events")
0109 std::ifstream in("list.txt")
0110 std::string file("")
0111 while (in >> file) events.Add(file.data())
0112 events.Scan("@MCParticles.size()","","",10)
0113 ```
0114 Where in the final line we're only going to skim over the first 10 events.
0115 
0116 It should be noted that the best solution may just be to run the files from the server, rather than copying them to somewhere else and running them there.