Warning, /tutorial-analysis/_episodes/01-introduction.md is written in an unsupported language. File is not indexed.
0001 ---
0002 title: "Introduction"
0003 teaching: 15
0004 exercises: 5
0005 questions:
0006 - "How do I locate and access the simulation output?"
0007 objectives:
0008 - "Understand how the simulation output is organized"
0009 - "Know how to access the simulation output using Jefferson Lab xrootd"
0010 keypoints:
0011 - "Use `xrdfs` from within the eic-shell to browse available files from simulations campaigns."
0012 - "Use `xrdcp` from within eic-shell to copy files to your local environment."
0013 - "Within eic-shell, you can also stream files directly in your root macros."
0014 ---
0015
0016 More detailed information on the simulation productions, including the information presented below, can be found on the [Simulation Production Campaign Website](https://eic.github.io/epic-prod/).
0017
0018 ## Simulation Files Organization
0019
0020 There are three broad classes of files stored on xrootd, each in their own directory:
0021 - EVGEN: The input hepmc3 datasets
0022 - E.g. some files that have been supplied by a physics event generator
0023 - FULL: The full GEANT4 output root files (usually only saved for a fraction of runs)
0024 - If running a simulation yourself, this would be your output from processing npsim
0025 - RECO: The output root files from the reconstruction
0026 - And again, if running yourself, this would be your output from EICrecon (after you've used your awesome new reconstruction algorithm from the later tutorial of course)
0027
0028 Most users will interact with the files in the RECO directory and that is what we will focus on in this tutorial. Within the RECO directory, files are organized by campaign (25.01.1 for the January 2025 campaign, for example), detector configuration and then physics process. Each physics process will have different sub directories, for example generator version, energy, or Q2. The directory structure and number of reconstructed files for each campaign can be found on the Simulation Website [here](https://eic.github.io/epic-prod/campaigns/campaigns_reco.html).
0029
0030 ## Access Simulation from Jefferson Lab xrootd
0031
0032 The prefered method for browsing the simulation output is to use xrootd from within the eic-shell. To browse the directory structure and exit, one can run the commands:
0033 ```console
0034 ./eic-shell
0035 xrdfs root://dtn-eic.jlab.org
0036 ls /volatile/eic/EPIC/RECO/25.01.1
0037 exit
0038 ```
0039 It is also possible to copy a file and open it locally using the `xrdcp` command:
0040 ```console
0041 ./eic-shell
0042 xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/path-to-file .
0043 exit
0044 ```
0045
0046 Files can also be coppied locally by replacing `ls` with `cp`.
0047
0048 > For earlier simulation campaigns, the destination is /work/eic2/EPIC rather than /volatile/eic/EPIC
0049 {: .callout}
0050
0051 ## Streaming Files
0052
0053 It is also possible to open a file directly in ROOT. Note that the following command should be executed after opening root and `TFile::Open()` should be used:
0054 ```console
0055 auto f = TFile::Open("root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/path-to-file")
0056 ```
0057
0058 ## Reminder - Download a file for the next step!
0059
0060 We will need a file to analyse going forward, if you have not done so, download a file now!
0061
0062 Grab a file from -
0063
0064 ```console
0065 /volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/
0066 ```
0067 For example -
0068
0069 ```console
0070 xrdcp root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_5.0001.eicrecon.tree.edm4eic.root ./
0071 ```
0072 Note that the ./ at the end is the target location to copy to. Change this as desired.
0073
0074 > Note that we can also specify a different filename to copy to as we could with a normal cp command. You might want to do this as the filename is a little cumbersome.
0075 > I called mine NC_DIS_18x275_JanCampaign.root, just replace ./ with your file name of choice.
0076 {: .callout}
0077
0078 You can also stream the file if you prefer, just copy the path of the file above. You will need to modify the scripts later in the tutorial accordingly to account for this.
0079
0080 ## Advanced Use Case - Grabbing a whole bunch of files
0081
0082 I won't go through this in the tutorial, but this may be something you want to come back to as you get deeper into writing and using your own analysis code. This advanced use case involves copying/using a large number of processed files. Something you might want to do once your analysis is out of the testing phase and onto the "Let's process ALL of the data!" stage.
0083
0084 If you're moving a lot of files around, you might normally resort to using a wildcard -
0085
0086 ```console
0087 cp File* My_Folder/
0088 ```
0089
0090 or similar. However, with xrdcp, this isn't so trivial. Some methods to test and try are include below.
0091
0092 where here we're finding things in the given path that match the name pattern provided, and copying them to our current directory.
0093
0094 Alternatively, you could grab a list of the files you want and pipe them to a file -
0095
0096 ```console
0097 xrdfs root://dtn-eic.jlab.org ls /volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10 | sed 's|^|root://dtn-eic.jlab.org/|g' > list.txt
0098 ```
0099
0100 In this case, we're listing all files on the server in that path, piping them to sed and inserting "root://dtn-eic.jlab.org/" at the front and then feeding the output to the file "list.txt".
0101
0102 ```console
0103 more list.txt
0104 root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0000.eicrecon.tree.edm4eic.root
0105 root://dtn-eic.jlab.org//volatile/eic/EPIC/RECO/25.01.1/epic_craterlake/DIS/NC/18x275/minQ2=10/pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0001.eicrecon.tree.edm4eic.root
0106 ...
0107 ```
0108 We could then, for example, feed this list to a TChain -
0109
0110 ```console
0111 TChain events("events")
0112 std::ifstream in("list.txt")
0113 std::string file("")
0114 while (in >> file) events.Add(file.data())
0115 events.Scan("@MCParticles.size()","","",10)
0116 ```
0117 Where in the final line we're only going to skim over the first 10 events.
0118
0119 It should be noted that the best solution may just be to run the files from the server, rather than copying them to somewhere else and running them there.