tutorial-analysis/_episodes/03-analysis.md

0001 ---
0002 title: "Analyzing the Reconstruction Output"
0003 teaching: 20
0004 exercises: 40
0005 questions:
0006 - "How does one utilize the reconstruction output trees to do an analysis?"
0007 objectives:
0008 - "Become familiar with methods for reading the trees"
0009 - "Understand how to access truth/particle information"
0010 - "Find track efficiency and resolution"
0011 keypoints:
0012 - "Flat tree structure provides flexibility in analysis."
0013 - "The ReconstructedChargedParticles branch holds information on reconstructed tracks."
0014 ---
0015
0016 So far, we have only looked at (and plotted) some information from our file interactively. This is very useful and can help us identify the variables we want to deal with. However, we can't really use these techniques to conduct a full analysis of the data. To do so, we typically use a script or macro. In this part of the tutorial, we will create a script that we can use to do a relatively straightforward analysis of our file.
0017
0018 > Note:
0019 > - The [branch dictionary]({{ page.root }}{% link _extras/branch_dictionary.md %}) outlines all of the branches we will need to utilise in this section.
0020 > - If you want, you can prune the branches you don't need from the input file using the [TreePrune.C script]({{ page.root }}{% link _extras/tree_pruning_script.md %}).
0021 {: .callout}
0022
0023 ## Reading the Output Trees
0024
0025 The simulation output trees are "flat" in the sense that there is no event class structure embedded within the tree and no additional libraries are needed to handle the output. Therefore, the end user can simply read the values stored in each branch using whatever method/workflow they are most comfortable with. Examples of several common methods for reading the trees are provided below. We will see a ROOT TTreeReader based example using a ROOT macro and a python/uproot based version. There is also an example using the (relatively) new RDataFrame class of ROOT. During the tutorial, you should try the exercise using whichever language you feel most comfortable with.
0026
0027 ## Sample Analysis with ROOT TTreeReader: Track Efficiency and Resolution
0028
0029 As a sample exercise to become familiar with the simulation output and how to use it in a realistic setting, we will find the tracking eficiency and resolution. We will need to access the reconstructed track information and the truth particle information and we will have to associate the individual tracks and particles to one another.
0030
0031 Before we begin, we should create a skeleton macro to handle file I/O. For the `TTreeReader` example, we will use a simple ROOT macro. Using your favorite text editor, create a file with a name like `trackAnalysis.C` or something similar and copy in the following code:
0032
0033 ```c++
0034 void trackAnalysis(TString infile="path_to_your_simu_file")
0035   {
0036     // Set output file for the histograms
0037     TFile *ofile = TFile::Open("out.hist.root","RECREATE");
0038
0039     // Analysis code will go here
0040
0041     ofile->Write(); // Write histograms to file
0042     ofile->Close(); // Close output file
0043   }
0044 ```
0045
0046 We will need momentum, generator status, and particle species information for the truth particles and momentum information for the reconstructed tracks. The reconstructed track information can be accessed from two different branches: CentralCKFTrackParameters and ReconstructedChargedParticles. We can access these branches using a TTreeReaderArray.
0047
0048 > ROOT TTreeReaderArrays:
0049 >
0050 >TTreeReader and the associated TTreeReaderArray is a simple interface for reading data from a TTree. The class description and examples can be seen [here](https://root.cern/doc/v630/classTTreeReader.html). To instantiate the reader and access values from a given branch (e.g. the MCParticles branch), one would use the following calls:
0051 >
0052 > ```c++
0053 > // Set up input file chain
0054 > TChain *mychain = new TChain("events");
0055 > mychain->Add(infile);
0056 >
0057 > // Initialize reader
0058 > TTreeReader tree_reader(mychain);
0059 >
0060 > // Access whatever data-members you need
0061 > TTreeReaderArray<int> partGenStat(tree_reader, "MCParticles.generatorStatus");
0062 > TTreeReaderArray<float> partMomX(tree_reader, "MCParticles.momentum.x");
0063 > ...
0064 > ```
0065 >
0066 > The branches and their members can be viewed by opening a file with TBrowser (`new TBrowser()`) from within ROOT. Once you have defined the `TTreeReaderArray` objects for the data-members you want to look at, you can loop over the events and the members within that event:
0067 >
0068 > ```c++
0069 > while(tree_reader.Next()) { // Loop over events
0070 >  for(unsigned int i=0; i<partGenStat.GetSize(); i++) // Loop through particles in the event
0071 >    {
0072 >      int particleStatus = partGenStat[i]; // Access data-members as you would an array
0073 >      float particleXMomentum = partMomX[i]; // partMomX should have same number of entries as partGenStat because they are in the same branch
0074 >      ...
0075 >    }
0076 > }
0077 > ```
0078 > All members of the same branch should have the same number of entries, so it is sufficient to use any member of the branch to set the limit of your loop.
0079 {: .callout}
0080
0081 We will proceed using the ReconstructedChargedParticles branch as this will give us a chance to practice using associations, copy the following lines into your analysis macro.
0082
0083 ```c++
0084 // Set up input file chain
0085 TChain *mychain = new TChain("events");
0086 mychain->Add(infile);
0087
0088 // Initialize reader
0089 TTreeReader tree_reader(mychain);
0090
0091 // Get Particle Information
0092 TTreeReaderArray<int> partGenStat(tree_reader, "MCParticles.generatorStatus");
0093 TTreeReaderArray<float> partMomX(tree_reader, "MCParticles.momentum.x");
0094 TTreeReaderArray<float> partMomY(tree_reader, "MCParticles.momentum.y");
0095 TTreeReaderArray<float> partMomZ(tree_reader, "MCParticles.momentum.z");
0096 TTreeReaderArray<int> partPdg(tree_reader, "MCParticles.PDG");
0097
0098 // Get Reconstructed Track Information
0099 TTreeReaderArray<float> trackMomX(tree_reader, "ReconstructedChargedParticles.momentum.x");
0100 TTreeReaderArray<float> trackMomY(tree_reader, "ReconstructedChargedParticles.momentum.y");
0101 TTreeReaderArray<float> trackMomZ(tree_reader, "ReconstructedChargedParticles.momentum.z");
0102
0103 // Get Associations Between MCParticles and ReconstructedChargedParticles
0104 TTreeReaderArray<unsigned int> recoAssoc(tree_reader, "ReconstructedChargedParticleAssociations.recID");
0105 TTreeReaderArray<unsigned int> simuAssoc(tree_reader, "ReconstructedChargedParticleAssociations.simID");
0106 ```
0107
0108 The last two lines encode the association between a ReconstructedChargedParticle and a MCParticle where the matching is determined in the [ParticlesWithPID](https://github.com/eic/EICrecon/blob/main/src/algorithms/pid/ParticlesWithPID.cc) algorithm which generates the ReconstructedChargedParticle objects.
0109
0110 > Compiling ROOT Macros:
0111 > - If you are analysing a large number of events, you may wish to compile your macro to increase throughput. An example of how you can create and compile a root macro is included in the [Exercise Scripts section](https://eic.github.io/tutorial-analysis/exercise_scripts/index.html#compiled-root-scripts)
0112 {: .callout}
0113
0114 ### Efficiency Analysis
0115
0116 > Hint:
0117 > Refer to [the script template](https://eic.github.io/tutorial-analysis/exercise_scripts/index.html#efficiencyanalysisc) if you're having trouble putting things in the right place.
0118 {: .callout}
0119
0120 Now that we have access to the data we need we will begin constructing our efficiency plots, starting with efficiency as a function of the true particle pseudorapidity. The basic strategy is outlined below:
0121
0122 1. Loop over all events in the file
0123 2. Within each event, loop over all stable charged particles
0124 3. Identify the ReconstructedChargedParticle (if any) associated with the truth particle we are looking at
0125 4. Create and fill the necessary histograms
0126
0127 Here is the code to implement these steps:
0128
0129 ```c++
0130 // Define Histograms
0131 TH1D *partEta = new TH1D("partEta","Eta of Thrown Charged Particles;Eta",100,-5.,5.);
0132 TH1D *matchedPartEta = new TH1D("matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track",100,-5.,5.);
0133
0134 TH1D *matchedPartTrackDeltaR = new TH1D("matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charged Particle",5000,0.,5.);
0135
0136 while(tree_reader.Next()) { // Loop over events
0137
0138   for(unsigned int i=0; i<partGenStat.GetSize(); i++) // Loop over thrown particles
0139     {
0140             if(partGenStat[i] == 1) // Select stable thrown particles
0141               {
0142                 int pdg = TMath::Abs(partPdg[i]);
0143
0144                 if(pdg == 11 || pdg == 13 || pdg == 211 || pdg == 321 || pdg == 2212) // Look at charged particles (electrons, muons, pions, kaons, protons)
0145                   {
0146                           TVector3 trueMom(partMomX[i],partMomY[i],partMomZ[i]);
0147
0148                           float trueEta = trueMom.PseudoRapidity();
0149                           float truePhi = trueMom.Phi();
0150
0151                           partEta->Fill(trueEta);
0152
0153               // Loop over associations to find matching ReconstructedChargedParticle
0154                           for(unsigned int j=0; j<simuAssoc.GetSize(); j++)
0155                             {
0156                               if(simuAssoc[j] == i) // Find association index matching the index of the thrown particle we are looking at
0157                                 {
0158                                         TVector3 recMom(trackMomX[recoAssoc[j]],trackMomY[recoAssoc[j]],trackMomZ[recoAssoc[j]]); // recoAssoc[j] is the index of the matched ReconstructedChargedParticle
0159
0160                       // Check the distance between the thrown and reconstructed particle
0161                                         float deltaEta = trueEta - recMom.PseudoRapidity();
0162                                         float deltaPhi = TVector2::Phi_mpi_pi(truePhi - recMom.Phi());
0163                                         float deltaR = TMath::Sqrt(deltaEta*deltaEta + deltaPhi*deltaPhi);
0164
0165                                         matchedPartTrackDeltaR->Fill(deltaR);
0166
0167                                         matchedPartEta->Fill(trueEta); // Plot the thrown eta if a matched ReconstructedChargedParticle was found
0168                     }
0169                 }
0170             }
0171         }
0172     }
0173 }
0174 ```
0175
0176 We should now have everything we need to find the track efficiency as a function of pseudorapidity. To run the macro and produce an output file containing the histograms we defined, simply type `root -l -q trackAnalysis.C`. After the macro runs, you can open the root file to inspect the histograms. The efficiency can be found by taking the ratio of matchedPartEta over partEta.
0177
0178 > Question:
0179 > - Do the histogram ranges make sense?
0180 > - We plot the distance between thrown and reconstructed charged partices, does this distribution look reasonable?
0181 > - When filling the matchedPartEta histogram (the numerator in our efficiency), why do we use again the true thrown eta instead of the associated reconstructed eta?
0182 {: .callout}
0183
0184 > Exercise:
0185 > - Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?
0186 > - Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta
0187 > - Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle
0188 {: .challenge}
0189
0190 ### Resolution Analysis
0191
0192 > Hint:
0193 > Refer to [the script template](https://eic.github.io/tutorial-analysis/exercise_scripts/index.html#resolutionanalysisc) if you're having trouble putting things in the right place.
0194 {: .callout}
0195
0196 Next, we will look at track momentum resolution, that is, how well the momentum of the reconstructed track matches that of the thrown particle. We should have all of the "infrastructure" we need in place to do the analysis, we just need to define the appropriate quantities and make the histograms. It only makes sense to define the resolution for tracks and particles which are associated with one another, so we will work within the loop over associations. Define the resolution expression and fill a simple histogram:
0197
0198 ```c++
0199 TH1D *trackMomentumRes = new TH1D("trackMomentumRes","Track Momentum Resolution",2000,-10.,10.);
0200 ...
0201 // Loop over associations to find matching ReconstructedChargedParticle
0202 for(unsigned int j=0; j<simuAssoc.GetSize(); j++)
0203   {
0204     if(simuAssoc[j] == i) // Find association index matching the index of the thrown particle we are looking at
0205       {
0206         ...
0207         double momRes = (recMom.Mag() - trueMom.Mag())/trueMom.Mag();
0208
0209         trackMomentumRes->Fill(momRes);
0210       }
0211   }
0212 ```
0213
0214 While this plot will give us a sense of what the tracking resolution is, we don't expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.
0215
0216 > Exercise:
0217 > - Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.
0218 > - Break resolution plots down by particle species.
0219 {: .challenge}
0220
0221 > Question:
0222 > - Will the histogram ranges for each particle species be the same?
0223 > - Could we present the resolution values in a more understandable way?
0224 {: .callout}
0225
0226 ## Sample Analysis with Python/uproot: Track Efficiency and Resolution
0227
0228 > Comment:
0229 > Despite using python/uproot, I have written these in a very "ROOT"/C way. Uproot converts our branches to arrays, so you can manipulate them in various fun ways using more pythonic methods if you want.
0230 > Shujie Li utilises a more pythonic approach in the [Understanding the Simulation Output](https://eic.github.io/tutorial-understanding-sim-output/) tutorial. You can see some examples of this in the [Jupyter Notebook](https://colab.research.google.com/drive/1Wn9guq1aIJ8RUW36HHTkeR7-iPPcoBOw?usp=sharing) Shujie set up for that tutorial.
0231 {: .callout}
0232
0233 If you are more familiar with python than you are with C/C++, you might find that using a python based root macro is easier for you. Outlined below are sample blocks of code for creating and running a python based analysis script.
0234
0235 With python, some tasks become easier, e.g. string manipulation and writing to (non ROOT) files.
0236
0237 Before we begin, we should create a skeleton macro to handle file I/O. For this example, we will make a simple python script. Using your favorite editor, create a file with a name like `trackAnalysis.py` or something similar and copy in the following code:
0238
0239 ```python
0240 #! /usr/bin/python
0241
0242 #Import relevant packages
0243 import ROOT, math, array
0244 from ROOT import TH1F, TH2F, TMath, TTree, TVector3, TVector2
0245 import uproot as up
0246
0247 # Define and open files
0248 infile="PATH_TO_FILE"
0249 ofile=ROOT.TFile.Open("TrackAnalysis_OutPy.root", "RECREATE")
0250
0251 # Open input file and define branches we want to look at with uproot
0252 events_tree = up.open(infile)["events"]
0253
0254 # Define histograms below
0255
0256 # Add main analysis loop(s) below
0257
0258 # Write output histograms to file below
0259
0260 # Close files
0261 ofile.Close()
0262 ```
0263 Note that we are using the module uproot to access the data here. See [further documentation here](https://masonproffitt.github.io/uproot-tutorial/03-trees/index.html). You may also need some of the other included packages too.
0264
0265 > We will use uproot a little bit like we use the TTreeReader in the other example. We can define the branches we want and assign them to arrays with uproot.
0266 > We can do this via:
0267 >  ```python
0268 > # Open input file and define branches we want to look at with uproot
0269 > events_tree = up.open(infile)["events"]
0270 > # Get particle information# Get particle information
0271 > partGenStat = events_tree["MCParticles.generatorStatus"].array()
0272 > partMomX = events_tree["MCP articles.momentum.x"].array()
0273 > partMomY = events_tree["MCParticles.momentum.y"].array()
0274 > partMomZ = events_tree["MCParticles.momentum.z"].array()
0275 > partPdg = events_tree["MCParticles.PDG"].array()
0276 >
0277 > # Get reconstructed track information
0278 > trackMomX = events_tree["ReconstructedChargedParticles.momentum.x"].array()
0279 > trackMomY = events_tree["ReconstructedChargedParticles.momentum.y"].array()
0280 > trackMomZ = events_tree["ReconstructedChargedParticles.momentum.z"].array()
0281 >  ...
0282 > ```
0283 >  We can then access them as an array in a loop -
0284 > ```python
0285 > # Add main analysis loop(s) below
0286 > for i in range(0, len(partGenStat)): # Loop over all events
0287 >     for j in range(0, len(partGenStat[i])): # Loop over all thrown particles
0288 >         if partGenStat[i][j] == 1: # Select stable particles
0289 >             pdg = abs(partPdg[i][j]) # Get PDG for each stable particle
0290 >             ...
0291 > ```
0292 > Uproot effectively takes the information in the tree, and turns it into an array. We can then acces and manipulate this array in the same way that we can with any array in python.
0293 >
0294 > Note that if you are using an older version of uproot (v2.x.x), you will need to access the branches slightly differently via -
0295 > ```python
0296 > partGenStat = events_tree.array("MCParticles.generatorStatus")
0297 > ```
0298 {: .callout}
0299
0300 You can run this file with ``python3 trackAnalysis.py``. It should open your file and create an empty output root file as specified. We will add histograms to this script and fill them in the next step.
0301
0302 Note that depending upon your setup, ``python trackAnalysis.py`` may work too.
0303
0304 ### Efficiency Analysis
0305
0306 > Hint:
0307 > Refer to [the script template](https://eic.github.io/tutorial-analysis/exercise_scripts/index.html#efficiencyanalysispy) if you're having trouble putting things in the right place.
0308 {: .callout}
0309
0310 As with the ROOT TTreeReader example, we will find the tracking eficiency and resolution. We will need to access the reconstructed track information and the truth particle information and we will have to associate the individual tracks and particles to one another.
0311
0312 The basic strategy is the same:
0313
0314 1. Loop over all events in the file
0315 2. Within each event, loop over all stable charged particles
0316 3. Identify the ReconstructedChargedParticle (if any) associated with the truth particle we are looking at
0317 4. Create and fill the necessary histograms
0318
0319 Here is the sample code to implement these steps:
0320
0321 ```python
0322 # Get assocations between MCParticles and ReconstructedChargedParticles
0323 recoAssoc = events_tree["ReconstructedChargedParticleAssociations.recID"].array()
0324 simuAssoc = events_tree["ReconstructedChargedParticleAssociations.simID"].array()
0325
0326 # Define histograms below
0327 partEta = ROOT.TH1D("partEta","Eta of Thrown Charged Particles;Eta",100, -5 ,5 )
0328 matchedPartEta = ROOT.TH1D("matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track", 100, -5 ,5);
0329 matchedPartTrackDeltaR = ROOT.TH1D("matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charge Particle", 5000, 0, 5);
0330
0331 # Add main analysis loop(s) below
0332 for i in range(0, len(partGenStat)): # Loop over all events
0333     for j in range(0, len(partGenStat[i])): # Loop over all thrown particles
0334         if partGenStat[i][j] == 1: # Select stable particles
0335             pdg = abs(partPdg[i][j]) # Get PDG for each stable particle
0336             if(pdg == 11 or pdg == 13 or pdg == 211 or pdg == 321 or pdg == 2212):
0337                 trueMom = ROOT.TVector3(partMomX[i][j], partMomY[i][j], partMomZ[i][j])
0338                 trueEta = trueMom.PseudoRapidity()
0339                 truePhi = trueMom.Phi()
0340                 partEta.Fill(trueEta)
0341                 for k in range(0,len(simuAssoc[i])): # Loop over associations to find matching ReconstructedChargedParticle
0342                     if (simuAssoc[i][k] == j):
0343                         recMom = ROOT.TVector3(trackMomX[i][recoAssoc[i][k]], trackMomY[i][recoAssoc[i][k]], trackMomZ[i][recoAssoc[i][k]])
0344                         deltaEta = trueEta - recMom.PseudoRapidity()
0345                         deltaPhi = TVector2. Phi_mpi_pi(truePhi - recMom.Phi())
0346                         deltaR = math.sqrt((deltaEta*deltaEta) + (deltaPhi*deltaPhi))
0347                         matchedPartEta.Fill(trueEta)
0348                         matchedPartTrackDeltaR.Fill(deltaR)
0349
0350 # Write output histograms to file below
0351 partEta.Write()
0352 matchedPartEta.Write()
0353 matchedPartTrackDeltaR.Write()
0354
0355 # Close files
0356 ofile.Close()
0357 ```
0358 Insert this block of code appropriately. We should now have everything we need to find the track efficiency as a function of pseudorapidity. Run the script with `python3 trackAnalysis.py``. This should produce a root file with a few histograms in place. The efficiency can be found by taking the ratio of matchedPartEta over partEta.
0359
0360 > Question:
0361 > - Do the hisotgram ranges make sense?
0362 > - We plot the distance between thrown and reconstructed charged partices, does this distribution look reasonable?
0363 > - When filling the matchedPartEta histogram (the numerator in our efficiency), why do we use again the true thrown eta instead of the associated reconstructed eta?
0364 {: .callout}
0365
0366 > Exercise:
0367 > - Find the efficiency as a function of particle momentum. Are there cuts on any other quantities you should place to get a sensible result?
0368 > - Find the efficiency for some 2-D correlations: momentum vs eta; phi vs eta
0369 > - Plot some kinematic distributions (momentum, eta, etc) for all ReconstructedChargedParticles, not just those that are associated with a thrown particle
0370 {: .challenge}
0371
0372 ### Resolution Analysis
0373
0374 > Hint:
0375 > Refer to [the script template](https://eic.github.io/tutorial-analysis/exercise_scripts/index.html#resolutionanalysispy) if you're having trouble putting things in the right place.
0376 {: .callout}
0377
0378 Next, we will look at track momentum resolution, that is, how well the momentum of the reconstructed track matches that of the thrown particle. We should have all of the "infrastructure" we need in place to do the analysis, we just need to define the appropriate quantities and make the histograms. It only makes sense to define the resolution for tracks and particles which are associated with one another, so we will work within the loop over associations. Define the resolution expression and fill a simple histogram by inserting this block of code appropriately:
0379
0380 ```python
0381 trackMomentumRes = ROOT.TH1D("trackMomentumRes","Track Momentum Resolution",2000,-10.,10.);
0382 ...
0383                 for k in range(0,len(simuAssoc[i])): # Loop over associations to find matching ReconstructedChargedParticle
0384                     if (simuAssoc[i][k] == j):
0385                         recMom = ROOT.TVector3(trackMomX[i][recoAssoc[i][k]], trackMomY[i][recoAssoc[i][k]], trackMomZ[i][recoAssoc[i][k]])
0386                         momRes = (recMom.Mag() - trueMom.Mag())/trueMom.Mag()
0387
0388                         trackMomentumRes.Fill(momRes)
0389 ```
0390
0391 Remember to write this histogram to the output file too! While this plot will give us a sense of what the tracking resolution is, we don't expect the resolution to be constant for all momenta or eta. We can get a more complete picture by plotting the resolution as a function of different kinematic quantities.
0392
0393 > Exercise:
0394 > - Make 2-D plots of resolution vs true momentum and vs true pseudorapidity.
0395 > - Break resolution plots down by particle species.
0396 {: .challenge}
0397
0398 > Question:
0399 > - Will the histogram ranges for each particle species be the same?
0400 > - Could we present the resolution values in a more understandable way?
0401 {: .callout}
0402
0403 ## ROOT RDataFrames
0404
0405 > Note:
0406 > - This method does actually need you to be within eic-shell (or somewhere else with the correct EDM4hep/EDM4eic libraries installed).
0407 {: .callout}
0408
0409 Newer versions of root, such as the version in eic-shell, have access to a relatively new class, [RDataFrames](https://root.cern/doc/master/classROOT_1_1RDataFrame.html). These are similar to pythonic data frame style strucutres that you may be familiar with. Some people are moving towards utilising RDataFrames in their analysis. If you are more familiar with working with data frames, you may wish to investigate these further.
0410
0411 Included below is a quick script from [Simon Gardner](https://github.com/simonge/EIC_Analysis/blob/main/Analysis-Tutorial/EfficiencyAnalysisRDF.C) that utilises RDataFrames to analyse a data file. Copy the following into a new file called `EfficiencyAnalysisRDF.C` -
0412
0413 ```c++
0414 #include <edm4hep/utils/vector_utils.h>
0415 #include <edm4hep/MCParticle.h>
0416 #include <edm4eic/ReconstructedParticle.h>
0417 #include <ROOT/RDataFrame.hxx>
0418 #include <ROOT/RVec.hxx>
0419 #include <TFile.h>
0420
0421 // Define aliases for the data types
0422 using MCP = edm4hep::MCParticleData;
0423 using RecoP = edm4eic::ReconstructedParticleData;
0424
0425 // Define function to vectorize the edm4hep::utils methods
0426 template <typename T>
0427 auto getEta = [](ROOT::VecOps::RVec<T> momenta) {
0428   return ROOT::VecOps::Map(momenta, [](const T& p) { return edm4hep::utils::eta(p.momentum); });
0429 };
0430
0431 template <typename T>
0432 auto getPhi = [](ROOT::VecOps::RVec<T> momenta) {
0433   return ROOT::VecOps::Map(momenta, [](const T& p) { return edm4hep::utils::angleAzimuthal(p.momentum); });
0434 };
0435
0436 // Define the function to perform the efficiency analysis
0437 void EfficiencyAnalysisRDF(TString infile="PATH_TO_FILE"){
0438
0439   // Set up input file
0440   ROOT::RDataFrame df("events", infile);
0441
0442   // Define new dataframe node with additional columns
0443   auto df1 =  df.Define("statusFilter",  "MCParticles.generatorStatus == 1"    )
0444                 .Define("absPDG",        "abs(MCParticles.PDG)"                )
0445                 .Define("pdgFilter",     "absPDG == 11 || absPDG == 13 || absPDG == 211 || absPDG == 321 || absPDG == 2212")
0446                 .Define("particleFilter","statusFilter && pdgFilter"           )
0447                 .Define("filtMCParts",   "MCParticles[particleFilter]"         )
0448                 .Define("assoFilter",    "Take(particleFilter,ReconstructedChargedParticleAssociations.simID)") // Incase any of the associated particles happen to not be charged
0449                 .Define("assoMCParts",   "Take(MCParticles,ReconstructedChargedParticleAssociations.simID)[assoFilter]")
0450                 .Define("assoRecParts",  "Take(ReconstructedChargedParticles,ReconstructedChargedParticleAssociations.recID)[assoFilter]")
0451                 .Define("filtMCEta",     getEta<MCP>   , {"filtMCParts"} )
0452                 .Define("filtMCPhi",     getPhi<MCP>   , {"filtMCParts"} )
0453                 .Define("accoMCEta",     getEta<MCP>   , {"assoMCParts"} )
0454                 .Define("accoMCPhi",     getPhi<MCP>   , {"assoMCParts"} )
0455                 .Define("assoRecEta",    getEta<RecoP> , {"assoRecParts"})
0456                 .Define("assoRecPhi",    getPhi<RecoP> , {"assoRecParts"})
0457                 .Define("deltaR",        "ROOT::VecOps::DeltaR(assoRecEta, accoMCEta, assoRecPhi, accoMCPhi)");
0458
0459   // Define histograms
0460   auto partEta                = df1.Histo1D({"partEta","Eta of Thrown Charged Particles;Eta",100,-5.,5.},"filtMCEta");
0461   auto matchedPartEta         = df1.Histo1D({"matchedPartEta","Eta of Thrown Charged Particles That Have Matching Track",100,-5.,5.},"accoMCEta");
0462   auto matchedPartTrackDeltaR = df1.Histo1D({"matchedPartTrackDeltaR","Delta R Between Matching Thrown and Reconstructed Charged Particle",5000,0.,5.},"deltaR");
0463
0464   // Write histograms to file
0465   TFile *ofile = TFile::Open("EfficiencyAnalysis_Out_RDF.root","RECREATE");
0466
0467   // Booked Define and Histo1D lazy actions are only performed here
0468   partEta->Write();
0469   matchedPartEta->Write();
0470   matchedPartTrackDeltaR->Write();
0471
0472   ofile->Close(); // Close output file
0473 }
0474 ```
0475 > Note:
0476 > - You will need to run this script with the command `root -l -q EfficiencyAnalysisRDF.C++`, within eic-shell (or somewhere else with the correct EDM4hep/EDM4eic libraries installed).
0477 > - Remember to put in the correct file path.
0478 {: .callout}
0479
0480 If you like, you can try completing the exercises using this example to start from.
0481
0482 ## PODIO - Direct Analysis
0483
0484 If you want to avoid ROOT entirely, you can analyse the PODIO files directly in a variety of ways.
0485
0486 See [Wouter's example use cases](https://indico.cern.ch/event/1343984/contributions/5908856/attachments/2842958/4970156/2024-04-23%20-%20Examples%20for%20Data%20Model%20Usage.pdf) from 23/04/24. Wouter shows a few ways in which the PODIO file can be accssed and analysed directly.