Back to home page

EIC code displayed by LXR

 
 

    


Warning, /python-analysis-bootcamp/Getting Started.ipynb is written in an unsupported language. File is not indexed.

0001 {
0002  "cells": [
0003   {
0004    "cell_type": "markdown",
0005    "id": "2a3842f0",
0006    "metadata": {
0007     "tags": []
0008    },
0009    "source": [
0010     "# Getting Started"
0011    ]
0012   },
0013   {
0014    "cell_type": "markdown",
0015    "id": "f7fa16dd",
0016    "metadata": {},
0017    "source": [
0018     "The ePIC collaboration is providing full simulation data files in the ROOT data format through the XRootD service at Jefferson Lab. This allows analysis without the need to download any data files."
0019    ]
0020   },
0021   {
0022    "cell_type": "markdown",
0023    "id": "c372456b",
0024    "metadata": {},
0025    "source": [
0026     "In this notebook we show how to load a file from the XRootD service using the [uproot](https://pypi.org/project/uproot/) python library. This allows for seemless interfacing with many data science and machine learning tools."
0027    ]
0028   },
0029   {
0030    "cell_type": "markdown",
0031    "id": "dd87e3d7",
0032    "metadata": {},
0033    "source": [
0034     "## Importing uproot"
0035    ]
0036   },
0037   {
0038    "cell_type": "markdown",
0039    "id": "808f3de2",
0040    "metadata": {},
0041    "source": [
0042     "Depending on the versions of uproot and XRootD that you have installed, you may encouter a warning from uproot below. Nevertheless, because of the simple data format of the ePIC ROOT files, we are able to ignore this warning."
0043    ]
0044   },
0045   {
0046    "cell_type": "code",
0047    "execution_count": null,
0048    "id": "crazy-lambda",
0049    "metadata": {},
0050    "outputs": [],
0051    "source": [
0052     "import uproot as ur\n",
0053     "print('Uproot version: ' + ur.__version__)"
0054    ]
0055   },
0056   {
0057    "cell_type": "markdown",
0058    "id": "02a85d93",
0059    "metadata": {},
0060    "source": [
0061     "## Opening a file with uproot"
0062    ]
0063   },
0064   {
0065    "cell_type": "markdown",
0066    "id": "54f3e386",
0067    "metadata": {},
0068    "source": [
0069     "To test uproot, we will open a sample file (a single reconstructed DIS NC output file):"
0070    ]
0071   },
0072   {
0073    "cell_type": "code",
0074    "execution_count": null,
0075    "id": "defensive-dressing",
0076    "metadata": {},
0077    "outputs": [],
0078    "source": [
0079     "server = 'root://dtn-eic.jlab.org//work/eic2/'\n",
0080     "dir = 'EPIC/RECO/23.06.1/epic_brycecanyon/DIS/NC/18x275/minQ2=10/'\n",
0081     "file = 'pythia8NCDIS_18x275_minQ2=10_beamEffects_xAngle=-0.025_hiDiv_1.0000.eicrecon.tree.edm4eic.root'"
0082    ]
0083   },
0084   {
0085    "cell_type": "code",
0086    "execution_count": null,
0087    "id": "wicked-amsterdam",
0088    "metadata": {},
0089    "outputs": [],
0090    "source": [
0091     "events = ur.open(server + dir + file + ':events')"
0092    ]
0093   },
0094   {
0095    "cell_type": "markdown",
0096    "id": "ab722977",
0097    "metadata": {},
0098    "source": [
0099     "## Exploring the file contents"
0100    ]
0101   },
0102   {
0103    "cell_type": "markdown",
0104    "id": "88d449e9",
0105    "metadata": {},
0106    "source": [
0107     "We can now look into the file, including all its branches. Let's take a look at the possible 'keys':"
0108    ]
0109   },
0110   {
0111    "cell_type": "code",
0112    "execution_count": null,
0113    "id": "439355a7",
0114    "metadata": {},
0115    "outputs": [],
0116    "source": [
0117     "events.keys()"
0118    ]
0119   },
0120   {
0121    "cell_type": "markdown",
0122    "id": "21ee8c79",
0123    "metadata": {},
0124    "source": [
0125     "That is a lot of branches!\n",
0126     "\n",
0127     "Maybe we are only interested in a few branches. Let's look at the branch with particles reconstructed by the track reconstruction algorithms:"
0128    ]
0129   },
0130   {
0131    "cell_type": "code",
0132    "execution_count": null,
0133    "id": "4c527b91",
0134    "metadata": {},
0135    "outputs": [],
0136    "source": [
0137     "events.keys('ReconstructedChargedParticles.*')"
0138    ]
0139   },
0140   {
0141    "cell_type": "markdown",
0142    "id": "970fb7e1",
0143    "metadata": {},
0144    "source": [
0145     "## Making a simple plot"
0146    ]
0147   },
0148   {
0149    "cell_type": "markdown",
0150    "id": "62ee43fd",
0151    "metadata": {},
0152    "source": [
0153     "Of course, we came here to create plots, not just look at branches. Uproot can give us the data from branches in `numpy` arrays. From there, we can use `matplotlib` to create a histogram. Let's do this with the momentum."
0154    ]
0155   },
0156   {
0157    "cell_type": "code",
0158    "execution_count": null,
0159    "id": "a3e18a64",
0160    "metadata": {},
0161    "outputs": [],
0162    "source": [
0163     "reconstructed_charged_particles = events['ReconstructedChargedParticles'].arrays()"
0164    ]
0165   },
0166   {
0167    "cell_type": "markdown",
0168    "id": "b1b4f7e0",
0169    "metadata": {},
0170    "source": [
0171     "If you are running this on a Jupyter instance that displays the memory use, then you will see that the previous step corresponds to an increase in memory use. This will be important to keep in mind. Since you are accessing files that are (in some cases) several GBs large, you will likely want to avoid reading all arrays from an entire file in memory, even on regular servers."
0172    ]
0173   },
0174   {
0175    "cell_type": "markdown",
0176    "id": "592a9a99-1e33-4c50-ae39-c8fe7bad1e3c",
0177    "metadata": {},
0178    "source": [
0179     "Let's start by taking a look at the `energy` variables in the array we just obtained."
0180    ]
0181   },
0182   {
0183    "cell_type": "code",
0184    "execution_count": null,
0185    "id": "be18cd70-57de-4080-a40c-e143d3e5c9b8",
0186    "metadata": {
0187     "tags": []
0188    },
0189    "outputs": [],
0190    "source": [
0191     "reconstructed_charged_particles['ReconstructedChargedParticles.energy']"
0192    ]
0193   },
0194   {
0195    "cell_type": "markdown",
0196    "id": "2eaf3638-0f29-47cc-b568-84f369b38971",
0197    "metadata": {},
0198    "source": [
0199     "As is very common in nuclear and high energy physics, these are not 'regular' numpy array, as indicated by the `var` in the dimension. This is because there are a varying number of reconstructed particles per event. We use a package `awkward` to deal with these 'awkward' arrays. In particular, we can 'regularize' these arrays using a `flatten` operation."
0200    ]
0201   },
0202   {
0203    "cell_type": "code",
0204    "execution_count": null,
0205    "id": "e7e9575a",
0206    "metadata": {
0207     "tags": []
0208    },
0209    "outputs": [],
0210    "source": [
0211     "import numpy as np\n",
0212     "import awkward as ak\n",
0213     "import matplotlib.pyplot as plt"
0214    ]
0215   },
0216   {
0217    "cell_type": "code",
0218    "execution_count": null,
0219    "id": "f5a8e4ee-d019-464a-ae35-8dc00a3584d0",
0220    "metadata": {},
0221    "outputs": [],
0222    "source": [
0223     "ak.flatten(reconstructed_charged_particles['ReconstructedChargedParticles.energy'])"
0224    ]
0225   },
0226   {
0227    "cell_type": "code",
0228    "execution_count": null,
0229    "id": "c169aec1",
0230    "metadata": {},
0231    "outputs": [],
0232    "source": [
0233     "plt.hist(ak.flatten(reconstructed_charged_particles['ReconstructedChargedParticles.energy']), range = (0, 50), bins = 50)\n",
0234     "plt.xlabel('Energy [GeV]')\n",
0235     "plt.ylabel('Events / GeV')\n",
0236     "plt.yscale('log')\n",
0237     "plt.show()"
0238    ]
0239   },
0240   {
0241    "cell_type": "code",
0242    "execution_count": null,
0243    "id": "82a4dcb3",
0244    "metadata": {},
0245    "outputs": [],
0246    "source": []
0247   }
0248  ],
0249  "metadata": {
0250   "kernelspec": {
0251    "display_name": "Python 3 (ipykernel)",
0252    "language": "python",
0253    "name": "python3"
0254   },
0255   "language_info": {
0256    "codemirror_mode": {
0257     "name": "ipython",
0258     "version": 3
0259    },
0260    "file_extension": ".py",
0261    "mimetype": "text/x-python",
0262    "name": "python",
0263    "nbconvert_exporter": "python",
0264    "pygments_lexer": "ipython3",
0265    "version": "3.10.8"
0266   }
0267  },
0268  "nbformat": 4,
0269  "nbformat_minor": 5
0270 }