Warning, /tutorial-developing-benchmarks/_episodes/03-filling_out_your_benchmark.md is written in an unsupported language. File is not indexed.
0001 ---
0002 title: "Exercise 3: Filling out your benchmark"
0003 teaching: 20
0004 exercises: 10
0005 questions: How do we fill in each stage of the benchmark pipeline?
0006 objectives:
0007 - "Fill out the many steps of your benchmark"
0008 - "Collect templates for the benchmark stages"
0009 keypoints:
0010 - "Create `setup.config` to switch between using the simulation campaign and re-simulating events"
0011 - "Each stage of the benchmark pipeline is defined in `config.yml`"
0012 - "`config.yml` takes normal bash scripts as input"
0013 - "Copy resulting figures over to the `results` directory to turn them into artifacts"
0014 ---
0015
0016 In this lesson we will be beefing up our benchmark by filling out several of the pipeline stages.
0017
0018 ## Setting up
0019
0020 Before filling out the stages for GitLab's CI and pipelines, we want to first create a file that contains some settings used by our benchmark.
0021
0022 Create a new file: [`benchmarks/your_benchmark/setup.config`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/setup.config) with the following contents
0023 ```bash
0024 #!/bin/bash
0025 source strict-mode.sh
0026
0027 USE_SIMULATION_CAMPAIGN=true
0028
0029 N_EVENTS=100
0030
0031 FILE_BASE=sim_output/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree
0032 INPUT_FILE=root://dtn-eic.jlab.org//volatile/eic/EPIC/EVGEN/EXCLUSIVE/UCHANNEL_RHO/10x100/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree.root
0033 OUTPUT_FILE=${FILE_BASE}.detectorsim.root
0034
0035 REC_FILE_BASE=${FILE_BASE}.detectorsim.edm4eic
0036 REC_FILE=${REC_FILE_BASE}.root
0037 ```
0038 The `export ENV_MODE=eicweb` lets our Snakefile know to use the paths for running on eicweb.
0039
0040 Here we've defined a switch `USE_SIMULATION_CAMPAIGN` which will allow us to alternate between using output from the simulation campaign, and dynamically simulating new events.
0041
0042 When not using the simulation campaign, the `N_EVENTS` variable defines how many events the benchmark should run.
0043 The rest of these variables define file names to be used in the benchmark.
0044
0045 Also create a new file [`benchmarks/your_benchmark/simulate.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/simulate.sh) with the following contents:
0046 ```bash
0047 #!/bin/bash
0048 source strict-mode.sh
0049 source benchmarks/your_benchmark/setup.config $*
0050
0051 if [ -f ${INPUT_FILE} ]; then
0052 echo "ERROR: Input simulation file does ${INPUT_FILE} not exist."
0053 else
0054 echo "GOOD: Input simulation file ${INPUT_FILE} exists!"
0055 fi
0056
0057 # Simulate
0058 ddsim --runType batch \
0059 -v WARNING \
0060 --numberOfEvents ${N_EVENTS} \
0061 --part.minimalKineticEnergy 100*GeV \
0062 --filter.tracker edep0 \
0063 --compactFile ${DETECTOR_PATH}/${DETECTOR_CONFIG}.xml \
0064 --inputFiles ${INPUT_FILE} \
0065 --outputFile ${OUTPUT_FILE}
0066 if [[ "$?" -ne "0" ]] ; then
0067 echo "ERROR running ddsim"
0068 exit 1
0069 fi
0070 ```
0071
0072 This script uses ddsim to simulate the detector response to your benchmark events.
0073
0074 Create a script named [`benchmarks/your_benchmark/reconstruct.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/reconstruct.sh) to manage the reconstruction:
0075 ```bash
0076 #!/bin/bash
0077 source strict-mode.sh
0078 source benchmarks/your_benchmark/setup.config $*
0079
0080 # Reconstruct
0081 if [ ${RECO} == "eicrecon" ] ; then
0082 eicrecon ${OUTPUT_FILE} -Ppodio:output_file=${REC_FILE}
0083 if [[ "$?" -ne "0" ]] ; then
0084 echo "ERROR running eicrecon"
0085 exit 1
0086 fi
0087 fi
0088
0089 if [[ ${RECO} == "juggler" ]] ; then
0090 gaudirun.py options/reconstruction.py || [ $? -eq 4 ]
0091 if [ "$?" -ne "0" ] ; then
0092 echo "ERROR running juggler"
0093 exit 1
0094 fi
0095 fi
0096
0097 if [ -f jana.dot ] ; then cp jana.dot ${REC_FILE_BASE}.dot ; fi
0098
0099 #rootls -t ${REC_FILE_BASE}.tree.edm4eic.root
0100 rootls -t ${REC_FILE}
0101 ```
0102
0103 Create a file called [`benchmarks/your_benchmark/analyze.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/analyze.sh) which will run the analysis and plotting scripts:
0104 ```bash
0105 #!/bin/bash
0106 source strict-mode.sh
0107 source benchmarks/your_benchmark/setup.config $*
0108
0109 OUTPUT_PLOTS_DIR=sim_output/nocampaign
0110 mkdir -p ${OUTPUT_PLOTS_DIR}
0111 # Analyze
0112 command time -v \
0113 root -l -b -q "benchmarks/your_benchmark/analysis/uchannelrho.cxx(\"${REC_FILE}\",\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0114 if [[ "$?" -ne "0" ]] ; then
0115 echo "ERROR analysis failed"
0116 exit 1
0117 fi
0118
0119 if [ ! -d "${OUTPUT_PLOTS_DIR}/plots_figures" ]; then
0120 mkdir "${OUTPUT_PLOTS_DIR}/plots_figures"
0121 echo "${OUTPUT_PLOTS_DIR}/plots_figures directory created successfully."
0122 else
0123 echo "${OUTPUT_PLOTS_DIR}/plots_figures directory already exists."
0124 fi
0125 root -l -b -q "benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C(\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0126 cat benchmark_output/*.json
0127 ```
0128
0129 Let's copy over our analysis script, our plotting macro & header, and our Snakefile:
0130 ```bash
0131 mkdir benchmarks/your_benchmark/analysis
0132 mkdir benchmarks/your_benchmark/macros
0133
0134 cp ../starting_script/Snakefile benchmarks/your_benchmark/
0135 cp ../starting_script/analysis/uchannelrho.cxx benchmarks/your_benchmark/analysis/
0136 cp ../starting_script/macros/RiceStyle.h benchmarks/your_benchmark/macros/
0137 cp ../starting_script/macros/plot_rho_physics_benchmark.C benchmarks/your_benchmark/macros/
0138 ```
0139
0140
0141
0142 Your benchmark directory should now look like this:
0143 
0144
0145 In order to use your Snakefile, let GitLab know it's there. Open the main `Snakefile`, NOT this one `benchmarks/your_benchmark/Snakefile`, but the one at the same level as the `benchmarks` directory.
0146
0147 Go to the very end of the file and include a path to your own Snakefile:
0148 ```python
0149 include: "benchmarks/diffractive_vm/Snakefile"
0150 include: "benchmarks/dis/Snakefile"
0151 include: "benchmarks/demp/Snakefile"
0152 include: "benchmarks/your_benchmark/Snakefile"
0153 ```
0154
0155 Once that's all setup, we can move on to actually adding these to our pipeline!
0156
0157 ## The "simulate" pipeline stage
0158 We now fill out the `simulate` stage in GitLab's pipelines. Currently the instructions for this rule should be contained in `benchmarks/your_benchmark/config.yml` as:
0159 ```yaml
0160 your_benchmark:simulate:
0161 extends: .phy_benchmark
0162 stage: simulate
0163 script:
0164 - echo "I will simulate detector response here!"
0165 ```
0166
0167 In order to make sure the previous stages finish before this one starts, add a new line below `stage:simulate`: `needs: ["common:setup"]`.
0168
0169 This step can take a long time if you simulate too many events. So let's add an upper limit on the allowed run time of 10 hours:
0170 In a new line below `needs: ["common:setup"]`, add this: `timeout: 10 hour`.
0171
0172 Now in the `script` section of the rule, add two new lines to source the `setup.config` file:
0173 ```yaml
0174 - config_file=benchmarks/your_benchmark/setup.config
0175 - source $config_file
0176 ```
0177
0178 Add instructions that if using the simulation campaign you can skip detector simulations. Otherwise simulate
0179 ```yaml
0180 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0181 - echo "Using simulation campaign so skipping this step!"
0182 - else
0183 - echo "Grabbing raw events from XRootD and running Geant4"
0184 - bash benchmarks/your_benchmark/simulate.sh
0185 - echo "Geant4 simulations done! Starting eicrecon now!"
0186 - bash benchmarks/your_benchmark/reconstruct.sh
0187 - fi
0188 - echo "Finished simulating detector response"
0189 ```
0190
0191 Finally, add an instruction to retry the simulation if it fails:
0192 ```yaml
0193 retry:
0194 max: 2
0195 when:
0196 - runner_system_failure
0197 ```
0198 The final `simulate` rule should look like this:
0199 ```yaml
0200 your_benchmark:simulate:
0201 extends: .phy_benchmark
0202 stage: simulate
0203 needs: ["common:setup"]
0204 timeout: 10 hour
0205 script:
0206 - echo "I will simulate detector response here!"
0207 - config_file=benchmarks/your_benchmark/setup.config
0208 - source $config_file
0209 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0210 - echo "Using simulation campaign!"
0211 - else
0212 - echo "Grabbing raw events from XRootD and running Geant4"
0213 - bash benchmarks/your_benchmark/simulate.sh
0214 - echo "Geant4 simulations done! Starting eicrecon now!"
0215 - bash benchmarks/your_benchmark/reconstruct.sh
0216 - fi
0217 - echo "Finished simulating detector response"
0218 retry:
0219 max: 2
0220 when:
0221 - runner_system_failure
0222 ```
0223
0224 ## The "results" pipeline stage
0225
0226 The `results` stage in `config.yml` is right now just this:
0227 ```yaml
0228 your_benchmark:results:
0229 extends: .phy_benchmark
0230 stage: collect
0231 script:
0232 - echo "I will collect results here!"
0233 ```
0234
0235 Specify that we need to finish the simulate stage first:
0236 ```yaml
0237 needs:
0238 - ["your_benchmark:simulate"]
0239 ```
0240
0241 Now make two directories to contain output from the benchmark analysis and source `setup.config` again:
0242 ```yaml
0243 - mkdir -p results/your_benchmark
0244 - mkdir -p benchmark_output
0245 - config_file=benchmarks/your_benchmark/setup.config
0246 - source $config_file
0247 ```
0248
0249 If using the simulation campaign, we can request the rho mass benchmark with snakemake. Once snakemake has finished creating the benchmark figures, we copy them over to `results/your_benchmark/` in order to make them into artifacts:
0250 ```yaml
0251 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0252 - echo "Using simulation campaign!"
0253 - snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0254 - cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0255 ```
0256
0257 If not using the simulation campaign, we can just run the `analyze.sh` script and copy the results into `results/your_benchmark/` in order to make them into artifacts:
0258 ```yaml
0259 - else
0260 - echo "Not using simulation campaign!"
0261 - bash benchmarks/your_benchmark/analyze.sh
0262 - cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0263 - fi
0264 - echo "Finished copying!"
0265 ```
0266
0267 Your final `config.yml` should look like:
0268 ```yaml
0269 your_benchmark:compile:
0270 extends: .phy_benchmark
0271 stage: compile
0272 script:
0273 - echo "You can compile your code here!"
0274
0275 your_benchmark:simulate:
0276 extends: .phy_benchmark
0277 stage: simulate
0278 needs: ["common:setup"]
0279 timeout: 10 hour
0280 script:
0281 - echo "Simulating everything here!"
0282 - config_file=benchmarks/your_benchmark/setup.config
0283 - source $config_file
0284 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0285 - echo "Using simulation campaign!"
0286 - else
0287 - echo "Grabbing raw events from XRootD and running Geant4"
0288 - bash benchmarks/your_benchmark/simulate.sh
0289 - echo "Geant4 simulations done! Starting eicrecon now!"
0290 - bash benchmarks/your_benchmark/reconstruct.sh
0291 - fi
0292 - echo "Finished simulating detector response"
0293 retry:
0294 max: 2
0295 when:
0296 - runner_system_failure
0297
0298 your_benchmark:results:
0299 extends: .phy_benchmark
0300 stage: collect
0301 script:
0302 - echo "I will collect results here!"
0303 needs:
0304 - ["your_benchmark:simulate"]
0305 script:
0306 - mkdir -p results/your_benchmark
0307 - mkdir -p benchmark_output
0308 - config_file=benchmarks/your_benchmark/setup.config
0309 - source $config_file
0310 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0311 - echo "Using simulation campaign!"
0312 - snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0313 - cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0314 - else
0315 - echo "Not using simulation campaign!"
0316 - bash benchmarks/your_benchmark/analyze.sh
0317 - cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0318 - fi
0319 - echo "Finished copying!"
0320 ```
0321
0322 ## Testing Real Pipelines
0323
0324 We've set up our benchmark to do some real analysis! As a first test, let's make sure we're still running only over the simulation campaign. The `USE_SIMULATION_CAMPAIGN` in `setup.config` should be set to true.
0325
0326 Now let's add our changes and push them to GitHub!
0327
0328 ```bash
0329 git status
0330 ```
0331 This command should show something like this:
0332 
0333
0334 Now add all our changes:
0335 ```bash
0336 git add Snakefile
0337 git add benchmarks/your_benchmark/config.yml
0338 git add benchmarks/your_benchmark/Snakefile
0339 git add benchmarks/your_benchmark/analysis/uchannelrho.cxx
0340 git add benchmarks/your_benchmark/analyze.sh
0341 git add benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C
0342 git add benchmarks/your_benchmark/macros/RiceStyle.h
0343 git add benchmarks/your_benchmark/reconstruct.sh
0344 git add benchmarks/your_benchmark/setup.config
0345 git add benchmarks/your_benchmark/simulate.sh
0346
0347 git commit -m "I'm beefing up my benchmark!"
0348 git push origin pr/your_benchmark_<mylastname>
0349 ```
0350
0351 Now monitor the pipeline you created:
0352 - [physics benchmark pipelines](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/pipelines)
0353 - [detector benchmark pipleines](https://eicweb.phy.anl.gov/EIC/benchmarks/detector_benchmarks/-/pipelines)
0354