Warning, /tutorial-developing-benchmarks/_episodes/03-filling_out_your_benchmark.md is written in an unsupported language. File is not indexed.
0001 ---
0002 title: "Exercise 3: Filling out your benchmark"
0003 teaching: 20
0004 exercises: 10
0005 questions: How do we fill in each stage of the benchmark pipeline?
0006 objectives:
0007 - "Fill out the many steps of your benchmark"
0008 - "Collect templates for the benchmark stages"
0009 keypoints:
0010 - "Create `setup.config` to switch between using the simulation campaign and re-simulating events"
0011 - "Each stage of the benchmark pipeline is defined in `config.yml`"
0012 - "`config.yml` takes normal bash scripts as input"
0013 - "Copy resulting figures over to the `results` directory to turn them into artifacts"
0014 ---
0015
0016 In this lesson we will be beefing up our benchmark by filling out several of the pipeline stages.
0017
0018 ## Setting up
0019
0020 Before filling out the stages for GitLab's CI and pipelines, we want to first create a file that contains some settings used by our benchmark.
0021
0022 Create a new file: [`benchmarks/your_benchmark/setup.config`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/setup.config) with the following contents
0023 ```bash
0024 #!/bin/bash
0025 source strict-mode.sh
0026
0027 export ENV_MODE=eicweb
0028
0029 USE_SIMULATION_CAMPAIGN=true
0030
0031 N_EVENTS=100
0032
0033 FILE_BASE=sim_output/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree
0034 INPUT_FILE=root://dtn-eic.jlab.org//work/eic2/EPIC/EVGEN/EXCLUSIVE/UCHANNEL_RHO/10x100/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree.root
0035 OUTPUT_FILE=${FILE_BASE}.detectorsim.root
0036
0037 REC_FILE_BASE=${FILE_BASE}.detectorsim.edm4eic
0038 REC_FILE=${REC_FILE_BASE}.root
0039 ```
0040 The `export ENV_MODE=eicweb` lets our Snakefile know to use the paths for running on eicweb.
0041
0042 Here we've defined a switch `USE_SIMULATION_CAMPAIGN` which will allow us to alternate between using output from the simulation campaign, and dynamically simulating new events.
0043
0044 When not using the simulation campaign, the `N_EVENTS` variable defines how many events the benchmark should run.
0045 The rest of these variables define file names to be used in the benchmark.
0046
0047 Also create a new file [`benchmarks/your_benchmark/simulate.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/simulate.sh) with the following contents:
0048 ```bash
0049 #!/bin/bash
0050 source strict-mode.sh
0051 source benchmarks/your_benchmark/setup.config $*
0052
0053 if [ -f ${INPUT_FILE} ]; then
0054 echo "ERROR: Input simulation file does ${INPUT_FILE} not exist."
0055 else
0056 echo "GOOD: Input simulation file ${INPUT_FILE} exists!"
0057 fi
0058
0059 # Simulate
0060 ddsim --runType batch \
0061 -v WARNING \
0062 --numberOfEvents ${N_EVENTS} \
0063 --part.minimalKineticEnergy 100*GeV \
0064 --filter.tracker edep0 \
0065 --compactFile ${DETECTOR_PATH}/${DETECTOR_CONFIG}.xml \
0066 --inputFiles ${INPUT_FILE} \
0067 --outputFile ${OUTPUT_FILE}
0068 if [[ "$?" -ne "0" ]] ; then
0069 echo "ERROR running ddsim"
0070 exit 1
0071 fi
0072 ```
0073
0074 This script uses ddsim to simulate the detector response to your benchmark events.
0075
0076 Create a script named [`benchmarks/your_benchmark/reconstruct.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/reconstruct.sh) to manage the reconstruction:
0077 ```bash
0078 #!/bin/bash
0079 source strict-mode.sh
0080 source benchmarks/your_benchmark/setup.config $*
0081
0082 # Reconstruct
0083 if [ ${RECO} == "eicrecon" ] ; then
0084 eicrecon ${OUTPUT_FILE} -Ppodio:output_file=${REC_FILE}
0085 if [[ "$?" -ne "0" ]] ; then
0086 echo "ERROR running eicrecon"
0087 exit 1
0088 fi
0089 fi
0090
0091 if [[ ${RECO} == "juggler" ]] ; then
0092 gaudirun.py options/reconstruction.py || [ $? -eq 4 ]
0093 if [ "$?" -ne "0" ] ; then
0094 echo "ERROR running juggler"
0095 exit 1
0096 fi
0097 fi
0098
0099 if [ -f jana.dot ] ; then cp jana.dot ${REC_FILE_BASE}.dot ; fi
0100
0101 #rootls -t ${REC_FILE_BASE}.tree.edm4eic.root
0102 rootls -t ${REC_FILE}
0103 ```
0104
0105 Create a file called [`benchmarks/your_benchmark/analyze.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/analyze.sh) which will run the analysis and plotting scripts:
0106 ```bash
0107 #!/bin/bash
0108 source strict-mode.sh
0109 source benchmarks/your_benchmark/setup.config $*
0110
0111 OUTPUT_PLOTS_DIR=sim_output/nocampaign
0112 mkdir -p ${OUTPUT_PLOTS_DIR}
0113 # Analyze
0114 command time -v \
0115 root -l -b -q "benchmarks/your_benchmark/analysis/uchannelrho.cxx(\"${REC_FILE}\",\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0116 if [[ "$?" -ne "0" ]] ; then
0117 echo "ERROR analysis failed"
0118 exit 1
0119 fi
0120
0121 if [ ! -d "${OUTPUT_PLOTS_DIR}/plots_figures" ]; then
0122 mkdir "${OUTPUT_PLOTS_DIR}/plots_figures"
0123 echo "${OUTPUT_PLOTS_DIR}/plots_figures directory created successfully."
0124 else
0125 echo "${OUTPUT_PLOTS_DIR}/plots_figures directory already exists."
0126 fi
0127 root -l -b -q "benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C(\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0128 cat benchmark_output/*.json
0129 ```
0130
0131 Let's copy over our analysis script, our plotting macro & header, and our Snakefile:
0132 ```bash
0133 mkdir benchmarks/your_benchmark/analysis
0134 mkdir benchmarks/your_benchmark/macros
0135
0136 cp ../starting_script/Snakefile benchmarks/your_benchmark/
0137 cp ../starting_script/analysis/uchannelrho.cxx benchmarks/your_benchmark/analysis/
0138 cp ../starting_script/macros/RiceStyle.h benchmarks/your_benchmark/macros/
0139 cp ../starting_script/macros/plot_rho_physics_benchmark.C benchmarks/your_benchmark/macros/
0140 ```
0141
0142
0143
0144 Your benchmark directory should now look like this:
0145 ![Add a title]({{ page.root }}/fig/your_bench_dir_new.png)
0146
0147 In order to use your Snakefile, let GitLab know it's there. Open the main `Snakefile`, NOT this one `benchmarks/your_benchmark/Snakefile`, but the one at the same level as the `benchmarks` directory.
0148
0149 Go to the very end of the file and include a path to your own Snakefile:
0150 ```python
0151 include: "benchmarks/diffractive_vm/Snakefile"
0152 include: "benchmarks/dis/Snakefile"
0153 include: "benchmarks/demp/Snakefile"
0154 include: "benchmarks/your_benchmark/Snakefile"
0155 ```
0156
0157 Once that's all setup, we can move on to actually adding these to our pipeline!
0158
0159 ## The "simulate" pipeline stage
0160 We now fill out the `simulate` stage in GitLab's pipelines. Currently the instructions for this rule should be contained in `benchmarks/your_benchmark/config.yml` as:
0161 ```yaml
0162 your_benchmark:simulate:
0163 extends: .phy_benchmark
0164 stage: simulate
0165 script:
0166 - echo "I will simulate detector response here!"
0167 ```
0168
0169 In order to make sure the previous stages finish before this one starts, add a new line below `stage:simulate`: `needs: ["common:setup"]`.
0170
0171 This step can take a long time if you simulate too many events. So let's add an upper limit on the allowed run time of 10 hours:
0172 In a new line below `needs: ["common:setup"]`, add this: `timeout: 10 hour`.
0173
0174 Now in the `script` section of the rule, add two new lines to source the `setup.config` file:
0175 ```yaml
0176 - config_file=benchmarks/your_benchmark/setup.config
0177 - source $config_file
0178 ```
0179
0180 Add instructions that if using the simulation campaign you can skip detector simulations. Otherwise simulate
0181 ```yaml
0182 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0183 - echo "Using simulation campaign so skipping this step!"
0184 - else
0185 - echo "Grabbing raw events from S3 and running Geant4"
0186 - bash benchmarks/your_benchmark/simulate.sh
0187 - echo "Geant4 simulations done! Starting eicrecon now!"
0188 - bash benchmarks/your_benchmark/reconstruct.sh
0189 - fi
0190 - echo "Finished simulating detector response"
0191 ```
0192
0193 Finally, add an instruction to retry the simulation if it fails:
0194 ```yaml
0195 retry:
0196 max: 2
0197 when:
0198 - runner_system_failure
0199 ```
0200 The final `simulate` rule should look like this:
0201 ```yaml
0202 your_benchmark:simulate:
0203 extends: .phy_benchmark
0204 stage: simulate
0205 needs: ["common:setup"]
0206 timeout: 10 hour
0207 script:
0208 - echo "I will simulate detector response here!"
0209 - config_file=benchmarks/your_benchmark/setup.config
0210 - source $config_file
0211 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0212 - echo "Using simulation campaign!"
0213 - else
0214 - echo "Grabbing raw events from S3 and running Geant4"
0215 - bash benchmarks/your_benchmark/simulate.sh
0216 - echo "Geant4 simulations done! Starting eicrecon now!"
0217 - bash benchmarks/your_benchmark/reconstruct.sh
0218 - fi
0219 - echo "Finished simulating detector response"
0220 retry:
0221 max: 2
0222 when:
0223 - runner_system_failure
0224 ```
0225
0226 ## The "results" pipeline stage
0227
0228 The `results` stage in `config.yml` is right now just this:
0229 ```yaml
0230 your_benchmark:results:
0231 extends: .phy_benchmark
0232 stage: collect
0233 script:
0234 - echo "I will collect results here!"
0235 ```
0236
0237 Specify that we need to finish the simulate stage first:
0238 ```yaml
0239 needs:
0240 - ["your_benchmark:simulate"]
0241 ```
0242
0243 Now make two directories to contain output from the benchmark analysis and source `setup.config` again:
0244 ```yaml
0245 - mkdir -p results/your_benchmark
0246 - mkdir -p benchmark_output
0247 - config_file=benchmarks/your_benchmark/setup.config
0248 - source $config_file
0249 ```
0250
0251 If using the simulation campaign, we can request the rho mass benchmark with snakemake. Once snakemake has finished creating the benchmark figures, we copy them over to `results/your_benchmark/` in order to make them into artifacts:
0252 ```yaml
0253 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0254 - echo "Using simulation campaign!"
0255 - snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0256 - cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0257 ```
0258
0259 If not using the simulation campaign, we can just run the `analyze.sh` script and copy the results into `results/your_benchmark/` in order to make them into artifacts:
0260 ```yaml
0261 - else
0262 - echo "Not using simulation campaign!"
0263 - bash benchmarks/your_benchmark/analyze.sh
0264 - cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0265 - fi
0266 - echo "Finished copying!"
0267 ```
0268
0269 Your final `config.yml` should look like:
0270 ```yaml
0271 your_benchmark:compile:
0272 extends: .phy_benchmark
0273 stage: compile
0274 script:
0275 - echo "You can compile your code here!"
0276
0277 your_benchmark:simulate:
0278 extends: .phy_benchmark
0279 stage: simulate
0280 needs: ["common:setup"]
0281 timeout: 10 hour
0282 script:
0283 - echo "Simulating everything here!"
0284 - config_file=benchmarks/your_benchmark/setup.config
0285 - source $config_file
0286 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0287 - echo "Using simulation campaign!"
0288 - else
0289 - echo "Grabbing raw events from S3 and running Geant4"
0290 - bash benchmarks/your_benchmark/simulate.sh
0291 - echo "Geant4 simulations done! Starting eicrecon now!"
0292 - bash benchmarks/your_benchmark/reconstruct.sh
0293 - fi
0294 - echo "Finished simulating detector response"
0295 retry:
0296 max: 2
0297 when:
0298 - runner_system_failure
0299
0300 your_benchmark:results:
0301 extends: .phy_benchmark
0302 stage: collect
0303 script:
0304 - echo "I will collect results here!"
0305 needs:
0306 - ["your_benchmark:simulate"]
0307 script:
0308 - mkdir -p results/your_benchmark
0309 - mkdir -p benchmark_output
0310 - config_file=benchmarks/your_benchmark/setup.config
0311 - source $config_file
0312 - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0313 - echo "Using simulation campaign!"
0314 - snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0315 - cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0316 - else
0317 - echo "Not using simulation campaign!"
0318 - bash benchmarks/your_benchmark/analyze.sh
0319 - cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0320 - fi
0321 - echo "Finished copying!"
0322 ```
0323
0324 ## Testing Real Pipelines
0325
0326 We've set up our benchmark to do some real analysis! As a first test, let's make sure we're still running only over the simulation campaign. The `USE_SIMULATION_CAMPAIGN` in `setup.config` should be set to true.
0327
0328 Now let's add our changes and push them to GitHub!
0329
0330 ```bash
0331 git status
0332 ```
0333 This command should show something like this:
0334 ![Add a title]({{ page.root }}/fig/gitstatus_example.png)
0335
0336 Now add all our changes:
0337 ```bash
0338 git add Snakefile
0339 git add benchmarks/your_benchmark/config.yml
0340 git add benchmarks/your_benchmark/Snakefile
0341 git add benchmarks/your_benchmark/analysis/uchannelrho.cxx
0342 git add benchmarks/your_benchmark/analyze.sh
0343 git add benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C
0344 git add benchmarks/your_benchmark/macros/RiceStyle.h
0345 git add benchmarks/your_benchmark/reconstruct.sh
0346 git add benchmarks/your_benchmark/setup.config
0347 git add benchmarks/your_benchmark/simulate.sh
0348
0349 git commit -m "I'm beefing up my benchmark!"
0350 git push origin pr/your_benchmark_<mylastname>
0351 ```
0352
0353 Now monitor the pipeline you created:
0354 - [physics benchmark pipelines](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/pipelines)
0355 - [detector benchmark pipleines](https://eicweb.phy.anl.gov/EIC/benchmarks/detector_benchmarks/-/pipelines)
0356