Back to home page

EIC code displayed by LXR

 
 

    


Warning, /tutorial-developing-benchmarks/_episodes/03-filling_out_your_benchmark.md is written in an unsupported language. File is not indexed.

0001 ---
0002 title: "Exercise 3: Filling out your benchmark"
0003 teaching: 20
0004 exercises: 10
0005 questions: How do we fill in each stage of the benchmark pipeline?
0006 objectives:
0007 - "Fill out the many steps of your benchmark"
0008 - "Collect templates for the benchmark stages"
0009 keypoints:
0010 - "Create `setup.config` to switch between using the simulation campaign and re-simulating events"
0011 - "Each stage of the benchmark pipeline is defined in `config.yml`"
0012 - "`config.yml` takes normal bash scripts as input"
0013 - "Copy resulting figures over to the `results` directory to turn them into artifacts"
0014 ---
0015 
0016 In this lesson we will be beefing up our benchmark by filling out several of the pipeline stages.
0017 
0018 ## Setting up
0019 
0020 Before filling out the stages for GitLab's CI and pipelines, we want to first create a file that contains some settings used by our benchmark.
0021 
0022 Create a new file: [`benchmarks/your_benchmark/setup.config`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/setup.config) with the following contents
0023 ```bash
0024 #!/bin/bash
0025 source strict-mode.sh
0026 
0027 USE_SIMULATION_CAMPAIGN=true
0028 
0029 N_EVENTS=100
0030 
0031 FILE_BASE=sim_output/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree
0032 INPUT_FILE=root://dtn-eic.jlab.org//volatile/eic/EPIC/EVGEN/EXCLUSIVE/UCHANNEL_RHO/10x100/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree.root
0033 OUTPUT_FILE=${FILE_BASE}.detectorsim.root
0034 
0035 REC_FILE_BASE=${FILE_BASE}.detectorsim.edm4eic
0036 REC_FILE=${REC_FILE_BASE}.root
0037 ```
0038 The `export ENV_MODE=eicweb` lets our Snakefile know to use the paths for running on eicweb.
0039 
0040 Here we've defined a switch `USE_SIMULATION_CAMPAIGN` which will allow us to alternate between using output from the simulation campaign, and dynamically simulating new events.
0041 
0042 When not using the simulation campaign, the `N_EVENTS` variable defines how many events the benchmark should run.
0043 The rest of these variables define file names to be used in the benchmark.
0044 
0045 Also create a new file [`benchmarks/your_benchmark/simulate.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/simulate.sh) with the following contents:
0046 ```bash
0047 #!/bin/bash
0048 source strict-mode.sh
0049 source benchmarks/your_benchmark/setup.config $*
0050 
0051 if [ -f ${INPUT_FILE} ]; then
0052   echo "ERROR: Input simulation file does ${INPUT_FILE} not exist."
0053 else
0054   echo "GOOD: Input simulation file ${INPUT_FILE} exists!"
0055 fi
0056 
0057 # Simulate
0058 ddsim --runType batch \
0059       -v WARNING \
0060       --numberOfEvents ${N_EVENTS} \
0061       --part.minimalKineticEnergy 100*GeV  \
0062       --filter.tracker edep0 \
0063       --compactFile ${DETECTOR_PATH}/${DETECTOR_CONFIG}.xml \
0064       --inputFiles ${INPUT_FILE} \
0065       --outputFile  ${OUTPUT_FILE}
0066 if [[ "$?" -ne "0" ]] ; then
0067   echo "ERROR running ddsim"
0068   exit 1
0069 fi
0070 ```
0071 
0072 This script uses ddsim to simulate the detector response to your benchmark events.
0073 
0074 Create a script named [`benchmarks/your_benchmark/reconstruct.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/reconstruct.sh) to manage the reconstruction:
0075 ```bash
0076 #!/bin/bash
0077 source strict-mode.sh
0078 source benchmarks/your_benchmark/setup.config $*
0079 
0080 # Reconstruct
0081 if [ ${RECO} == "eicrecon" ] ; then
0082   eicrecon ${OUTPUT_FILE} -Ppodio:output_file=${REC_FILE}
0083   if [[ "$?" -ne "0" ]] ; then
0084     echo "ERROR running eicrecon"
0085     exit 1
0086   fi
0087 fi
0088 
0089 if [[ ${RECO} == "juggler" ]] ; then
0090   gaudirun.py options/reconstruction.py || [ $? -eq 4 ]
0091   if [ "$?" -ne "0" ] ; then
0092     echo "ERROR running juggler"
0093     exit 1
0094   fi
0095 fi
0096 
0097 if [ -f jana.dot ] ; then cp jana.dot ${REC_FILE_BASE}.dot ; fi
0098 
0099 #rootls -t ${REC_FILE_BASE}.tree.edm4eic.root
0100 rootls -t ${REC_FILE}
0101 ```
0102 
0103 Create a file called [`benchmarks/your_benchmark/analyze.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/analyze.sh) which will run the analysis and plotting scripts:
0104 ```bash
0105 #!/bin/bash
0106 source strict-mode.sh
0107 source benchmarks/your_benchmark/setup.config $*
0108 
0109 OUTPUT_PLOTS_DIR=sim_output/nocampaign
0110 mkdir -p ${OUTPUT_PLOTS_DIR}
0111 # Analyze
0112 command time -v \
0113 root -l -b -q "benchmarks/your_benchmark/analysis/uchannelrho.cxx(\"${REC_FILE}\",\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0114 if [[ "$?" -ne "0" ]] ; then
0115   echo "ERROR analysis failed"
0116   exit 1
0117 fi
0118 
0119 if [ ! -d "${OUTPUT_PLOTS_DIR}/plots_figures" ]; then
0120     mkdir "${OUTPUT_PLOTS_DIR}/plots_figures"
0121     echo "${OUTPUT_PLOTS_DIR}/plots_figures directory created successfully."
0122 else
0123     echo "${OUTPUT_PLOTS_DIR}/plots_figures directory already exists."
0124 fi
0125 root -l -b -q "benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C(\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0126 cat benchmark_output/*.json
0127 ```
0128 
0129 Let's copy over our analysis script, our plotting macro & header, and our Snakefile:
0130 ```bash
0131 mkdir benchmarks/your_benchmark/analysis
0132 mkdir benchmarks/your_benchmark/macros
0133 
0134 cp ../starting_script/Snakefile benchmarks/your_benchmark/
0135 cp ../starting_script/analysis/uchannelrho.cxx benchmarks/your_benchmark/analysis/
0136 cp ../starting_script/macros/RiceStyle.h benchmarks/your_benchmark/macros/
0137 cp ../starting_script/macros/plot_rho_physics_benchmark.C benchmarks/your_benchmark/macros/
0138 ```
0139 
0140 
0141 
0142 Your benchmark directory should now look like this: 
0143 ![Add a title]({{ page.root }}/fig/your_bench_dir_new.png) 
0144 
0145 In order to use your Snakefile, let GitLab know it's there. Open the main `Snakefile`, NOT this one `benchmarks/your_benchmark/Snakefile`, but the one at the same level as the `benchmarks` directory.
0146 
0147 Go to the very end of the file and include a path to your own Snakefile:
0148 ```python
0149 include: "benchmarks/diffractive_vm/Snakefile"
0150 include: "benchmarks/dis/Snakefile"
0151 include: "benchmarks/demp/Snakefile"
0152 include: "benchmarks/your_benchmark/Snakefile"
0153 ```
0154 
0155 Once that's all setup, we can move on to actually adding these to our pipeline!
0156 
0157 ## The "simulate" pipeline stage
0158 We now fill out the `simulate` stage in GitLab's pipelines. Currently the instructions for this rule should be contained in `benchmarks/your_benchmark/config.yml` as: 
0159 ```yaml
0160 your_benchmark:simulate:
0161   extends: .phy_benchmark
0162   stage: simulate
0163   script:
0164     - echo "I will simulate detector response here!"
0165 ```
0166 
0167 In order to make sure the previous stages finish before this one starts, add a new line below `stage:simulate`: `needs: ["common:setup"]`.
0168 
0169 This step can take a long time if you simulate too many events. So let's add an upper limit on the allowed run time of 10 hours:
0170 In a new line below `needs: ["common:setup"]`, add this: `timeout: 10 hour`.
0171 
0172 Now in the `script` section of the rule, add two new lines to source the `setup.config` file:
0173 ```yaml
0174     - config_file=benchmarks/your_benchmark/setup.config
0175     - source $config_file
0176 ```
0177 
0178 Add instructions that if using the simulation campaign you can skip detector simulations. Otherwise simulate
0179 ```yaml
0180     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0181     -     echo "Using simulation campaign so skipping this step!"
0182     - else
0183     -     echo "Grabbing raw events from XRootD and running Geant4"
0184     -     bash benchmarks/your_benchmark/simulate.sh
0185     -     echo "Geant4 simulations done! Starting eicrecon now!"
0186     -     bash benchmarks/your_benchmark/reconstruct.sh
0187     - fi
0188     - echo "Finished simulating detector response"
0189 ```
0190 
0191 Finally, add an instruction to retry the simulation if it fails:
0192 ```yaml
0193   retry:
0194     max: 2
0195     when:
0196       - runner_system_failure
0197 ```
0198 The final `simulate` rule should look like this:
0199 ```yaml
0200 your_benchmark:simulate:
0201   extends: .phy_benchmark
0202   stage: simulate
0203   needs: ["common:setup"]
0204   timeout: 10 hour
0205   script:
0206     - echo "I will simulate detector response here!"
0207     - config_file=benchmarks/your_benchmark/setup.config
0208     - source $config_file
0209     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0210     -     echo "Using simulation campaign!"
0211     - else
0212     -     echo "Grabbing raw events from XRootD and running Geant4"
0213     -     bash benchmarks/your_benchmark/simulate.sh
0214     -     echo "Geant4 simulations done! Starting eicrecon now!"
0215     -     bash benchmarks/your_benchmark/reconstruct.sh
0216     - fi
0217     - echo "Finished simulating detector response"
0218   retry:
0219     max: 2
0220     when:
0221       - runner_system_failure
0222 ```
0223 
0224 ## The "results" pipeline stage
0225 
0226 The `results` stage in `config.yml` is right now just this:
0227 ```yaml
0228 your_benchmark:results:
0229   extends: .phy_benchmark
0230   stage: collect
0231   script:
0232     - echo "I will collect results here!"
0233 ```
0234 
0235 Specify that we need to finish the simulate stage first:
0236 ```yaml
0237   needs:
0238     - ["your_benchmark:simulate"]
0239 ```
0240 
0241 Now make two directories to contain output from the benchmark analysis and source `setup.config` again:
0242 ```yaml
0243     - mkdir -p results/your_benchmark
0244     - mkdir -p benchmark_output
0245     - config_file=benchmarks/your_benchmark/setup.config
0246     - source $config_file
0247 ```
0248 
0249 If using the simulation campaign, we can request the rho mass benchmark with snakemake. Once snakemake has finished creating the benchmark figures, we copy them over to `results/your_benchmark/` in order to make them into artifacts:
0250 ```yaml
0251     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0252     -     echo "Using simulation campaign!"
0253     -     snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0254     -     cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0255 ```
0256 
0257 If not using the simulation campaign, we can just run the `analyze.sh` script and copy the results into `results/your_benchmark/` in order to make them into artifacts:
0258 ```yaml
0259     - else
0260     -     echo "Not using simulation campaign!"
0261     -     bash benchmarks/your_benchmark/analyze.sh
0262     -     cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0263     - fi
0264     - echo "Finished copying!"
0265 ```
0266 
0267 Your final `config.yml` should look like:
0268 ```yaml
0269 your_benchmark:compile:
0270   extends: .phy_benchmark 
0271   stage: compile
0272   script:
0273     - echo "You can compile your code here!"
0274 
0275 your_benchmark:simulate:
0276   extends: .phy_benchmark
0277   stage: simulate
0278   needs: ["common:setup"]
0279   timeout: 10 hour
0280   script:
0281     - echo "Simulating everything here!"
0282     - config_file=benchmarks/your_benchmark/setup.config
0283     - source $config_file
0284     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0285     -     echo "Using simulation campaign!"
0286     - else
0287     -     echo "Grabbing raw events from XRootD and running Geant4"
0288     -     bash benchmarks/your_benchmark/simulate.sh
0289     -     echo "Geant4 simulations done! Starting eicrecon now!"
0290     -     bash benchmarks/your_benchmark/reconstruct.sh
0291     - fi
0292     - echo "Finished simulating detector response"
0293   retry:
0294     max: 2
0295     when:
0296       - runner_system_failure
0297 
0298 your_benchmark:results:
0299   extends: .phy_benchmark
0300   stage: collect
0301   script:
0302     - echo "I will collect results here!"
0303   needs:
0304     - ["your_benchmark:simulate"]
0305   script:
0306     - mkdir -p results/your_benchmark
0307     - mkdir -p benchmark_output
0308     - config_file=benchmarks/your_benchmark/setup.config
0309     - source $config_file
0310     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0311     -     echo "Using simulation campaign!"
0312     -     snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0313     -     cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0314     - else
0315     -     echo "Not using simulation campaign!"
0316     -     bash benchmarks/your_benchmark/analyze.sh
0317     -     cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0318     - fi
0319     - echo "Finished copying!"
0320 ```
0321 
0322 ## Testing Real Pipelines
0323 
0324 We've set up our benchmark to do some real analysis! As a first test, let's make sure we're still running only over the simulation campaign. The `USE_SIMULATION_CAMPAIGN` in `setup.config` should be set to true.
0325 
0326 Now let's add our changes and push them to GitHub!
0327 
0328 ```bash
0329 git status
0330 ```
0331 This command should show something like this:
0332 ![Add a title]({{ page.root }}/fig/gitstatus_example.png) 
0333 
0334 Now add all our changes:
0335 ```bash
0336 git add Snakefile
0337 git add benchmarks/your_benchmark/config.yml
0338 git add benchmarks/your_benchmark/Snakefile
0339 git add benchmarks/your_benchmark/analysis/uchannelrho.cxx 
0340 git add benchmarks/your_benchmark/analyze.sh
0341 git add benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C 
0342 git add benchmarks/your_benchmark/macros/RiceStyle.h 
0343 git add benchmarks/your_benchmark/reconstruct.sh
0344 git add benchmarks/your_benchmark/setup.config
0345 git add benchmarks/your_benchmark/simulate.sh
0346 
0347 git commit -m "I'm beefing up my benchmark!"
0348 git push origin pr/your_benchmark_<mylastname>
0349 ```
0350 
0351 Now monitor the pipeline you created:
0352 - [physics benchmark pipelines](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/pipelines)
0353 - [detector benchmark pipleines](https://eicweb.phy.anl.gov/EIC/benchmarks/detector_benchmarks/-/pipelines)
0354