Back to home page

EIC code displayed by LXR

 
 

    


Warning, /tutorial-developing-benchmarks/_episodes/03-filling_out_your_benchmark.md is written in an unsupported language. File is not indexed.

0001 ---
0002 title: "Exercise 3: Filling out your benchmark"
0003 teaching: 20
0004 exercises: 10
0005 questions: How do we fill in each stage of the benchmark pipeline?
0006 objectives:
0007 - "Fill out the many steps of your benchmark"
0008 - "Collect templates for the benchmark stages"
0009 keypoints:
0010 - "Create `setup.config` to switch between using the simulation campaign and re-simulating events"
0011 - "Each stage of the benchmark pipeline is defined in `config.yml`"
0012 - "`config.yml` takes normal bash scripts as input"
0013 - "Copy resulting figures over to the `results` directory to turn them into artifacts"
0014 ---
0015 
0016 In this lesson we will be beefing up our benchmark by filling out several of the pipeline stages.
0017 
0018 ## Setting up
0019 
0020 Before filling out the stages for GitLab's CI and pipelines, we want to first create a file that contains some settings used by our benchmark.
0021 
0022 Create a new file: [`benchmarks/your_benchmark/setup.config`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/setup.config) with the following contents
0023 ```bash
0024 #!/bin/bash
0025 source strict-mode.sh
0026 
0027 export ENV_MODE=eicweb
0028 
0029 USE_SIMULATION_CAMPAIGN=true
0030 
0031 N_EVENTS=100
0032 
0033 FILE_BASE=sim_output/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree
0034 INPUT_FILE=root://dtn-eic.jlab.org//work/eic2/EPIC/EVGEN/EXCLUSIVE/UCHANNEL_RHO/10x100/rho_10x100_uChannel_Q2of0to10_hiDiv.hepmc3.tree.root
0035 OUTPUT_FILE=${FILE_BASE}.detectorsim.root
0036 
0037 REC_FILE_BASE=${FILE_BASE}.detectorsim.edm4eic
0038 REC_FILE=${REC_FILE_BASE}.root
0039 ```
0040 The `export ENV_MODE=eicweb` lets our Snakefile know to use the paths for running on eicweb.
0041 
0042 Here we've defined a switch `USE_SIMULATION_CAMPAIGN` which will allow us to alternate between using output from the simulation campaign, and dynamically simulating new events.
0043 
0044 When not using the simulation campaign, the `N_EVENTS` variable defines how many events the benchmark should run.
0045 The rest of these variables define file names to be used in the benchmark.
0046 
0047 Also create a new file [`benchmarks/your_benchmark/simulate.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/simulate.sh) with the following contents:
0048 ```bash
0049 #!/bin/bash
0050 source strict-mode.sh
0051 source benchmarks/your_benchmark/setup.config $*
0052 
0053 if [ -f ${INPUT_FILE} ]; then
0054   echo "ERROR: Input simulation file does ${INPUT_FILE} not exist."
0055 else
0056   echo "GOOD: Input simulation file ${INPUT_FILE} exists!"
0057 fi
0058 
0059 # Simulate
0060 ddsim --runType batch \
0061       -v WARNING \
0062       --numberOfEvents ${N_EVENTS} \
0063       --part.minimalKineticEnergy 100*GeV  \
0064       --filter.tracker edep0 \
0065       --compactFile ${DETECTOR_PATH}/${DETECTOR_CONFIG}.xml \
0066       --inputFiles ${INPUT_FILE} \
0067       --outputFile  ${OUTPUT_FILE}
0068 if [[ "$?" -ne "0" ]] ; then
0069   echo "ERROR running ddsim"
0070   exit 1
0071 fi
0072 ```
0073 
0074 This script uses ddsim to simulate the detector response to your benchmark events.
0075 
0076 Create a script named [`benchmarks/your_benchmark/reconstruct.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/reconstruct.sh) to manage the reconstruction:
0077 ```bash
0078 #!/bin/bash
0079 source strict-mode.sh
0080 source benchmarks/your_benchmark/setup.config $*
0081 
0082 # Reconstruct
0083 if [ ${RECO} == "eicrecon" ] ; then
0084   eicrecon ${OUTPUT_FILE} -Ppodio:output_file=${REC_FILE}
0085   if [[ "$?" -ne "0" ]] ; then
0086     echo "ERROR running eicrecon"
0087     exit 1
0088   fi
0089 fi
0090 
0091 if [[ ${RECO} == "juggler" ]] ; then
0092   gaudirun.py options/reconstruction.py || [ $? -eq 4 ]
0093   if [ "$?" -ne "0" ] ; then
0094     echo "ERROR running juggler"
0095     exit 1
0096   fi
0097 fi
0098 
0099 if [ -f jana.dot ] ; then cp jana.dot ${REC_FILE_BASE}.dot ; fi
0100 
0101 #rootls -t ${REC_FILE_BASE}.tree.edm4eic.root
0102 rootls -t ${REC_FILE}
0103 ```
0104 
0105 Create a file called [`benchmarks/your_benchmark/analyze.sh`](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/analyze.sh) which will run the analysis and plotting scripts:
0106 ```bash
0107 #!/bin/bash
0108 source strict-mode.sh
0109 source benchmarks/your_benchmark/setup.config $*
0110 
0111 OUTPUT_PLOTS_DIR=sim_output/nocampaign
0112 mkdir -p ${OUTPUT_PLOTS_DIR}
0113 # Analyze
0114 command time -v \
0115 root -l -b -q "benchmarks/your_benchmark/analysis/uchannelrho.cxx(\"${REC_FILE}\",\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0116 if [[ "$?" -ne "0" ]] ; then
0117   echo "ERROR analysis failed"
0118   exit 1
0119 fi
0120 
0121 if [ ! -d "${OUTPUT_PLOTS_DIR}/plots_figures" ]; then
0122     mkdir "${OUTPUT_PLOTS_DIR}/plots_figures"
0123     echo "${OUTPUT_PLOTS_DIR}/plots_figures directory created successfully."
0124 else
0125     echo "${OUTPUT_PLOTS_DIR}/plots_figures directory already exists."
0126 fi
0127 root -l -b -q "benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C(\"${OUTPUT_PLOTS_DIR}/plots.root\")"
0128 cat benchmark_output/*.json
0129 ```
0130 
0131 Let's copy over our analysis script, our plotting macro & header, and our Snakefile:
0132 ```bash
0133 mkdir benchmarks/your_benchmark/analysis
0134 mkdir benchmarks/your_benchmark/macros
0135 
0136 cp ../starting_script/Snakefile benchmarks/your_benchmark/
0137 cp ../starting_script/analysis/uchannelrho.cxx benchmarks/your_benchmark/analysis/
0138 cp ../starting_script/macros/RiceStyle.h benchmarks/your_benchmark/macros/
0139 cp ../starting_script/macros/plot_rho_physics_benchmark.C benchmarks/your_benchmark/macros/
0140 ```
0141 
0142 
0143 
0144 Your benchmark directory should now look like this: 
0145 ![Add a title]({{ page.root }}/fig/your_bench_dir_new.png) 
0146 
0147 In order to use your Snakefile, let GitLab know it's there. Open the main `Snakefile`, NOT this one `benchmarks/your_benchmark/Snakefile`, but the one at the same level as the `benchmarks` directory.
0148 
0149 Go to the very end of the file and include a path to your own Snakefile:
0150 ```python
0151 include: "benchmarks/diffractive_vm/Snakefile"
0152 include: "benchmarks/dis/Snakefile"
0153 include: "benchmarks/demp/Snakefile"
0154 include: "benchmarks/your_benchmark/Snakefile"
0155 ```
0156 
0157 Once that's all setup, we can move on to actually adding these to our pipeline!
0158 
0159 ## The "simulate" pipeline stage
0160 We now fill out the `simulate` stage in GitLab's pipelines. Currently the instructions for this rule should be contained in `benchmarks/your_benchmark/config.yml` as: 
0161 ```yaml
0162 your_benchmark:simulate:
0163   extends: .phy_benchmark
0164   stage: simulate
0165   script:
0166     - echo "I will simulate detector response here!"
0167 ```
0168 
0169 In order to make sure the previous stages finish before this one starts, add a new line below `stage:simulate`: `needs: ["common:setup"]`.
0170 
0171 This step can take a long time if you simulate too many events. So let's add an upper limit on the allowed run time of 10 hours:
0172 In a new line below `needs: ["common:setup"]`, add this: `timeout: 10 hour`.
0173 
0174 Now in the `script` section of the rule, add two new lines to source the `setup.config` file:
0175 ```yaml
0176     - config_file=benchmarks/your_benchmark/setup.config
0177     - source $config_file
0178 ```
0179 
0180 Add instructions that if using the simulation campaign you can skip detector simulations. Otherwise simulate
0181 ```yaml
0182     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0183     -     echo "Using simulation campaign so skipping this step!"
0184     - else
0185     -     echo "Grabbing raw events from S3 and running Geant4"
0186     -     bash benchmarks/your_benchmark/simulate.sh
0187     -     echo "Geant4 simulations done! Starting eicrecon now!"
0188     -     bash benchmarks/your_benchmark/reconstruct.sh
0189     - fi
0190     - echo "Finished simulating detector response"
0191 ```
0192 
0193 Finally, add an instruction to retry the simulation if it fails:
0194 ```yaml
0195   retry:
0196     max: 2
0197     when:
0198       - runner_system_failure
0199 ```
0200 The final `simulate` rule should look like this:
0201 ```yaml
0202 your_benchmark:simulate:
0203   extends: .phy_benchmark
0204   stage: simulate
0205   needs: ["common:setup"]
0206   timeout: 10 hour
0207   script:
0208     - echo "I will simulate detector response here!"
0209     - config_file=benchmarks/your_benchmark/setup.config
0210     - source $config_file
0211     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0212     -     echo "Using simulation campaign!"
0213     - else
0214     -     echo "Grabbing raw events from S3 and running Geant4"
0215     -     bash benchmarks/your_benchmark/simulate.sh
0216     -     echo "Geant4 simulations done! Starting eicrecon now!"
0217     -     bash benchmarks/your_benchmark/reconstruct.sh
0218     - fi
0219     - echo "Finished simulating detector response"
0220   retry:
0221     max: 2
0222     when:
0223       - runner_system_failure
0224 ```
0225 
0226 ## The "results" pipeline stage
0227 
0228 The `results` stage in `config.yml` is right now just this:
0229 ```yaml
0230 your_benchmark:results:
0231   extends: .phy_benchmark
0232   stage: collect
0233   script:
0234     - echo "I will collect results here!"
0235 ```
0236 
0237 Specify that we need to finish the simulate stage first:
0238 ```yaml
0239   needs:
0240     - ["your_benchmark:simulate"]
0241 ```
0242 
0243 Now make two directories to contain output from the benchmark analysis and source `setup.config` again:
0244 ```yaml
0245     - mkdir -p results/your_benchmark
0246     - mkdir -p benchmark_output
0247     - config_file=benchmarks/your_benchmark/setup.config
0248     - source $config_file
0249 ```
0250 
0251 If using the simulation campaign, we can request the rho mass benchmark with snakemake. Once snakemake has finished creating the benchmark figures, we copy them over to `results/your_benchmark/` in order to make them into artifacts:
0252 ```yaml
0253     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0254     -     echo "Using simulation campaign!"
0255     -     snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0256     -     cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0257 ```
0258 
0259 If not using the simulation campaign, we can just run the `analyze.sh` script and copy the results into `results/your_benchmark/` in order to make them into artifacts:
0260 ```yaml
0261     - else
0262     -     echo "Not using simulation campaign!"
0263     -     bash benchmarks/your_benchmark/analyze.sh
0264     -     cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0265     - fi
0266     - echo "Finished copying!"
0267 ```
0268 
0269 Your final `config.yml` should look like:
0270 ```yaml
0271 your_benchmark:compile:
0272   extends: .phy_benchmark 
0273   stage: compile
0274   script:
0275     - echo "You can compile your code here!"
0276 
0277 your_benchmark:simulate:
0278   extends: .phy_benchmark
0279   stage: simulate
0280   needs: ["common:setup"]
0281   timeout: 10 hour
0282   script:
0283     - echo "Simulating everything here!"
0284     - config_file=benchmarks/your_benchmark/setup.config
0285     - source $config_file
0286     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0287     -     echo "Using simulation campaign!"
0288     - else
0289     -     echo "Grabbing raw events from S3 and running Geant4"
0290     -     bash benchmarks/your_benchmark/simulate.sh
0291     -     echo "Geant4 simulations done! Starting eicrecon now!"
0292     -     bash benchmarks/your_benchmark/reconstruct.sh
0293     - fi
0294     - echo "Finished simulating detector response"
0295   retry:
0296     max: 2
0297     when:
0298       - runner_system_failure
0299 
0300 your_benchmark:results:
0301   extends: .phy_benchmark
0302   stage: collect
0303   script:
0304     - echo "I will collect results here!"
0305   needs:
0306     - ["your_benchmark:simulate"]
0307   script:
0308     - mkdir -p results/your_benchmark
0309     - mkdir -p benchmark_output
0310     - config_file=benchmarks/your_benchmark/setup.config
0311     - source $config_file
0312     - if [ "$USE_SIMULATION_CAMPAIGN" = true ] ; then
0313     -     echo "Using simulation campaign!"
0314     -     snakemake --cores 2 ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/benchmark_rho_mass.pdf
0315     -     cp ../../sim_output/campaign_24.07.0_combined_45files.eicrecon.tree.edm4eic.plots_figures/*.pdf results/your_benchmark/
0316     - else
0317     -     echo "Not using simulation campaign!"
0318     -     bash benchmarks/your_benchmark/analyze.sh
0319     -     cp sim_output/nocampaign/plots_figures/*.pdf results/your_benchmark/
0320     - fi
0321     - echo "Finished copying!"
0322 ```
0323 
0324 ## Testing Real Pipelines
0325 
0326 We've set up our benchmark to do some real analysis! As a first test, let's make sure we're still running only over the simulation campaign. The `USE_SIMULATION_CAMPAIGN` in `setup.config` should be set to true.
0327 
0328 Now let's add our changes and push them to GitHub!
0329 
0330 ```bash
0331 git status
0332 ```
0333 This command should show something like this:
0334 ![Add a title]({{ page.root }}/fig/gitstatus_example.png) 
0335 
0336 Now add all our changes:
0337 ```bash
0338 git add Snakefile
0339 git add benchmarks/your_benchmark/config.yml
0340 git add benchmarks/your_benchmark/Snakefile
0341 git add benchmarks/your_benchmark/analysis/uchannelrho.cxx 
0342 git add benchmarks/your_benchmark/analyze.sh
0343 git add benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C 
0344 git add benchmarks/your_benchmark/macros/RiceStyle.h 
0345 git add benchmarks/your_benchmark/reconstruct.sh
0346 git add benchmarks/your_benchmark/setup.config
0347 git add benchmarks/your_benchmark/simulate.sh
0348 
0349 git commit -m "I'm beefing up my benchmark!"
0350 git push origin pr/your_benchmark_<mylastname>
0351 ```
0352 
0353 Now monitor the pipeline you created:
0354 - [physics benchmark pipelines](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/pipelines)
0355 - [detector benchmark pipleines](https://eicweb.phy.anl.gov/EIC/benchmarks/detector_benchmarks/-/pipelines)
0356