tutorial-developing-benchmarks/_episodes/04-status_flags.md

0001 ---
0002 title: "Exercise 4: Adding a Status Flag"
0003 teaching: 10
0004 exercises: 10
0005 questions:
0006 - "How can your benchmark indicate that there were detrimental changes to software or detector design?"
0007 objectives:
0008 - "Learn what a status flag is and how to add one to your benchmark"
0009 keypoints:
0010 - "Status flags are used to indicate detrimental changes to software/detector design"
0011 - "Add a status flag to your benchmark to alert developers to changes in performance"
0012 ---
0013
0014 We've created a benchmark and tested it with GitLab's CI tools. Now let's explore one of the tools available to us to alert fellow developers when there has been a detrimental change in performance for your benchmark.
0015
0016 ## What is a Status Flag?
0017
0018 A typical benchmark might have 5 to 20 figures which each may or may not be useful in understanding detector and algorithm performance. Developers need rapid feedback when making changes to the tracking software, changing the detector geometry and so on.
0019
0020 As a benchmark developer, the way you can design this into your benchmark is with a status flag. Status flags are binary pass/fail flags which are summarized at the end of a pipeline. These allow other developers to quickly identify any detrimental changes they may have made to the EIC software environment.
0021
0022 At the completion of one of GitLab's pipelines, the status flags from each benchmark are gathered and summarized like this one:
0023 <img src="{{ page.root }}/fig/example_status.png" alt="Status flag example" width="500">
0024
0025 You can think about what quantities might be relevant to monitor. For example, since the u-channel rho benchmark is being used to evaluate the performance of the B0 trackers, this benchmark has a status flag assigned to the efficiency of rho reconstruction within the B0. In the April campaign, this efficiency was observed to be at roughly 95%. A flag was set such that if the efficiency dropped below 90%, it would indicate notable degredation of the performance of far-forward tracking.
0026
0027 Depending on your observable, you might set a status flag on:
0028 - the mass width of a reconstructed particle
0029 - reconstructed momentum resolution
0030 - energy resolution in a calorimeter
0031
0032 Just remember that a status flag that is raised too often stops being alarming to developers. So try to leave some margin for error, and check in on the benchmark's performance every so often.
0033
0034 ## Adding a Status Flag to Your Benchmark
0035
0036 To add a status flag, first define a function to set the benchmark status. In this example, the following function was added to the plotting macro `benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C`:
0037
0038 ```c++
0039 ///////////// Set benchmark status!
0040 int setbenchstatus(double eff){
0041         // create our test definition
0042         common_bench::Test rho_reco_eff_test{
0043           {
0044             {"name", "rho_reconstruction_efficiency"},
0045             {"title", "rho Reconstruction Efficiency for rho -> pi+pi- in the B0"},
0046             {"description", "u-channel rho->pi+pi- reconstruction efficiency "},
0047             {"quantity", "efficiency"},
0048             {"target", "0.9"}
0049           }
0050         };
0051         //this need to be consistent with the target above
0052         double eff_target = 0.9;
0053
0054         if(eff<0 || eff>1){
0055           rho_reco_eff_test.error(-1);
0056         }else if(eff > eff_target){
0057           rho_reco_eff_test.pass(eff);
0058         }else{
0059           rho_reco_eff_test.fail(eff);
0060         }
0061
0062         // write out our test data
0063         common_bench::write_test(rho_reco_eff_test, "./benchmark_output/u_rho_eff.json");
0064         return 0;
0065 }
0066 ```
0067
0068 We also have to include the appropriate header. At the top of `plot_benchmark.C`, please also add:
0069 ```c++
0070 #include "common_bench/benchmark.h"
0071 ```
0072
0073 In the main plotting function, the reconstruction efficiency is calculated, then compared against the target:
0074 ```c++
0075 minbineff = h_VM_mass_MC_etacut->FindBin(0.6);
0076 maxbineff = h_VM_mass_MC_etacut->FindBin(1.0);
0077 double reconstuctionEfficiency = (1.0*h_VM_mass_REC_etacut->Integral(minbineff,maxbineff))/(1.0*h_VM_mass_MC_etacut->Integral(minbineff,maxbineff));
0078 //set the benchmark status:
0079 setbenchstatus(reconstuctionEfficiency);
0080 ```
0081
0082 Now every time the plotting macro is run, it will generate the `json` file `benchmark_output/u_rho_eff.json` with this status flag. In order propagate this flag through the pipeline, you need also to create a top-level `json` file which will collect all status flags in your benchmark.
0083
0084 In your benchmark directory, create a file titled `benchmark.json`, or copy [this one](https://github.com/eic/tutorial-developing-benchmarks/blob/gh-pages/files/benchmark.json). The file should contain a name and title for your benchmark, as well as a description:
0085 ```json
0086 {
0087   "name": "YOUR BENCHMARK NAME",
0088   "title": "YOUR BENCHMARK TITLE",
0089   "description": "Benchmark for ...",
0090   "target": "0.9"
0091 }
0092 ```
0093
0094 To keep the status flags as artifacts, also add these lines to the end of the `results` rule in your `config.yml`
0095 ```yml
0096     - echo "Finished, copying over json now"
0097     - cp benchmark_output/u_rho_eff.json results/your_benchmark/
0098     - echo "Finished copying!"
0099 ```
0100
0101 The status flags from your benchmark should all collected and summarized in this stage of the pipeline too. To do this, include the following lines at the end of the stage:
0102 ```yml
0103      - collect_tests.py your_benchmark
0104      - echo "Finished collecting!"
0105 ```
0106
0107 Now push to GitHub!
0108 ```bash
0109 git add benchmarks/your_benchmark/config.yml
0110 git add benchmarks/your_benchmark/macros/plot_rho_physics_benchmark.C
0111 git add benchmarks/your_benchmark/benchmark.json
0112
0113 git commit -m "added a status flag!"
0114 git push origin pr/your_benchmark_<mylastname>
0115 ```
0116
0117 Check the pipelines:
0118 - [physics benchmarks](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/pipelines)
0119 - [detector benchmarks](https://eicweb.phy.anl.gov/EIC/benchmarks/detector_benchmarks/-/pipelines)
0120
0121
0122 > Exercise
0123 > - Try to identify several places where the status flag information is kept. It may take a while for these to run, so [check this example pipeline](https://eicweb.phy.anl.gov/EIC/benchmarks/physics_benchmarks/-/pipelines/103909).
0124 {: .challenge}
0125
0126
0127
0128
0129
0130