Par04/training_vae/README.md

0001 This repository contains the set of scripts used to train, generate and validate the generative model used
0002 in this example.
0003
0004 - root2h5_for_vae.py: translation of ROOT file with showers to h5 files usable in the VAE model.
0005 - root2h5.py: translation of ROOT file with showers to h5 files, more general version (recommended). It allows to simulate non-discrete energies and stores showers in 3D tensors (R x phi x z).
0006 - core/constants.py: defines the set of common variables.
0007 - core/model.py: defines the VAE model class and a handler to construct the model.
0008 - utils/preprocess.py: defines the data loading and preprocessing functions.
0009 - utils/hyperparameter_tuner.py: defines the HyperparameterTuner class.
0010 - utils/gpu_limiter.py: defines a logic responsible for GPU memory management.
0011 - utils/observables.py: defines a set of observable possibly calculated from a shower.
0012 - utils/plotter.py: defines plotting classes responsible for manufacturing various plots of observables.
0013 - train.py: performs model training.
0014 - generate.py: generate showers using a saved VAE model.
0015 - observables.py: defines a set of shower observables.
0016 - validate.py: creates validation plots using shower observables.
0017 - convert.py: defines the conversion function to an ONNX file.
0018 - tune_model.py: performs hyperparameters optimization.
0019
0020 ## Getting Started
0021
0022 `setup.py` script creates necessary folders used to save model checkpoints, generate showers and validation plots.
0023
0024 ```
0025 python3 setup.py
0026 ```
0027
0028 ## Full simulation dataset
0029
0030 The full simulation dataset can be downloaded from/linked to [Zenodo](https://zenodo.org/record/6082201#.Ypo5UeDRaL4).
0031
0032 If custom simulation is used, the output of full simulation must be translated to h5 files using `root2h5_for_vae.py` script. This file is recommended for use with the provided VAE model. For all other usecases script `root2h5.py` is recommended, as it does not assume that simulation is run with discrete energies (e.g. GPS can be used within Geant4 simulation instead of the particle gun).
0033
0034 ## Training
0035
0036 In order to launch the training:
0037
0038 ```
0039 python3 train.py
0040 ```
0041
0042 You may specify those three following flags. If you do not, then default values will be used.
0043
0044 ```--max-gpu-memory-allocation``` specifies a maximum memory allocation on a single, logic GPU unit. Should be given as
0045 an integer.
0046
0047 ```--gpu-ids``` specifies IDs of physical GPUs. Should be given as a string, separated with comas, no spaces.
0048 If you specify more than one GPU then automatically ```tf.distribute.MirroredStrategy``` will be applied to the
0049 training.
0050
0051 ```--study-name``` specifies a study name. This name is used as an experiment name in W&B dashboard and as a name of
0052 directory for saving models.
0053
0054 See ```run.sh``` and ```condor.sub``` for training on HTCondor.
0055 Note: wandb api key is hardcoded and needs to be added manually in order to log stats to weights and biases.
0056
0057 ## Hyperparameters tuning
0058
0059 If you want to tune hyperparameters, specify in `tune_model.py` parameters to be tuned. There are three types of
0060 parameters: discrete, continuous and categorical. Discrete and continuous require range specification (low, high), while
0061 the categorical parameter requires a list of possible values to be chosen. Then run it with:
0062
0063 ```
0064 python3 tune_model.py
0065 ```
0066
0067 If you want to parallelize tuning process you need to specify a common storage (preferable MySQL database) by
0068 setting `--storage="URL_TO_MYSQL_DATABASE"`. Then you can run multiple processes with the same command:
0069
0070 ```
0071 python3 tune_model.py --storage="URL_TO_MYSQL_DATABASE"
0072 ```
0073
0074 Similarly to training procedure, you may specify ```--max-gpu-memory-allocation```, ```--gpu-ids``` and
0075 ```--study-name```.
0076
0077 ## ML shower generation (MLFastSim)
0078
0079 In order to generate showers using the ML model, use `generate.py` script and specify information of geometry, energy
0080 and angle of the particle and the epoch of the saved checkpoint model. The number of events to generate can also be
0081 specified (by default is set to 10.000):
0082
0083 ```
0084 python3 generate.py --geometry=SiW --energy=64 --angle=90 --epoch=1000 --study-name=YOUR_STUDY_NAME
0085 ```
0086
0087 If you do not specify an epoch number the based model (saved as ```VAEbest```) will be used for shower generation.
0088
0089 ## Validation
0090
0091 In order to validate the MLFastSim and the full simulation, use `validate.py` script and specify information of
0092 geometry, energy and angle of the particle:
0093
0094 ```
0095 python3 validate.py --geometry=SiW --energye=64 --angle=90
0096 ```
0097
0098 ## Conversion
0099
0100 After training and validation, the model can be converted into a format that can be used in C++, such as ONNX,
0101 use `convert.py` script:
0102
0103 ```
0104 python3 convert.py --epoch 1000
0105 ```