docs/development/containers.md

0001
0002 # Working with development containers <!-- {docsify-ignore-all} -->
0003
0004 ## Building a CUDA Singularity container
0005
0006 These instructions involve creating a Docker
0007 image locally and then producing a Singularity image from
0008 that. These instructions were developed on a computer running
0009 running RHEL7.9 with Docker 20.10.13 installed. Singularity
0010 version 3.9.5 was used.
0011
0012 Docker is used to access the Nvidia supplied cuda image
0013 from dockerhub as a base for the image. Several packages
0014 are installed in a new docker image mainly to allow building
0015 ROOT with the singularity container. If you do not need to
0016 use ROOT or ZeroMQ support in JANA2, then you can skip building
0017 your own Docker image and just pull the one supplied by Nvida
0018 into a Singularity image directly with:
0019
0020 ```bash
0021 singularity build cuda_11.4.2-devel-ubuntu20.04.sif docker://nvidia/cuda:11.4.2-devel-ubuntu20.04
0022 ```
0023
0024 The Dockerfile in this directory can be used to build an image
0025 that will allow building JANA2 with support for both ROOT and
0026 ZeroMQ. To build the Singularity image, execute the two commands
0027 below.
0028
0029 NOTE: This will result in a Singularity image that is about
0030 3GB. Docker claims its image takes 5.5GB. Make sure you have
0031 plenty of disk space available for both. Also, be aware that by
0032 default, Singularity uses a subdirectory in your home directory
0033 for its cache so you may run into issues there if you have limited
0034 space.
0035
0036 NOTE: You do NOT actually need to have cuda or a GPU installed
0037 on the computer where you create this image(s). You can transfer
0038 the image to a computer with one or more GPUs and the CUDA
0039 drivers installed to actually use it.
0040
0041 ```bash
0042 docker build -f Dockerfile -t epsci/cuda:11.4.2-devel-ubuntu20.04 .
0043
0044 singularity build epsci_cuda_11.4.2-devel-ubuntu20.04.sif docker-daemon://epsci/cuda:11.4.2-devel-ubuntu20.04
0045 ```
0046
0047 ## Building ROOT
0048
0049 If you are interested in building root with the container, then
0050 here are some instructions. If you don't need ROOT then skip
0051 this section.
0052
0053
0054 The following commands will checkout, build and install root
0055 version 6.26.02 in the local working directory. Note that you
0056 may be able to build this *much* faster on a ramdisk if you have
0057 enough memory. Just make sure to adjust the install location to
0058 to somewhere more permanent.
0059
0060 ```bash
0061 singularity run epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0062 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0063 mkdir root-build/
0064 cd root-build/
0065 cmake -DCMAKE_INSTALL_PREFIX=../root-6.26.02-install -DCMAKE_CXX_STANDARD=14 ../root-6.26.02
0066 make -j48 install
0067 ```
0068
0069
0070 ### TROUBLESHOOTING
0071
0072 Some centralized installations of Singularity may be configured
0073 at a system level to automatically mount one or more network drives.
0074 If the system you are using does not have access to all of these,
0075 singularity may fail to launch properly. In that case you can tell
0076 it not to bind any default directories with the "-c" option and then
0077 explictly bind the directories you need. For example, here is how
0078 it could be built using a ramdisk for the build and a network mounted
0079 directory, /gapps for the install (this assumes a ramdisk already
0080 mounted under /media/ramdisk):
0081
0082 ```bash
0083 singularity run -c -B /media/ramdisk,/gapps epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0084 cd /media/ramdisk
0085 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0086 mkdir root-build/
0087 cd root-build/
0088 cmake \
0089     -DCMAKE_INSTALL_PREFIX=/gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/ \
0090     -DCMAKE_CXX_STANDARD=14 \
0091     ../root-6.26.02
0092
0093 make -j48 install
0094 ```
0095
0096
0097 ### X11
0098
0099 If you run this ROOT from the container and can't open any graphics
0100 windows it may be because you ran singularity with the "-c" option
0101 and your ~/.Xauthority directory is not available. Just start the container
0102 again with this explictly bound. For example:
0103
0104 ```bash
0105 singularity run -c -B /gapps,${HOME}/.Xauthority epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0106 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0107 root
0108 root [0] TCanvas c("c","", 400, 400)
0109 ```
0110
0111
0112 ## Building JANA2 with CUDA and libtorch
0113
0114 Singularity makes it really easy to access Nvidia GPU's from within
0115 a container by just adding the "-nv" option. Create a singlularity
0116 container with something like the following command and then use it
0117 for the rest of these instructions. Note that some of these directories
0118 and versions represent a specific example so adjust for your specific
0119 system as appropriate.
0120
0121 ```
0122 setenv SINGIMG /gapps/singularity/epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0123 singularity run -c -B /media/ramdisk,/gapps,/gluonwork1,${HOME}/.Xauthority --nv ${SINGIMG}
0124
0125 # Unpack libtorch and cudnn (you must download these seperately)
0126 # n.b. Ubuntu 20.04 using gcc 9.3.0 needs to use:
0127 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0128 # while RHEL7 using SCL 9.3.0 needs to use:
0129 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0130 cd ${WORKDIR}
0131 unzip libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0132 tar xzf cudnn-11.4-linux-x64-v8.2.4.15.tgz
0133
0134
0135 # Setup environment
0136 export WORKDIR=./
0137 export CMAKE_PREFIX_PATH=${WORKDIR}/libtorch/share/cmake/Torch:${CMAKE_PREFIX_PATH}
0138 export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.4
0139 export CUDACXX=$CUDA_TOOLKIT_ROOT_DIR/bin/nvcc
0140 export CUDNN_LIBRARY_PATH=${WORKDIR}/cuda/lib64
0141 export CUDNN_INCLUDE_PATH=${WORKDIR}/cuda/include
0142 export PATH=${CUDA_TOOLKIT_ROOT_DIR}/bin:${PATH}
0143 export LD_LIBRARY_PATH=${WORKDIR}/libtorch/lib:${CUDA_TOOLKIT_ROOT_DIR}/lib64:${CUDNN_LIBRARY_PATH}:${LD_LIBRARY_PATH}
0144 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0145
0146
0147
0148 # Clone and build JANA2
0149 # (n.b. the git clone failed to work from within ths singularity
0150 # container on my internal system so I had to run that outside
0151 # of the container.)
0152 cd ${WORKDIR}
0153 git clone https://github.com/JeffersonLab/JANA2
0154 mkdir JANA2/build
0155 cd JANA2/build
0156 cmake -DCMAKE_INSTALL_PREFIX=../install \
0157         -DUSE_PYTHON=ON \
0158         -DUSE_ROOT=ON \
0159         -DUSE_ZEROMQ=ON \
0160         -DCMAKE_BUILD_TYPE=Debug \
0161         ../
0162 make -j48 install
0163 source ../install/bin/jana-this.sh
0164
0165
0166 # Create and build a JANA2 plugin
0167 cd ${WORKDIR}
0168 jana-generate.py Plugin JANAGPUTest 1 0 1
0169 mkdir JANAGPUTest/build
0170 cd  JANAGPUTest/build
0171 cmake -DCMAKE_BUILD_TYPE=Debug ../
0172 make -j48 install
0173
0174
0175 # Test plugin works without GPU or libtorch before continuing
0176 jana -PPLUGINS=JTestRoot,JANAGPUTest
0177
0178
0179 # Add factory which uses GPU
0180 cd ${WORKDIR}/JANAGPUTest
0181 jana-generate.py JFactory GPUPID
0182
0183 < edit JANAGPUTest.cc to add factory generator (see comments at top of JFactory_GPUPID.cc)  >
0184
0185
0186 # Add CUDA/libtorch to plugin's CMakeLists.txt
0187 cd ${WORKDIR}/JANAGPUTest
0188
0189 < edit CMakeLists.txt and add following in appropriate places:
0190
0191   # At top right under project(...) line
0192   enable_language(CUDA)
0193
0194
0195   # Right after the line with "find_package(ROOT REQUIRED)"
0196   find_package(Torch REQUIRED)
0197   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
0198
0199   # This is just a modification of the existing target_link_libraries line
0200   target_link_libraries(JANAGPUTest_plugin ${JANA_LIBRARY} ${ROOT_LIBRARIES} ${TORCH_LIBRARIES})
0201
0202 >
0203
0204
0205 # Add code that uses libtorch
0206
0207 < edit JFactory_GPUPID.cc to include the folllwing:
0208
0209   // Place this at top of file
0210   #include <torch/torch.h>
0211
0212   // Place these in the Init() method
0213   torch::Tensor tensor = torch::rand({2, 3});
0214   std::cout << tensor << std::endl;
0215 >
0216
0217 # A CUDA kernel can also be added and called without using libtorch.
0218 # Create a file called tmp.cu with the following content:
0219
0220    #include <stdio.h>
0221
0222    __global__ void cuda_hello(){
0223        printf("Hello World from GPU!\n");
0224    }
0225
0226    int cuda_hello_world() {
0227        printf("%s:%d\n", __FILE__,__LINE__);
0228        cuda_hello<<<2,3>>>();
0229        printf("%s:%d\n", __FILE__,__LINE__);
0230        return 0;
0231    }
0232
0233 # Add a call in JFactory_GPUPID::Init() to
0234
0235    cuda_hello_world();
0236
0237
0238
0239 # Rebuild the plugin
0240 cd ${WORKDIR}/JANAGPUTest/build
0241 rm CMakeCache.txt
0242 cmake \
0243     -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_TOOLKIT_ROOT_DIR} \
0244     -DCUDNN_LIBRARY_PATH=${CUDNN_LIBRARY_PATH} \
0245     -DCUDNN_INCLUDE_PATH=${CUDNN_INCLUDE_PATH} \
0246     -DCMAKE_BUILD_TYPE=Debug \
0247     ../
0248 make -j48 install
0249
0250
0251 # Test the plugin. You should see a message with values from the libtorch
0252 # tensor followed by 6 Hello World messages from the CUDA kernel.
0253 jana -PPLUGINS=JTestRoot,JANAGPUTest -PAUTOACTIVATE=GPUPID
0254 ```
0255
0256 Note: You can confirm that this is using the GPU by checking the
0257 output of "nvidia-smi" while running. The jana program should be
0258 listed at the bottom of the output.
0259
0260
0261
0262
0263