Warning, /jana2/containers/Singularity/GPU/README is written in an unsupported language. File is not indexed.
0001
0002 How to build a CUDA GPU enabled Singularity container
0003 that can be used with JANA2.
0004
0005
0006 These are some pretty specific instructions for building
0007 a Singularity container that can be used to build and run
0008 a JANA2 program that can access Nvidia GPUs via CUDA on
0009 the host computer. These have not been tested across OS
0010 flavors/versions or other software versions.
0011
0012
0013 Building the Singularity container
0014 -----------------------------------
0015 These instructions involve creating a Docker
0016 image locally and then producing a Singularity image from
0017 that. These instructions were developed on a computer running
0018 running RHEL7.9 with Docker 20.10.13 installed. Singularity
0019 version 3.9.5 was used.
0020
0021 Docker is used to access the Nvidia supplied cuda image
0022 from dockerhub as a base for the image. Several packages
0023 are installed in a new docker image mainly to allow building
0024 ROOT with the singularity container. If you do not need to
0025 use ROOT or ZeroMQ support in JANA2, then you can skip building
0026 your own Docker image and just pull the one supplied by Nvida
0027 into a Singularity image directly with:
0028
0029 singularity build cuda_11.4.2-devel-ubuntu20.04.sif docker://nvidia/cuda:11.4.2-devel-ubuntu20.04
0030
0031
0032 The Dockerfile in this directory can be used to build an image
0033 that will allow building JANA2 with support for both ROOT and
0034 ZeroMQ. To build the Singularity image, execute the two commands
0035 below.
0036
0037 NOTE: This will result in a Singularity image that is about
0038 3GB. Docker claims its image takes 5.5GB. Make sure you have
0039 plenty of disk space available for both. Also, be aware that by
0040 default, Singularity uses a subdirectory in your home directory
0041 for its cache so you may run into issues there if you have limited
0042 space.
0043
0044 NOTE: You do NOT actually need to have cuda or a GPU installed
0045 on the computer where you create this image(s). You can transfer
0046 the image to a computer with one or more GPUs and the CUDA
0047 drivers installed to actually use it.
0048
0049
0050 docker build -f Dockerfile -t epsci/cuda:11.4.2-devel-ubuntu20.04 .
0051
0052 singularity build epsci_cuda_11.4.2-devel-ubuntu20.04.sif docker-daemon://epsci/cuda:11.4.2-devel-ubuntu20.04
0053
0054
0055
0056 Building ROOT
0057 ------------------
0058 If you are interested in building root with the container, then
0059 here are some instructions. If you don't need ROOT then skip
0060 this section.
0061
0062
0063 The following commands will checkout, build and install root
0064 version 6.26.02 in the local working directory. Note that you
0065 may be able to build this *much* faster on a ramdisk if you have
0066 enough memory. Just make sure to adjust the install location to
0067 to somewhere more permenant.
0068
0069
0070 singularity run epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0071 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0072 mkdir root-build/
0073 cd root-build/
0074 cmake -DCMAKE_INSTALL_PREFIX=../root-6.26.02-install -DCMAKE_CXX_STANDARD=14 ../root-6.26.02
0075 make -j48 install
0076
0077
0078 TROUBLESHOOTING:
0079
0080 Some centralized installations of Singularity may be configured
0081 at a system level to automatically mount one or more network drives.
0082 If the system you are using does not have access to all of these,
0083 singularity may fail to launch properly. In that case you can tell
0084 it not to bind any default directories with the "-c" option and then
0085 explictly bind the directories you need. For example, here is how
0086 it could be built using a ramdisk for the build and a network mounted
0087 directory, /gapps for the install (this assumes a ramdisk already
0088 mounted under /media/ramdisk):
0089
0090 singularity run -c -B /media/ramdisk,/gapps epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0091 cd /media/ramdisk
0092 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0093 mkdir root-build/
0094 cd root-build/
0095 cmake \
0096 -DCMAKE_INSTALL_PREFIX=/gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/ \
0097 -DCMAKE_CXX_STANDARD=14 \
0098 ../root-6.26.02
0099
0100 make -j48 install
0101
0102
0103 X11:
0104
0105 If you run this ROOT from the container and can't open any graphics
0106 windows it may be because you ran singularity with the "-c" option
0107 and your ~/.Xauthority directory is not available. Just start the container
0108 again with this explictly bound. For example:
0109
0110 singularity run -c -B /gapps,${HOME}/.Xauthority epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0111 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0112 root
0113 root [0] TCanvas c("c","", 400, 400)
0114
0115
0116 Building JANA2 with CUDA and libtorch:
0117 ---------------------------------------
0118
0119 Singularity makes it really easy to access Nvidia GPU's from within
0120 a container by just adding the "-nv" option. Create a singlularity
0121 container with something like the following command and then use it
0122 for the rest of these instructions. Note that some of these directories
0123 and versions represent a specific example so adjust for your specific
0124 system as appropriate.
0125
0126
0127 setenv SINGIMG /gapps/singularity/epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0128 singularity run -c -B /media/ramdisk,/gapps,/gluonwork1,${HOME}/.Xauthority --nv ${SINGIMG}
0129
0130 # Unpack libtorch and cudnn (you must download these seperately)
0131 # n.b. Ubuntu 20.04 using gcc 9.3.0 needs to use:
0132 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0133 # while RHEL7 using SCL 9.3.0 needs to use:
0134 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0135 cd ${WORKDIR}
0136 unzip libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0137 tar xzf cudnn-11.4-linux-x64-v8.2.4.15.tgz
0138
0139
0140 # Setup environment
0141 export WORKDIR=./
0142 export CMAKE_PREFIX_PATH=${WORKDIR}/libtorch/share/cmake/Torch:${CMAKE_PREFIX_PATH}
0143 export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.4
0144 export CUDACXX=$CUDA_TOOLKIT_ROOT_DIR/bin/nvcc
0145 export CUDNN_LIBRARY_PATH=${WORKDIR}/cuda/lib64
0146 export CUDNN_INCLUDE_PATH=${WORKDIR}/cuda/include
0147 export PATH=${CUDA_TOOLKIT_ROOT_DIR}/bin:${PATH}
0148 export LD_LIBRARY_PATH=${WORKDIR}/libtorch/lib:${CUDA_TOOLKIT_ROOT_DIR}/lib64:${CUDNN_LIBRARY_PATH}:${LD_LIBRARY_PATH}
0149 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0150
0151
0152
0153 # Clone and build JANA2
0154 # (n.b. the git clone failed to work from within ths singularity
0155 # container on my internal system so I had to run that outside
0156 # of the container.)
0157 cd ${WORKDIR}
0158 git clone https://github.com/JeffersonLab/JANA2
0159 mkdir JANA2/build
0160 cd JANA2/build
0161 cmake -DCMAKE_INSTALL_PREFIX=../install \
0162 -DUSE_PYTHON=ON \
0163 -DUSE_ROOT=ON \
0164 -DUSE_ZEROMQ=ON \
0165 -DCMAKE_BUILD_TYPE=Debug \
0166 ../
0167 make -j48 install
0168 source ../install/bin/jana-this.sh
0169
0170
0171 # Create and build a JANA2 plugin
0172 cd ${WORKDIR}
0173 jana-generate.py Plugin JANAGPUTest 1 0 1
0174 mkdir JANAGPUTest/build
0175 cd JANAGPUTest/build
0176 cmake -DCMAKE_BUILD_TYPE=Debug ../
0177 make -j48 install
0178
0179
0180 # Test plugin works without GPU or libtorch before continuing
0181 jana -PPLUGINS=JTestRoot,JANAGPUTest
0182
0183
0184 # Add factory which uses GPU
0185 cd ${WORKDIR}/JANAGPUTest
0186 jana-generate.py JFactory GPUPID
0187
0188 < edit JANAGPUTest.cc to add factory generator (see comments at top of JFactory_GPUPID.cc) >
0189
0190
0191 # Add CUDA/libtorch to plugin's CMakeLists.txt
0192 cd ${WORKDIR}/JANAGPUTest
0193
0194 < edit CMakeLists.txt and add following in appropriate places:
0195
0196 # At top right under project(...) line
0197 enable_language(CUDA)
0198
0199
0200 # Right after the line with "find_package(ROOT REQUIRED)"
0201 find_package(Torch REQUIRED)
0202 set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
0203
0204 # This is just a modification of the existing target_link_libraries line
0205 target_link_libraries(JANAGPUTest_plugin ${JANA_LIBRARY} ${ROOT_LIBRARIES} ${TORCH_LIBRARIES})
0206
0207 >
0208
0209
0210 # Add code that uses libtorch
0211
0212 < edit JFactory_GPUPID.cc to include the folllwing:
0213
0214 // Place this at top of file
0215 #include <torch/torch.h>
0216
0217 // Place these in the Init() method
0218 torch::Tensor tensor = torch::rand({2, 3});
0219 std::cout << tensor << std::endl;
0220 >
0221
0222 # A CUDA kernel can also be added and called without using libtorch.
0223 # Create a file called tmp.cu with the following content:
0224
0225 #include <stdio.h>
0226
0227 __global__ void cuda_hello(){
0228 printf("Hello World from GPU!\n");
0229 }
0230
0231 int cuda_hello_world() {
0232 printf("%s:%d\n", __FILE__,__LINE__);
0233 cuda_hello<<<2,3>>>();
0234 printf("%s:%d\n", __FILE__,__LINE__);
0235 return 0;
0236 }
0237
0238 # Add a call in JFactory_GPUPID::Init() to
0239
0240 cuda_hello_world();
0241
0242
0243
0244 # Rebuild the plugin
0245 cd ${WORKDIR}/JANAGPUTest/build
0246 rm CMakeCache.txt
0247 cmake \
0248 -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_TOOLKIT_ROOT_DIR} \
0249 -DCUDNN_LIBRARY_PATH=${CUDNN_LIBRARY_PATH} \
0250 -DCUDNN_INCLUDE_PATH=${CUDNN_INCLUDE_PATH} \
0251 -DCMAKE_BUILD_TYPE=Debug \
0252 ../
0253 make -j48 install
0254
0255
0256 # Test the plugin. You should see a message with values from the libtorch
0257 # tensor followed by 6 Hello World messages from the CUDA kernel.
0258 jana -PPLUGINS=JTestRoot,JANAGPUTest -PAUTOACTIVATE=GPUPID
0259
0260
0261 Note: You can confirm that this is using the GPU by checking the
0262 output of "nvidia-smi" while running. The jana program should be
0263 listed at the bottom of the output.
0264
0265
0266
0267
0268