Back to home page

EIC code displayed by LXR

 
 

    


Warning, /jana2/containers/Singularity/GPU/README is written in an unsupported language. File is not indexed.

0001 
0002 How to build a CUDA GPU enabled Singularity container
0003 that can be used with JANA2.
0004 
0005 
0006 These are some pretty specific instructions for building
0007 a Singularity container that can be used to build and run
0008 a JANA2 program that can access Nvidia GPUs via CUDA on
0009 the host computer. These have not been tested across OS
0010 flavors/versions or other software versions.
0011 
0012 
0013 Building the Singularity container
0014 -----------------------------------
0015 These instructions involve creating a Docker
0016 image locally and then producing a Singularity image from
0017 that. These instructions were developed on a computer running
0018 running RHEL7.9 with Docker 20.10.13 installed. Singularity
0019 version 3.9.5 was used.
0020 
0021 Docker is used to access the Nvidia supplied cuda image
0022 from dockerhub as a base for the image. Several packages 
0023 are installed in a new docker image mainly to allow building
0024 ROOT with the singularity container. If you do not need to
0025 use ROOT or ZeroMQ support in JANA2, then you can skip building
0026 your own Docker image and just pull the one supplied by Nvida
0027 into a Singularity image directly with:
0028 
0029   singularity build cuda_11.4.2-devel-ubuntu20.04.sif docker://nvidia/cuda:11.4.2-devel-ubuntu20.04
0030 
0031 
0032 The Dockerfile in this directory can be used to build an image
0033 that will allow building JANA2 with support for both ROOT and
0034 ZeroMQ. To build the Singularity image, execute the two commands
0035 below. 
0036 
0037 NOTE: This will result in a Singularity image that is about
0038 3GB. Docker claims its image takes 5.5GB. Make sure you have
0039 plenty of disk space available for both. Also, be aware that by
0040 default, Singularity uses a subdirectory in your home directory
0041 for its cache so you may run into issues there if you have limited
0042 space.
0043 
0044 NOTE: You do NOT actually need to have cuda or a GPU installed
0045 on the computer where you create this image(s). You can transfer
0046 the image to a computer with one or more GPUs and the CUDA
0047 drivers installed to actually use it.
0048 
0049 
0050 docker build -f Dockerfile -t epsci/cuda:11.4.2-devel-ubuntu20.04 .
0051 
0052 singularity build epsci_cuda_11.4.2-devel-ubuntu20.04.sif docker-daemon://epsci/cuda:11.4.2-devel-ubuntu20.04
0053 
0054 
0055 
0056 Building ROOT
0057 ------------------
0058 If you are interested in building root with the container, then
0059 here are some instructions. If you don't need ROOT then skip
0060 this section.
0061 
0062 
0063 The following commands will checkout, build and install root
0064 version 6.26.02 in the local working directory. Note that you
0065 may be able to build this *much* faster on a ramdisk if you have
0066 enough memory. Just make sure to adjust the install location to
0067 to somewhere more permenant.
0068 
0069 
0070 singularity run epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0071 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0072 mkdir root-build/
0073 cd root-build/
0074 cmake -DCMAKE_INSTALL_PREFIX=../root-6.26.02-install -DCMAKE_CXX_STANDARD=14 ../root-6.26.02
0075 make -j48 install
0076 
0077 
0078 TROUBLESHOOTING:
0079 
0080 Some centralized installations of Singularity may be configured
0081 at a system level to automatically mount one or more network drives.
0082 If the system you are using does not have access to all of these,
0083 singularity may fail to launch properly. In that case you can tell
0084 it not to bind any default directories with the "-c" option and then
0085 explictly bind the directories you need. For example, here is how
0086 it could be built using a ramdisk for the build and a network mounted
0087 directory, /gapps for the install (this assumes a ramdisk already
0088 mounted under /media/ramdisk):
0089 
0090 singularity run -c -B /media/ramdisk,/gapps epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0091 cd /media/ramdisk
0092 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0093 mkdir root-build/
0094 cd root-build/
0095 cmake \
0096     -DCMAKE_INSTALL_PREFIX=/gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/ \
0097     -DCMAKE_CXX_STANDARD=14 \
0098     ../root-6.26.02
0099 
0100 make -j48 install
0101 
0102 
0103 X11:
0104 
0105 If you run this ROOT from the container and can't open any graphics
0106 windows it may be because you ran singularity with the "-c" option
0107 and your ~/.Xauthority directory is not available. Just start the container
0108 again with this explictly bound. For example:
0109 
0110 singularity run -c -B /gapps,${HOME}/.Xauthority epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0111 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0112 root
0113 root [0] TCanvas c("c","", 400, 400)
0114 
0115 
0116 Building JANA2 with CUDA and libtorch:
0117 ---------------------------------------
0118 
0119 Singularity makes it really easy to access Nvidia GPU's from within
0120 a container by just adding the "-nv" option. Create a singlularity
0121 container with something like the following command and then use it
0122 for the rest of these instructions. Note that some of these directories 
0123 and versions represent a specific example so adjust for your specific
0124 system as appropriate.
0125 
0126 
0127 setenv SINGIMG /gapps/singularity/epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0128 singularity run -c -B /media/ramdisk,/gapps,/gluonwork1,${HOME}/.Xauthority --nv ${SINGIMG}
0129 
0130 # Unpack libtorch and cudnn (you must download these seperately)
0131 # n.b. Ubuntu 20.04 using gcc 9.3.0 needs to use:
0132 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0133 # while RHEL7 using SCL 9.3.0 needs to use:
0134 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0135 cd ${WORKDIR}
0136 unzip libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0137 tar xzf cudnn-11.4-linux-x64-v8.2.4.15.tgz
0138 
0139 
0140 # Setup environment
0141 export WORKDIR=./
0142 export CMAKE_PREFIX_PATH=${WORKDIR}/libtorch/share/cmake/Torch:${CMAKE_PREFIX_PATH}
0143 export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.4
0144 export CUDACXX=$CUDA_TOOLKIT_ROOT_DIR/bin/nvcc
0145 export CUDNN_LIBRARY_PATH=${WORKDIR}/cuda/lib64
0146 export CUDNN_INCLUDE_PATH=${WORKDIR}/cuda/include
0147 export PATH=${CUDA_TOOLKIT_ROOT_DIR}/bin:${PATH}
0148 export LD_LIBRARY_PATH=${WORKDIR}/libtorch/lib:${CUDA_TOOLKIT_ROOT_DIR}/lib64:${CUDNN_LIBRARY_PATH}:${LD_LIBRARY_PATH}
0149 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0150 
0151 
0152 
0153 # Clone and build JANA2 
0154 # (n.b. the git clone failed to work from within ths singularity
0155 # container on my internal system so I had to run that outside
0156 # of the container.)
0157 cd ${WORKDIR}
0158 git clone https://github.com/JeffersonLab/JANA2
0159 mkdir JANA2/build
0160 cd JANA2/build
0161 cmake -DCMAKE_INSTALL_PREFIX=../install \
0162         -DUSE_PYTHON=ON \
0163         -DUSE_ROOT=ON \
0164         -DUSE_ZEROMQ=ON \
0165         -DCMAKE_BUILD_TYPE=Debug \
0166         ../
0167 make -j48 install
0168 source ../install/bin/jana-this.sh
0169 
0170 
0171 # Create and build a JANA2 plugin
0172 cd ${WORKDIR}
0173 jana-generate.py Plugin JANAGPUTest 1 0 1
0174 mkdir JANAGPUTest/build
0175 cd  JANAGPUTest/build
0176 cmake -DCMAKE_BUILD_TYPE=Debug ../
0177 make -j48 install
0178 
0179 
0180 # Test plugin works without GPU or libtorch before continuing
0181 jana -PPLUGINS=JTestRoot,JANAGPUTest
0182 
0183 
0184 # Add factory which uses GPU
0185 cd ${WORKDIR}/JANAGPUTest
0186 jana-generate.py JFactory GPUPID
0187 
0188 < edit JANAGPUTest.cc to add factory generator (see comments at top of JFactory_GPUPID.cc)  >
0189 
0190 
0191 # Add CUDA/libtorch to plugin's CMakeLists.txt
0192 cd ${WORKDIR}/JANAGPUTest
0193 
0194 < edit CMakeLists.txt and add following in appropriate places:
0195 
0196   # At top right under project(...) line
0197   enable_language(CUDA)
0198 
0199 
0200   # Right after the line with "find_package(ROOT REQUIRED)"
0201   find_package(Torch REQUIRED)
0202   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
0203   
0204   # This is just a modification of the existing target_link_libraries line
0205   target_link_libraries(JANAGPUTest_plugin ${JANA_LIBRARY} ${ROOT_LIBRARIES} ${TORCH_LIBRARIES})
0206 
0207 >
0208   
0209   
0210 # Add code that uses libtorch  
0211 
0212 < edit JFactory_GPUPID.cc to include the folllwing:
0213 
0214   // Place this at top of file
0215   #include <torch/torch.h>
0216 
0217   // Place these in the Init() method
0218   torch::Tensor tensor = torch::rand({2, 3});
0219   std::cout << tensor << std::endl;
0220 >
0221 
0222 # A CUDA kernel can also be added and called without using libtorch.
0223 # Create a file called tmp.cu with the following content:
0224 
0225    #include <stdio.h>
0226 
0227    __global__ void cuda_hello(){
0228        printf("Hello World from GPU!\n");
0229    }
0230 
0231    int cuda_hello_world() {
0232        printf("%s:%d\n", __FILE__,__LINE__);
0233        cuda_hello<<<2,3>>>(); 
0234        printf("%s:%d\n", __FILE__,__LINE__);
0235        return 0;
0236    }
0237 
0238 # Add a call in JFactory_GPUPID::Init() to
0239 
0240    cuda_hello_world();
0241    
0242 
0243 
0244 # Rebuild the plugin
0245 cd ${WORKDIR}/JANAGPUTest/build
0246 rm CMakeCache.txt
0247 cmake \
0248     -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_TOOLKIT_ROOT_DIR} \
0249     -DCUDNN_LIBRARY_PATH=${CUDNN_LIBRARY_PATH} \
0250     -DCUDNN_INCLUDE_PATH=${CUDNN_INCLUDE_PATH} \
0251     -DCMAKE_BUILD_TYPE=Debug \
0252     ../
0253 make -j48 install
0254 
0255 
0256 # Test the plugin. You should see a message with values from the libtorch
0257 # tensor followed by 6 Hello World messages from the CUDA kernel.
0258 jana -PPLUGINS=JTestRoot,JANAGPUTest -PAUTOACTIVATE=GPUPID
0259 
0260 
0261 Note: You can confirm that this is using the GPU by checking the
0262 output of "nvidia-smi" while running. The jana program should be
0263 listed at the bottom of the output.
0264 
0265 
0266 
0267 
0268