Back to home page

EIC code displayed by LXR

 
 

    


Warning, /jana2/docs/development/containers.md is written in an unsupported language. File is not indexed.

0001 
0002 # Working with development containers <!-- {docsify-ignore-all} -->
0003 
0004 ## Building a CUDA Singularity container
0005 
0006 These instructions involve creating a Docker
0007 image locally and then producing a Singularity image from
0008 that. These instructions were developed on a computer running
0009 running RHEL7.9 with Docker 20.10.13 installed. Singularity
0010 version 3.9.5 was used.
0011 
0012 Docker is used to access the Nvidia supplied cuda image
0013 from dockerhub as a base for the image. Several packages 
0014 are installed in a new docker image mainly to allow building
0015 ROOT with the singularity container. If you do not need to
0016 use ROOT or ZeroMQ support in JANA2, then you can skip building
0017 your own Docker image and just pull the one supplied by Nvida
0018 into a Singularity image directly with:
0019 
0020 ```bash
0021 singularity build cuda_11.4.2-devel-ubuntu20.04.sif docker://nvidia/cuda:11.4.2-devel-ubuntu20.04
0022 ```
0023 
0024 The Dockerfile in this directory can be used to build an image
0025 that will allow building JANA2 with support for both ROOT and
0026 ZeroMQ. To build the Singularity image, execute the two commands
0027 below.
0028 
0029 NOTE: This will result in a Singularity image that is about
0030 3GB. Docker claims its image takes 5.5GB. Make sure you have
0031 plenty of disk space available for both. Also, be aware that by
0032 default, Singularity uses a subdirectory in your home directory
0033 for its cache so you may run into issues there if you have limited
0034 space.
0035 
0036 NOTE: You do NOT actually need to have cuda or a GPU installed
0037 on the computer where you create this image(s). You can transfer
0038 the image to a computer with one or more GPUs and the CUDA
0039 drivers installed to actually use it.
0040 
0041 ```bash
0042 docker build -f Dockerfile -t epsci/cuda:11.4.2-devel-ubuntu20.04 .
0043 
0044 singularity build epsci_cuda_11.4.2-devel-ubuntu20.04.sif docker-daemon://epsci/cuda:11.4.2-devel-ubuntu20.04
0045 ```
0046 
0047 ## Building ROOT
0048 
0049 If you are interested in building root with the container, then
0050 here are some instructions. If you don't need ROOT then skip
0051 this section.
0052 
0053 
0054 The following commands will checkout, build and install root
0055 version 6.26.02 in the local working directory. Note that you
0056 may be able to build this *much* faster on a ramdisk if you have
0057 enough memory. Just make sure to adjust the install location to
0058 to somewhere more permanent.
0059 
0060 ```bash
0061 singularity run epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0062 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0063 mkdir root-build/
0064 cd root-build/
0065 cmake -DCMAKE_INSTALL_PREFIX=../root-6.26.02-install -DCMAKE_CXX_STANDARD=14 ../root-6.26.02
0066 make -j48 install
0067 ```
0068 
0069 
0070 ### TROUBLESHOOTING
0071 
0072 Some centralized installations of Singularity may be configured
0073 at a system level to automatically mount one or more network drives.
0074 If the system you are using does not have access to all of these,
0075 singularity may fail to launch properly. In that case you can tell
0076 it not to bind any default directories with the "-c" option and then
0077 explictly bind the directories you need. For example, here is how
0078 it could be built using a ramdisk for the build and a network mounted
0079 directory, /gapps for the install (this assumes a ramdisk already
0080 mounted under /media/ramdisk):
0081 
0082 ```bash
0083 singularity run -c -B /media/ramdisk,/gapps epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0084 cd /media/ramdisk
0085 git clone --branch v6-26-02 https://github.com/root-project/root.git root-6.26.02
0086 mkdir root-build/
0087 cd root-build/
0088 cmake \
0089     -DCMAKE_INSTALL_PREFIX=/gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/ \
0090     -DCMAKE_CXX_STANDARD=14 \
0091     ../root-6.26.02
0092 
0093 make -j48 install
0094 ```
0095 
0096 
0097 ### X11
0098 
0099 If you run this ROOT from the container and can't open any graphics
0100 windows it may be because you ran singularity with the "-c" option
0101 and your ~/.Xauthority directory is not available. Just start the container
0102 again with this explictly bound. For example:
0103 
0104 ```bash
0105 singularity run -c -B /gapps,${HOME}/.Xauthority epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0106 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0107 root
0108 root [0] TCanvas c("c","", 400, 400)
0109 ```
0110 
0111 
0112 ## Building JANA2 with CUDA and libtorch
0113 
0114 Singularity makes it really easy to access Nvidia GPU's from within
0115 a container by just adding the "-nv" option. Create a singlularity
0116 container with something like the following command and then use it
0117 for the rest of these instructions. Note that some of these directories 
0118 and versions represent a specific example so adjust for your specific
0119 system as appropriate.
0120 
0121 ```
0122 setenv SINGIMG /gapps/singularity/epsci_cuda_11.4.2-devel-ubuntu20.04.sif
0123 singularity run -c -B /media/ramdisk,/gapps,/gluonwork1,${HOME}/.Xauthority --nv ${SINGIMG}
0124 
0125 # Unpack libtorch and cudnn (you must download these seperately)
0126 # n.b. Ubuntu 20.04 using gcc 9.3.0 needs to use:
0127 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0128 # while RHEL7 using SCL 9.3.0 needs to use:
0129 # libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0130 cd ${WORKDIR}
0131 unzip libtorch-cxx11-abi-shared-with-deps-1.11.0+cu113.zip
0132 tar xzf cudnn-11.4-linux-x64-v8.2.4.15.tgz
0133 
0134 
0135 # Setup environment
0136 export WORKDIR=./
0137 export CMAKE_PREFIX_PATH=${WORKDIR}/libtorch/share/cmake/Torch:${CMAKE_PREFIX_PATH}
0138 export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.4
0139 export CUDACXX=$CUDA_TOOLKIT_ROOT_DIR/bin/nvcc
0140 export CUDNN_LIBRARY_PATH=${WORKDIR}/cuda/lib64
0141 export CUDNN_INCLUDE_PATH=${WORKDIR}/cuda/include
0142 export PATH=${CUDA_TOOLKIT_ROOT_DIR}/bin:${PATH}
0143 export LD_LIBRARY_PATH=${WORKDIR}/libtorch/lib:${CUDA_TOOLKIT_ROOT_DIR}/lib64:${CUDNN_LIBRARY_PATH}:${LD_LIBRARY_PATH}
0144 source /gapps/root/Linux_Ubuntu20.04-x86_64-gcc9.3.0/root-6.26.02/bin/thisroot.sh
0145 
0146 
0147 
0148 # Clone and build JANA2 
0149 # (n.b. the git clone failed to work from within ths singularity
0150 # container on my internal system so I had to run that outside
0151 # of the container.)
0152 cd ${WORKDIR}
0153 git clone https://github.com/JeffersonLab/JANA2
0154 mkdir JANA2/build
0155 cd JANA2/build
0156 cmake -DCMAKE_INSTALL_PREFIX=../install \
0157         -DUSE_PYTHON=ON \
0158         -DUSE_ROOT=ON \
0159         -DUSE_ZEROMQ=ON \
0160         -DCMAKE_BUILD_TYPE=Debug \
0161         ../
0162 make -j48 install
0163 source ../install/bin/jana-this.sh
0164 
0165 
0166 # Create and build a JANA2 plugin
0167 cd ${WORKDIR}
0168 jana-generate.py Plugin JANAGPUTest 1 0 1
0169 mkdir JANAGPUTest/build
0170 cd  JANAGPUTest/build
0171 cmake -DCMAKE_BUILD_TYPE=Debug ../
0172 make -j48 install
0173 
0174 
0175 # Test plugin works without GPU or libtorch before continuing
0176 jana -PPLUGINS=JTestRoot,JANAGPUTest
0177 
0178 
0179 # Add factory which uses GPU
0180 cd ${WORKDIR}/JANAGPUTest
0181 jana-generate.py JFactory GPUPID
0182 
0183 < edit JANAGPUTest.cc to add factory generator (see comments at top of JFactory_GPUPID.cc)  >
0184 
0185 
0186 # Add CUDA/libtorch to plugin's CMakeLists.txt
0187 cd ${WORKDIR}/JANAGPUTest
0188 
0189 < edit CMakeLists.txt and add following in appropriate places:
0190 
0191   # At top right under project(...) line
0192   enable_language(CUDA)
0193 
0194 
0195   # Right after the line with "find_package(ROOT REQUIRED)"
0196   find_package(Torch REQUIRED)
0197   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
0198   
0199   # This is just a modification of the existing target_link_libraries line
0200   target_link_libraries(JANAGPUTest_plugin ${JANA_LIBRARY} ${ROOT_LIBRARIES} ${TORCH_LIBRARIES})
0201 
0202 >
0203   
0204   
0205 # Add code that uses libtorch  
0206 
0207 < edit JFactory_GPUPID.cc to include the folllwing:
0208 
0209   // Place this at top of file
0210   #include <torch/torch.h>
0211 
0212   // Place these in the Init() method
0213   torch::Tensor tensor = torch::rand({2, 3});
0214   std::cout << tensor << std::endl;
0215 >
0216 
0217 # A CUDA kernel can also be added and called without using libtorch.
0218 # Create a file called tmp.cu with the following content:
0219 
0220    #include <stdio.h>
0221 
0222    __global__ void cuda_hello(){
0223        printf("Hello World from GPU!\n");
0224    }
0225 
0226    int cuda_hello_world() {
0227        printf("%s:%d\n", __FILE__,__LINE__);
0228        cuda_hello<<<2,3>>>(); 
0229        printf("%s:%d\n", __FILE__,__LINE__);
0230        return 0;
0231    }
0232 
0233 # Add a call in JFactory_GPUPID::Init() to
0234 
0235    cuda_hello_world();
0236    
0237 
0238 
0239 # Rebuild the plugin
0240 cd ${WORKDIR}/JANAGPUTest/build
0241 rm CMakeCache.txt
0242 cmake \
0243     -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_TOOLKIT_ROOT_DIR} \
0244     -DCUDNN_LIBRARY_PATH=${CUDNN_LIBRARY_PATH} \
0245     -DCUDNN_INCLUDE_PATH=${CUDNN_INCLUDE_PATH} \
0246     -DCMAKE_BUILD_TYPE=Debug \
0247     ../
0248 make -j48 install
0249 
0250 
0251 # Test the plugin. You should see a message with values from the libtorch
0252 # tensor followed by 6 Hello World messages from the CUDA kernel.
0253 jana -PPLUGINS=JTestRoot,JANAGPUTest -PAUTOACTIVATE=GPUPID
0254 ```
0255 
0256 Note: You can confirm that this is using the GPU by checking the
0257 output of "nvidia-smi" while running. The jana program should be
0258 listed at the bottom of the output.
0259 
0260 
0261 
0262 
0263