This repository contains a patched version of NCCL specifically tuned for NVIDIA A100 MIG (Multi-Instance GPU) environments and heterogeneous Ampere systems (e.g., A40).
On certain A100 MIG configurations, the CUDA driver rejects the cudaMemPoolCreate call during NCCL initialization with the error:
cuda failure 'operation not supported' (occurring in src/init.cc).
This patch bypasses the memory pool creation by forcing comm->memPool = nullptr, allowing NCCL to fall back to standard memory allocation paths which are supported by the MIG driver.
- Modified
src/init.ccto remove thecudaMemPoolCreateblock.
To build for Ampere architecture (SM80):
make clean
make -j$(nproc) NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80"When running benchmarks (e.g., nccl-tests) across MIG partitions and remote A40 nodes, the following environment variables are recommended to ensure stability and bypass driver restrictions:
export NCCL_P2P_DISABLE=1 # Disable Peer-to-Peer (MIG restriction)
export NCCL_SHM_DISABLE=1 # Disable Shared Memory (MIG restriction)
export NCCL_NET_GDR_LEVEL=0 # Disable GPU Direct RDMA
export NCCL_MIG_MODE=1 # Enable MIG mode
export NCCL_SOCKET_IFNAME=ens33 # Set specific network interface