An optimized implementation of the ORB-SLAM3 visual SLAM system, designed for embedded edge computing platforms.
This project addresses the high computational latency of visual feature extraction on mobile processors by implementing a Hybrid CPU-GPU Architecture.
- Replaced the sequential descriptor extraction with a massive parallel CUDA Kernel.
- Utilizes Constant Memory for ORB patterns to minimize global memory latency.
- Achieves ~5x speedup in the descriptor calculation stage (from 23ms to 4ms).
- Replaced the recursive QuadTree algorithm (standard in ORB-SLAM3) with a linear Grid-based filtering approach.
- Ensures uniform feature distribution with O(N) complexity.
- Reduces CPU overhead and branch mispredictions.
- Implemented Static Memory Pooling on the GPU to avoid
cudaMallocoverhead per frame. - Zero-copy data transfer where applicable (Unified Memory architecture).
Tested on NVIDIA Jetson Orin Nano using the KITTI Odometry Benchmark (Sequence 00).
| Processing Stage | Original (CPU) | Accelerated (CPU+GPU) | Speedup |
|---|---|---|---|
| Pyramid Building | 1.52 ms | 2.45 ms | - |
| Feature Distribution | 20.85 ms | 11.20 ms | 1.86x |
| Gaussian Blur | 2.65 ms | 4.10 ms | - |
| ORB Descriptor | 21.40 ms | 4.35 ms | 4.92x |
| Data Transfer | 2.45 ms | 0.45 ms | 5.44x |
| TOTAL FRAME TIME | ~49 ms | ~22 ms | ~2.2x |
| FPS | 20 FPS | 45 FPS | Real-Time |
Note: While the GPU kernel is 5x faster, the total system speedup is governed by Amdahl's Law, resulting in a 2.2x overall improvement.
- Device: NVIDIA Jetson Orin Nano / Xavier NX / AGX Orin.
- Storage: NVMe SSD (Recommended for high-bandwidth dataset reading).
- OS: Ubuntu 20.04 (JetPack 5.x) or Ubuntu 22.04/24.04.
- Compilers: GCC 9+, NVCC 11.4+.
- Libraries:
- OpenCV 4.4+ (Must be compiled with CUDA & Eigen support).
- Eigen3.
- Pangolin (for visualization).
- Clone the repository:
git clone https://github.com/tothantonio/ORB_SLAM3_Accelerated.git cd ORB_SLAM3_Accelerated - Build the project: We provide a script to handle the mixed C++/CUDA compilation.
chmod +x build.sh ./build.sh
- Download KITTI Dataset: Download the grayscale odometry sequences from the KITTI Website.
To run the Monocular SLAM on KITTI Sequence 00:
# Using the helper script
chmod +x run.sh
./run.shManual execution:
./Examples/Monocular/mono_kitti \
Vocabulary/ORBvoc.txt \
Examples/Monocular/KITTI00-02.yaml \
/path/to/dataset/sequences/00Controls
Map Viewer: Use the mouse to rotate/zoom the 3D map.
Terminal: Real-time profiling logs are printed to stdout.
Exit: Press Ctrl+C.
src/ORBextractor.cc: Modified class with Grid Distribution logic.
src/OrbCuda.cu: CUDA Kernels and device memory management.
include/OrbCuda.h: Header for C++/CUDA interfacing.
Examples/: Executables for Monocular mode.
This project is based on ORB-SLAM3 by Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel and Juan D. Tardós.
Modifications by: Toth Antonio-Roberto
Technical University of Cluj-Napoca