3D Reconstruction for Robotics on NVIDIA Jetson: A Comprehensive Technical Guide

Published: January 2026 Author: Koca Ventures Technical Research Reading Time: 25 minutes


Plain English Summary

What is 3D Reconstruction?

Imagine a robot trying to navigate a warehouse. Unlike humans who can instantly see depth and obstacles, robots need to build a 3D map of their surroundings from camera and sensor data. This process of creating a 3D model of the real world is called 3D reconstruction.

Why do robots need this?

Robot Task Why 3D Matters
Navigation Know where walls, shelves, and people are
Grasping Objects Understand exact shape and position of items
Collision Avoidance Don't crash into things!
Mapping Remember where everything is for next time

The main technologies:

Technology What It Does Best For
Stereo Vision Two cameras = depth perception (like human eyes) Indoor robots
LiDAR Laser scanning = precise distance Outdoor vehicles
SLAM Simultaneous mapping and self-location Mobile robots
Gaussian Splatting New AI-based 3D reconstruction High-quality scenes

Real-world performance on Jetson:

Feature Speed Device
Depth Estimation 30 FPS Orin NX
Visual SLAM 60 FPS AGX Orin
3D Mapping Real-time AGX Orin

What will you learn?

  1. How stereo cameras calculate depth (like your two eyes do)
  2. Processing LiDAR point clouds with GPU acceleration
  3. Building maps while the robot moves (SLAM)
  4. Using NVIDIA's Isaac ROS for robotics
  5. Creating detailed 3D models with Gaussian Splatting

The bottom line: For any robot that needs to understand its environment, 3D reconstruction is essential. This guide covers everything from basic depth sensing to cutting-edge neural 3D methods.


Executive Summary

3D reconstruction has become the cornerstone of modern robotic perception, enabling autonomous systems to understand, navigate, and interact with complex real-world environments. NVIDIA's Jetson platform family—from the compact Orin Nano to the powerhouse Jetson AGX Thor—provides the perfect balance of computational capability, power efficiency, and edge deployment readiness for these demanding applications.

This comprehensive guide explores the state-of-the-art in 3D reconstruction for robotics, covering stereo vision, LiDAR processing, Visual SLAM, Neural Radiance Fields (NeRF), Gaussian Splatting, multi-sensor fusion, and NVIDIA's Isaac ROS ecosystem. We provide implementation insights, performance benchmarks, and practical code examples for deploying these technologies on Jetson hardware.


Table of Contents

  1. Jetson Platform Overview for 3D Perception
  2. Stereo Vision and Depth Estimation
  3. LiDAR Point Cloud Processing
  4. Visual SLAM: ORB-SLAM3 and RTAB-Map
  5. Neural Radiance Fields (NeRF) on Edge
  6. Gaussian Splatting for Real-Time 3D
  7. Multi-Sensor Fusion
  8. Real-Time Mesh Reconstruction
  9. CUDA-Accelerated 3D Processing
  10. Isaac ROS Integration
  11. Occupancy Grids and Voxel Mapping
  12. Implementation Pipeline

1. Jetson Platform Overview for 3D Perception

Hardware Specifications Comparison

The NVIDIA Jetson family provides a spectrum of compute capabilities for robotics applications:

Specification Jetson Orin Nano Jetson Orin NX Jetson AGX Orin Jetson AGX Thor
CUDA Cores 512-1024 1024 2048 2560
Tensor Cores 16 32 64 96
AI Performance 40-67 TOPS 100 TOPS 275 TOPS 2070 TFLOPS (FP4)
Memory 4-8 GB 8-16 GB 32-64 GB 128 GB
Memory BW ~68 GB/s ~102 GB/s 204.8 GB/s 273 GB/s
Power 7-15W 10-25W 15-60W 130W

The newly released Jetson AGX Thor represents a paradigm shift, offering 7.5x higher AI compute and 3.5x better energy efficiency compared to AGX Orin. Built on the Blackwell GPU architecture, Thor delivers 2070 FP4 TFLOPS—sufficient for running advanced generative AI models and multi-sensor perception pipelines simultaneously.

Architecture Overview

JETSON 3D PERCEPTION STACK
Stereo Camera
LiDAR Scanner
IMU Unit
SENSOR FUSION LAYER
cuVSLAM, CUDA-PCL, Kalman Filtering
3D RECONSTRUCTION ENGINE
nvblox
(Voxels)
GS-SLAM
(Splats)
NeRF
(Neural)
OUTPUT REPRESENTATIONS
ESDF Maps Meshes Occupancy Grids Point Clouds Costmaps Semantics

2. Stereo Vision and Depth Estimation

FoundationStereo: CVPR 2025 Best Paper Nomination

NVIDIA's FoundationStereo represents the state-of-the-art in zero-shot stereo matching, achieving first place on both the Middlebury and ETH3D leaderboards. This foundation model is designed for strong generalization across diverse real-world scenarios.

Key Architecture Components:

  1. Side-Tuning Adapter (STA): Adapts rich monocular priors from a frozen DepthAnythingV2 model while combining fine-grained high-frequency features from multi-level CNNs.

  2. Attentive Hybrid Cost Filtering (AHCF): Combines Axial-Planar Convolution (APC) filtering with a Disparity Transformer module for effective feature aggregation over the 4D hybrid cost volume.

  3. GRU Refinement: Initial disparity predictions are refined through GRU blocks for sub-pixel accuracy.

# FoundationStereo inference on Jetson
import torch
from foundation_stereo import FoundationStereo

# Load model (tested on Jetson Orin)
model = FoundationStereo.from_pretrained('nvidia/foundation-stereo')
model = model.cuda().eval()

# Process stereo pair
with torch.no_grad():
    left_img = load_image('left.png').cuda()
    right_img = load_image('right.png').cuda()

    # Output: dense disparity map
    disparity = model(left_img, right_img)

    # Convert to metric depth (requires baseline and focal length)
    depth = (baseline * focal_length) / disparity

    # Generate 3D point cloud
    point_cloud = depth_to_pointcloud(depth, camera_intrinsics)

Real-Time Stereo with VPI

NVIDIA's Vision Programming Interface (VPI) provides CUDA-accelerated stereo matching optimized for Jetson:

import vpi
import numpy as np

# Create stereo disparity estimator
stereo = vpi.StereoDisparityEstimator(
    width=1280, height=720,
    max_disparity=128,
    window_size=5,
    backend=vpi.Backend.CUDA
)

# Process stereo pair
with vpi.Backend.CUDA:
    left_vpi = vpi.asimage(left_array)
    right_vpi = vpi.asimage(right_array)

    disparity = stereo(left_vpi, right_vpi)

    # Convert to NumPy for further processing
    disparity_np = disparity.cpu().numpy()

ZED SDK Integration

Stereolabs ZED cameras provide optimized SDK support for Jetson:

import pyzed.sl as sl

# Initialize ZED camera
zed = sl.Camera()
init_params = sl.InitParameters()
init_params.depth_mode = sl.DEPTH_MODE.ULTRA  # High accuracy
init_params.coordinate_units = sl.UNIT.METER
init_params.depth_minimum_distance = 0.3

zed.open(init_params)

# Runtime parameters
runtime_params = sl.RuntimeParameters()
runtime_params.confidence_threshold = 100

# Capture depth and point cloud
depth_map = sl.Mat()
point_cloud = sl.Mat()

while True:
    if zed.grab(runtime_params) == sl.ERROR_CODE.SUCCESS:
        zed.retrieve_measure(depth_map, sl.MEASURE.DEPTH)
        zed.retrieve_measure(point_cloud, sl.MEASURE.XYZRGBA)

        # Access point cloud data
        pc_data = point_cloud.get_data()

3. LiDAR Point Cloud Processing

CUDA-PCL: 10x Acceleration

NVIDIA's CUDA-PCL library provides GPU-accelerated point cloud processing, achieving up to 10x speedup over CPU implementations:

Core Algorithms:

  • CUDA-ICP: Iterative Closest Point for point cloud alignment
  • CUDA-Segmentation: Ground removal using RANSAC fitting
  • CUDA-Filter: PassThrough and voxel grid filtering
// CUDA-PCL ICP Example
#include <cuda_icp/icp.h>

// Initialize CUDA ICP
cudaICP::ICP icp_solver;
icp_solver.setMaxIterations(50);
icp_solver.setTransformationEpsilon(1e-8);

// Input point clouds (GPU memory)
cudaICP::PointCloud::Ptr source_cloud(new cudaICP::PointCloud);
cudaICP::PointCloud::Ptr target_cloud(new cudaICP::PointCloud);

// Load point clouds to GPU
source_cloud->loadFromHost(source_points, num_source_points);
target_cloud->loadFromHost(target_points, num_target_points);

// Perform ICP alignment
Eigen::Matrix4f transformation;
icp_solver.align(source_cloud, target_cloud, transformation);

// Apply transformation
source_cloud->transform(transformation);

CUDA-PointPillars for 3D Object Detection

import cuda_pointpillars as cpp

# Initialize PointPillars model
detector = cpp.PointPillarsDetector(
    model_path="pointpillars.onnx",
    nms_threshold=0.5,
    score_threshold=0.3
)

# Process LiDAR point cloud
points = load_lidar_scan("velodyne_scan.bin")  # N x 4 (x, y, z, intensity)

# Run inference
detections = detector.detect(points)

for det in detections:
    print(f"Class: {det.class_name}, "
          f"Position: ({det.x:.2f}, {det.y:.2f}, {det.z:.2f}), "
          f"Dimensions: ({det.length:.2f}, {det.width:.2f}, {det.height:.2f}), "
          f"Confidence: {det.score:.2f}")

LiDAR Processing Pipeline

LIDAR PROCESSING PIPELINE
Raw LiDAR Points
CUDA-Filter
(Voxelize)
Ground Segmentation
↓ splits into ↓
Ground Points
CUDA-ICP
(Alignment)
Obstacles Points
Clustering
(DBSCAN)

4. Visual SLAM: ORB-SLAM3 and RTAB-Map

ORB-SLAM3 on Jetson

ORB-SLAM3 is the first real-time SLAM library capable of Visual, Visual-Inertial, and Multi-Map SLAM with monocular, stereo, and RGB-D cameras.

Key Features:

  • ORB feature extraction for robust tracking
  • Loop closure and relocalization
  • Multi-map session support
  • Visual-Inertial fusion
// ORB-SLAM3 initialization for Jetson
#include <System.h>

// Create SLAM system (RGBD mode)
ORB_SLAM3::System SLAM(
    "ORBvoc.txt",           // Vocabulary file
    "RealSense_D435i.yaml", // Camera calibration
    ORB_SLAM3::System::RGBD,
    true                     // Enable viewer
);

// Main tracking loop
while (true) {
    // Capture RGB-D frame
    cv::Mat rgb, depth;
    camera.capture(rgb, depth);

    double timestamp = get_timestamp();

    // Track and map
    cv::Mat Tcw = SLAM.TrackRGBD(rgb, depth, timestamp);

    // Get current map points
    std::vector<ORB_SLAM3::MapPoint*> map_points =
        SLAM.GetTrackedMapPoints();

    // Access pose
    if (!Tcw.empty()) {
        Eigen::Matrix4f pose = cv_to_eigen(Tcw);
        publish_pose(pose);
    }
}

// Save map on shutdown
SLAM.SaveTrajectoryTUM("trajectory.txt");
SLAM.Shutdown();

RTAB-Map Integration

RTAB-Map (Real-Time Appearance-Based Mapping) provides dense 3D reconstruction with memory management for long-term operation:

<!-- ROS 2 launch for RTAB-Map on Jetson -->
<launch>
  <node pkg="rtabmap_slam" exec="rtabmap" name="rtabmap">
    <param name="subscribe_depth" value="true"/>
    <param name="subscribe_rgb" value="true"/>
    <param name="subscribe_scan" value="true"/>

    <!-- Memory management for Jetson -->
    <param name="Mem/IncrementalMemory" value="true"/>
    <param name="Mem/STMSize" value="30"/>

    <!-- ORB-SLAM3 as visual odometry -->
    <param name="Odom/Strategy" value="1"/>

    <!-- Output dense point cloud -->
    <param name="Grid/FromDepth" value="true"/>
    <param name="Grid/3D" value="true"/>
  </node>
</launch>

Performance on Jetson (EuRoC Dataset)

Metric Jetson Nano Jetson Orin NX Jetson AGX Orin
Tracking FPS 15-20 30-45 60+
Loop Closure 2-3s 0.5-1s <0.3s
ATE Error 3-11 cm 2-8 cm 1-5 cm
Memory Usage 2.5 GB 3.5 GB 4.5 GB

5. Neural Radiance Fields (NeRF) on Edge

Challenges and Solutions

NeRF rendering on edge devices faces significant computational challenges. On Jetson AGX Orin, rendering a single 424x240 image takes approximately 660ms (2Hz)—far from real-time requirements.

Optimization Strategies:

  1. Two-Step Rendering: Generate renders at half resolution, then use FSRCNN for upsampling
  2. Baked NeRFs: Pre-compute and store intermediate representations
  3. Hybrid Approaches: Combine NeRF with faster representations
# NeRF-VINS: NeRF-based Visual-Inertial Navigation
import torch
from nerf_vins import NeRFVINS

# Initialize NeRF-VINS system
system = NeRFVINS(
    nerf_model_path="scene_nerf.pth",
    imu_noise_model="d435i_imu.yaml",
    render_resolution=(212, 140),  # Half resolution
    use_fsrcnn_upscale=True
)

# Main navigation loop
while running:
    # Get current IMU and image
    imu_data = imu.read()
    image = camera.capture()

    # NeRF-based localization with IMU fusion
    pose, uncertainty = system.localize(image, imu_data)

    # Render expected view from estimated pose
    expected_view = system.render_view(pose)

    # Compute photometric error for loop closure
    error = photometric_loss(image, expected_view)

    if error > threshold:
        system.trigger_relocalization()

RT-NeRF: Algorithm-Hardware Co-Design

RT-NeRF addresses two primary inefficiencies:

  1. Uniform point sampling along rays
  2. Dense embedding access patterns
RT-NeRF ACCELERATION
Standard NeRF:
Ray
192 Samples
MLP
Render
RT-NeRF (Adaptive):
Ray
Importance Sampling
32-64 Samples
Render
Sparse Embedding Access Pattern

6. Gaussian Splatting for Real-Time 3D

The 3DGS Revolution

3D Gaussian Splatting has emerged as the breakthrough technology for real-time 3D reconstruction, achieving rendering speeds of 30+ FPS at high quality—compared to NeRF's sub-1 FPS performance on equivalent hardware.

Key Advantages:

  • No neural network inference during rendering
  • Explicit, editable representation
  • Real-time rasterization
  • Compact storage (compared to dense voxels)

SplaTAM and GS-SLAM

SplaTAM introduced 3D Gaussian Splatting to SLAM, enabling dense RGB-D SLAM with real-time rendering:

# SplaTAM-style Gaussian SLAM
import torch
from gsplat import rasterize_gaussians

class GaussianSLAM:
    def __init__(self, config):
        self.gaussians = GaussianModel()
        self.optimizer = torch.optim.Adam(
            self.gaussians.parameters(),
            lr=config.lr
        )

    def track(self, rgb, depth, K):
        """Estimate camera pose via differentiable rendering"""
        pose = torch.eye(4, requires_grad=True)
        pose_optimizer = torch.optim.Adam([pose], lr=0.01)

        for _ in range(config.tracking_iters):
            # Render from current pose estimate
            rendered_rgb, rendered_depth = self.render(pose, K)

            # Photometric + geometric loss
            loss = (
                photometric_loss(rgb, rendered_rgb) +
                geometric_loss(depth, rendered_depth)
            )

            pose_optimizer.zero_grad()
            loss.backward()
            pose_optimizer.step()

        return pose.detach()

    def map(self, rgb, depth, pose, K):
        """Update Gaussian map with new observation"""
        for _ in range(config.mapping_iters):
            rendered_rgb, rendered_depth = self.render(pose, K)

            loss = reconstruction_loss(rgb, depth,
                                       rendered_rgb, rendered_depth)

            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()

            # Adaptive density control
            if should_densify():
                self.gaussians.densify_and_prune()

2025 State-of-the-Art: GSORB-SLAM

GSORB-SLAM integrates 3DGS with ORB features through a tightly coupled optimization pipeline:

GSORB-SLAM ARCHITECTURE
RGB-D Input
ORB Feature Extraction
↓ splits into ↓
Tracking
Thread
Local
BA
Loop
Closure
↓ merges into ↓
3D Gaussian Map with Transmittance
G_1
G_2
G_3
...

Robotics Applications in 2025

  • GaussNav (TPAMI 2025): Gaussian Splatting for visual navigation
  • HAMMER: Multi-robot collaborative Gaussian Splatting mapping
  • AgriGS-SLAM: Orchard mapping across seasons
  • GS-LIVM (ICCV 2025): LiDAR-Inertial-Visual Mapping with Gaussians

7. Multi-Sensor Fusion (Camera + LiDAR + IMU)

Why Fusion is Essential

Each sensor has inherent limitations:

  • Cameras: Fail in low light, glare, featureless textures
  • LiDAR: Struggles in structureless environments (open water, fog)
  • IMU: Subject to drift and bias accumulation
  • GPS: Unavailable indoors, tunnels, urban canyons

Multi-sensor fusion provides redundancy and robustness across diverse conditions.

Fusion Architectures

MULTI-SENSOR FUSION ARCHITECTURE
Stereo Camera
LiDAR Scanner
IMU
Visual Odometry
LiDAR Odometry
IMU Pre-integration
↓ merges into ↓
Extended Kalman Filter /
Factor Graph Optimization
↓ outputs ↓
Fused Pose
3D Map
Object Tracks

LVIO-SAM Implementation

// LVIO-SAM: LiDAR-Visual-Inertial Odometry via SAM
#include <gtsam/navigation/CombinedImuFactor.h>
#include <gtsam/slam/BetweenFactor.h>

class LVIO_SAM {
public:
    void processIMU(const ImuMsg& imu_msg) {
        // Pre-integrate IMU measurements
        preintegrator_->integrateMeasurement(
            imu_msg.linear_acceleration,
            imu_msg.angular_velocity,
            imu_msg.dt
        );
    }

    void processLiDAR(const PointCloud& cloud) {
        // Deskew point cloud using IMU data
        PointCloud deskewed = deskewCloud(cloud, preintegrator_);

        // Extract features and compute LiDAR odometry
        lidar_odom_ = computeLiDARFeatures(deskewed);
    }

    void processImage(const cv::Mat& image) {
        // Visual feature tracking
        visual_odom_ = trackFeatures(image);
    }

    void optimize() {
        // Build factor graph
        gtsam::NonlinearFactorGraph graph;

        // Add IMU factor
        graph.add(gtsam::CombinedImuFactor(
            X(i-1), V(i-1), X(i), V(i), B(i-1), B(i),
            preintegrator_->preintegratedMeasurements()
        ));

        // Add LiDAR odometry factor
        graph.add(gtsam::BetweenFactor<Pose3>(
            X(i-1), X(i), lidar_odom_, lidar_noise_
        ));

        // Add visual odometry factor
        graph.add(gtsam::BetweenFactor<Pose3>(
            X(i-1), X(i), visual_odom_, visual_noise_
        ));

        // Optimize
        gtsam::LevenbergMarquardtOptimizer optimizer(graph, initial_);
        result_ = optimizer.optimize();
    }
};

8. Real-Time Mesh Reconstruction

Voxel Hashing for Scalable Reconstruction

Voxel hashing enables infinite-scale real-time reconstruction by storing only occupied voxels:

// Voxel hashing for real-time reconstruction
struct VoxelBlock {
    float tsdf[8][8][8];       // Truncated SDF values
    float weight[8][8][8];     // Integration weights
    uchar3 color[8][8][8];     // Color per voxel
};

class VoxelHashMap {
public:
    __device__ void integrate(
        const float* depth_image,
        const float* K,
        const float* pose,
        int width, int height
    ) {
        // Get voxel position
        int3 voxel_pos = blockIdx * blockDim + threadIdx;
        float3 world_pos = voxelToWorld(voxel_pos);

        // Project to image
        float3 cam_pos = transformPoint(world_pos, pose);
        float2 pixel = project(cam_pos, K);

        if (isValid(pixel, width, height)) {
            float depth = depth_image[int(pixel.y) * width + int(pixel.x)];
            float sdf = depth - cam_pos.z;

            // Truncate SDF
            float tsdf = fminf(1.0f, sdf / truncation_distance);
            tsdf = fmaxf(-1.0f, tsdf);

            // Weighted running average
            float old_tsdf = voxel_block->tsdf[local_idx];
            float old_weight = voxel_block->weight[local_idx];

            float new_weight = old_weight + 1.0f;
            float new_tsdf = (old_tsdf * old_weight + tsdf) / new_weight;

            voxel_block->tsdf[local_idx] = new_tsdf;
            voxel_block->weight[local_idx] = fminf(new_weight, max_weight);
        }
    }

    void extractMesh(std::vector<Triangle>& mesh) {
        // Marching cubes on GPU
        launchMarchingCubes(voxel_blocks_, mesh);
    }
};

Nvblox Mesh Generation

# Nvblox mesh extraction with ROS 2
import rclpy
from rclpy.node import Node
from nvblox_msgs.srv import FilePath

class MeshExtractor(Node):
    def __init__(self):
        super().__init__('mesh_extractor')

        # Service client for mesh saving
        self.save_mesh_client = self.create_client(
            FilePath, '/nvblox_node/save_mesh'
        )

    def save_mesh(self, filepath):
        request = FilePath.Request()
        request.file_path = filepath

        future = self.save_mesh_client.call_async(request)
        rclpy.spin_until_future_complete(self, future)

        return future.result().success

9. CUDA-Accelerated 3D Processing

Cupoch: GPU-Accelerated Robotics Library

Cupoch provides CUDA-accelerated 3D data processing specifically designed for robotics:

import cupoch as cph

# Load point cloud
pcd = cph.io.read_point_cloud("scene.ply")

# GPU-accelerated voxel downsampling
pcd_downsampled = pcd.voxel_down_sample(voxel_size=0.02)

# GPU-accelerated ICP registration
source = cph.io.read_point_cloud("source.ply")
target = cph.io.read_point_cloud("target.ply")

# Compute normals on GPU
source.estimate_normals()
target.estimate_normals()

# Point-to-plane ICP
reg_result = cph.registration.registration_icp(
    source, target,
    max_correspondence_distance=0.05,
    estimation_method=cph.registration.TransformationEstimationPointToPlane()
)

print(f"Fitness: {reg_result.fitness}")
print(f"RMSE: {reg_result.inlier_rmse}")
print(f"Transformation:\n{reg_result.transformation}")

# GPU-accelerated collision detection
collision_manager = cph.collision.CollisionManager()
collision_manager.add_object("robot", robot_mesh)
collision_manager.add_object("obstacle", obstacle_cloud)

is_collision = collision_manager.check_collision("robot", "obstacle")

Custom CUDA Kernels for 3D Processing

// Custom CUDA kernel for point cloud transformation
__global__ void transformPointCloud(
    float3* points,
    int num_points,
    const float* transformation  // 4x4 matrix
) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx >= num_points) return;

    float3 p = points[idx];

    // Apply transformation
    float x = transformation[0] * p.x + transformation[1] * p.y +
              transformation[2] * p.z + transformation[3];
    float y = transformation[4] * p.x + transformation[5] * p.y +
              transformation[6] * p.z + transformation[7];
    float z = transformation[8] * p.x + transformation[9] * p.y +
              transformation[10] * p.z + transformation[11];

    points[idx] = make_float3(x, y, z);
}

// CUDA kernel for radius search
__global__ void radiusSearch(
    const float3* query_points,
    const float3* target_points,
    int* neighbor_indices,
    int num_queries,
    int num_targets,
    float radius
) {
    int query_idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (query_idx >= num_queries) return;

    float3 query = query_points[query_idx];
    float radius_sq = radius * radius;
    int neighbor_count = 0;

    for (int i = 0; i < num_targets; i++) {
        float3 target = target_points[i];
        float dx = query.x - target.x;
        float dy = query.y - target.y;
        float dz = query.z - target.z;
        float dist_sq = dx*dx + dy*dy + dz*dz;

        if (dist_sq < radius_sq) {
            neighbor_indices[query_idx * MAX_NEIGHBORS + neighbor_count] = i;
            neighbor_count++;

            if (neighbor_count >= MAX_NEIGHBORS) break;
        }
    }
}

10. Isaac ROS Integration for 3D Perception

Isaac ROS Architecture

Isaac ROS provides a complete perception stack optimized for Jetson:

ISAAC ROS PERCEPTION STACK
Isaac Perceptor
cuVSLAM
(VSLAM)
nvblox
(3D Recon)
Foundation
Stereo
NITROS Transport
(Zero-copy GPU-accelerated messaging)
Nav2 Integration
Costmap
Server
Planner
Controller

cuVSLAM Configuration

# isaac_ros_visual_slam configuration
/**:
  ros__parameters:
    # Input configuration
    enable_image_denoising: false
    rectified_images: true

    # SLAM parameters
    enable_slam_visualization: true
    enable_observations_view: true
    enable_landmarks_view: true

    # Performance tuning for Jetson
    num_cameras: 2
    min_num_images: 2
    enable_imu_fusion: true
    gyro_noise_density: 0.000244
    gyro_random_walk: 0.000019393
    accel_noise_density: 0.001862
    accel_random_walk: 0.003

    # Output
    map_frame: "map"
    odom_frame: "odom"
    base_frame: "base_link"

nvblox for 3D Reconstruction

# ROS 2 Python node using nvblox
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from geometry_msgs.msg import PoseStamped
from nvblox_msgs.msg import DistanceMapSlice

class NvbloxNavigator(Node):
    def __init__(self):
        super().__init__('nvblox_navigator')

        # Subscribe to nvblox outputs
        self.esdf_sub = self.create_subscription(
            DistanceMapSlice,
            '/nvblox_node/static_esdf_slice',
            self.esdf_callback,
            10
        )

        self.costmap_sub = self.create_subscription(
            OccupancyGrid,
            '/nvblox_node/static_map_slice',
            self.costmap_callback,
            10
        )

    def esdf_callback(self, msg):
        """Process Euclidean Signed Distance Field"""
        # ESDF provides distance to nearest obstacle
        # Use for gradient-based path planning
        esdf_data = np.array(msg.data).reshape(
            msg.height, msg.width
        )

        # Find collision-free paths
        gradient = np.gradient(esdf_data)

    def costmap_callback(self, msg):
        """Process 2D costmap for Nav2"""
        # Direct integration with Nav2 costmap
        pass

Performance Benchmarks

Component Jetson Orin NX Jetson AGX Orin CPU Baseline
cuVSLAM 60 FPS 90+ FPS 15 FPS
nvblox 30 FPS 60 FPS 0.6 FPS
FoundationStereo 15 FPS 30 FPS 2 FPS
Combined Pipeline 20 FPS 45 FPS N/A

11. Occupancy Grids and Voxel Mapping

OHM Library: GPU Occupancy Mapping

The OHM library provides probabilistic occupancy mapping with GPU acceleration:

#include <ohm/OccupancyMap.h>
#include <ohmgpu/GpuMap.h>

// Initialize GPU-accelerated occupancy map
ohm::OccupancyMap map(0.1);  // 10cm resolution
ohm::GpuMap gpu_map(&map);

// Configure ray casting
ohm::RayMapperConfig config;
config.miss_value = -1.0f;  // Log-odds decrease for miss
config.hit_value = 2.0f;    // Log-odds increase for hit

// Process LiDAR rays on GPU
std::vector<ohm::Ray> rays;
for (const auto& point : lidar_scan) {
    rays.push_back(ohm::Ray(sensor_origin, point));
}

// GPU-accelerated ray integration
gpu_map.integrateRays(rays.data(), rays.size(), config);

// Extract occupied voxels
std::vector<Eigen::Vector3d> occupied_voxels;
for (auto iter = map.begin(); iter != map.end(); ++iter) {
    if (map.occupancy(*iter) > 0.5f) {
        occupied_voxels.push_back(map.voxelCentre(*iter));
    }
}

3D Occupancy Prediction with Neural Networks

# Neural network-based 3D occupancy prediction
import torch
import torch.nn as nn

class OccupancyNetwork(nn.Module):
    def __init__(self, voxel_size=0.2, grid_size=(200, 200, 16)):
        super().__init__()
        self.voxel_size = voxel_size
        self.grid_size = grid_size

        # Image backbone
        self.backbone = EfficientNet.from_pretrained('efficientnet-b4')

        # View transformer (image to BEV)
        self.view_transform = LSSViewTransform(
            in_channels=1792,
            out_channels=64,
            grid_size=grid_size
        )

        # 3D occupancy head
        self.occupancy_head = nn.Sequential(
            nn.Conv3d(64, 128, 3, padding=1),
            nn.BatchNorm3d(128),
            nn.ReLU(),
            nn.Conv3d(128, 64, 3, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(),
            nn.Conv3d(64, 17, 1)  # 16 classes + free space
        )

    def forward(self, images, intrinsics, extrinsics):
        # Extract image features
        B, N, C, H, W = images.shape
        features = self.backbone(images.view(B*N, C, H, W))
        features = features.view(B, N, -1, H//32, W//32)

        # Transform to 3D voxel space
        voxel_features = self.view_transform(
            features, intrinsics, extrinsics
        )

        # Predict occupancy
        occupancy = self.occupancy_head(voxel_features)

        return occupancy

Occupancy Grid Integration with Nav2

# Nav2 costmap configuration with nvblox
costmap:
  ros__parameters:
    global_frame: map
    robot_base_frame: base_link
    update_frequency: 10.0
    publish_frequency: 5.0

    # Use nvblox as primary obstacle source
    plugins: ["nvblox_layer", "inflation_layer"]

    nvblox_layer:
      plugin: "nvblox::NvbloxCostmapLayer"
      enabled: true
      nvblox_map_slice_topic: "/nvblox_node/static_map_slice"

    inflation_layer:
      plugin: "nav2_costmap_2d::InflationLayer"
      cost_scaling_factor: 3.0
      inflation_radius: 0.55

12. Complete Implementation Pipeline

End-to-End 3D Perception System

COMPLETE 3D PERCEPTION PIPELINE FOR JETSON
Stereo Camera
RGB-D Camera
LiDAR Scanner
IMU
FoundationStereo
Depth Denoise
CUDA-PCL Filter
Pre-Integrate
↓ merges into ↓
cuVSLAM
(Visual-Inertial SLAM)
↓ branches into ↓
nvblox
(ESDF)
GS-SLAM
(Splats)
Mesh
Recon
↓ consolidates to ↓
OUTPUTS
2D Costmap for Nav2 3D Point Cloud ESDF Heatmap Triangle Mesh Semantic Labels Object Detections Robot Pose Loop Closure Events

ROS 2 Launch File

# complete_perception.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import IncludeLaunchDescription

def generate_launch_description():
    return LaunchDescription([
        # cuVSLAM for visual SLAM
        Node(
            package='isaac_ros_visual_slam',
            executable='visual_slam_node',
            parameters=[{
                'enable_imu_fusion': True,
                'num_cameras': 2,
            }],
            remappings=[
                ('stereo_camera/left/image', '/zed/left/image_rect_color'),
                ('stereo_camera/right/image', '/zed/right/image_rect_color'),
                ('imu', '/zed/imu/data'),
            ]
        ),

        # nvblox for 3D reconstruction
        Node(
            package='isaac_ros_nvblox',
            executable='nvblox_node',
            parameters=[{
                'global_frame': 'map',
                'voxel_size': 0.05,
                'esdf_2d_min_height': 0.1,
                'esdf_2d_max_height': 1.5,
            }],
            remappings=[
                ('depth/image', '/zed/depth/image_rect'),
                ('depth/camera_info', '/zed/depth/camera_info'),
                ('pose', '/visual_slam/tracking/odometry'),
            ]
        ),

        # CUDA-PCL for LiDAR processing
        Node(
            package='cuda_pcl_ros',
            executable='cuda_pcl_node',
            parameters=[{
                'voxel_size': 0.1,
                'ground_removal': True,
                'ransac_threshold': 0.1,
            }],
            remappings=[
                ('points_in', '/velodyne_points'),
                ('points_out', '/filtered_points'),
            ]
        ),

        # Point cloud to occupancy grid
        Node(
            package='pointcloud_to_laserscan',
            executable='pointcloud_to_laserscan_node',
            parameters=[{
                'min_height': 0.1,
                'max_height': 2.0,
            }],
            remappings=[
                ('cloud_in', '/filtered_points'),
                ('scan', '/scan'),
            ]
        ),
    ])

Performance Optimization Tips

  1. Use NITROS for zero-copy transfer: Avoid unnecessary CPU-GPU memory copies
  2. Batch processing: Process multiple frames together when possible
  3. Resolution management: Use adaptive resolution based on computational budget
  4. Memory pooling: Pre-allocate GPU memory to avoid allocation overhead
  5. Async processing: Pipeline different stages to maximize throughput
# Example: Optimized async processing pipeline
import asyncio
from concurrent.futures import ThreadPoolExecutor

class OptimizedPipeline:
    def __init__(self):
        self.executor = ThreadPoolExecutor(max_workers=4)
        self.depth_queue = asyncio.Queue(maxsize=3)
        self.slam_queue = asyncio.Queue(maxsize=3)

    async def process_frame(self, rgb, depth, imu):
        # Run depth processing and SLAM tracking in parallel
        depth_task = asyncio.get_event_loop().run_in_executor(
            self.executor, self.process_depth, depth
        )
        slam_task = asyncio.get_event_loop().run_in_executor(
            self.executor, self.track_slam, rgb, imu
        )

        processed_depth, pose = await asyncio.gather(
            depth_task, slam_task
        )

        # Update map with processed data
        await self.update_map(processed_depth, pose)

Conclusion

3D reconstruction for robotics on NVIDIA Jetson platforms has reached an inflection point in 2025-2026. The convergence of powerful edge hardware (Jetson AGX Thor with 2070 TFLOPS), advanced algorithms (FoundationStereo, Gaussian Splatting SLAM), and comprehensive software stacks (Isaac ROS, nvblox, cuVSLAM) enables capabilities previously restricted to high-end workstations.

Key takeaways:

  1. Stereo depth estimation has been revolutionized by foundation models like FoundationStereo, providing zero-shot generalization across diverse environments.

  2. Gaussian Splatting is emerging as the preferred representation for dense SLAM, offering real-time rendering and explicit editability that NeRF cannot match on edge devices.

  3. Multi-sensor fusion remains essential for robust operation, with tight integration of cameras, LiDAR, and IMUs providing redundancy and accuracy.

  4. CUDA acceleration is non-negotiable for real-time performance, with libraries like CUDA-PCL, nvblox, and cuVSLAM providing 10-100x speedups over CPU implementations.

  5. Isaac ROS provides the production-ready framework for deploying these technologies, with optimizations specifically targeting Jetson hardware.

The future points toward even tighter integration of perception with action, as embodied AI systems leverage these 3D understanding capabilities for manipulation, navigation, and human-robot interaction.


References and Further Reading

Research Papers

  • FoundationStereo: Zero-Shot Stereo Matching (CVPR 2025 Best Paper Nomination)
  • ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
  • 3D Gaussian Splatting for Real-Time Radiance Field Rendering (SIGGRAPH 2023)
  • SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
  • GSORB-SLAM: Gaussian Splatting SLAM with ORB Features

NVIDIA Resources

GitHub Repositories

Industry Applications

  • Boston Dynamics - VPI on Jetson for perception algorithms
  • Leopard Imaging - Eagle series stereo cameras with Jetson Thor
  • Amazon Robotics - Early Jetson Thor evaluation pilots

This technical guide is part of Koca Ventures' research series on edge AI and robotics. For consulting or implementation support, contact our technical team.

Contact Us for Edge AI Solutions
Share this article: