3D Reconstruction for Robotics on NVIDIA Jetson: A Comprehensive Technical Guide

Published: January 2026 Author: Koca Ventures Technical Research Reading Time: 25 minutes

Plain English Summary

What is 3D Reconstruction?

Imagine a robot trying to navigate a warehouse. Unlike humans who can instantly see depth and obstacles, robots need to build a 3D map of their surroundings from camera and sensor data. This process of creating a 3D model of the real world is called 3D reconstruction.

Why do robots need this?

Robot Task	Why 3D Matters
Navigation	Know where walls, shelves, and people are
Grasping Objects	Understand exact shape and position of items
Collision Avoidance	Don't crash into things!
Mapping	Remember where everything is for next time

The main technologies:

Technology	What It Does	Best For
Stereo Vision	Two cameras = depth perception (like human eyes)	Indoor robots
LiDAR	Laser scanning = precise distance	Outdoor vehicles
SLAM	Simultaneous mapping and self-location	Mobile robots
Gaussian Splatting	New AI-based 3D reconstruction	High-quality scenes

Real-world performance on Jetson:

Feature	Speed	Device
Depth Estimation	30 FPS	Orin NX
Visual SLAM	60 FPS	AGX Orin
3D Mapping	Real-time	AGX Orin

What will you learn?

How stereo cameras calculate depth (like your two eyes do)
Processing LiDAR point clouds with GPU acceleration
Building maps while the robot moves (SLAM)
Using NVIDIA's Isaac ROS for robotics
Creating detailed 3D models with Gaussian Splatting

The bottom line: For any robot that needs to understand its environment, 3D reconstruction is essential. This guide covers everything from basic depth sensing to cutting-edge neural 3D methods.

Executive Summary

3D reconstruction has become the cornerstone of modern robotic perception, enabling autonomous systems to understand, navigate, and interact with complex real-world environments. NVIDIA's Jetson platform family—from the compact Orin Nano to the powerhouse Jetson AGX Thor—provides the perfect balance of computational capability, power efficiency, and edge deployment readiness for these demanding applications.

This comprehensive guide explores the state-of-the-art in 3D reconstruction for robotics, covering stereo vision, LiDAR processing, Visual SLAM, Neural Radiance Fields (NeRF), Gaussian Splatting, multi-sensor fusion, and NVIDIA's Isaac ROS ecosystem. We provide implementation insights, performance benchmarks, and practical code examples for deploying these technologies on Jetson hardware.

Jetson Platform Overview for 3D Perception
Stereo Vision and Depth Estimation
LiDAR Point Cloud Processing
Visual SLAM: ORB-SLAM3 and RTAB-Map
Neural Radiance Fields (NeRF) on Edge
Gaussian Splatting for Real-Time 3D
Multi-Sensor Fusion
Real-Time Mesh Reconstruction
CUDA-Accelerated 3D Processing
Isaac ROS Integration
Occupancy Grids and Voxel Mapping
Implementation Pipeline

1. Jetson Platform Overview for 3D Perception

Hardware Specifications Comparison

The NVIDIA Jetson family provides a spectrum of compute capabilities for robotics applications:

Specification	Jetson Orin Nano	Jetson Orin NX	Jetson AGX Orin	Jetson AGX Thor
CUDA Cores	512-1024	1024	2048	2560
Tensor Cores	16	32	64	96
AI Performance	40-67 TOPS	100 TOPS	275 TOPS	2070 TFLOPS (FP4)
Memory	4-8 GB	8-16 GB	32-64 GB	128 GB
Memory BW	~68 GB/s	~102 GB/s	204.8 GB/s	273 GB/s
Power	7-15W	10-25W	15-60W	130W

The newly released Jetson AGX Thor represents a paradigm shift, offering 7.5x higher AI compute and 3.5x better energy efficiency compared to AGX Orin. Built on the Blackwell GPU architecture, Thor delivers 2070 FP4 TFLOPS—sufficient for running advanced generative AI models and multi-sensor perception pipelines simultaneously.

Architecture Overview

JETSON 3D PERCEPTION STACK

Sensors

Stereo Camera

LiDAR Scanner

IMU Unit

↓

SENSOR FUSION LAYER

cuVSLAM, CUDA-PCL, Kalman Filtering

↓

3D RECONSTRUCTION ENGINE

nvblox
(Voxels)

GS-SLAM
(Splats)

NeRF
(Neural)

↓

OUTPUT REPRESENTATIONS

ESDF Maps Meshes Occupancy Grids Point Clouds Costmaps Semantics

2. Stereo Vision and Depth Estimation

FoundationStereo: CVPR 2025 Best Paper Nomination

NVIDIA's FoundationStereo represents the state-of-the-art in zero-shot stereo matching, achieving first place on both the Middlebury and ETH3D leaderboards. This foundation model is designed for strong generalization across diverse real-world scenarios.

Key Architecture Components:

Side-Tuning Adapter (STA): Adapts rich monocular priors from a frozen DepthAnythingV2 model while combining fine-grained high-frequency features from multi-level CNNs.
Attentive Hybrid Cost Filtering (AHCF): Combines Axial-Planar Convolution (APC) filtering with a Disparity Transformer module for effective feature aggregation over the 4D hybrid cost volume.
GRU Refinement: Initial disparity predictions are refined through GRU blocks for sub-pixel accuracy.

# FoundationStereo inference on Jetson
import torch
from foundation_stereo import FoundationStereo

# Load model (tested on Jetson Orin)
model = FoundationStereo.from_pretrained('nvidia/foundation-stereo')
model = model.cuda().eval()

# Process stereo pair
with torch.no_grad():
    left_img = load_image('left.png').cuda()
    right_img = load_image('right.png').cuda()

    # Output: dense disparity map
    disparity = model(left_img, right_img)

    # Convert to metric depth (requires baseline and focal length)
    depth = (baseline * focal_length) / disparity

    # Generate 3D point cloud
    point_cloud = depth_to_pointcloud(depth, camera_intrinsics)

Real-Time Stereo with VPI

NVIDIA's Vision Programming Interface (VPI) provides CUDA-accelerated stereo matching optimized for Jetson:

import vpi
import numpy as np

# Create stereo disparity estimator
stereo = vpi.StereoDisparityEstimator(
    width=1280, height=720,
    max_disparity=128,
    window_size=5,
    backend=vpi.Backend.CUDA
)

# Process stereo pair
with vpi.Backend.CUDA:
    left_vpi = vpi.asimage(left_array)
    right_vpi = vpi.asimage(right_array)

    disparity = stereo(left_vpi, right_vpi)

    # Convert to NumPy for further processing
    disparity_np = disparity.cpu().numpy()

ZED SDK Integration

Stereolabs ZED cameras provide optimized SDK support for Jetson:

import pyzed.sl as sl

# Initialize ZED camera
zed = sl.Camera()
init_params = sl.InitParameters()
init_params.depth_mode = sl.DEPTH_MODE.ULTRA  # High accuracy
init_params.coordinate_units = sl.UNIT.METER
init_params.depth_minimum_distance = 0.3

zed.open(init_params)

# Runtime parameters
runtime_params = sl.RuntimeParameters()
runtime_params.confidence_threshold = 100

# Capture depth and point cloud
depth_map = sl.Mat()
point_cloud = sl.Mat()

while True:
    if zed.grab(runtime_params) == sl.ERROR_CODE.SUCCESS:
        zed.retrieve_measure(depth_map, sl.MEASURE.DEPTH)
        zed.retrieve_measure(point_cloud, sl.MEASURE.XYZRGBA)

        # Access point cloud data
        pc_data = point_cloud.get_data()

3. LiDAR Point Cloud Processing

CUDA-PCL: 10x Acceleration

NVIDIA's CUDA-PCL library provides GPU-accelerated point cloud processing, achieving up to 10x speedup over CPU implementations:

Core Algorithms:

CUDA-ICP: Iterative Closest Point for point cloud alignment
CUDA-Segmentation: Ground removal using RANSAC fitting
CUDA-Filter: PassThrough and voxel grid filtering

// CUDA-PCL ICP Example
#include <cuda_icp/icp.h>

// Initialize CUDA ICP
cudaICP::ICP icp_solver;
icp_solver.setMaxIterations(50);
icp_solver.setTransformationEpsilon(1e-8);

// Input point clouds (GPU memory)
cudaICP::PointCloud::Ptr source_cloud(new cudaICP::PointCloud);
cudaICP::PointCloud::Ptr target_cloud(new cudaICP::PointCloud);

// Load point clouds to GPU
source_cloud->loadFromHost(source_points, num_source_points);
target_cloud->loadFromHost(target_points, num_target_points);

// Perform ICP alignment
Eigen::Matrix4f transformation;
icp_solver.align(source_cloud, target_cloud, transformation);

// Apply transformation
source_cloud->transform(transformation);

CUDA-PointPillars for 3D Object Detection

import cuda_pointpillars as cpp

# Initialize PointPillars model
detector = cpp.PointPillarsDetector(
    model_path="pointpillars.onnx",
    nms_threshold=0.5,
    score_threshold=0.3
)

# Process LiDAR point cloud
points = load_lidar_scan("velodyne_scan.bin")  # N x 4 (x, y, z, intensity)

# Run inference
detections = detector.detect(points)

for det in detections:
    print(f"Class: {det.class_name}, "
          f"Position: ({det.x:.2f}, {det.y:.2f}, {det.z:.2f}), "
          f"Dimensions: ({det.length:.2f}, {det.width:.2f}, {det.height:.2f}), "
          f"Confidence: {det.score:.2f}")

LiDAR Processing Pipeline

LIDAR PROCESSING PIPELINE

Raw LiDAR Points

→

CUDA-Filter

(Voxelize)

→

Ground Segmentation

↓ splits into ↓

Ground Points

↓

CUDA-ICP

(Alignment)

Obstacles Points

↓

Clustering

(DBSCAN)

4. Visual SLAM: ORB-SLAM3 and RTAB-Map

ORB-SLAM3 on Jetson

ORB-SLAM3 is the first real-time SLAM library capable of Visual, Visual-Inertial, and Multi-Map SLAM with monocular, stereo, and RGB-D cameras.

Key Features:

ORB feature extraction for robust tracking
Loop closure and relocalization
Multi-map session support
Visual-Inertial fusion

// ORB-SLAM3 initialization for Jetson
#include <System.h>

// Create SLAM system (RGBD mode)
ORB_SLAM3::System SLAM(
    "ORBvoc.txt",           // Vocabulary file
    "RealSense_D435i.yaml", // Camera calibration
    ORB_SLAM3::System::RGBD,
    true                     // Enable viewer
);

// Main tracking loop
while (true) {
    // Capture RGB-D frame
    cv::Mat rgb, depth;
    camera.capture(rgb, depth);

    double timestamp = get_timestamp();

    // Track and map
    cv::Mat Tcw = SLAM.TrackRGBD(rgb, depth, timestamp);

    // Get current map points
    std::vector<ORB_SLAM3::MapPoint*> map_points =
        SLAM.GetTrackedMapPoints();

    // Access pose
    if (!Tcw.empty()) {
        Eigen::Matrix4f pose = cv_to_eigen(Tcw);
        publish_pose(pose);
    }
}

// Save map on shutdown
SLAM.SaveTrajectoryTUM("trajectory.txt");
SLAM.Shutdown();

RTAB-Map Integration

RTAB-Map (Real-Time Appearance-Based Mapping) provides dense 3D reconstruction with memory management for long-term operation:

<!-- ROS 2 launch for RTAB-Map on Jetson -->
<launch>
  <node pkg="rtabmap_slam" exec="rtabmap" name="rtabmap">
    <param name="subscribe_depth" value="true"/>
    <param name="subscribe_rgb" value="true"/>
    <param name="subscribe_scan" value="true"/>

    <!-- Memory management for Jetson -->
    <param name="Mem/IncrementalMemory" value="true"/>
    <param name="Mem/STMSize" value="30"/>

    <!-- ORB-SLAM3 as visual odometry -->
    <param name="Odom/Strategy" value="1"/>

    <!-- Output dense point cloud -->
    <param name="Grid/FromDepth" value="true"/>
    <param name="Grid/3D" value="true"/>
  </node>
</launch>

Performance on Jetson (EuRoC Dataset)

Metric	Jetson Nano	Jetson Orin NX	Jetson AGX Orin
Tracking FPS	15-20	30-45	60+
Loop Closure	2-3s	0.5-1s	<0.3s
ATE Error	3-11 cm	2-8 cm	1-5 cm
Memory Usage	2.5 GB	3.5 GB	4.5 GB

5. Neural Radiance Fields (NeRF) on Edge

Challenges and Solutions

NeRF rendering on edge devices faces significant computational challenges. On Jetson AGX Orin, rendering a single 424x240 image takes approximately 660ms (2Hz)—far from real-time requirements.

Optimization Strategies:

Two-Step Rendering: Generate renders at half resolution, then use FSRCNN for upsampling
Baked NeRFs: Pre-compute and store intermediate representations
Hybrid Approaches: Combine NeRF with faster representations

# NeRF-VINS: NeRF-based Visual-Inertial Navigation
import torch
from nerf_vins import NeRFVINS

# Initialize NeRF-VINS system
system = NeRFVINS(
    nerf_model_path="scene_nerf.pth",
    imu_noise_model="d435i_imu.yaml",
    render_resolution=(212, 140),  # Half resolution
    use_fsrcnn_upscale=True
)

# Main navigation loop
while running:
    # Get current IMU and image
    imu_data = imu.read()
    image = camera.capture()

    # NeRF-based localization with IMU fusion
    pose, uncertainty = system.localize(image, imu_data)

    # Render expected view from estimated pose
    expected_view = system.render_view(pose)

    # Compute photometric error for loop closure
    error = photometric_loss(image, expected_view)

    if error > threshold:
        system.trigger_relocalization()

RT-NeRF: Algorithm-Hardware Co-Design

RT-NeRF addresses two primary inefficiencies:

Uniform point sampling along rays
Dense embedding access patterns

RT-NeRF ACCELERATION

Standard NeRF:

Ray

→

192 Samples

→

MLP

→

Render

RT-NeRF (Adaptive):

Ray

→

Importance Sampling

→

32-64 Samples

→

Render

↓

Sparse Embedding Access Pattern

6. Gaussian Splatting for Real-Time 3D

The 3DGS Revolution

3D Gaussian Splatting has emerged as the breakthrough technology for real-time 3D reconstruction, achieving rendering speeds of 30+ FPS at high quality—compared to NeRF's sub-1 FPS performance on equivalent hardware.

Key Advantages:

No neural network inference during rendering
Explicit, editable representation
Real-time rasterization
Compact storage (compared to dense voxels)

SplaTAM and GS-SLAM

SplaTAM introduced 3D Gaussian Splatting to SLAM, enabling dense RGB-D SLAM with real-time rendering:

# SplaTAM-style Gaussian SLAM
import torch
from gsplat import rasterize_gaussians

class GaussianSLAM:
    def __init__(self, config):
        self.gaussians = GaussianModel()
        self.optimizer = torch.optim.Adam(
            self.gaussians.parameters(),
            lr=config.lr
        )

    def track(self, rgb, depth, K):
        """Estimate camera pose via differentiable rendering"""
        pose = torch.eye(4, requires_grad=True)
        pose_optimizer = torch.optim.Adam([pose], lr=0.01)

        for _ in range(config.tracking_iters):
            # Render from current pose estimate
            rendered_rgb, rendered_depth = self.render(pose, K)

            # Photometric + geometric loss
            loss = (
                photometric_loss(rgb, rendered_rgb) +
                geometric_loss(depth, rendered_depth)
            )

            pose_optimizer.zero_grad()
            loss.backward()
            pose_optimizer.step()

        return pose.detach()

    def map(self, rgb, depth, pose, K):
        """Update Gaussian map with new observation"""
        for _ in range(config.mapping_iters):
            rendered_rgb, rendered_depth = self.render(pose, K)

            loss = reconstruction_loss(rgb, depth,
                                       rendered_rgb, rendered_depth)

            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()

            # Adaptive density control
            if should_densify():
                self.gaussians.densify_and_prune()

2025 State-of-the-Art: GSORB-SLAM

GSORB-SLAM integrates 3DGS with ORB features through a tightly coupled optimization pipeline:

GSORB-SLAM ARCHITECTURE

RGB-D Input

↓

ORB Feature Extraction

↓ splits into ↓

Tracking

Thread

Local

Loop

Closure

↓ merges into ↓

3D Gaussian Map with Transmittance

G_1

G_2

G_3

...

Robotics Applications in 2025

GaussNav (TPAMI 2025): Gaussian Splatting for visual navigation
HAMMER: Multi-robot collaborative Gaussian Splatting mapping
AgriGS-SLAM: Orchard mapping across seasons
GS-LIVM (ICCV 2025): LiDAR-Inertial-Visual Mapping with Gaussians

7. Multi-Sensor Fusion (Camera + LiDAR + IMU)

Why Fusion is Essential

Each sensor has inherent limitations:

Cameras: Fail in low light, glare, featureless textures
LiDAR: Struggles in structureless environments (open water, fog)
IMU: Subject to drift and bias accumulation
GPS: Unavailable indoors, tunnels, urban canyons

Multi-sensor fusion provides redundancy and robustness across diverse conditions.

Fusion Architectures

MULTI-SENSOR FUSION ARCHITECTURE

Input Sensors

Stereo Camera

LiDAR Scanner

IMU

↓

Odometry Processing

Visual Odometry

LiDAR Odometry

IMU Pre-integration

↓ merges into ↓

Extended Kalman Filter /

Factor Graph Optimization

↓ outputs ↓

Outputs

Fused Pose

3D Map

Object Tracks

LVIO-SAM Implementation

// LVIO-SAM: LiDAR-Visual-Inertial Odometry via SAM
#include <gtsam/navigation/CombinedImuFactor.h>
#include <gtsam/slam/BetweenFactor.h>

class LVIO_SAM {
public:
    void processIMU(const ImuMsg& imu_msg) {
        // Pre-integrate IMU measurements
        preintegrator_->integrateMeasurement(
            imu_msg.linear_acceleration,
            imu_msg.angular_velocity,
            imu_msg.dt
        );
    }

    void processLiDAR(const PointCloud& cloud) {
        // Deskew point cloud using IMU data
        PointCloud deskewed = deskewCloud(cloud, preintegrator_);

        // Extract features and compute LiDAR odometry
        lidar_odom_ = computeLiDARFeatures(deskewed);
    }

    void processImage(const cv::Mat& image) {
        // Visual feature tracking
        visual_odom_ = trackFeatures(image);
    }

    void optimize() {
        // Build factor graph
        gtsam::NonlinearFactorGraph graph;

        // Add IMU factor
        graph.add(gtsam::CombinedImuFactor(
            X(i-1), V(i-1), X(i), V(i), B(i-1), B(i),
            preintegrator_->preintegratedMeasurements()
        ));

        // Add LiDAR odometry factor
        graph.add(gtsam::BetweenFactor<Pose3>(
            X(i-1), X(i), lidar_odom_, lidar_noise_
        ));

        // Add visual odometry factor
        graph.add(gtsam::BetweenFactor<Pose3>(
            X(i-1), X(i), visual_odom_, visual_noise_
        ));

        // Optimize
        gtsam::LevenbergMarquardtOptimizer optimizer(graph, initial_);
        result_ = optimizer.optimize();
    }
};

8. Real-Time Mesh Reconstruction

Voxel Hashing for Scalable Reconstruction

Voxel hashing enables infinite-scale real-time reconstruction by storing only occupied voxels:

// Voxel hashing for real-time reconstruction
struct VoxelBlock {
    float tsdf[8][8][8];       // Truncated SDF values
    float weight[8][8][8];     // Integration weights
    uchar3 color[8][8][8];     // Color per voxel
};

class VoxelHashMap {
public:
    __device__ void integrate(
        const float* depth_image,
        const float* K,
        const float* pose,
        int width, int height
    ) {
        // Get voxel position
        int3 voxel_pos = blockIdx * blockDim + threadIdx;
        float3 world_pos = voxelToWorld(voxel_pos);

        // Project to image
        float3 cam_pos = transformPoint(world_pos, pose);
        float2 pixel = project(cam_pos, K);

        if (isValid(pixel, width, height)) {
            float depth = depth_image[int(pixel.y) * width + int(pixel.x)];
            float sdf = depth - cam_pos.z;

            // Truncate SDF
            float tsdf = fminf(1.0f, sdf / truncation_distance);
            tsdf = fmaxf(-1.0f, tsdf);

            // Weighted running average
            float old_tsdf = voxel_block->tsdf[local_idx];
            float old_weight = voxel_block->weight[local_idx];

            float new_weight = old_weight + 1.0f;
            float new_tsdf = (old_tsdf * old_weight + tsdf) / new_weight;

            voxel_block->tsdf[local_idx] = new_tsdf;
            voxel_block->weight[local_idx] = fminf(new_weight, max_weight);
        }
    }

    void extractMesh(std::vector<Triangle>& mesh) {
        // Marching cubes on GPU
        launchMarchingCubes(voxel_blocks_, mesh);
    }
};

Nvblox Mesh Generation

# Nvblox mesh extraction with ROS 2
import rclpy
from rclpy.node import Node
from nvblox_msgs.srv import FilePath

class MeshExtractor(Node):
    def __init__(self):
        super().__init__('mesh_extractor')

        # Service client for mesh saving
        self.save_mesh_client = self.create_client(
            FilePath, '/nvblox_node/save_mesh'
        )

    def save_mesh(self, filepath):
        request = FilePath.Request()
        request.file_path = filepath

        future = self.save_mesh_client.call_async(request)
        rclpy.spin_until_future_complete(self, future)

        return future.result().success

9. CUDA-Accelerated 3D Processing

Cupoch: GPU-Accelerated Robotics Library

Cupoch provides CUDA-accelerated 3D data processing specifically designed for robotics:

import cupoch as cph

# Load point cloud
pcd = cph.io.read_point_cloud("scene.ply")

# GPU-accelerated voxel downsampling
pcd_downsampled = pcd.voxel_down_sample(voxel_size=0.02)

# GPU-accelerated ICP registration
source = cph.io.read_point_cloud("source.ply")
target = cph.io.read_point_cloud("target.ply")

# Compute normals on GPU
source.estimate_normals()
target.estimate_normals()

# Point-to-plane ICP
reg_result = cph.registration.registration_icp(
    source, target,
    max_correspondence_distance=0.05,
    estimation_method=cph.registration.TransformationEstimationPointToPlane()
)

print(f"Fitness: {reg_result.fitness}")
print(f"RMSE: {reg_result.inlier_rmse}")
print(f"Transformation:\n{reg_result.transformation}")

# GPU-accelerated collision detection
collision_manager = cph.collision.CollisionManager()
collision_manager.add_object("robot", robot_mesh)
collision_manager.add_object("obstacle", obstacle_cloud)

is_collision = collision_manager.check_collision("robot", "obstacle")

Custom CUDA Kernels for 3D Processing

// Custom CUDA kernel for point cloud transformation
__global__ void transformPointCloud(
    float3* points,
    int num_points,
    const float* transformation  // 4x4 matrix
) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx >= num_points) return;

    float3 p = points[idx];

    // Apply transformation
    float x = transformation[0] * p.x + transformation[1] * p.y +
              transformation[2] * p.z + transformation[3];
    float y = transformation[4] * p.x + transformation[5] * p.y +
              transformation[6] * p.z + transformation[7];
    float z = transformation[8] * p.x + transformation[9] * p.y +
              transformation[10] * p.z + transformation[11];

    points[idx] = make_float3(x, y, z);
}

// CUDA kernel for radius search
__global__ void radiusSearch(
    const float3* query_points,
    const float3* target_points,
    int* neighbor_indices,
    int num_queries,
    int num_targets,
    float radius
) {
    int query_idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (query_idx >= num_queries) return;

    float3 query = query_points[query_idx];
    float radius_sq = radius * radius;
    int neighbor_count = 0;

    for (int i = 0; i < num_targets; i++) {
        float3 target = target_points[i];
        float dx = query.x - target.x;
        float dy = query.y - target.y;
        float dz = query.z - target.z;
        float dist_sq = dx*dx + dy*dy + dz*dz;

        if (dist_sq < radius_sq) {
            neighbor_indices[query_idx * MAX_NEIGHBORS + neighbor_count] = i;
            neighbor_count++;

            if (neighbor_count >= MAX_NEIGHBORS) break;
        }
    }
}

10. Isaac ROS Integration for 3D Perception

Isaac ROS Architecture

Isaac ROS provides a complete perception stack optimized for Jetson:

ISAAC ROS PERCEPTION STACK

Isaac Perceptor

cuVSLAM

(VSLAM)

nvblox

(3D Recon)

Foundation

Stereo

NITROS Transport

(Zero-copy GPU-accelerated messaging)

Nav2 Integration

Costmap

Server

Planner

Controller

cuVSLAM Configuration

# isaac_ros_visual_slam configuration
/**:
  ros__parameters:
    # Input configuration
    enable_image_denoising: false
    rectified_images: true

    # SLAM parameters
    enable_slam_visualization: true
    enable_observations_view: true
    enable_landmarks_view: true

    # Performance tuning for Jetson
    num_cameras: 2
    min_num_images: 2
    enable_imu_fusion: true
    gyro_noise_density: 0.000244
    gyro_random_walk: 0.000019393
    accel_noise_density: 0.001862
    accel_random_walk: 0.003

    # Output
    map_frame: "map"
    odom_frame: "odom"
    base_frame: "base_link"

nvblox for 3D Reconstruction

# ROS 2 Python node using nvblox
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, CameraInfo
from geometry_msgs.msg import PoseStamped
from nvblox_msgs.msg import DistanceMapSlice

class NvbloxNavigator(Node):
    def __init__(self):
        super().__init__('nvblox_navigator')

        # Subscribe to nvblox outputs
        self.esdf_sub = self.create_subscription(
            DistanceMapSlice,
            '/nvblox_node/static_esdf_slice',
            self.esdf_callback,
            10
        )

        self.costmap_sub = self.create_subscription(
            OccupancyGrid,
            '/nvblox_node/static_map_slice',
            self.costmap_callback,
            10
        )

    def esdf_callback(self, msg):
        """Process Euclidean Signed Distance Field"""
        # ESDF provides distance to nearest obstacle
        # Use for gradient-based path planning
        esdf_data = np.array(msg.data).reshape(
            msg.height, msg.width
        )

        # Find collision-free paths
        gradient = np.gradient(esdf_data)

    def costmap_callback(self, msg):
        """Process 2D costmap for Nav2"""
        # Direct integration with Nav2 costmap
        pass

Performance Benchmarks

Component	Jetson Orin NX	Jetson AGX Orin	CPU Baseline
cuVSLAM	60 FPS	90+ FPS	15 FPS
nvblox	30 FPS	60 FPS	0.6 FPS
FoundationStereo	15 FPS	30 FPS	2 FPS
Combined Pipeline	20 FPS	45 FPS	N/A

11. Occupancy Grids and Voxel Mapping

OHM Library: GPU Occupancy Mapping

The OHM library provides probabilistic occupancy mapping with GPU acceleration:

#include <ohm/OccupancyMap.h>
#include <ohmgpu/GpuMap.h>

// Initialize GPU-accelerated occupancy map
ohm::OccupancyMap map(0.1);  // 10cm resolution
ohm::GpuMap gpu_map(&map);

// Configure ray casting
ohm::RayMapperConfig config;
config.miss_value = -1.0f;  // Log-odds decrease for miss
config.hit_value = 2.0f;    // Log-odds increase for hit

// Process LiDAR rays on GPU
std::vector<ohm::Ray> rays;
for (const auto& point : lidar_scan) {
    rays.push_back(ohm::Ray(sensor_origin, point));
}

// GPU-accelerated ray integration
gpu_map.integrateRays(rays.data(), rays.size(), config);

// Extract occupied voxels
std::vector<Eigen::Vector3d> occupied_voxels;
for (auto iter = map.begin(); iter != map.end(); ++iter) {
    if (map.occupancy(*iter) > 0.5f) {
        occupied_voxels.push_back(map.voxelCentre(*iter));
    }
}

3D Occupancy Prediction with Neural Networks

# Neural network-based 3D occupancy prediction
import torch
import torch.nn as nn

class OccupancyNetwork(nn.Module):
    def __init__(self, voxel_size=0.2, grid_size=(200, 200, 16)):
        super().__init__()
        self.voxel_size = voxel_size
        self.grid_size = grid_size

        # Image backbone
        self.backbone = EfficientNet.from_pretrained('efficientnet-b4')

        # View transformer (image to BEV)
        self.view_transform = LSSViewTransform(
            in_channels=1792,
            out_channels=64,
            grid_size=grid_size
        )

        # 3D occupancy head
        self.occupancy_head = nn.Sequential(
            nn.Conv3d(64, 128, 3, padding=1),
            nn.BatchNorm3d(128),
            nn.ReLU(),
            nn.Conv3d(128, 64, 3, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(),
            nn.Conv3d(64, 17, 1)  # 16 classes + free space
        )

    def forward(self, images, intrinsics, extrinsics):
        # Extract image features
        B, N, C, H, W = images.shape
        features = self.backbone(images.view(B*N, C, H, W))
        features = features.view(B, N, -1, H//32, W//32)

        # Transform to 3D voxel space
        voxel_features = self.view_transform(
            features, intrinsics, extrinsics
        )

        # Predict occupancy
        occupancy = self.occupancy_head(voxel_features)

        return occupancy

Occupancy Grid Integration with Nav2

# Nav2 costmap configuration with nvblox
costmap:
  ros__parameters:
    global_frame: map
    robot_base_frame: base_link
    update_frequency: 10.0
    publish_frequency: 5.0

    # Use nvblox as primary obstacle source
    plugins: ["nvblox_layer", "inflation_layer"]

    nvblox_layer:
      plugin: "nvblox::NvbloxCostmapLayer"
      enabled: true
      nvblox_map_slice_topic: "/nvblox_node/static_map_slice"

    inflation_layer:
      plugin: "nav2_costmap_2d::InflationLayer"
      cost_scaling_factor: 3.0
      inflation_radius: 0.55

12. Complete Implementation Pipeline

End-to-End 3D Perception System

COMPLETE 3D PERCEPTION PIPELINE FOR JETSON

SENSOR LAYER

Stereo Camera

RGB-D Camera

LiDAR Scanner

IMU

↓

PREPROCESSING

FoundationStereo

Depth Denoise

CUDA-PCL Filter

Pre-Integrate

↓ merges into ↓

FUSION & SLAM

cuVSLAM

(Visual-Inertial SLAM)

↓ branches into ↓

3D RECONSTRUCTION

nvblox

(ESDF)

GS-SLAM

(Splats)

Mesh

Recon

↓ consolidates to ↓

OUTPUT LAYER

OUTPUTS

2D Costmap for Nav2 3D Point Cloud ESDF Heatmap Triangle Mesh Semantic Labels Object Detections Robot Pose Loop Closure Events

ROS 2 Launch File

# complete_perception.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import IncludeLaunchDescription

def generate_launch_description():
    return LaunchDescription([
        # cuVSLAM for visual SLAM
        Node(
            package='isaac_ros_visual_slam',
            executable='visual_slam_node',
            parameters=[{
                'enable_imu_fusion': True,
                'num_cameras': 2,
            }],
            remappings=[
                ('stereo_camera/left/image', '/zed/left/image_rect_color'),
                ('stereo_camera/right/image', '/zed/right/image_rect_color'),
                ('imu', '/zed/imu/data'),
            ]
        ),

        # nvblox for 3D reconstruction
        Node(
            package='isaac_ros_nvblox',
            executable='nvblox_node',
            parameters=[{
                'global_frame': 'map',
                'voxel_size': 0.05,
                'esdf_2d_min_height': 0.1,
                'esdf_2d_max_height': 1.5,
            }],
            remappings=[
                ('depth/image', '/zed/depth/image_rect'),
                ('depth/camera_info', '/zed/depth/camera_info'),
                ('pose', '/visual_slam/tracking/odometry'),
            ]
        ),

        # CUDA-PCL for LiDAR processing
        Node(
            package='cuda_pcl_ros',
            executable='cuda_pcl_node',
            parameters=[{
                'voxel_size': 0.1,
                'ground_removal': True,
                'ransac_threshold': 0.1,
            }],
            remappings=[
                ('points_in', '/velodyne_points'),
                ('points_out', '/filtered_points'),
            ]
        ),

        # Point cloud to occupancy grid
        Node(
            package='pointcloud_to_laserscan',
            executable='pointcloud_to_laserscan_node',
            parameters=[{
                'min_height': 0.1,
                'max_height': 2.0,
            }],
            remappings=[
                ('cloud_in', '/filtered_points'),
                ('scan', '/scan'),
            ]
        ),
    ])

Performance Optimization Tips

Use NITROS for zero-copy transfer: Avoid unnecessary CPU-GPU memory copies
Batch processing: Process multiple frames together when possible
Resolution management: Use adaptive resolution based on computational budget
Memory pooling: Pre-allocate GPU memory to avoid allocation overhead
Async processing: Pipeline different stages to maximize throughput

# Example: Optimized async processing pipeline
import asyncio
from concurrent.futures import ThreadPoolExecutor

class OptimizedPipeline:
    def __init__(self):
        self.executor = ThreadPoolExecutor(max_workers=4)
        self.depth_queue = asyncio.Queue(maxsize=3)
        self.slam_queue = asyncio.Queue(maxsize=3)

    async def process_frame(self, rgb, depth, imu):
        # Run depth processing and SLAM tracking in parallel
        depth_task = asyncio.get_event_loop().run_in_executor(
            self.executor, self.process_depth, depth
        )
        slam_task = asyncio.get_event_loop().run_in_executor(
            self.executor, self.track_slam, rgb, imu
        )

        processed_depth, pose = await asyncio.gather(
            depth_task, slam_task
        )

        # Update map with processed data
        await self.update_map(processed_depth, pose)

Conclusion

3D reconstruction for robotics on NVIDIA Jetson platforms has reached an inflection point in 2025-2026. The convergence of powerful edge hardware (Jetson AGX Thor with 2070 TFLOPS), advanced algorithms (FoundationStereo, Gaussian Splatting SLAM), and comprehensive software stacks (Isaac ROS, nvblox, cuVSLAM) enables capabilities previously restricted to high-end workstations.

Key takeaways:

Stereo depth estimation has been revolutionized by foundation models like FoundationStereo, providing zero-shot generalization across diverse environments.
Gaussian Splatting is emerging as the preferred representation for dense SLAM, offering real-time rendering and explicit editability that NeRF cannot match on edge devices.
Multi-sensor fusion remains essential for robust operation, with tight integration of cameras, LiDAR, and IMUs providing redundancy and accuracy.
CUDA acceleration is non-negotiable for real-time performance, with libraries like CUDA-PCL, nvblox, and cuVSLAM providing 10-100x speedups over CPU implementations.
Isaac ROS provides the production-ready framework for deploying these technologies, with optimizations specifically targeting Jetson hardware.

The future points toward even tighter integration of perception with action, as embodied AI systems leverage these 3D understanding capabilities for manipulation, navigation, and human-robot interaction.

References and Further Reading

Research Papers

FoundationStereo: Zero-Shot Stereo Matching (CVPR 2025 Best Paper Nomination)
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
3D Gaussian Splatting for Real-Time Radiance Field Rendering (SIGGRAPH 2023)
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
GSORB-SLAM: Gaussian Splatting SLAM with ORB Features

NVIDIA Resources

GitHub Repositories

Industry Applications

Boston Dynamics - VPI on Jetson for perception algorithms
Leopard Imaging - Eagle series stereo cameras with Jetson Thor
Amazon Robotics - Early Jetson Thor evaluation pilots

This technical guide is part of Koca Ventures' research series on edge AI and robotics. For consulting or implementation support, contact our technical team.

Síguenos

3D Reconstruction for Robotics on NVIDIA Jetson: A Comprehensive Technical Guide

Plain English Summary

Executive Summary

Table of Contents

1. Jetson Platform Overview for 3D Perception

Hardware Specifications Comparison

Architecture Overview

2. Stereo Vision and Depth Estimation

FoundationStereo: CVPR 2025 Best Paper Nomination

Real-Time Stereo with VPI

ZED SDK Integration

3. LiDAR Point Cloud Processing

CUDA-PCL: 10x Acceleration

CUDA-PointPillars for 3D Object Detection

LiDAR Processing Pipeline

4. Visual SLAM: ORB-SLAM3 and RTAB-Map

ORB-SLAM3 on Jetson

RTAB-Map Integration

Performance on Jetson (EuRoC Dataset)

5. Neural Radiance Fields (NeRF) on Edge

Challenges and Solutions

RT-NeRF: Algorithm-Hardware Co-Design

6. Gaussian Splatting for Real-Time 3D

The 3DGS Revolution

SplaTAM and GS-SLAM

2025 State-of-the-Art: GSORB-SLAM

Robotics Applications in 2025

7. Multi-Sensor Fusion (Camera + LiDAR + IMU)

Why Fusion is Essential

Fusion Architectures

LVIO-SAM Implementation

8. Real-Time Mesh Reconstruction

Voxel Hashing for Scalable Reconstruction

Nvblox Mesh Generation

9. CUDA-Accelerated 3D Processing

Cupoch: GPU-Accelerated Robotics Library

Custom CUDA Kernels for 3D Processing

10. Isaac ROS Integration for 3D Perception

Isaac ROS Architecture

cuVSLAM Configuration

nvblox for 3D Reconstruction

Performance Benchmarks

11. Occupancy Grids and Voxel Mapping

OHM Library: GPU Occupancy Mapping

3D Occupancy Prediction with Neural Networks

Occupancy Grid Integration with Nav2

12. Complete Implementation Pipeline

End-to-End 3D Perception System

ROS 2 Launch File

Performance Optimization Tips

Conclusion

References and Further Reading

Research Papers

NVIDIA Resources

GitHub Repositories

Industry Applications

Related Articles

Building Autonomous Systems with NVIDIA Jetson: A Comprehensive Technical Guide

Advanced Computer Vision on NVIDIA Jetson Platforms: A Comprehensive Technical Guide

Edge AI Inference Patterns and Architectures for NVIDIA Jetson: A Comprehensive Technical Guide