ros2_depth_anything_v3_tensorrt_short.mp4
A ROS 2 node for Depth Anything V3 depth estimation using TensorRT for real-time inference. This node subscribes to camera image and camera info topics and publishes directly both, a metric depth image and PointCloud2 point cloud.
- Features
- Dependencies
- Topics
- Parameters
- Usage
- Model Preparation
- Docker Image
- Building
- Performance
- Architecture
- Depth Postprocessing Pipeline
- Troubleshooting
- License
- Citation
- Acknowledgements
- Real-time metric depth estimation using Depth Anything V3 with TensorRT acceleration
- Point cloud generation from metric depth image
- Debug visualization with colormap options
- Configurable precision (FP16/FP32)
Important
This repository is open-sourced and maintained by the Institute for Automotive Engineering (ika) at RWTH Aachen University.
We cover a wide variety of research topics within our Vehicle Intelligence & Automated Driving domain.
If you would like to learn more about how we can support your automated driving or robotics efforts, feel free to reach out to us!
📧 opensource@ika.rwth-aachen.de
-
Tested with image
nvcr.io/nvidia/tensorrt:25.08-py3- Ubuntu 24.04, ROS 2 Jazzy
- CUDA 13
- TensorRT 10.9
-
Tested with image
nvcr.io/nvidia/tensorrt:25.03-py3- Ubuntu 24.04, ROS 2 Jazzy
- CUDA 12.8.1
- TensorRT 10.9
Depending on your driver and CUDA version you need to select the appropriate base image.
~/input/image(sensor_msgs/msg/Image): Input camera image (supports raw and compressed transport via image_transport)~/input/camera_info(sensor_msgs/msg/CameraInfo): Camera calibration info
~/output/depth_image(sensor_msgs/msg/Image): Depth image (32FC1 format)~/output/point_cloud(sensor_msgs/msg/PointCloud2): Generated point cloud~/output/depth_image_debug(sensor_msgs/msg/Image): Debug depth visualization (if enabled)
All parameters can be configured via config/depth_anything_v3.param.yaml or passed at launch time.
| Parameter | Type | Default | Description |
|---|---|---|---|
onnx_path |
string | "models/DA3METRIC-LARGE.onnx" |
Path to Depth Anything V3 ONNX or TensorRT engine file |
precision |
string | "fp16" |
Inference precision ("fp16" or "fp32") |
| Parameter | Type | Default | Description |
|---|---|---|---|
sky_threshold |
double | 0.3 |
Threshold for sky classification from model's sky output (lower = more sky detected) |
sky_depth_cap |
double | 200.0 |
Maximum depth value (meters) to assign to sky pixels |
| Parameter | Type | Default | Description |
|---|---|---|---|
point_cloud_downsample_factor |
int | 2 |
Publish every Nth point (1 = full resolution, 10 = every 10th point) |
colorize_point_cloud |
bool | true |
Add RGB colors from input image to point cloud |
| Parameter | Type | Default | Description |
|---|---|---|---|
enable_debug |
bool | false |
Enable debug visualization output |
debug_colormap |
string | "JET" |
Colormap for depth visualization (see below) |
debug_filepath |
string | "/tmp/depth_anything_v3_debug/" |
Directory to save debug images |
write_colormap |
bool | false |
Save colorized debug images to disk |
debug_colormap_min_depth |
double | 0.0 |
Minimum depth value for colormap normalization (meters) |
debug_colormap_max_depth |
double | 50.0 |
Maximum depth value for colormap normalization (meters) |
JET, HOT, COOL, SPRING, SUMMER, AUTUMN, WINTER, BONE, GRAY, HSV, PARULA, PLASMA, INFERNO, VIRIDIS, MAGMA, CIVIDIS
ros2 launch depth_anything_v3 depth_anything_v3.launch.pyros2 launch depth_anything_v3 depth_anything_v3.launch.py \
input_image_topic:=/your_camera/image_raw \
input_camera_info_topic:=/your_camera/camera_info \
output_depth_topic:=/depth_anything_v3/depth \
output_point_cloud_topic:=/depth_anything_v3/pointsros2 launch depth_anything_v3 depth_anything_v3.launch.py \
params_file:=src/depth_anything_v3/config/debug.param.yaml- Obtain the ONNX model (Two Options): A. Download the ONNX file from Huggingface: https://huggingface.co/TillBeemelmanns/Depth-Anything-V3-ONNX B. Generate ONNX following the instruction here
- Place model file: Put the ONNX/engine file in the
models/directory - Update configuration: Modify
config/depth_anything_v3.param.yamlwith the correct model path
Expected model input format:
- Input shape: [1, 3, 280, 504] (batch, channels, height, width)
- Input type: float32
- Value range: [0, 1] (normalized with ImageNet mean/std)
Expected model outputs:
depth: [1, 1, 280, 504] - metric depth map (float32)sky: [1, 1, 280, 504] - sky classification logits (float32, lower values = sky)
We precompile and provide a Docker image with all dependencies installed and ready to use. You can pull it from Docker Hub:
docker pull tillbeemelmanns/ros2-depth-anything-v3:latest-devIf you want to run rosbags and visualize the output in rviz2 make sure to install it
apt update && apt install -y \
ros-jazzy-rviz2 \
ros-jazzy-rosbag2 \
ros-jazzy-rosbag2-storage-mcapWe recommend to use the docker image in combination with our other tools for Docker and ROS.
- docker-ros automatically builds minimal container images of ROS applications
- docker-run is a CLI tool for simplified interaction with Docker images
# From your ROS 2 workspace
colcon build --packages-select depth_anything_v3 --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bash
# Optional: Pre-build TensorRT engines
./generate_engines.shThe node is optimized for real-time performance.
Performance on Quadro RTX 6000:
- DA3METRIC-LARGE: 50 FPS
Input Image + Camera Info
↓
Preprocessing (CPU/GPU)
↓
TensorRT Inference (GPU)
↓
Postprocessing (CPU)
↓
Depth Image + Point Cloud
Depth Anything V3 predicts a dense metric depth map. After TensorRT inference, the node performs the following postprocessing steps:
- The single-channel NCHW tensor is reshaped into a
cv::Mat - Negative depth values are clamped to zero
- The raw depth output is scaled based on the camera's focal length
- Scale factor:
focal_pixels / 300.0wherefocal_pixels = (fx + fy) / 2 - This aligns the model's internal focal normalization with your actual camera intrinsics
The model outputs a separate sky classification tensor:
- Pixels with sky confidence below
sky_thresholdare classified as sky - Sky pixels are filled with a depth value derived from the 99th percentile of valid (non-sky) depths
- The fill value is capped at
sky_depth_capmeters to avoid unrealistic far distances
- The depth map is resized from model resolution back to the original camera resolution using cubic interpolation
For each pixel in the depth map:
X = (u - cx) * depth / fx
Y = (v - cy) * depth / fy
Z = depth
Where (cx, cy) is the principal point and (fx, fy) are the focal lengths from CameraInfo.
Optional features:
- Downsampling: Use
point_cloud_downsample_factorto reduce point count for visualization or bandwidth - Colorization: When
colorize_point_cloudis enabled, RGB values from the input image are added to each 3D point
-
TensorRT engine building fails:
- Check CUDA/TensorRT compatibility
- Verify ONNX model format
- Increase workspace size
-
Point cloud appears incorrect:
- Verify camera_info calibration
- Check coordinate frame conventions
- Validate depth value units
-
Performance issues:
- Enable FP16 precision
- Check GPU memory usage
Enable debug mode to troubleshoot:
enable_debug: true
debug_colormap: "JET"
debug_colormap_min_depth: 0.0
debug_colormap_max_depth: 50.0
write_colormap: trueThis will publish colorized depth images and save them to disk for inspection.
This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.
If you use this code in your research, please cite the following:
@misc{beemelmanns2024depth,
author = {Till Beemelmanns},
title = {ros2-depth-anything-v3-trt: ROS2 TensorRT Node for Monocular Metric Depth estimation and Point Cloud generation with Depth Anything V3},
year = {2025},
publisher = {GitHub},
url = {https://github.com/ika-rwth-aachen/ros2-depth-anything-v3-trt}
}Thanks to the following repositories for inspiration: