FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention

Abstract

In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full 360° depth map along with per-camera depth, fusion depth, and confidence estimates.

Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image–depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware.

FastViDAR achieves strong accuracy–throughput trade-offs and robust omnidirectional perception in real-world robotics scenarios.

Framework

FastViDAR employs Alternative Hierarchical Attention (AHA) with ERP-centric processing and ERP fusion for seamless 360° depth. The pipeline alternates local window attention with per-frame/global summary attention and fuses multi-view depth on a shared ERP grid.

Benchmark Results

Method	AbsRel ↓	RMSE ↓	Log10 ↓	δ<1.25 ↑	Time (ms)
VGGT	0.557	1.934	0.396	0.043	120
OmniStereo	0.619	1.450	0.154	0.554	66
LightStereo-M	0.125	0.667	0.050	0.851	33
FastViDAR	0.119	0.433	0.046	0.929	36

Zero-shot on Stanford 2D-3D-S. FastViDAR: 4 ERP views (FOV 220°). LightStereo via ERP-split stereo; VGGT via 8 pinhole projections projected back to ERP.

In-domain Results

This video demonstrates the performance achieved by training on the Sunny Subset of the Omnidirectional Stereo Dataset training set and evaluating on the test set.

Layout: Top left ERP-converted inputs; Bottom left omnidirectional depth maps (ERP); Top right colored BEV point cloud; Bottom right colored FPV.

BibTeX

@inproceedings{fastvidar_icra2026, title = {FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention}, author = {Author, A. and Author, B. and Author, C.}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, year = {2026} }

FastViDAR: Real-Time Omnidirectional Depth Estimation
via Alternative Hierarchical Attention

Abstract

Framework

Benchmark Results

In-domain Results

Zero-shot Results

Part I: Omnidirectional Point Cloud

Part II: Omnidirectional Depth-based Mapping Using Odometry In The Wild

Related Links

BibTeX