In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full 360° depth map along with per-camera depth, fusion depth, and confidence estimates.
Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image–depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware.
FastViDAR achieves strong accuracy–throughput trade-offs and robust omnidirectional perception in real-world robotics scenarios.
FastViDAR employs Alternative Hierarchical Attention (AHA) with ERP-centric processing and ERP fusion for seamless 360° depth. The pipeline alternates local window attention with per-frame/global summary attention and fuses multi-view depth on a shared ERP grid.
Method | AbsRel ↓ | RMSE ↓ | Log10 ↓ | δ<1.25 ↑ | Time (ms) |
---|---|---|---|---|---|
VGGT | 0.557 | 1.934 | 0.396 | 0.043 | 120 |
OmniStereo | 0.619 | 1.450 | 0.154 | 0.554 | 66 |
LightStereo-M | 0.125 | 0.667 | 0.050 | 0.851 | 33 |
FastViDAR | 0.119 | 0.433 | 0.046 | 0.929 | 36 |
Zero-shot on Stanford 2D-3D-S. FastViDAR: 4 ERP views (FOV 220°). LightStereo via ERP-split stereo; VGGT via 8 pinhole projections projected back to ERP.
This video demonstrates the performance achieved by training on the Sunny Subset of the Omnidirectional Stereo Dataset training set and evaluating on the test set.
Layout: Top left ERP-converted inputs; Bottom left omnidirectional depth maps (ERP); Top right colored BEV point cloud; Bottom right colored FPV.@inproceedings{fastvidar_icra2026,
title = {FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention},
author = {Author, A. and Author, B. and Author, C.},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2026}
}