259
Views
0
CrossRef citations to date
0
Altmetric
Research Article

MobileDepth: Monocular Depth Estimation Based on Lightweight Vision Transformer

&
Article: 2364159 | Received 21 May 2023, Accepted 04 May 2024, Published online: 01 Jul 2024

References

  • Bae, J.-H., S. Moon, and S. Im. 2022. MonoFormer: Towards generalization of self-supervised monocular depth estimation with transformers. arXiv Preprint arXiv: 150302531 2 (37): 11083.
  • Bhat, S., I. Alhashim and P. Wonka. 2020. AdaBins: depth estimation using adaptive bins. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4008–20. Nashville, TN, USA. June 10–25, 2021.
  • Deng, J., W. Dong, R. Socher, Li, L.J., Li, K. and L. Fei-Fei. 2009. Imagenet: A large-scale hierarchical image databaseC. In 2009 IEEE conference on computer vision and pattern recognition, 248–55. IEEE: Miami, FL, USA. June 20–25, 2009.
  • Eigen, D., C. Puhrsch, and R. Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. NIPS 27. http://arxiv.org/abs/1406.2283.
  • Fu, H., M. Gong, C. Wang, K. Batmanghelich and D. Tao. 2018. Deep ordinal regression network for monocular depth estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2002–11. Salt Lake City, UT, USA. June 18–22, 2018.
  • Gan, Y., X. Xu, W. Sun and L. Lin. 2018. Monocular depth estimation with affinity, vertical pooling, and label enhancement. In European Conference on Computer Vision, Munich, Germany. September 8–14, 2018.
  • Godard, C., O. Mac Aodha, M. Firman and G.J. Brostow. 2018. Digging into self-supervised monocular depth estimation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 3827–37. Seoul, Korea (South), Oct 27–Nov 2, 2019.
  • Guizilini, V. C., R. Ambrus, S. Pillai, A. Raventos and A. Gaidon. 2019. 3D Packing for self-supervised monocular depth estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2482–91. Seattle, WA, USA. June 13–19, 2020.
  • Ibrahem, H., A. Salem, and H.-S. Kang. 2022. RT-ViT: Real-time monocular depth estimation using lightweight vision transformers. Sensors (Basel, Switzerland) 22 (10):3849. doi:10.3390/s22103849.
  • Kaur, A., A. P. S. Chauhan, and A. K. Aggarwal. 2019. Machine learning based comparative analysis of methods for enhancer prediction in genomic data C. In 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), 142–45. IEEE: Jaipur, India. September 28–29, 2019.
  • Kim, Y. 2014. Convolutional neural networks for sentence classification. In Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. October 25–29, 2014.
  • Kundu, J. N. 2018. AdaDepth: Unsupervised content congruent adaptation for depth estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2656–65. Salt Lake City, UT, USA. June 18–22, 2018.
  • Lee, J.H., M.K. Han, D.W Ko, and I.H Suh. 2019. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv Preprint arXiv: 1907 abs/1907.10326: 10326. http://arxiv.org/abs/1907.10326.
  • Lin, G., A. Milan, C. Shen and I. Reid. 2016. RefineNet: Multi-path refinement networks for high-resolution semantic segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5168–77. Honolulu, HI, USA. July 21–26, 2017.
  • Li, B., C. Shen, Y. Dai, A. Van Den Hengel and M. He. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1119–27. Boston, MA, USA. June 7–12, 2015.
  • Liu, F., C. Shen, G. Lin, and I. Reid. 2015. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38:2024–39. NW Washington, DC, United States.
  • Long, J., E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA. June 7–12, 2015.
  • Lyu, X., L. Liu, M. Wang, X. Kong, L. Liu, Y. Liu, X. Chen, and Y. Yuan. 2020. HR-Depth: High resolution self-supervised monocular depth estimation. arXiv Preprint arXiv: 1503025312 35: 07356.
  • Poggi, M., F. Tosi, and S. Mattoccia. 2018. Learning monocular depth estimation with unsupervised trinocular assumptions. In 2018 International Conference on 3D Vision (3DV), 324–33. Verona, ltaly. September 5–8, 2018.
  • Ranftl, R., A. Bochkovskiy and V. Koltun. 2021. Vision transformers for dense prediction. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) ,12159–68. Montreal, Canada. Oct 10–17, 2021.
  • Sandler, M., A. Howard, M. Zhu, A. Zhmoginov and L.C. Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–20. Salt Lake City, UT, USA. June 18–22, 2018.
  • Wang, C., J.M. Buenaposada, R. Zhu and S. Lucey. 2017. Learning depth from monocular videos using direct methods. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022–30. Salt Lake City, UT, USA. June 18–22, 2018.
  • Wang, L., J. Zhang, O. Wang, Z. Lin and H. Lu. 2020. Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation C. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 541–50. Seattle, WA, USA. June 13–19, 2020.
  • Xu, D., E. Ricci, W. Ouyang, X. Wang and N.Sebe. 2017. Multi-scale continuous crfs as sequential deep networks for monocular depth estimationC. In Proceedings of the IEEE conference on computer vision and pattern recognition, 5354–62. Honolulu, HI, USAJuly 21–26, 2017.
  • Yan, J., H. Zhao, P. Bu, and Y. Jin. 2021. Channel-wise attention-based network for self-supervised monocular depth estimation. In 2021 International Conference on 3D Vision (3DV), 464–73. December 1–3, 2021.
  • Zhan, H., R. Garg, C.S. Weerasekera, K. Li, H. Agarwal and I. Reid. 2018. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstructionC. In Proceedings of the IEEE conference on computer vision and pattern recognition, 340–49. Salt Lake City, UT, USA. June 18–22, 2018.
  • Zhang, N., F. Nex, G. Vosselman, and N. Kerle. 2023. Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation C. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18537–46. Vancouver, Canada. June 18–22, 2023.
  • Zhao, C., Y. Zhang, M. Poggi, Tosi, F., Guo, X., Zhu, Z., Huang, G., Tang, Y. and S. Mattoccia. 2022. Monovit: Self-supervised monocular depth estimation with a vision transformer C. In 2022 International Conference on 3D Vision(3DV), 668–78. IEEE: Prague, CZ. September 12–16, 2022.
  • Zhou, T., M. Brown, N. Snavely and D.G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6612–19. Honolulu, HI, USA. July 21–26, 2017.
  • Zhou, Z., X. Fan, P. Shi, and Xin Y. 2021. R-msfm: Recurrent multi-scale feature modulation for monocular depth estimating C. In Proceedings of the IEEE/CVF international conference on computer vision, 12777–86. Montreal, Canada. Oct. 10–17, 2021