Skip to Main Content

Browse
We’re here to help

Find guidance on Author Services

Login | Register

Login
Register

Cart Add to Cart

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
Applied Artificial Intelligence
List of Issues
Volume 38, Issue 1
MobileDepth: Monocular Depth Estimation ....

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Register a free Taylor & Francis Online account today to boost your research and gain these benefits:

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Register now or learn more

Search in:

Advanced search

Publication Cover

Applied Artificial Intelligence

An International Journal

Volume 38, 2024 - Issue 1

Submit an article Journal homepage

Open access

259

Views

0

CrossRef citations to date

0

Altmetric

Research Article

MobileDepth: Monocular Depth Estimation Based on Lightweight Vision Transformer

Yundong LiSchool of Information Science and Technology, North China University of Technology, Beijing, ChinaCorrespondence[email protected]
View further author information

&

Xiaokun WeiSchool of Information Science and Technology, North China University of Technology, Beijing, ChinaView further author information

Article: 2364159 | Received 21 May 2023, Accepted 04 May 2024, Published online: 01 Jul 2024

Cite this article
https://doi.org/10.1080/08839514.2024.2364159
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Figures & data

Figure 1. Multi-head self-attention mechanism.

Figure 1. Multi-head self-attention mechanism.

Figure 2. Overview of the proposed MobileDepth.

Figure 2. Overview of the proposed MobileDepth.

Table 1. Comparison of CNNs-based models and transformer-based models.

Figure 3. Processing of vision transformer.

Figure 3. Processing of vision transformer.

Figure 4. Dilated self-attention block (Dsab)(top) and standard self-attention block(bottom).

Figure 4. Dilated self-attention block (Dsab)(top) and standard self-attention block(bottom).

Figure 5. Local and global feature extraction block (LGFE).

Figure 5. Local and global feature extraction block (LGFE).

Figure 6. MobileNetV2 block.

Figure 6. MobileNetV2 block.

Figure 7. Residual convolution unit (RCU)(left) and fusion block(right).

Figure 7. Residual convolution unit (RCU)(left) and fusion block(right).

Figure 8. (a) Input RGB image (b) Relative depth map predicted by Monodepth2 (c) Metric depth map predicted by MobileDepth.

Figure 8. (a) Input RGB image (b) Relative depth map predicted by Monodepth2 (c) Metric depth map predicted by MobileDepth.

Figure 9. Qualitative comparison with ground truth.

Figure 9. Qualitative comparison with ground truth.

Table 2. Quantitative results. Compare our method to existing methods on KITTI.

Table 3. Quantitative results. Compare our method to existing methods on NYU.

Table 4. Comparison of MobileDepth with existing models in parameters.

Table 5. Complexity of models.

Table 6. Ablation study for components and loss function.

Figure 10. Quantitative ablation comparison experiments. (a) The input image. (b) The ground truth. (c) The predicted depth map using MSE loss function. (d)The predicted depth map via full model. (e) The predicted depth map without DSAB block. (f) The predicted depth map without MV2 block.

Figure 10. Quantitative ablation comparison experiments. (a) The input image. (b) The ground truth. (c) The predicted depth map using MSE loss function. (d)The predicted depth map via full model. (e) The predicted depth map without DSAB block. (f) The predicted depth map without MV2 block.

Eigen, D., C. Puhrsch, and R. Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. NIPS 27. http://arxiv.org/abs/1406.2283.

Zhou, T., M. Brown, N. Snavely and D.G Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6612–19. Honolulu, HI, USA. July 21–26, 2017.

Liu, F., C. Shen, G. Lin, and I. Reid. 2015. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38:2024–39. NW Washington, DC, United States.

Poggi, M., F. Tosi, and S. Mattoccia. 2018. Learning monocular depth estimation with unsupervised trinocular assumptions. In 2018 International Conference on 3D Vision (3DV), 324–33. Verona, ltaly. September 5–8, 2018.

Godard, C., O. Mac Aodha, M. Firman and G.J. Brostow. 2018. Digging into self-supervised monocular depth estimation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 3827–37. Seoul, Korea (South), Oct 27–Nov 2, 2019.

Ranftl, R., A. Bochkovskiy and V. Koltun. 2021. Vision transformers for dense prediction. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) ,12159–68. Montreal, Canada. Oct 10–17, 2021.

Bhat, S., I. Alhashim and P. Wonka. 2020. AdaBins: depth estimation using adaptive bins. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4008–20. Nashville, TN, USA. June 10–25, 2021.

Zhan, H., R. Garg, C.S. Weerasekera, K. Li, H. Agarwal and I. Reid. 2018. Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstructionC. In Proceedings of the IEEE conference on computer vision and pattern recognition, 340–49. Salt Lake City, UT, USA. June 18–22, 2018.

Kundu, J. N. 2018. AdaDepth: Unsupervised content congruent adaptation for depth estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2656–65. Salt Lake City, UT, USA. June 18–22, 2018.

Guizilini, V. C., R. Ambrus, S. Pillai, A. Raventos and A. Gaidon. 2019. 3D Packing for self-supervised monocular depth estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2482–91. Seattle, WA, USA. June 13–19, 2020.

Wang, L., J. Zhang, O. Wang, Z. Lin and H. Lu. 2020. Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation C. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 541–50. Seattle, WA, USA. June 13–19, 2020.

Lyu, X., L. Liu, M. Wang, X. Kong, L. Liu, Y. Liu, X. Chen, and Y. Yuan. 2020. HR-Depth: High resolution self-supervised monocular depth estimation. arXiv Preprint arXiv: 1503025312 35: 07356.

Bae, J.-H., S. Moon, and S. Im. 2022. MonoFormer: Towards generalization of self-supervised monocular depth estimation with transformers. arXiv Preprint arXiv: 150302531 2 (37): 11083.

Xu, D., E. Ricci, W. Ouyang, X. Wang and N.Sebe. 2017. Multi-scale continuous crfs as sequential deep networks for monocular depth estimationC. In Proceedings of the IEEE conference on computer vision and pattern recognition, 5354–62. Honolulu, HI, USAJuly 21–26, 2017.

Fu, H., M. Gong, C. Wang, K. Batmanghelich and D. Tao. 2018. Deep ordinal regression network for monocular depth estimation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2002–11. Salt Lake City, UT, USA. June 18–22, 2018.

Lee, J.H., M.K. Han, D.W Ko, and I.H Suh. 2019. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv Preprint arXiv: 1907 abs/1907.10326: 10326. http://arxiv.org/abs/1907.10326.

Wang, C., J.M. Buenaposada, R. Zhu and S. Lucey. 2017. Learning depth from monocular videos using direct methods. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022–30. Salt Lake City, UT, USA. June 18–22, 2018.

Yan, J., H. Zhao, P. Bu, and Y. Jin. 2021. Channel-wise attention-based network for self-supervised monocular depth estimation. In 2021 International Conference on 3D Vision (3DV), 464–73. December 1–3, 2021.

Zhao, C., Y. Zhang, M. Poggi, Tosi, F., Guo, X., Zhu, Z., Huang, G., Tang, Y. and S. Mattoccia. 2022. Monovit: Self-supervised monocular depth estimation with a vision transformer C. In 2022 International Conference on 3D Vision(3DV), 668–78. IEEE: Prague, CZ. September 12–16, 2022.

Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Information for

Authors
R&D professionals
Editors
Librarians
Societies

Open access

Overview
Open journals
Open Select
Dove Medical Press
F1000Research

Opportunities

Reprints and e-prints
Advertising solutions
Accelerated publication
Corporate access solutions

Help and information

Help and contact
Newsroom
All journals
Books

Keep up to date

Register to receive personalised research and resources by email

Taylor and Francis Group Facebook page

Taylor and Francis Group X Twitter page

Taylor and Francis Group Linkedin page

Taylor and Francis Group Youtube page

Taylor and Francis Group Weibo page

Copyright © 2024Informa UK Limited Privacy policy Cookies Terms & conditions Accessibility

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG