Search in:

Advanced search

Connection Science Volume 35, 2023 - Issue 1

Submit an article Journal homepage

Open access

907

Views

CrossRef citations to date

Altmetric

Research Article

Accelerating AI performance with the incorporation of TVM and MediaTek NeuroPilot

Chao-Lin Leea Department of Computer Science, National Tsing Hua University, Hsinchu, TaiwanCorrespondence[email protected]

https://orcid.org/0000-0002-4619-3843

Chun-Ping Chunga Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

Sheng-Yuan Chenga Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

Jenq-Kuen Leea Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan

Robert Laib MediaTek Corporation, Hsinchu, Taiwan

Article: 2272586 | Received 01 Jan 2023, Accepted 13 Oct 2023, Published online: 30 Oct 2023

Cite this article
https://doi.org/10.1080/09540091.2023.2272586
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., …Yu, Y.2016). {TensorFlow}: A system for {Large-Scale} machine learning. In 12th usenix symposium on operating systems design and implementation (OSDI 16) (pp. 265–283). USA: USENIX Association.
Google Scholar
Bai, J., Lu, F., & Zhang, K. (2019). Onnx: Open neural network exchange. GitHub. https://github.com/onnx/onnx.
Google Scholar
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., & Zhang, Z. (2015). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
Google Scholar
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., & Krishnamurthy, A. (2018). {TVM}: An automated {End-to-End} optimizing compiler for deep learning. In 13th usenix symposium on operating systems design and implementation (OSDI 18) (pp. 578–594). USA: USENIX Association.
Google Scholar
Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2016). Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138. https://doi.org/10.1109/JSSC.2016.2616357.
Web of Science ®Google Scholar
George, A., & Marcel, S. (2019). Deep pixel-wise binary supervision for face presentation attack detection. Crete, Greece: International Conference on Biometrics (ICB).
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622.
Web of Science ®Google Scholar
Gulli, A., & Pal, S. (2017). Deep learning with keras. Packt Publishing Ltd.
Google Scholar
Guo, J., He, H., He, T., Lausen, L., Li, M., Lin, H., Shi, X., Wang, C., Xie, J., Zha, S., Zhang, A., Zhang, H., Zhang, Z., Zhang, Z., Zheng, S., & Zhu, Y. (2020). GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, 21(23), 1–7.
PubMedGoogle Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. ArXiv. /abs/1512.03385.
Google Scholar
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
Google Scholar
Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. (2014). Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
Google Scholar
Lai, M. Y., Sung, C. Y., Lee, J. K., & Hung, M. Y. (2020). Enabling android nnapi flow for tvm runtime. In 49th international conference on parallel processing-icpp: Workshops (pp. 1–8). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3409390.3409393.
Google Scholar
Liao, H. H., Lee, C. L., Lee, J. K., Lai, W. C., Hung, M. Y., & Huang, C. W. (2021). Support convolution of CNN with compression sparse matrix multiplication flow in TVM. In 50th international conference on parallel processing workshop (pp. 1–7). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3458744.3473352.
Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Deep Learning Library. ArXiv. /abs/1912.01703.
Google Scholar
PyTorch (2021). Writing custom C++/CUDA extensions for PyTorch. Retrieved from https://pytorch.org/tutorials/advanced/cpp_extension.html#binding-to-python.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation (Tech. Rep.). California Univ San Diego La Jolla Inst for Cognitive Science.
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. ArXiv. /abs/1801.04381.
Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Google Scholar
Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2017). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295–2329. https://doi.org/10.1109/JPROC.2017.2761740.
Web of Science ®Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31. http://doi.org/10.1609/aaai.v31i1.11231.
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the inception architecture for computer vision. ArXiv. /abs/1512.00567.
Google Scholar
Tensorflow Lite: Ml for Mobile and Edge Devices (n.d.). Retrieved from https://www.tensorflow.org/lite.
Google Scholar
tflite-neuron-delegate (n.d.). Retrieved from https://github.com/MediaTek-NeuroPilot/tflite-neuron-delegate.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems. ArXiv. /abs/1706.03762.
Google Scholar
Wang, S. C., Kan, L. C., Lee, C. L., Hwang, Y. S., & Lee, J. K. (2017). Architecture and compiler support for gpus using energy-efficient affine register files. ACM Transactions on Design Automation of Electronic Systems (TODAES), 23(2), 1–25. https://doi.org/10.1145/3133218.
Web of Science ®Google Scholar
Yang, C. C., Chen, Y. R., Liao, H. H., Chang, Y. M., & Lee, J. K. (2023). Auto-tuning fixed-point precision with TVM on RISC-V Packed SIMD extension. ACM Transactions on Design Automation of Electronic Systems, 28, 21. https://doi.org/10.1145/3569939.
Web of Science ®Google Scholar
Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C. H., Haj-Ali, A., Wang, Y., Yang, J., Zhuo, D., Sen, K., Gonzalez, J. E., & Stoica, I. (2020). Ansor: Generating {High-Performance} Tensor Programs for Deep Learning. In 14th usenix symposium on operating systems design and implementation (OSDI 20) (pp. 863–879). USENIX Association.
Google Scholar
Zmora, N., Jacob, G., Zlotnik, L., Elharar, B., & Novik, G. (2019). Neural network distiller: A python package for dnn compression research. arXiv preprint arXiv:1910.12232.
Google Scholar
Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. ArXiv. /abs/1707.07012.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Accelerating AI performance with the incorporation of TVM and MediaTek NeuroPilot

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Accelerating AI performance with the incorporation of TVM and MediaTek NeuroPilot

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date