907
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Accelerating AI performance with the incorporation of TVM and MediaTek NeuroPilot

ORCID Icon, , , &
Article: 2272586 | Received 01 Jan 2023, Accepted 13 Oct 2023, Published online: 30 Oct 2023

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., …Yu, Y.2016). {TensorFlow}: A system for {Large-Scale} machine learning. In 12th usenix symposium on operating systems design and implementation (OSDI 16) (pp. 265–283). USA: USENIX Association.
  • Bai, J., Lu, F., & Zhang, K. (2019). Onnx: Open neural network exchange. GitHub. https://github.com/onnx/onnx.
  • Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., & Zhang, Z. (2015). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.
  • Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., & Krishnamurthy, A. (2018). {TVM}: An automated {End-to-End} optimizing compiler for deep learning. In 13th usenix symposium on operating systems design and implementation (OSDI 18) (pp. 578–594). USA: USENIX Association.
  • Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2016). Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127–138. https://doi.org/10.1109/JSSC.2016.2616357.
  • George, A., & Marcel, S. (2019). Deep pixel-wise binary supervision for face presentation attack detection. Crete, Greece: International Conference on Biometrics (ICB).
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622.
  • Gulli, A., & Pal, S. (2017). Deep learning with keras. Packt Publishing Ltd.
  • Guo, J., He, H., He, T., Lausen, L., Li, M., Lin, H., Shi, X., Wang, C., Xie, J., Zha, S., Zhang, A., Zhang, H., Zhang, Z., Zhang, Z., Zheng, S., & Zhu, Y. (2020). GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, 21(23), 1–7.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. ArXiv. /abs/1512.03385.
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  • Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. (2014). Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869.
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
  • Lai, M. Y., Sung, C. Y., Lee, J. K., & Hung, M. Y. (2020). Enabling android nnapi flow for tvm runtime. In 49th international conference on parallel processing-icpp: Workshops (pp. 1–8). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3409390.3409393.
  • Liao, H. H., Lee, C. L., Lee, J. K., Lai, W. C., Hung, M. Y., & Huang, C. W. (2021). Support convolution of CNN with compression sparse matrix multiplication flow in TVM. In 50th international conference on parallel processing workshop (pp. 1–7). New York, NY: Association for Computing Machinery. https://doi.org/10.1145/3458744.3473352.
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Deep Learning Library. ArXiv. /abs/1912.01703.
  • PyTorch (2021). Writing custom C++/CUDA extensions for PyTorch. Retrieved from https://pytorch.org/tutorials/advanced/cpp_extension.html#binding-to-python.
  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation (Tech. Rep.). California Univ San Diego La Jolla Inst for Cognitive Science.
  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. ArXiv. /abs/1801.04381.
  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  • Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2017). Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12), 2295–2329. https://doi.org/10.1109/JPROC.2017.2761740.
  • Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31. http://doi.org/10.1609/aaai.v31i1.11231.
  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the inception architecture for computer vision. ArXiv. /abs/1512.00567.
  • Tensorflow Lite: Ml for Mobile and Edge Devices (n.d.). Retrieved from https://www.tensorflow.org/lite.
  • tflite-neuron-delegate (n.d.). Retrieved from https://github.com/MediaTek-NeuroPilot/tflite-neuron-delegate.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems. ArXiv. /abs/1706.03762.
  • Wang, S. C., Kan, L. C., Lee, C. L., Hwang, Y. S., & Lee, J. K. (2017). Architecture and compiler support for gpus using energy-efficient affine register files. ACM Transactions on Design Automation of Electronic Systems (TODAES), 23(2), 1–25. https://doi.org/10.1145/3133218.
  • Yang, C. C., Chen, Y. R., Liao, H. H., Chang, Y. M., & Lee, J. K. (2023). Auto-tuning fixed-point precision with TVM on RISC-V Packed SIMD extension. ACM Transactions on Design Automation of Electronic Systems, 28, 21. https://doi.org/10.1145/3569939.
  • Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C. H., Haj-Ali, A., Wang, Y., Yang, J., Zhuo, D., Sen, K., Gonzalez, J. E., & Stoica, I. (2020). Ansor: Generating {High-Performance} Tensor Programs for Deep Learning. In 14th usenix symposium on operating systems design and implementation (OSDI 20) (pp. 863–879). USENIX Association.
  • Zmora, N., Jacob, G., Zlotnik, L., Elharar, B., & Novik, G. (2019). Neural network distiller: A python package for dnn compression research. arXiv preprint arXiv:1910.12232.
  • Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2017). Learning transferable architectures for scalable image recognition. ArXiv. /abs/1707.07012.