2,965
Views
8
CrossRef citations to date
0
Altmetric
Efficient Deep Neural Networks for Image Processing in End Side Devices

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

, , , &
Pages 990-1004 | Received 12 Aug 2021, Accepted 23 Dec 2021, Published online: 17 Jan 2022

References

  • Bai, X., Wang, X., Liu, X., Liu, Q., Song, J., Sebe, N., & Kim, B. (2021). Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments. Pattern Recognition120, 108102. https://doi.org/10.1016/j.patcog.2021.108102
  • Bai, X., Yan, C., Yang, H., Bai, L., Zhou, J., & Hancock, E. R. (2018). Adaptive hash retrieval with kernel based similarity. Pattern Recognition, 75(9), 136–148. https://doi.org/10.1016/j.patcog.2017.03.020
  • Bhandare, A., Sripathi, V., Karkada, D., Menon, V., Choi, S., Datta, K., & Saletore, V. (2019). Efficient 8-bit quantization of transformer neural machine language translation model. CoRR abs/1906.00532. http://arxiv.org/abs/1906.00532
  • Bie, A., Venkitesh, B., Monteiro, J., Haidar, M. A., & Rezagholizadeh, M. (2019). Fully quantizing a simplified transformer for end-to-end speech recognition. CoRR abs/1911.03604. http://arxiv.org/abs/1911.03604
  • Boo, Y., & Sung, W. (2020, May 4–8). Fixed-point optimization of transformer neural network. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020 (pp. 1753–1757). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9054724
  • Bradbury, J., Merity, S., Xiong, C., & Socher, R. (2016). Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576.
  • Brock, A., De, S., Smith, S. L., & Simonyan, K. (2021, July 18–24). High-performance large-scale image recognition without normalization. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event (Vol. 139, pp. 1059–1071). PMLR. http://proceedings.mlr.press/v139/brock21a.html
  • Carreira-Perpiñán, M. Á., & Idelbayev, Y. (2017). Model compression as constrained optimization, with application to neural nets. Part II: Quantization. CoRR abs/1707.04319. http://arxiv.org/abs/1707.04319
  • Chung, I., Kim, B., Choi, Y., Kwon, S. J., Jeon, Y., Park, B., Kim, S, & Lee, D. (2020, November 16–20). Extremely low bit transformer quantization for on-device neural machine translation. In T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event (Vol. EMNLP 2020, pp. 4812–4826). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.433
  • Défossez, A., Adi, Y., & Synnaeve, G. (2021). Differentiable model compression via pseudo quantization noise. CoRR abs/2104.09987. https://arxiv.org/abs/2104.09987
  • Dong, X., & Yang, Y.. (2019, December 8–14). Network pruning via transformable architecture search .  In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch á-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in neural information processing systems 32: Annual conference on Neural Information Processing Systems, Vancouver, BC, Canada (pp. 759–770). PMLR.
  • Ge, T., He, K., Ke, Q., & Sun, J. (2013). Optimized product quantization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 744–755. https://doi.org/10.1109/TPAMI.2013.240
  • Grachev, A. M., Ignatov, D. I., & Savchenko, A. V. (2019). Compression of recurrent neural networks for efficient language modeling. Applied Soft Computing, 79(8), 354–362. https://doi.org/10.1016/j.asoc.2019.03.057
  • Han, S., Mao, H., & Dally, W. J. (2016, May 2–4). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Y. Bengio & Y. LeCun (Eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, Conference Track Proceedings. http://arxiv.org/abs/1510.00149
  • Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A. G., Adam, H., & Kalenichenko, D. (2018, June 18–22). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (pp. 2704–2713). IEEE Computer Society.
  • Jegou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. https://doi.org/10.1109/TPAMI.2010.57
  • Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. https://doi.org/10.1109/TPAMI.2010.57
  • Karayiannis, N. B. (1999). An axiomatic approach to soft learning vector quantization and clustering. IEEE Transactions on Neural Networks, 10(5), 1153–1165. https://doi.org/10.1109/72.788654
  • Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L., & Lewis, M. (2020, April 26–30). Generalization through memorization: Nearest neighbor language models. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=HklBjCEKvH
  • Krause, B., Kahembwe, E., Murray, I., & Renals, S. (2019). Dynamic evaluation of transformer language models. CoRR abs/1904.08378. http://arxiv.org/abs/1904.08378
  • Lee, J. H., Ha, S., Choi, S., Lee, W., & Lee, S. (2018). Quantization for rapid deployment of deep neural networks. CoRR abs/1810.05488. http://arxiv.org/abs/1810.05488
  • Li, Y., Dong, X., & Wang, W. (2020, April 26–30). Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=BkgXT24tDS
  • Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., & Schasberger, B. (1994, March 8–11). The Penn treebank: Annotating predicate argument structure. In Human Language Technology: Proceedings of a Workshop, Plainsboro. Morgan Kaufmann.
  • Melis, G., Kociský, T., & Blunsom, P. (2020, April 26–30). Mogrifier LSTM. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=SJe5P6EYvS
  • Miyashita, D., Lee, E. H., & Murmann, B. (2016). Convolutional neural networks using logarithmic data representation. CoRR abs/1603.01025. http://arxiv.org/abs/1603.01025
  • Nagel, M., van Baalen, M., Blankevoort, T., & Welling, M. (2019). Data-free quantization through weight equalization and bias correction. CoRR abs/1906.04721. http://arxiv.org/abs/1906.04721
  • Ning, X., Duan, P., Li, W., Shi, Y., & Li, S. (2020). A CPU real-time face alignment for mobile platform. IEEE Access, 8, 8834–8843. https://doi.org/10.1109/Access.6287639
  • Ning, X., Duan, P., Li, W., & Zhang, S. (2020). Real-time 3D face alignment using an encoder-decoder network with an efficient deconvolution layer. IEEE Signal Processing Letters, 27, 1944–1948. https://doi.org/10.1109/LSP.97
  • Ning, X., Gong, K., Li, W., & Zhang, L. (2021). JWSAA: Joint weak saliency and attention aware for person re-identification. Neurocomputing, 453(9), 801–811. https://doi.org/10.1016/j.neucom.2020.05.106
  • Ning, X., Gong, K., Li, W., Zhang, L., Bai, X., & Tian, S. (2021). Feature refinement and filter network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 31(9), 3391–3402. https://doi.org/10.1109/TCSVT.2020.3043026
  • Ning, X., Nan, F., Xu, S., Yu, L., & Zhang, L. (2020). Multi-view frontal face image generation: A survey. Concurrency and Computation: Practice and Experience, e6147. https://doi.org/10.1002/cpe.6147
  • Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3017–3024). IEEE Computer Society.
  • Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., & Auli, M. (2019). fairseq: A fast, extensible toolkit for sequence modeling. In W. Ammar, A. Louis, & N. Mostafazadeh (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Demonstrations (pp. 48–53). Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-4009
  • Prato, G., Charlaix, E., & Rezagholizadeh, M. (2020, November 16–20). Fully quantized transformer for machine translation. In T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event (Vol. EMNLP 2020, pp. 1–14). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.1
  • Qi, S., Ning, X., Yang, G., Zhang, L., Long, P., Cai, W., & Li, W. (2021). Review of multi-view 3D object recognition methods based on deep learning. Displays, 69, 102053. https://doi.org/10.1016/j.displa.2021.102053
  • Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pinto, A. S., Keysers, D., & Houlsby, N. (2021). Scaling vision with sparse mixture of experts. CoRR abs/2106.05974. https://arxiv.org/abs/2106.05974
  • Santoro, A., Faulkner, R., Raposo, D., Rae, J. W., Chrzanowski, M., Weber, T., Wierstra, D., Vinyals, O., Pascanu, R., & Lillicrap, T. P. (2018). Relational recurrent neural networks. CoRR abs/1806.01822. http://arxiv.org/abs/1806.01822
  • Simonyan, K., & Zisserman, A. (2015, May 7–9). Very deep convolutional networks for large-scale image recognition. In Y. Bengio & Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings. http://arxiv.org/abs/1409.1556
  • Stock, P., Fan, A., Graham, B., Grave, E., Gribonval, R., Jégou, H., & Joulin, A. (2021, May 3–7). Training with quantization noise for extreme model compression. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event. OpenReview.net. https://openreview.net/forum?id=dV19Yyi1fS3
  • Stock, P., Joulin, A., Gribonval, R., Graham, B., & Jégou, H. (2020, April 26–30). And the bit goes down: Revisiting the quantization of neural networks. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=rJehVyrKwH
  • Takase, S., Suzuki, J., & Nagata, M. (2019). Character n-gram embeddings to improve RNN language models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 5074–5082). AAAI Press.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008). MIT Press.
  • Wang, C., Bai, X., Wang, X., Liu, X., Zhou, J., Wu, X., Li, H., & Tao, D. (2020). Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Transactions on Cybernetics, 51(10), 4770–4783. https://doi.org/10.1109/TCYB.2020.2999492
  • Wang, C., Li, M., & Smola, A. J. (2019). Language models with transformers. arXiv preprint arXiv:1904.09408.
  • Wang, D., Gong, C., & Liu, Q. (2019, June 9–15). Improving neural language modeling via adversarial training. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA (Vol. 97, pp. 6555–6565). PMLR. http://proceedings.mlr.press/v97/wang19f.html
  • Wang, G., Li, W., Zhang, L., Sun, L., Chen, P., Yu, L., & Ning, X. (2021). Encoder-X: Solving unknown coefficients automatically in polynomial fitting by using an autoencoder. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3051430
  • Wang, X., Wang, C., Liu, B., Zhou, X., Zhang, L., Zheng, J., & Bai, X. (2021). Multi-view stereo in the deep learning era: A comprehensive review. Displays, 70, 102102. https://doi.org/10.1016/j.displa.2021.102102
  • Wang, Z., Wohlwend, J., & Lei, T. (2020, November 16–20). Structured pruning of large language models. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online (pp. 6151–6162). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.496
  • Williams, J. (1993). Narrow-band analyzer [Unpublished doctoral dissertation]. Dept. Elect. Eng., Harvard Univ.
  • Xu, S., Chang, C. C., & Liu, Y. (2021). A novel image compression technology based on vector quantisation and linear regression prediction. Connection Science, 33(2), 219–236. https://doi.org/10.1080/09540091.2020.1806206
  • Yan, C., Pang, G., Bai, X., Liu, C., Xin, N., Gu, L., & Zhou, J. (2021). Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia. https://arxiv.org/pdf/2009.10295.pdf
  • Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2021). Scaling vision transformers. CoRR abs/2106.04560. https://arxiv.org/abs/2106.04560
  • Zhang, D., Yang, J., Ye, D., & Hua, G. (2018, September 8–14). LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer vision -- ECCV 2018 -- 15th European Conference, Proceedings, Part VIII (Vol. 11212, pp. 373–390). Springer. https://doi.org/10.1007/978-3-030-01237-3_23
  • Zhang, W., Hou, L., Yin, Y., Shang, L., Chen, X., Jiang, X., & Liu, Q. (2020, November 16–20). TernaryBERT: Distillation-aware ultra-low bit BERT. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online (pp. 509–521). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.37
  • Zhou, L., Bai, X., Liu, X., Zhou, J., & Hancock, E. R. (2020). Learning binary code for fast nearest subspace search. Pattern Recognition, 98(1), 107040. https://doi.org/10.1016/j.patcog.2019.107040
  • Zhu, M., Han, K., Tang, Y., & Wang, Y. (2021). Visual transformer pruning. CoRR abs/2104.08500. https://arxiv.org/abs/2104.08500