Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

Zaiyang Yua Institute of Semiconductors, Chinese Academy of Sciences, Beijing, People's Republic of China;b Cognitive Computing Technology Joint Laboratory, Wave Group, Beijing, People's Republic of ChinaView further author information

Shuang Lic Center of Materials Science and Optoelectronics Engineering & School of Microelectronics, University of Chinese Academy of Sciences, Beijing, People's Republic of China;d Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing, People's Republic of ChinaCorrespondence[email protected]
View further author information

Linjun Suna Institute of Semiconductors, Chinese Academy of Sciences, Beijing, People's Republic of China;b Cognitive Computing Technology Joint Laboratory, Wave Group, Beijing, People's Republic of ChinaCorrespondence[email protected]
View further author information

Liang Liub Cognitive Computing Technology Joint Laboratory, Wave Group, Beijing, People's Republic of ChinaView further author information

Wang Haininge School of Police Administration, People's Public Security University of China, Beijing, People's Republic of ChinaView further author information

Pages 990-1004 | Received 12 Aug 2021, Accepted 23 Dec 2021, Published online: 17 Jan 2022

Cite this article
https://doi.org/10.1080/09540091.2021.2024510
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Bai, X., Wang, X., Liu, X., Liu, Q., Song, J., Sebe, N., & Kim, B. (2021). Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments. Pattern Recognition, 120, 108102. https://doi.org/10.1016/j.patcog.2021.108102
Web of Science ®Google Scholar
Bai, X., Yan, C., Yang, H., Bai, L., Zhou, J., & Hancock, E. R. (2018). Adaptive hash retrieval with kernel based similarity. Pattern Recognition, 75(9), 136–148. https://doi.org/10.1016/j.patcog.2017.03.020
Google Scholar
Bhandare, A., Sripathi, V., Karkada, D., Menon, V., Choi, S., Datta, K., & Saletore, V. (2019). Efficient 8-bit quantization of transformer neural machine language translation model. CoRR abs/1906.00532. http://arxiv.org/abs/1906.00532
Google Scholar
Bie, A., Venkitesh, B., Monteiro, J., Haidar, M. A., & Rezagholizadeh, M. (2019). Fully quantizing a simplified transformer for end-to-end speech recognition. CoRR abs/1911.03604. http://arxiv.org/abs/1911.03604
Google Scholar
Boo, Y., & Sung, W. (2020, May 4–8). Fixed-point optimization of transformer neural network. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020 (pp. 1753–1757). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9054724
Google Scholar
Bradbury, J., Merity, S., Xiong, C., & Socher, R. (2016). Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576.
Google Scholar
Brock, A., De, S., Smith, S. L., & Simonyan, K. (2021, July 18–24). High-performance large-scale image recognition without normalization. In M. Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event (Vol. 139, pp. 1059–1071). PMLR. http://proceedings.mlr.press/v139/brock21a.html
Google Scholar
Carreira-Perpiñán, M. Á., & Idelbayev, Y. (2017). Model compression as constrained optimization, with application to neural nets. Part II: Quantization. CoRR abs/1707.04319. http://arxiv.org/abs/1707.04319
Google Scholar
Chung, I., Kim, B., Choi, Y., Kwon, S. J., Jeon, Y., Park, B., Kim, S, & Lee, D. (2020, November 16–20). Extremely low bit transformer quantization for on-device neural machine translation. In T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event (Vol. EMNLP 2020, pp. 4812–4826). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.433
Google Scholar
Défossez, A., Adi, Y., & Synnaeve, G. (2021). Differentiable model compression via pseudo quantization noise. CoRR abs/2104.09987. https://arxiv.org/abs/2104.09987
Google Scholar
Dong, X., & Yang, Y.. (2019, December 8–14). Network pruning via transformable architecture search . In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch á-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in neural information processing systems 32: Annual conference on Neural Information Processing Systems, Vancouver, BC, Canada (pp. 759–770). PMLR.
Google Scholar
Ge, T., He, K., Ke, Q., & Sun, J. (2013). Optimized product quantization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(4), 744–755. https://doi.org/10.1109/TPAMI.2013.240
Web of Science ®Google Scholar
Grachev, A. M., Ignatov, D. I., & Savchenko, A. V. (2019). Compression of recurrent neural networks for efficient language modeling. Applied Soft Computing, 79(8), 354–362. https://doi.org/10.1016/j.asoc.2019.03.057
Google Scholar
Han, S., Mao, H., & Dally, W. J. (2016, May 2–4). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Y. Bengio & Y. LeCun (Eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, Conference Track Proceedings. http://arxiv.org/abs/1510.00149
Google Scholar
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A. G., Adam, H., & Kalenichenko, D. (2018, June 18–22). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 (pp. 2704–2713). IEEE Computer Society.
Google Scholar
Jegou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. https://doi.org/10.1109/TPAMI.2010.57
Web of Science ®Google Scholar
Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. https://doi.org/10.1109/TPAMI.2010.57
PubMed Web of Science ®Google Scholar
Karayiannis, N. B. (1999). An axiomatic approach to soft learning vector quantization and clustering. IEEE Transactions on Neural Networks, 10(5), 1153–1165. https://doi.org/10.1109/72.788654
PubMedGoogle Scholar
Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L., & Lewis, M. (2020, April 26–30). Generalization through memorization: Nearest neighbor language models. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=HklBjCEKvH
Google Scholar
Krause, B., Kahembwe, E., Murray, I., & Renals, S. (2019). Dynamic evaluation of transformer language models. CoRR abs/1904.08378. http://arxiv.org/abs/1904.08378
Google Scholar
Lee, J. H., Ha, S., Choi, S., Lee, W., & Lee, S. (2018). Quantization for rapid deployment of deep neural networks. CoRR abs/1810.05488. http://arxiv.org/abs/1810.05488
Google Scholar
Li, Y., Dong, X., & Wang, W. (2020, April 26–30). Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=BkgXT24tDS
Google Scholar
Marcus, M., Kim, G., Marcinkiewicz, M. A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., & Schasberger, B. (1994, March 8–11). The Penn treebank: Annotating predicate argument structure. In Human Language Technology: Proceedings of a Workshop, Plainsboro. Morgan Kaufmann.
Google Scholar
Melis, G., Kociský, T., & Blunsom, P. (2020, April 26–30). Mogrifier LSTM. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=SJe5P6EYvS
Google Scholar
Miyashita, D., Lee, E. H., & Murmann, B. (2016). Convolutional neural networks using logarithmic data representation. CoRR abs/1603.01025. http://arxiv.org/abs/1603.01025
Google Scholar
Nagel, M., van Baalen, M., Blankevoort, T., & Welling, M. (2019). Data-free quantization through weight equalization and bias correction. CoRR abs/1906.04721. http://arxiv.org/abs/1906.04721
Google Scholar
Ning, X., Duan, P., Li, W., Shi, Y., & Li, S. (2020). A CPU real-time face alignment for mobile platform. IEEE Access, 8, 8834–8843. https://doi.org/10.1109/Access.6287639
Web of Science ®Google Scholar
Ning, X., Duan, P., Li, W., & Zhang, S. (2020). Real-time 3D face alignment using an encoder-decoder network with an efficient deconvolution layer. IEEE Signal Processing Letters, 27, 1944–1948. https://doi.org/10.1109/LSP.97
Web of Science ®Google Scholar
Ning, X., Gong, K., Li, W., & Zhang, L. (2021). JWSAA: Joint weak saliency and attention aware for person re-identification. Neurocomputing, 453(9), 801–811. https://doi.org/10.1016/j.neucom.2020.05.106
Google Scholar
Ning, X., Gong, K., Li, W., Zhang, L., Bai, X., & Tian, S. (2021). Feature refinement and filter network for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 31(9), 3391–3402. https://doi.org/10.1109/TCSVT.2020.3043026
Web of Science ®Google Scholar
Ning, X., Nan, F., Xu, S., Yu, L., & Zhang, L. (2020). Multi-view frontal face image generation: A survey. Concurrency and Computation: Practice and Experience, e6147. https://doi.org/10.1002/cpe.6147
Google Scholar
Norouzi, M., & Fleet, D. J. (2013). Cartesian k-means. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3017–3024). IEEE Computer Society.
Google Scholar
Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., & Auli, M. (2019). fairseq: A fast, extensible toolkit for sequence modeling. In W. Ammar, A. Louis, & N. Mostafazadeh (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Demonstrations (pp. 48–53). Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-4009
Google Scholar
Prato, G., Charlaix, E., & Rezagholizadeh, M. (2020, November 16–20). Fully quantized transformer for machine translation. In T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event (Vol. EMNLP 2020, pp. 1–14). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.1
Google Scholar
Qi, S., Ning, X., Yang, G., Zhang, L., Long, P., Cai, W., & Li, W. (2021). Review of multi-view 3D object recognition methods based on deep learning. Displays, 69, 102053. https://doi.org/10.1016/j.displa.2021.102053
Web of Science ®Google Scholar
Riquelme, C., Puigcerver, J., Mustafa, B., Neumann, M., Jenatton, R., Pinto, A. S., Keysers, D., & Houlsby, N. (2021). Scaling vision with sparse mixture of experts. CoRR abs/2106.05974. https://arxiv.org/abs/2106.05974
Google Scholar
Santoro, A., Faulkner, R., Raposo, D., Rae, J. W., Chrzanowski, M., Weber, T., Wierstra, D., Vinyals, O., Pascanu, R., & Lillicrap, T. P. (2018). Relational recurrent neural networks. CoRR abs/1806.01822. http://arxiv.org/abs/1806.01822
Google Scholar
Simonyan, K., & Zisserman, A. (2015, May 7–9). Very deep convolutional networks for large-scale image recognition. In Y. Bengio & Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings. http://arxiv.org/abs/1409.1556
Google Scholar
Stock, P., Fan, A., Graham, B., Grave, E., Gribonval, R., Jégou, H., & Joulin, A. (2021, May 3–7). Training with quantization noise for extreme model compression. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event. OpenReview.net. https://openreview.net/forum?id=dV19Yyi1fS3
Google Scholar
Stock, P., Joulin, A., Gribonval, R., Graham, B., & Jégou, H. (2020, April 26–30). And the bit goes down: Revisiting the quantization of neural networks. In 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net. https://openreview.net/forum?id=rJehVyrKwH
Google Scholar
Takase, S., Suzuki, J., & Nagata, M. (2019). Character n-gram embeddings to improve RNN language models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 5074–5082). AAAI Press.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008). MIT Press.
Google Scholar
Wang, C., Bai, X., Wang, X., Liu, X., Zhou, J., Wu, X., Li, H., & Tao, D. (2020). Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE Transactions on Cybernetics, 51(10), 4770–4783. https://doi.org/10.1109/TCYB.2020.2999492
Web of Science ®Google Scholar
Wang, C., Li, M., & Smola, A. J. (2019). Language models with transformers. arXiv preprint arXiv:1904.09408.
Google Scholar
Wang, D., Gong, C., & Liu, Q. (2019, June 9–15). Improving neural language modeling via adversarial training. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA (Vol. 97, pp. 6555–6565). PMLR. http://proceedings.mlr.press/v97/wang19f.html
Google Scholar
Wang, G., Li, W., Zhang, L., Sun, L., Chen, P., Yu, L., & Ning, X. (2021). Encoder-X: Solving unknown coefficients automatically in polynomial fitting by using an autoencoder. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3051430
Google Scholar
Wang, X., Wang, C., Liu, B., Zhou, X., Zhang, L., Zheng, J., & Bai, X. (2021). Multi-view stereo in the deep learning era: A comprehensive review. Displays, 70, 102102. https://doi.org/10.1016/j.displa.2021.102102
Web of Science ®Google Scholar
Wang, Z., Wohlwend, J., & Lei, T. (2020, November 16–20). Structured pruning of large language models. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online (pp. 6151–6162). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.496
Google Scholar
Williams, J. (1993). Narrow-band analyzer [Unpublished doctoral dissertation]. Dept. Elect. Eng., Harvard Univ.
Google Scholar
Xu, S., Chang, C. C., & Liu, Y. (2021). A novel image compression technology based on vector quantisation and linear regression prediction. Connection Science, 33(2), 219–236. https://doi.org/10.1080/09540091.2020.1806206
Web of Science ®Google Scholar
Yan, C., Pang, G., Bai, X., Liu, C., Xin, N., Gu, L., & Zhou, J. (2021). Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss. IEEE Transactions on Multimedia. https://arxiv.org/pdf/2009.10295.pdf
PubMed Web of Science ®Google Scholar
Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2021). Scaling vision transformers. CoRR abs/2106.04560. https://arxiv.org/abs/2106.04560
Google Scholar
Zhang, D., Yang, J., Ye, D., & Hua, G. (2018, September 8–14). LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer vision -- ECCV 2018 -- 15th European Conference, Proceedings, Part VIII (Vol. 11212, pp. 373–390). Springer. https://doi.org/10.1007/978-3-030-01237-3_23
Google Scholar
Zhang, W., Hou, L., Yin, Y., Shang, L., Chen, X., Jiang, X., & Liu, Q. (2020, November 16–20). TernaryBERT: Distillation-aware ultra-low bit BERT. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online (pp. 509–521). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.37
Google Scholar
Zhou, L., Bai, X., Liu, X., Zhou, J., & Hancock, E. R. (2020). Learning binary code for fast nearest subspace search. Pattern Recognition, 98(1), 107040. https://doi.org/10.1016/j.patcog.2019.107040
Google Scholar
Zhu, M., Han, K., Tang, Y., & Wang, Y. (2021). Visual transformer pruning. CoRR abs/2104.08500. https://arxiv.org/abs/2104.08500
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date