Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

Zaiyang Yua Institute of Semiconductors, Chinese Academy of Sciences, Beijing, People's Republic of China;b Cognitive Computing Technology Joint Laboratory, Wave Group, Beijing, People's Republic of ChinaView further author information

Shuang Lic Center of Materials Science and Optoelectronics Engineering & School of Microelectronics, University of Chinese Academy of Sciences, Beijing, People's Republic of China;d Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology, Beijing, People's Republic of ChinaCorrespondence[email protected]
View further author information

Linjun Suna Institute of Semiconductors, Chinese Academy of Sciences, Beijing, People's Republic of China;b Cognitive Computing Technology Joint Laboratory, Wave Group, Beijing, People's Republic of ChinaCorrespondence[email protected]
View further author information

Liang Liub Cognitive Computing Technology Joint Laboratory, Wave Group, Beijing, People's Republic of ChinaView further author information

Wang Haininge School of Police Administration, People's Public Security University of China, Beijing, People's Republic of ChinaView further author information

Abstract

With the development of deep learning, neural networks are widely used in various fields, and the improved model performance also introduces a considerable number of parameters and computations. Model quantisation is a technique that turns floating-point computing into low-specific-point computing, which can effectively reduce model computation strength, parameter size, and memory consumption but often bring a considerable loss of accuracy. This paper mainly addresses the problem where the distribution of parameters is too concentrated during quantisation aware training (QAT). In the QAT process, we use a piecewise function to statistics the parameter distributions and simulate the effect of quantisation noise in each round of training, based on the statistical results. Experimental results show that by quantising the Transformer network, we lose less precision and significantly reduce the storage cost of the model; compared with the full precision LSTM network, our model has higher accuracy under the condition of a similar storage cost. Meanwhile, compared with other quantisation methods on language modelling task, our approach is more accurate. We validated the effectiveness of our policy on the WikiText-103 and PENN Treebank datasets. The experiments show that our method extremely compresses the storage cost and maintains high model performance.

Keywords:

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work is supported by the National Natural Science Foundation of China [grant number 61901436].

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

Information for

Open access

Opportunities

Help and information

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

Abstract

Disclosure statement

Additional information

Funding

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature