2,965
Views
8
CrossRef citations to date
0
Altmetric
Efficient Deep Neural Networks for Image Processing in End Side Devices

Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution

, , , &
Pages 990-1004 | Received 12 Aug 2021, Accepted 23 Dec 2021, Published online: 17 Jan 2022
 

Abstract

With the development of deep learning, neural networks are widely used in various fields, and the improved model performance also introduces a considerable number of parameters and computations. Model quantisation is a technique that turns floating-point computing into low-specific-point computing, which can effectively reduce model computation strength, parameter size, and memory consumption but often bring a considerable loss of accuracy. This paper mainly addresses the problem where the distribution of parameters is too concentrated during quantisation aware training (QAT). In the QAT process, we use a piecewise function to statistics the parameter distributions and simulate the effect of quantisation noise in each round of training, based on the statistical results. Experimental results show that by quantising the Transformer network, we lose less precision and significantly reduce the storage cost of the model; compared with the full precision LSTM network, our model has higher accuracy under the condition of a similar storage cost. Meanwhile, compared with other quantisation methods on language modelling task, our approach is more accurate. We validated the effectiveness of our policy on the WikiText-103 and PENN Treebank datasets. The experiments show that our method extremely compresses the storage cost and maintains high model performance.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work is supported by the National Natural Science Foundation of China [grant number 61901436].