On Minimizing Diagonal Block-Wise Differences for Neural Network Compression

Hsu, Yun-Jui; Chang, Yi-Ting; Shen, Chih-Ya; Shuai, Hong-Han; Tseng, Wei-Lun; Yang, Chen-Hsu

doi:10.3233/FAIA200219

Abstract

Deep neural networks have achieved great success on a wide spectrum of applications. However, neural network (NN) mod- els often include a massive number of weights and consume much memory. To reduce the NN model size, we observe that the struc- ture of the weight matrix can be further re-organized for a better compression, i.e., converting the weight matrix to the block diag- onal structure. Therefore, in this paper, we formulate a new re- search problem to consider the structural factor of the weight ma- trix, named Compression with Difference-Minimized Block Diagonal Structure (COMIS), and propose a new algorithm, Memory-Efficient and Structure-Aware Compression (MESA), which effectively prunes the weights into a block diagonal structure to significantly boost the compression rate. Extensive experiments on different models show that MESA achieves 135× to 392× compression rates for different models, which are 1.8 to 3.03 times the compression rates of the state-of-the-art approaches. In addition, our approach provides an in- ference speed-up from 2.6× to 5.1×, a speed-up up to 44% to the state-of-the-art approaches.

This website uses cookies

This website uses cookies