As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Deep neural networks have achieved great success on a wide spectrum of applications. However, neural network (NN) mod- els often include a massive number of weights and consume much memory. To reduce the NN model size, we observe that the struc- ture of the weight matrix can be further re-organized for a better compression, i.e., converting the weight matrix to the block diag- onal structure. Therefore, in this paper, we formulate a new re- search problem to consider the structural factor of the weight ma- trix, named Compression with Difference-Minimized Block Diagonal Structure (COMIS), and propose a new algorithm, Memory-Efficient and Structure-Aware Compression (MESA), which effectively prunes the weights into a block diagonal structure to significantly boost the compression rate. Extensive experiments on different models show that MESA achieves 135× to 392× compression rates for different models, which are 1.8 to 3.03 times the compression rates of the state-of-the-art approaches. In addition, our approach provides an in- ference speed-up from 2.6× to 5.1×, a speed-up up to 44% to the state-of-the-art approaches.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.