

Model compression and acceleration has recently received ever-increasing research attention. Among them, filter pruning shows a promising effectiveness, due to its merits in significant speedup for inference and support on off-the-shelf computing platforms. Most existing works tend to prune filters in a layer-wise manner, where networks are pruned and fine-tuned layer by layer. However, these methods require intensive computation for per-layer sensitivity analysis and suffer from accumulation of pruning errors. To address these challenges, we propose a novel pruning method, namely Tutor-Instructing global Pruning (TIP), to prune the redundant filters in a global manner. TIP introduces Information Gain (IG) to estimate the contribution of filters to the class probability distributions of network output. The motivation of TIP is to formulate filter pruning as a minimization of the IG with respect to a group of pruned filters under a constraint on the size of pruned network. To solve this problem, we propose a Taylor-based approximate algorithm, which can efficiently obtain the IG of each filter by backpropagation. We comprehensively evaluate our TIP on CIFAR-10 and ILSVRC-12. On ILSVRC-12, TIP reduces FLOPs for ResNet-50 by 54.13% with only a drop in top-5 accuracy by 0.1%, which significantly outperforms the state-of-the-art methods.