Variance-Based Pruning for Accelerating and Compressing Trained Networks
Abstract
Variance-Based Pruning is a one-shot pruning technique that efficiently compresses networks with minimal fine-tuning, maintaining high performance and reducing computational costs.
Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers activation statistics, which are used to select neurons for pruning. Simultaneously the mean activations are integrated back into the model to preserve a high degree of performance. On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance and requires only 10 epochs of fine-tuning to regain 99% of the original accuracy while simultaneously reducing MACs by 35% and model size by 36%, thus speeding up the model by 1.44x.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression (2025)
- NIRVANA: Structured pruning reimagined for large language models compression (2025)
- Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression (2025)
- DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning (2025)
- Study of Training Dynamics for Memory-Constrained Fine-Tuning (2025)
- A Free Lunch in LLM Compression: Revisiting Retraining after Pruning (2025)
- Elastic ViTs from Pretrained Models without Retraining (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper