Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,107 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# ReplaceMe: Training-Free Transformer Pruning via Layer Removal & Linear Transformations
|
| 6 |
+
[]()
|
| 7 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+

|
| 11 |
+
|
| 12 |
+
## Model Description
|
| 13 |
+
ReplaceMe is a novel method for transformer model compression that enables **training-free** block/layer pruning while maintaining model performance through linear transformations. The approach:
|
| 14 |
+
|
| 15 |
+
- Identifies and removes block of layers
|
| 16 |
+
- Applies mathematically-derived transformations to preserve information flow
|
| 17 |
+
- Requires no fine-tuning or retraining
|
| 18 |
+
- Works with standard transformer architectures (The LTs are merged with the original model weights)
|
| 19 |
+
|
| 20 |
+
## Key Features
|
| 21 |
+
- π **Zero-Training Pruning**: Remove layers without any fine-tuning
|
| 22 |
+
- π§ **Performance Preservation**: <8% accuracy drop in most cases
|
| 23 |
+
- β‘ **Instant Speedup**: less blocks -> faster inference + less memory
|
| 24 |
+
- π **Plug-and-Play**: Works with existing HuggingFace models
|
| 25 |
+
|
| 26 |
+
## π₯ Performance Comparison of Pruning Methods (Llama 2 7B, 25% Compression)
|
| 27 |
+
|
| 28 |
+
| Method | num_pruned_layers | Dataset | State | race π | winogrande π² | piqa π§ | boolq β | openbookqa π | sciq π¬ | lambada_openai π¦ | Avg-acc π |
|
| 29 |
+
|--------------|-------------------|------------|---------------|--------|--------------|--------|---------|--------------|--------|------------------|------------|
|
| 30 |
+
| | | | | acc | acc | acc_norm | acc | acc_norm | acc_norm | acc | acc |
|
| 31 |
+
| **Llama 3.1** (baseline) | - | - | - | 0.449761 | 0.779006 | 0.809576 | 0.84159 | 0.43 | 0.961 | 0.732195 | 3.403683 | **0.711822** |
|
| 32 |
+
| **UIDL*** | - | 8 | slim_orca | no training | 0.34067 | 0.719021 | 0.68988 | 0.773394 | 0.31 | 0.719 | 0.087328 | 932.0 | 0.591994 |
|
| 33 |
+
| **ReplaceMe** (Ours) β
| Cosine | 8 | slim_orca | no training | **0.405742**π | **0.74191** π| **0.705658** π| **0.830275** π| **0.338** π| **0.901** π| **0.470794** π| 16.759605 π| **0.653764** π |
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
**Key:**
|
| 37 |
+
- π Best performance in column
|
| 38 |
+
- β
Training-free (our methods)
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
**Metrics Explained:**
|
| 42 |
+
- **Bold**: Best training-free results
|
| 43 |
+
- All numbers are accuracy scores
|
| 44 |
+
|
| 45 |
+
> π₯ **Our training-free methods achieve 92.5% of baseline performance while other approaches require expensive retraining!**
|
| 46 |
+
|
| 47 |
+
## Installation
|
| 48 |
+
```bash
|
| 49 |
+
pip install replaceme
|
| 50 |
+
# or
|
| 51 |
+
git clone https://github.com/mts-ai/ReplaceMe
|
| 52 |
+
cd ReplaceMe
|
| 53 |
+
pip install -e .
|
| 54 |
+
```
|
| 55 |
+
## Basic Usage
|
| 56 |
+
```
|
| 57 |
+
# LSTSQ method (recommended)
|
| 58 |
+
run_replaceme --config ./reproduce/Replace_Me_pipeline_lstsq.yaml
|
| 59 |
+
|
| 60 |
+
# Cosine similarity method
|
| 61 |
+
run_replaceme --config ./reproduce/Replace_Me_pipeline_cosine.yaml
|
| 62 |
+
```
|
| 63 |
+
There are many parameters you can play with, visit our repo and dscover π₯π₯
|
| 64 |
+
## Load Model
|
| 65 |
+
As we said we are merging the LTs with the original transformer architecture so you just do it as usual
|
| 66 |
+
```python
|
| 67 |
+
## EXAMPLE
|
| 68 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 69 |
+
|
| 70 |
+
model_name = "MTSAIR/Llama3.1-6B-ReplaceMe"
|
| 71 |
+
|
| 72 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 73 |
+
model_name,
|
| 74 |
+
torch_dtype="auto",
|
| 75 |
+
device_map="auto"
|
| 76 |
+
)
|
| 77 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 78 |
+
|
| 79 |
+
prompt = "What is ReplaceME pruning method?!"
|
| 80 |
+
messages = [
|
| 81 |
+
{"role": "user", "content": prompt}
|
| 82 |
+
]
|
| 83 |
+
text = tokenizer.apply_chat_template(
|
| 84 |
+
messages,
|
| 85 |
+
tokenize=False,
|
| 86 |
+
add_generation_prompt=True
|
| 87 |
+
)
|
| 88 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 89 |
+
|
| 90 |
+
output = model.generate(
|
| 91 |
+
**model_inputs,
|
| 92 |
+
max_new_tokens=512
|
| 93 |
+
)
|
| 94 |
+
response = tokenizer.batch_decode(output, skip_special_tokens=True)[0]
|
| 95 |
+
|
| 96 |
+
```
|
| 97 |
+
# Citation
|
| 98 |
+
If you use ReplaceMe in your research, please cite our paper:
|
| 99 |
+
|
| 100 |
+
```bibtex
|
| 101 |
+
@article{replaceme2024,
|
| 102 |
+
title={Replace Me: Network Simplification via Block Pruning and Linear Transformations},
|
| 103 |
+
author={Shopkhoev D., Ali A., Zhussip M., Malykh V., Lefkimmiatis S., Komodakis N., Zagoruyko S.},
|
| 104 |
+
journal={},
|
| 105 |
+
year={2025}
|
| 106 |
+
}
|
| 107 |
+
```
|