MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Paper
•
2412.19509
•
Published
Clone this repo
git clone --recurse-submodules [email protected]:thu-nics/MBQ.git
Create a conda env
conda create -n qmllm python=3.10
Install packages and 3rdparty repos.
# Install LLaVA-NeXT
cd ./3rdparty/LLaVA-NeXT
pip install -e .
# Install lmms-eval
cd ./3rdparty/lmms-eval
pip install -e .
# Install qmllm
pip install -r requirements.txt
pip install -e .
qmllm package
Quantization search for MLLMs is executed based on main_quant.py. A variety of arguments are available to configure the quantization search process. We also support using YAML files for parameter configuration, you can refer to yaml configs to directly use and adjust parameters, or create your own custom configuration.
Model arguments
--model : Select which model type is processed during quantization search. Must be a string corresponding to the name of the model type. internvl2, llava_onevision, llava, qwen2_vl now.--model_args : Control parameters passed to the model constructor. Accepts a string containing model path", for example --model_args pretrained=OpenGVLab/InternVL2-8B.Calibration arguments
--calib_data : Select which calibration data type is used during quantization search. pileval and coco now.--n_samples : The number of the samples used in quantization search.--data_path : Accept a string of the dataset path.pileval , we use mit-han-lab/pile-val-backup.coco , the data need to be a JSON or JSONL file, you can refer to sharegpt4v for data preparation.--image_folder : Accept a string of the image folder, you can refer to sharegpt4v for data preparation.--few_shot_format : Organize the calibration data in an interleaved format, currently by simply concatenating two samples.--calib_data=coco.--interleave_format : Organize the calibration data with image-text pairs and pure text data, currently by simply insert 512 pure text token in two image-text pairs.--calib_data=coco.--text_data_path : Accept a string of the pure text dataset path, this dataset will be used in interleave_format, we use mit-han-lab/pile-val-backup.Quantization arguments
--method : Select the quantization search type, support mbq , awq , smoothquant and rtn.--run_process : Specify this parameter to run the quantization search.--w_bit: Specify the weight bit.--w_group: Specify the group size in weight-only per-group quantization.--a_bit: Specify the activation bit.--alpha: The hyperparameter of Smoothquant.--reweight: Specify this parameter to use gradient to reweight the loss during quantization search.--distort: Specify this parameter to use distort feature map during quantization search.--loss_mode: Select the loss type during quantization search, support mae , mse.--scale_path: The path for saving quantization search results.--pseudo_quant: Specify this parameter to perform pseudo quantization for the model.python3 -W ignore main_quant.py \
--config configs/internvl2/MBQ_search/8b_weight_only.yaml
--run_process in the command and provide the appropriate data path and quantization config.scale_path, and we use the results to perform quantization.python3 -W ignore main_quant.py \
--model internvl2
--model_args pretrained="OpenGVLab/InternVL2-8B" \
--calib_data coco \
--data_path "your/data/path/" \
--image_folder "your/image/folder" \
--n_samples 128 \
--interleave_format \
--method mbq \
--run_process \
--w_bit 4 \
--w_group 128 \
--reweight \
--loss_mode mae \
--scale_path "scale_cache/mbq/internvl2_w4g128.pt"
python3 -W ignore main_quant.py \
--model internvl2
--model_args pretrained="OpenGVLab/InternVL2-8B" \
--calib_data coco \
--data_path "your/data/path/" \
--image_folder "your/image/folder" \
--n_samples 128 \
--method mbq \
--run_process \
--w_bit 4 \
--a_bit 8 \
--reweight \
--distort \
--loss_mode mae \
--scale_path "scale_cache/mbq/internvl2_w4a8.pt"
python3 -W ignore main.py \
--config configs/internvl2/Eval/eval.yaml
--pseudo_quant in the command and provide the appropriate scale path and quantization config.Evaluation with weight-only quantization
python3 -W ignore main.py \
--model internvl2
--model_args pretrained="OpenGVLab/InternVL2-8B" \
--tasks mmmu \
--batch_size 1 \
--log_samples \
--log_samples_suffix mmmu \
--method mbq \
--pseudo_quant \
--w_bit 4 \
--w_group 128 \
--output_path "your/output/path" \
--scale_path "scale_cache/mbq/internvl2_w4g128.pt"
Evaluation with weight-activation quantization
python3 -W ignore main.py \
--model internvl2
--model_args pretrained="OpenGVLab/InternVL2-8B" \
--tasks mmmu \
--batch_size 1 \
--log_samples \
--log_samples_suffix mmmu \
--method mbq \
--pseudo_quant \
--w_bit 4 \
--a_bit 8 \
--output_path "your/output/path" \
--scale_path "scale_cache/mbq/internvl2_w4a8.pt"
@misc{li2024mbq,
title={MBQ: Modality-Balanced Quantization for Large Vision-Language Models},
author={Shiyao Li and Yingchun Hu and Xuefei Ning and Xihui Liu and Ke Hong and Xiaotao Jia and Xiuhong Li and Yaqi Yan and Pei Ran and Guohao Dai and Shengen Yan and Huazhong Yang and Yu Wang},
year={2024},
eprint={2412.19509},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.19509},
}
This work is maintained by NICS-EFC Lab (Tsinghua University) and Infinigence-AI (Beijing China).