initial commit
Browse files- README.md +17 -0
- adapter_config.json +31 -0
- adapter_model.bin +3 -0
README.md
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OPA-DPO LoRA for LLaVA-v1.5-7B
|
| 2 |
+
|
| 3 |
+
## Introduction
|
| 4 |
+
|
| 5 |
+
Hallucination remains a major challenge for Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) has gained increasing attention as a simple solution to hallucination issues. Nonetheless, different data construction methods in existing works bring notable performance variations. We identify a crucial factor: outcomes are largely contingent on whether the constructed data aligns on-policy w.r.t the initial (reference) policy of DPO. Due to the implicit KL-divergence constraint, off-policy data cannot be effectively learned.
|
| 6 |
+
|
| 7 |
+
We propose On-Policy Alignment (OPA)-DPO framework, which uniquely leverages expert feedback to correct hallucinated responses and aligns both the original and expert-revised responses in an on-policy manner. Compared with DPO without OPA operations, OPA-DPO significantly enhances performance. It achieves SOTA performance with only 4.8k training data, while most DPO-based algorithms require over 10k data.
|
| 8 |
+
|
| 9 |
+
## Usage
|
| 10 |
+
|
| 11 |
+
Please refer to our [Github Repository](https://github.com/zhyang2226/OPA-DPO) for more details. If you wish to use our model outside of our code, make sure to update the `base_model_name_or_path` in the `adapter_config.json` file to `liuhaotian/llava-v1.5-7b`.
|
| 12 |
+
|
| 13 |
+
Please note that the LoRA modules are also added on top of the vision tower. **Ensure that the vision tower is loaded before loading the LoRA module.**
|
| 14 |
+
|
| 15 |
+
## Acknowledgements
|
| 16 |
+
|
| 17 |
+
We would like to express our gratitude for the code snippets provided in [LLaVA](https://github.com/haotian-liu/LLaVA), [LLaVA-RLHF](https://github.com/llava-rlhf/LLaVA-RLHF), [FastChat](https://github.com/lm-sys/FastChat) and [TRL](https://github.com/huggingface/trl), and datasets provided in [RLAIF-V](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset). These resources have significantly contributed to the development of our project.
|
adapter_config.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"auto_mapping": null,
|
| 3 |
+
"base_model_name_or_path": "base_models/llava-v1.5-7b",
|
| 4 |
+
"bias": "none",
|
| 5 |
+
"fan_in_fan_out": false,
|
| 6 |
+
"inference_mode": true,
|
| 7 |
+
"init_lora_weights": true,
|
| 8 |
+
"layers_pattern": null,
|
| 9 |
+
"layers_to_transform": null,
|
| 10 |
+
"lora_alpha": 512,
|
| 11 |
+
"lora_dropout": 0.0,
|
| 12 |
+
"modules_to_save": null,
|
| 13 |
+
"peft_type": "LORA",
|
| 14 |
+
"r": 256,
|
| 15 |
+
"revision": null,
|
| 16 |
+
"target_modules": [
|
| 17 |
+
"q_proj",
|
| 18 |
+
"v_proj",
|
| 19 |
+
"k_proj",
|
| 20 |
+
"up_proj",
|
| 21 |
+
"o_proj",
|
| 22 |
+
"down_proj",
|
| 23 |
+
"out_proj",
|
| 24 |
+
"gate_proj",
|
| 25 |
+
"fc2",
|
| 26 |
+
"mm_projector.0",
|
| 27 |
+
"fc1",
|
| 28 |
+
"mm_projector.2"
|
| 29 |
+
],
|
| 30 |
+
"task_type": "CAUSAL_LM"
|
| 31 |
+
}
|
adapter_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:568767e7162bb9f073c1ba84f5e17091f2a7192dc54c4e30375ee4e432b0d67b
|
| 3 |
+
size 1512852253
|