zhyang2226 commited on
Commit
f589492
·
verified ·
1 Parent(s): c365a43

initial commit

Browse files
Files changed (3) hide show
  1. README.md +17 -0
  2. adapter_config.json +31 -0
  3. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OPA-DPO LoRA for LLaVA-v1.5-7B
2
+
3
+ ## Introduction
4
+
5
+ Hallucination remains a major challenge for Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) has gained increasing attention as a simple solution to hallucination issues. Nonetheless, different data construction methods in existing works bring notable performance variations. We identify a crucial factor: outcomes are largely contingent on whether the constructed data aligns on-policy w.r.t the initial (reference) policy of DPO. Due to the implicit KL-divergence constraint, off-policy data cannot be effectively learned.
6
+
7
+ We propose On-Policy Alignment (OPA)-DPO framework, which uniquely leverages expert feedback to correct hallucinated responses and aligns both the original and expert-revised responses in an on-policy manner. Compared with DPO without OPA operations, OPA-DPO significantly enhances performance. It achieves SOTA performance with only 4.8k training data, while most DPO-based algorithms require over 10k data.
8
+
9
+ ## Usage
10
+
11
+ Please refer to our [Github Repository](https://github.com/zhyang2226/OPA-DPO) for more details. If you wish to use our model outside of our code, make sure to update the `base_model_name_or_path` in the `adapter_config.json` file to `liuhaotian/llava-v1.5-7b`.
12
+
13
+ Please note that the LoRA modules are also added on top of the vision tower. **Ensure that the vision tower is loaded before loading the LoRA module.**
14
+
15
+ ## Acknowledgements
16
+
17
+ We would like to express our gratitude for the code snippets provided in [LLaVA](https://github.com/haotian-liu/LLaVA), [LLaVA-RLHF](https://github.com/llava-rlhf/LLaVA-RLHF), [FastChat](https://github.com/lm-sys/FastChat) and [TRL](https://github.com/huggingface/trl), and datasets provided in [RLAIF-V](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset). These resources have significantly contributed to the development of our project.
adapter_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_mapping": null,
3
+ "base_model_name_or_path": "base_models/llava-v1.5-7b",
4
+ "bias": "none",
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "init_lora_weights": true,
8
+ "layers_pattern": null,
9
+ "layers_to_transform": null,
10
+ "lora_alpha": 512,
11
+ "lora_dropout": 0.0,
12
+ "modules_to_save": null,
13
+ "peft_type": "LORA",
14
+ "r": 256,
15
+ "revision": null,
16
+ "target_modules": [
17
+ "q_proj",
18
+ "v_proj",
19
+ "k_proj",
20
+ "up_proj",
21
+ "o_proj",
22
+ "down_proj",
23
+ "out_proj",
24
+ "gate_proj",
25
+ "fc2",
26
+ "mm_projector.0",
27
+ "fc1",
28
+ "mm_projector.2"
29
+ ],
30
+ "task_type": "CAUSAL_LM"
31
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:568767e7162bb9f073c1ba84f5e17091f2a7192dc54c4e30375ee4e432b0d67b
3
+ size 1512852253