initial commit

Browse files

Files changed (3) hide show

README.md +17 -0
adapter_config.json +31 -0
adapter_model.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,17 @@

+# OPA-DPO LoRA for LLaVA-v1.5-7B
+## Introduction
+Hallucination remains a major challenge for Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) has gained increasing attention as a simple solution to hallucination issues. Nonetheless, different data construction methods in existing works bring notable performance variations. We identify a crucial factor: outcomes are largely contingent on whether the constructed data aligns on-policy w.r.t the initial (reference) policy of DPO. Due to the implicit KL-divergence constraint, off-policy data cannot be effectively learned.
+We propose On-Policy Alignment (OPA)-DPO framework, which uniquely leverages expert feedback to correct hallucinated responses and aligns both the original and expert-revised responses in an on-policy manner. Compared with DPO without OPA operations, OPA-DPO significantly enhances performance. It achieves SOTA performance with only 4.8k training data, while most DPO-based algorithms require over 10k data.
+## Usage
+Please refer to our [Github Repository](https://github.com/zhyang2226/OPA-DPO) for more details. If you wish to use our model outside of our code, make sure to update the `base_model_name_or_path` in the `adapter_config.json` file to `liuhaotian/llava-v1.5-7b`.
+Please note that the LoRA modules are also added on top of the vision tower. **Ensure that the vision tower is loaded before loading the LoRA module.**
+## Acknowledgements
+We would like to express our gratitude for the code snippets provided in [LLaVA](https://github.com/haotian-liu/LLaVA), [LLaVA-RLHF](https://github.com/llava-rlhf/LLaVA-RLHF), [FastChat](https://github.com/lm-sys/FastChat) and [TRL](https://github.com/huggingface/trl), and datasets provided in [RLAIF-V](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset). These resources have significantly contributed to the development of our project.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "auto_mapping": null,
+  "base_model_name_or_path": "base_models/llava-v1.5-7b",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 512,
+  "lora_dropout": 0.0,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 256,
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj",
+    "k_proj",
+    "up_proj",
+    "o_proj",
+    "down_proj",
+    "out_proj",
+    "gate_proj",
+    "fc2",
+    "mm_projector.0",
+    "fc1",
+    "mm_projector.2"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:568767e7162bb9f073c1ba84f5e17091f2a7192dc54c4e30375ee4e432b0d67b
+size 1512852253