Image-Text-to-Text
Transformers
Safetensors
English
qwen2_vl
image-to-text
conversational
text-generation-inference
Michael4933 commited on
Commit
a1f6c6e
·
verified ·
1 Parent(s): d2b2b61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -14,7 +14,7 @@ base_model:
14
 
15
 
16
  <p align="center">
17
- <img src=![image/png](https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/RrciC01LCU7QUqh9kEAp-.png) style="width: 30%">
18
  </p>
19
 
20
  # Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
@@ -46,10 +46,10 @@ The recent advancement of Multimodal Large Language Models (MLLMs) has significa
46
 
47
  ## 😮 Top Multi-Image Grounding Capacity
48
  <p align="center">
49
- <img src="figs/radar.png" width=100%>
50
  </p>
51
  <p align="center">
52
- <img src="figs/multi_general.png" width=100%>
53
  </p>
54
  Migician surpasses much larger 70B scale model over all tasks on MIG-Bench by a great margin as shown in the radar image above. Additionally, it demonstrates great competitiveness in several general multi-image understanding benchmarks. We are looking forward to the promising applications of Migician on a broad spectrum of real-world scenarios.
55
 
@@ -180,7 +180,7 @@ An example structure for training data:
180
  As mentioned in the paper, 🎩Migician is finetuned on [Qwen2-vl-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) through a progressive two-stage training process with massive amount of data on 8*A100-80G. You can feel the 🪄magic of multi-image grounding through the following code.
181
 
182
  <p align="center">
183
- <img src="figs/multi_view_all.png" width=100%>
184
  </p>
185
 
186
  ```
 
14
 
15
 
16
  <p align="center">
17
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/RrciC01LCU7QUqh9kEAp-.png" style="width: 30%; max-width: 600px;">
18
  </p>
19
 
20
  # Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
 
46
 
47
  ## 😮 Top Multi-Image Grounding Capacity
48
  <p align="center">
49
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/ZZTdrJvSJ9x637ochqf8x.png" width=100%>
50
  </p>
51
  <p align="center">
52
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/taqiE_6t7ilwrzIGB71ok.png" width=100%>
53
  </p>
54
  Migician surpasses much larger 70B scale model over all tasks on MIG-Bench by a great margin as shown in the radar image above. Additionally, it demonstrates great competitiveness in several general multi-image understanding benchmarks. We are looking forward to the promising applications of Migician on a broad spectrum of real-world scenarios.
55
 
 
180
  As mentioned in the paper, 🎩Migician is finetuned on [Qwen2-vl-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) through a progressive two-stage training process with massive amount of data on 8*A100-80G. You can feel the 🪄magic of multi-image grounding through the following code.
181
 
182
  <p align="center">
183
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/3MgtMW_LOQwODDtoRAbY3.png" width=100%>
184
  </p>
185
 
186
  ```