Michael4933
/

Migician

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Michael4933 commited on Jan 3

Commit

a1f6c6e

·

verified ·

1 Parent(s): d2b2b61

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ base_model:
 <p align="center">
-<img src=![image/png](https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/RrciC01LCU7QUqh9kEAp-.png) style="width: 30%">
 </p>
 # Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
@@ -46,10 +46,10 @@ The recent advancement of Multimodal Large Language Models (MLLMs) has significa
 ## 😮 Top Multi-Image Grounding Capacity
 <p align="center">
-<img src="figs/radar.png" width=100%>
 </p>
 <p align="center">
-<img src="figs/multi_general.png" width=100%>
 </p>
 Migician surpasses much larger 70B scale model over all tasks on MIG-Bench by a great margin as shown in the radar image above. Additionally, it demonstrates great competitiveness in several general multi-image understanding benchmarks. We are looking forward to the promising applications of Migician on a broad spectrum of real-world scenarios.
@@ -180,7 +180,7 @@ An example structure for training data:
 As mentioned in the paper, 🎩Migician is finetuned on [Qwen2-vl-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) through a progressive two-stage training process with massive amount of data on 8*A100-80G. You can feel the 🪄magic of multi-image grounding through the following code.
 <p align="center">
-<img src="figs/multi_view_all.png" width=100%>
 </p>
 ```

 <p align="center">
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/RrciC01LCU7QUqh9kEAp-.png" style="width: 30%; max-width: 600px;">
 </p>
 # Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
 ## 😮 Top Multi-Image Grounding Capacity
 <p align="center">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/ZZTdrJvSJ9x637ochqf8x.png" width=100%>
 </p>
 <p align="center">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/taqiE_6t7ilwrzIGB71ok.png" width=100%>
 </p>
 Migician surpasses much larger 70B scale model over all tasks on MIG-Bench by a great margin as shown in the radar image above. Additionally, it demonstrates great competitiveness in several general multi-image understanding benchmarks. We are looking forward to the promising applications of Migician on a broad spectrum of real-world scenarios.
 As mentioned in the paper, 🎩Migician is finetuned on [Qwen2-vl-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) through a progressive two-stage training process with massive amount of data on 8*A100-80G. You can feel the 🪄magic of multi-image grounding through the following code.
 <p align="center">
+<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/3MgtMW_LOQwODDtoRAbY3.png" width=100%>
 </p>
 ```