Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ base_model:
|
|
| 14 |
|
| 15 |
|
| 16 |
<p align="center">
|
| 17 |
-
<img src
|
| 18 |
</p>
|
| 19 |
|
| 20 |
# Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
|
|
@@ -46,10 +46,10 @@ The recent advancement of Multimodal Large Language Models (MLLMs) has significa
|
|
| 46 |
|
| 47 |
## 😮 Top Multi-Image Grounding Capacity
|
| 48 |
<p align="center">
|
| 49 |
-
<img src="
|
| 50 |
</p>
|
| 51 |
<p align="center">
|
| 52 |
-
<img src="
|
| 53 |
</p>
|
| 54 |
Migician surpasses much larger 70B scale model over all tasks on MIG-Bench by a great margin as shown in the radar image above. Additionally, it demonstrates great competitiveness in several general multi-image understanding benchmarks. We are looking forward to the promising applications of Migician on a broad spectrum of real-world scenarios.
|
| 55 |
|
|
@@ -180,7 +180,7 @@ An example structure for training data:
|
|
| 180 |
As mentioned in the paper, 🎩Migician is finetuned on [Qwen2-vl-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) through a progressive two-stage training process with massive amount of data on 8*A100-80G. You can feel the 🪄magic of multi-image grounding through the following code.
|
| 181 |
|
| 182 |
<p align="center">
|
| 183 |
-
<img src="
|
| 184 |
</p>
|
| 185 |
|
| 186 |
```
|
|
|
|
| 14 |
|
| 15 |
|
| 16 |
<p align="center">
|
| 17 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/RrciC01LCU7QUqh9kEAp-.png" style="width: 30%; max-width: 600px;">
|
| 18 |
</p>
|
| 19 |
|
| 20 |
# Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
|
|
|
|
| 46 |
|
| 47 |
## 😮 Top Multi-Image Grounding Capacity
|
| 48 |
<p align="center">
|
| 49 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/ZZTdrJvSJ9x637ochqf8x.png" width=100%>
|
| 50 |
</p>
|
| 51 |
<p align="center">
|
| 52 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/taqiE_6t7ilwrzIGB71ok.png" width=100%>
|
| 53 |
</p>
|
| 54 |
Migician surpasses much larger 70B scale model over all tasks on MIG-Bench by a great margin as shown in the radar image above. Additionally, it demonstrates great competitiveness in several general multi-image understanding benchmarks. We are looking forward to the promising applications of Migician on a broad spectrum of real-world scenarios.
|
| 55 |
|
|
|
|
| 180 |
As mentioned in the paper, 🎩Migician is finetuned on [Qwen2-vl-7B](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) through a progressive two-stage training process with massive amount of data on 8*A100-80G. You can feel the 🪄magic of multi-image grounding through the following code.
|
| 181 |
|
| 182 |
<p align="center">
|
| 183 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/654f3e104c8874c64d43aafa/3MgtMW_LOQwODDtoRAbY3.png" width=100%>
|
| 184 |
</p>
|
| 185 |
|
| 186 |
```
|