MedM-VL-2D-3B-en

Introduction

A 2D medical LVLM trained on 2D medical images and English medical texts, enabling tasks such as report generation, VQA, referring expression comprehension (REC), referring expression generation (REG) and image classification.

	Config
Image encoder	google/siglip-base-patch16-256-multilingual
Connector	MLP (2-layer)
LLM	Qwen/Qwen2.5-3B-Instruct
Image resolution	256*256
Sequence length	2048

Evaluation

Benchmark	Med-Flamingo	LLaVA-Med	RadFM	MedM-VL-2D-3B-en
MedMNIST_derma	0.012	0.258	0.051	0.810
MedMNIST_organ	0.089	0.668	0.189	0.791
MedPix	0.081	0.151	-	0.087
MIMIC-CXR	0.233	0.204	0.068	0.222
PathVQA	0.334	0.378	0.248	0.634
SAMed_identify	-	0.458	-	0.637
SAMed_refer	-	0.086	-	0.225
SLAKE_identify	-	0.272	-	0.349
SLAKE_refer	-	0.041	-	0.261
SLAKE_vqa	0.215	0.337	0.817	0.812

Quickstart

Please refer to MedM-VL.

Citation

@inproceedings{shi2025medm,
  title={Medm-vl: What makes a good medical lvlm?},
  author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Fu, Xiangling and Li, Miao and Wu, Ji},
  booktitle={International Workshop on Agentic AI for Medicine},
  pages={290--299},
  year={2025},
  organization={Springer}
}

Downloads last month: 20

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for shiym2000/MedM-VL-2D-3B-en

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

(812)

this model

Collection including shiym2000/MedM-VL-2D-3B-en

MedM-VL

Collection

Model weights for 2D/3D medical LVLMs • 3 items • Updated Apr 10 • 1