MedM-VL-2D-3B-en

Introduction

A 2D medical LVLM trained on 2D medical images and English medical texts, enabling tasks such as report generation, VQA, referring expression comprehension (REC), referring expression generation (REG) and image classification.

Config
Image encoder google/siglip-base-patch16-256-multilingual
Connector MLP (2-layer)
LLM Qwen/Qwen2.5-3B-Instruct
Image resolution 256*256
Sequence length 2048

Evaluation

Benchmark Med-Flamingo LLaVA-Med RadFM MedM-VL-2D-3B-en
MedMNISTderma 0.012 0.258 0.051 0.810
MedMNISTorgan 0.089 0.668 0.189 0.791
MedPix 0.081 0.151 - 0.087
MIMIC-CXR 0.233 0.204 0.068 0.222
PathVQA 0.334 0.378 0.248 0.634
SAMedidentify - 0.458 - 0.637
SAMedrefer - 0.086 - 0.225
SLAKEidentify - 0.272 - 0.349
SLAKErefer - 0.041 - 0.261
SLAKEvqa 0.215 0.337 0.817 0.812

Quickstart

Please refer to MedM-VL.

Citation

@inproceedings{shi2025medm,
  title={Medm-vl: What makes a good medical lvlm?},
  author={Shi, Yiming and Yang, Shaoshuai and Zhu, Xun and Wang, Haoyu and Fu, Xiangling and Li, Miao and Wu, Ji},
  booktitle={International Workshop on Agentic AI for Medicine},
  pages={290--299},
  year={2025},
  organization={Springer}
}
Downloads last month
20
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shiym2000/MedM-VL-2D-3B-en

Base model

Qwen/Qwen2.5-3B
Finetuned
(812)
this model

Collection including shiym2000/MedM-VL-2D-3B-en