Improve model card: Add pipeline tag, library name, introduction, and GitHub link (#1)

0bffb76 verified about 2 months ago

2.08 kB

	---
	license: apache-2.0
	papers:
	- arxiv:2509.16944
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# llava-v1.5-13b-roi-K15T3-152k-v1bf16Mheads-twiginit

	This model is associated with the paper [Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception](https://huggingface.co/papers/2509.16944).

	## Introduction
	While recent methods leverage a Region-of-Interest (RoI) mechanism to focus on salient areas, they typically present a difficult trade-off: training-based approaches depend on large-scale annotated datasets, while training-free methods that utilize the model's internal attention are computationally inefficient, requiring either multi-pass prefill stages or reliance on the slow auto-regressive decoding process for RoI identification.
	We propose an efficient, annotation-free Self-Distilled Region Proposal Network (SD-RPN) that resolves this trade-off. Our core innovation is a pipeline that processes and denoises the noisy cross-attention maps from the MLLM's middle layers to generate pseudo-RoI labels. We then use these labels to train a lightweight and tunable Region Proposal Network (RPN) that is built upon the frozen MLLM backbone. Our RPN predicts the RoI in a single forward pass using features available from the MLLM's middle layers, completely decoupling RoI identification from the auto-regressive generation process and avoiding costly multi-pass operations.
	<p align="center">
	<img src="https://github.com/YuHengsss/SD-RPN/raw/main/assets/framework.png" width="800" />
	</p>

	For more details, code, and training instructions, visit the [GitHub repository](https://github.com/YuHengsss/SD-RPN).

	## Citation

	If you use this model, please cite the original paper:

	```bibtex
	@misc{shi2025catching,
	title={Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception},
	author={Yuheng Shi and Xiaohuan Pei and Minjing Dong and Chang Xu},
	year={2025},
	eprint={2509.16944},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```