You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

RareSeek-R1: A specialized language model for rare disease diagnosis and reasoning

RareSeek-R1 is a domain-specialized large language model for rare-disease diagnostic reasoning, developed through a Progressive Parameter-Efficient Transfer Learning framework. The model is first instruction-tuned on the clinically grounded RareMed-Corpus, a large, multi-source dataset deeply integrated from medical textbooks, guidelines, biomedical literature, and real-world EHR narratives. It is then fine-tuned on RareMed-CoT, a high-fidelity corpus designed to instill explicit, stepwise clinical reasoning aligned with real diagnostic workflows. To further enhance factual reliability, GraphRAG is incorporated to anchor the model’s inference to up-to-date variant–gene–phenotype–disease relationships. This retrieval augmentation substantially reduces hallucinations, improves factual calibration, and yields notable performance gains—particularly when EHR narratives are combined with prioritized genetic variants. Together, RareSeek-R1 performs direct reasoning over full-length EHRs, leverages graph-grounded retrieval, and demonstrably augments clinician-level diagnostic accuracy, advancing a reliable and scalable AI paradigm for rare-disease diagnosis.

RareSeek-R1 Teaser Image

RareMedData: https://huggingface.co/datasets/TaoMedAI/RareMedData

Downloads last month: -

Safetensors

Model size

71B params

Tensor type

BF16

Model tree for TaoMedAI/RareSeek-R1

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

Finetuned

(17)

this model