TaoMedAI
/

RareSeek-R1

Text Generation

text-generation-inference

Model card Files Files and versions

RareSeek-R1 / README.md

TaoMedAI's picture

Update README.md

6e3a716 verified 22 days ago

|

history blame contribute delete

1.78 kB

	---
	license: afl-3.0
	language:
	- en
	- zh
	metrics:
	- accuracy
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- medical
	- deepseek-r1
	- health
	- ehr
	- reasoning
	---

	# RareSeek-R1: A specialized language model for rare disease diagnosis and reasoning

	RareSeek-R1 is a domain-specialized large language model for rare-disease diagnostic reasoning, developed through a Progressive Parameter-Efficient Transfer Learning framework. The model is first instruction-tuned on the clinically grounded RareMed-Corpus, a large, multi-source dataset deeply integrated from medical textbooks, guidelines, biomedical literature, and real-world EHR narratives. It is then fine-tuned on RareMed-CoT, a high-fidelity corpus designed to instill explicit, stepwise clinical reasoning aligned with real diagnostic workflows. To further enhance factual reliability, GraphRAG is incorporated to anchor the model’s inference to up-to-date variant–gene–phenotype–disease relationships. This retrieval augmentation substantially reduces hallucinations, improves factual calibration, and yields notable performance gains—particularly when EHR narratives are combined with prioritized genetic variants. Together, RareSeek-R1 performs direct reasoning over full-length EHRs, leverages graph-grounded retrieval, and demonstrably augments clinician-level diagnostic accuracy, advancing a reliable and scalable AI paradigm for rare-disease diagnosis.

	<p align="center">
	<img src="https://github.com/yangtao1025/RareSeek-R1/raw/main/RareSeek-R1.png" alt="RareSeek-R1 Teaser Image" width="800">
	</p>

	# RareMedData: [https://huggingface.co/datasets/TaoMedAI/RareMedData](https://huggingface.co/datasets/TaoMedAI/RareMedData)