mobiuslabsgmbh
/

DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0

Text Generation

Model card Files Files and versions

mobicham commited on Jan 24

Commit

2edfda3

·

verified ·

1 Parent(s): 5ba05de

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ This is a version of the <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1
 ## Performance
-| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1">DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1</a> |
 |:-------------------:|:--------:|:----------------:|
 | ARC (25-shot)      | 40.96 | <b>41.3</b>  |
 | HellaSwag (10-shot)| 44    | <b>45.22</b> |
@@ -18,7 +18,7 @@ This is a version of the <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1
 | GSM8K (5-shot)     | 69.9  | <b>73.24</b> |
 | Average            | 49.13 | <b>50.86</b> |
-| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1">DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1</a>  |
 |:-------------------:|:--------:|:----------------:|
 | GPQA (0-shot)     | 26.96 | <b>27.8</b>  |
 | MMLU PRO (5-shot) | 16.74 | <b>19.44</b> |
@@ -32,7 +32,7 @@ import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 compute_dtype = torch.bfloat16
 device   = 'cuda'
-model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1"
 model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
 tokenizer = AutoTokenizer.from_pretrained(model_id)

 ## Performance
+| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0">DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1</a> |
 |:-------------------:|:--------:|:----------------:|
 | ARC (25-shot)      | 40.96 | <b>41.3</b>  |
 | HellaSwag (10-shot)| 44    | <b>45.22</b> |
 | GSM8K (5-shot)     | 69.9  | <b>73.24</b> |
 | Average            | 49.13 | <b>50.86</b> |
+| Models            | <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B">DeepSeek-R1-Distill-Qwen-1.5B</a> | <a href="https://huggingface.co/mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0">DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1</a>  |
 |:-------------------:|:--------:|:----------------:|
 | GPQA (0-shot)     | 26.96 | <b>27.8</b>  |
 | MMLU PRO (5-shot) | 16.74 | <b>19.44</b> |
 from transformers import AutoModelForCausalLM, AutoTokenizer
 compute_dtype = torch.bfloat16
 device   = 'cuda'
+model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0"
 model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
 tokenizer = AutoTokenizer.from_pretrained(model_id)