David-A-Reiss commited on
Commit
934649a
·
verified ·
1 Parent(s): 517cc7c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,6 +1,69 @@
1
- Here we present the DeepSeek-TNG R1T2 Chimera, the successor our original Chimera released on April 26th. The new Chimera is a Trimind, i.e. a child model constructed from three parent models, namely DeepSeek V3-0324, R1 and R1-0528.
2
- using the Assembly of Experts-method
3
-
4
  ---
5
  license: mit
 
 
 
 
 
 
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ base_model:
5
+ - deepseek-ai/DeepSeek-V3-0324
6
+ - deepseek-ai/DeepSeek-R1
7
+ - deepseek-ai/DeepSeek-R1-0528
8
+ pipeline_tag: text-generation
9
  ---
10
+ # DeepSeek-TNG-R1T2-Chimera
11
+
12
+ <div align="center">
13
+ <img src="https://354918363417-runtime-assets.s3.eu-central-1.amazonaws.com/company_logo_light.svg"
14
+ alt="TNG Logo"
15
+ width="400"
16
+ style="display: inline-block; vertical-align: middle;"/>
17
+ </div>
18
+ <br>
19
+ <div align="center">
20
+ <a href="LICENSE" style="margin: 2px;">
21
+ <img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
22
+ </a>
23
+ </div>
24
+ <br>
25
+ <div align="center">
26
+ <a href="https://x.com/tngtech/status/1916284566127444468" style="margin: 2px;">
27
+ <img alt="Benchmarks" src="R1T-Chimera_Benchmarks_20250427_V1.jpg" style="display: inline-block; vertical-align: middle;"/>
28
+ </a>
29
+ </div>
30
+
31
+ **Model Merge of DeepSeek-R1-0528, DeepSeek-R1 and DeepSeek-V3-0324**
32
+
33
+ Here we present the DeepSeek-TNG-R1T2-Chimera, the successor our original Chimera released on April 26th. The new Chimera is a Trimind, i.e., a child model constructed from three parent models, namely DeepSeek-V3-0324, R1 and R1-0528,
34
+ using the Assembly-of-Experts-method.
35
+
36
+ For details on the construction process, please [read our paper](https://arxiv.org/abs/2506.14794).
37
+
38
+ [Paper on arXiV](https://arxiv.org/abs/2506.14794) | [Announcement on X](https://x.com/tngtech/status/1916284566127444468) | [LinkedIn post](https://www.linkedin.com/posts/tng-technology-consulting_on-the-weekend-we-released-deepseek-r1t-chimera-activity-7323008947236290560-Cf2m))
39
+
40
+
41
+ ## Model Details
42
+
43
+ - **Architecture**: DeepSeek-MoE transformer-based language model
44
+ - **Combination Method**: Merged model weights from DeepSeek-R1-0528, DeepSeek-R1 and DeepSeek-V3-0324
45
+ - **Release Date**: 2025-07-0x
46
+
47
+ ## Use, Out-of-scope Use, Limitations, Risks, Recommendations et al.
48
+ Regarding R1T2-Chimera, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model.
49
+
50
+ These guidelines are available [here on Hugging Face](https://huggingface.co/microsoft/MAI-DS-R1).
51
+
52
+ ## Contact
53
+
54
+ - Email: [email protected]
55
+ - X.com: @tngtech
56
+
57
+ ## Citation
58
+
59
+ ```
60
+ @misc{tng_technology_consulting_gmbh_2025_07_0x,
61
+ author = { TNG Technology Consulting GmbH },
62
+ title = { DeepSeek-TNG-R1T2-Chimera },
63
+ year = 2025,
64
+ month = { April },
65
+ url = { https://huggingface.co/tngtech/DeepSeek-TNG-R1T2-Chimera },
66
+ doi = { xxx },
67
+ publisher = { Hugging Face }
68
+ }
69
+ ```