danielhanchen commited on
Commit
d92bdcb
·
verified ·
1 Parent(s): 9d649f8

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - ibm-granite/granite-4.0-350m-base
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - language
8
+ - unsloth
9
+ - granite-4.0
10
+ ---
11
+ <div>
12
+ <p style="margin-top: 0;margin-bottom: 0;">
13
+ <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
14
+ </p>
15
+ <div style="display: flex; gap: 5px; align-items: center; ">
16
+ <a href="https://github.com/unslothai/unsloth/">
17
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
18
+ </a>
19
+ <a href="https://discord.gg/unsloth">
20
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
21
+ </a>
22
+ <a href="https://docs.unsloth.ai/">
23
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
24
+ </a>
25
+ </div>
26
+ </div>
27
+
28
+
29
+ # Granite-4.0-350M-Base
30
+
31
+ **Model Summary:**
32
+ Granite-4.0-350M-Base is a lightweight decoder-only language model designed for scenarios where efficiency and speed are critical. They can run on resource-constrained devices such as smartphones or IoT hardware, enabling offline and privacy-preserving applications. It also supports Fill-in-the-Middle (FIM) code completion through the use of specialized prefix and suffix tokens. The model is trained from scratch on approximately 15 trillion tokens following a four-stage training strategy: 10 trillion tokens in the first stage, 2 trillion in the second, another 2 trillion in the third, and 0.5 trillion in the final stage.
33
+
34
+ - **Developers:** Granite Team, IBM
35
+ - **HF Collection:** [Granite 4.0 Nano Language Models HF Collection](https://huggingface.co/collections/ibm-granite/granite-40-nano-language-models-68e5775c80b60e43b72cfa16)
36
+ - **GitHub Repository:** [ibm-granite/granite-4.0-nano-language-models](https://github.com/ibm-granite/granite-4.0-nano-language-models)
37
+ - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
38
+ - **Release Date**: October 28, 2025
39
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
40
+
41
+ **Supported Languages:**
42
+ English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may fine-tune Granite 4.0 Nano models to support languages beyond those included in this list.
43
+
44
+ **Intended Use:**
45
+ Prominent use cases of LLMs in text-to-text generation include summarization, text classification, extraction, question-answering and code-completion (including FIM) tasks. Moreover, these lightweight models can serve as baseline to create task-specific models for different applications.
46
+
47
+ **Generation:**
48
+ This is a simple example of how to use Granite-4.0-350M-Base model.
49
+
50
+ Install the following libraries:
51
+
52
+ ```shell
53
+ pip install torch torchvision torchaudio
54
+ pip install accelerate
55
+ pip install transformers
56
+ ```
57
+ Then, copy the code snippet below to run the example.
58
+
59
+ ```python
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
+ device = "cuda"
62
+
63
+ model_path = "ibm-granite/granite-4.0-350M-base"
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
66
+ # drop device_map if running on CPU
67
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
68
+ model.eval()
69
+ # change input text as desired
70
+ input_text = "The capital of France is"
71
+ # tokenize the text
72
+ input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
73
+ # generate output tokens
74
+ output = model.generate(**input_tokens, max_length=10)
75
+ # decode output tokens into text
76
+ output = tokenizer.batch_decode(output)
77
+ # print output
78
+ print(output[0])
79
+ ```
80
+
81
+ Expected output:
82
+ ```shell
83
+ The capital of France is Paris.
84
+ ```
85
+
86
+ **Evaluation Results:**
87
+
88
+ <table>
89
+ <thead>
90
+ <tr>
91
+ <th style="text-align:left; background-color: #001d6c; color: white;">Benchmarks</th>
92
+ <th style="text-align:left; background-color: #001d6c; color: white;">Metric</th>
93
+ <th style="text-align:center; background-color: #001d6c; color: white;">350M Dense</th>
94
+ <th style="text-align:center; background-color: #001d6c; color: white;">H 350M Dense</th>
95
+ <th style="text-align:center; background-color: #001d6c; color: white;">1B Dense</th>
96
+ <th style="text-align:center; background-color: #001d6c; color: white;">H 1B Dense</th>
97
+ </tr>
98
+ </thead>
99
+ <tbody>
100
+ <tr>
101
+ <td colspan="6" style="text-align:center; background-color: #FFFFFF; color: #2D2D2D; font-style:italic;">
102
+ General Tasks
103
+ </td>
104
+ </tr>
105
+ <tr>
106
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MMLU</td>
107
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">5-shot</td>
108
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">33.08</td>
109
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">36.07</td>
110
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">59.82</td>
111
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">58.71</td>
112
+ </tr>
113
+ <tr>
114
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MMLU-Pro</td>
115
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">5-shot,CoT</td>
116
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">11.29</td>
117
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">10.08</td>
118
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">29.96</td>
119
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">23.45</td>
120
+ </tr>
121
+ <tr>
122
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">BBH</td>
123
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">3-shot, CoT</td>
124
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">32.19</td>
125
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">29.96</td>
126
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">57.73</td>
127
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">48.45</td>
128
+ </tr>
129
+ <tr>
130
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">AGI EVAL</td>
131
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">3-shot</td>
132
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">28.97</td>
133
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">29.2</td>
134
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">48.95</td>
135
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">47.46</td>
136
+ </tr>
137
+ <tr>
138
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">DROP</td>
139
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">5-shot</td>
140
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">29.77</td>
141
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">28.56</td>
142
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">58.18</td>
143
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">57.18</td>
144
+ </tr>
145
+ <tr>
146
+ <td colspan="6" style="text-align:center; background-color: #FFFFFF; color: #2D2D2D; font-style:italic;">
147
+ Math Tasks
148
+ </td>
149
+ </tr>
150
+ <tr>
151
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">GSM8K</td>
152
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">8-shot</td>
153
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">24.11</td>
154
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">24.41</td>
155
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">62.4</td>
156
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">57.39</td>
157
+ </tr>
158
+ <tr>
159
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Minerva Math</td>
160
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">4-shot</td>
161
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">9.96</td>
162
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">11.5</td>
163
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">30.3</td>
164
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">21.3</td>
165
+ </tr>
166
+ <tr>
167
+ <td colspan="6" style="text-align:center; background-color: #FFFFFF; color: #2D2D2D; font-style:italic;">
168
+ Code Tasks
169
+ </td>
170
+ </tr>
171
+ <tr>
172
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">HumanEval</td>
173
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">pass@1 [StarCoder Prompt]</td>
174
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">34.6</td>
175
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">35.61</td>
176
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">68.08</td>
177
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">68.26</td>
178
+ </tr>
179
+ <tr>
180
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">HumanEval</td>
181
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">pass@1</td>
182
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">32</td>
183
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">34</td>
184
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">60</td>
185
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">59</td>
186
+ </tr>
187
+ <tr>
188
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">HumanEval+</td>
189
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">pass@1</td>
190
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">29</td>
191
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">29</td>
192
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">57</td>
193
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">56</td>
194
+ </tr>
195
+ <tr>
196
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MBPP</td>
197
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">pass@1</td>
198
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">45</td>
199
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">17</td>
200
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">72</td>
201
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">65</td>
202
+ </tr>
203
+ <tr>
204
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MBPP+</td>
205
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">pass@1</td>
206
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">38</td>
207
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">16</td>
208
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">60</td>
209
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">54</td>
210
+ </tr>
211
+ <tr>
212
+ <td colspan="10" style="text-align:center; background-color: #FFFFFF; color: #2D2D2D; font-style:italic;">
213
+ Multilingual Tasks
214
+ </td>
215
+ </tr>
216
+ <tr>
217
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MMMLU</td>
218
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">5-shot</td>
219
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">30.93</td>
220
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">31.02</td>
221
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">46.73</td>
222
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">48.55</td>
223
+ </tr>
224
+ <tr>
225
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">INCLUDE</td>
226
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">5-shot</td>
227
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">27.32</td>
228
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">29.26</td>
229
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">42.6</td>
230
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">43.8</td>
231
+ </tr>
232
+ <tr>
233
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MGSM</td>
234
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">8-shot</td>
235
+ <td style="text-align:right; background-color: #DAE8FF; color: #2D2D2D;">13.92</td>
236
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">15.12</td>
237
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">46.96</td>
238
+ <td style="text-align:right; background-color: #FFFFFF; color: #2D2D2D;">41.52</td>
239
+ </tr>
240
+ </tbody></table>
241
+
242
+ <table>
243
+ <caption><b>Multilingual Benchmarks and thr included languages:</b></caption>
244
+ <thead>
245
+ <tr>
246
+ <th style="text-align:left; background-color: #001d6c; color: white;">Benchmarks</th>
247
+ <th style="text-align:left; background-color: #001d6c; color: white;"># Langs</th>
248
+ <th style="text-align:center; background-color: #001d6c; color: white;">Languages</th>
249
+ </tr>
250
+ </thead>
251
+ <tbody>
252
+ <tr>
253
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MMMLU</td>
254
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">11</td>
255
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">ar, de, en, es, fr, ja, ko, pt, zh, bn, hi</td>
256
+ </tr>
257
+ <tr>
258
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">INCLUDE</td>
259
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">14</td>
260
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">hi, bn, ta, te, ar, de, es, fr, it, ja, ko, nl, pt, zh</td>
261
+ </tr>
262
+ <tr>
263
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">MGSM</td>
264
+ <td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">5</td>
265
+ <td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">en, es, fr, ja, zh</td>
266
+ </tr>
267
+ </tbody>
268
+ </table>
269
+
270
+ **Model Architecture:**
271
+ <!-- TO DO: #DAE8FF -->
272
+ Granite-4.0-350M-Base is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
273
+
274
+ <table>
275
+ <thead>
276
+ <tr>
277
+ <th style="text-align:left; background-color: #001d6c; color: white;">Model</th>
278
+ <th style="text-align:center; background-color: #001d6c; color: white;">350M Dense</th>
279
+ <th style="text-align:center; background-color: #001d6c; color: white;">H 350M Dense</th>
280
+ <th style="text-align:center; background-color: #001d6c; color: white;">1B Dense</th>
281
+ <th style="text-align:center; background-color: #001d6c; color: white;">H 1B Dense</th>
282
+ </tr></thead>
283
+ <tbody>
284
+ <tr>
285
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Embedding size</td>
286
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">1024</td>
287
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">768</td>
288
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2048</td>
289
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1536</td>
290
+ </tr>
291
+ <tr>
292
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of layers</td>
293
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">28 attention</td>
294
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4 attention / 28 Mamba2</td>
295
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">40 attention</td>
296
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4 attention / 36 Mamba2</td>
297
+ </tr>
298
+ <tr>
299
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Attention head size</td>
300
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">64</td>
301
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">64</td>
302
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128</td>
303
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128</td>
304
+ </tr>
305
+ <tr>
306
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of attention heads</td>
307
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">16</td>
308
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12</td>
309
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">16</td>
310
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">12</td>
311
+ </tr>
312
+ <tr>
313
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of KV heads</td>
314
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">4</td>
315
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4</td>
316
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4</td>
317
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4</td>
318
+ </tr>
319
+ <tr>
320
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Mamba2 state size</td>
321
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
322
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128</td>
323
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
324
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128</td>
325
+ </tr>
326
+ <tr>
327
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Number of Mamba2 heads</td>
328
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
329
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">48</td>
330
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
331
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">48</td>
332
+ </tr>
333
+
334
+ <tr>
335
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP / Shared expert hidden size</td>
336
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">2048</td>
337
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2048</td>
338
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4096</td>
339
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">4096</td>
340
+ </tr>
341
+ <tr>
342
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Num. Experts</td>
343
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
344
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
345
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
346
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
347
+ </tr>
348
+ <tr>
349
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Num. active Experts</td>
350
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
351
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
352
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
353
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
354
+ </tr>
355
+ <tr>
356
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Expert hidden size</td>
357
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
358
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
359
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
360
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">-</td>
361
+ </tr>
362
+ <tr>
363
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">MLP activation</td>
364
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">SwiGLU</td>
365
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
366
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
367
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">SwiGLU</td>
368
+ </tr>
369
+
370
+ <tr>
371
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Sequence length</td>
372
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">32K</td>
373
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">32K</td>
374
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
375
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">128K</td>
376
+ </tr>
377
+ <tr>
378
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">Position embedding</td>
379
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">RoPE</td>
380
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">NoPE</td>
381
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">RoPE</td>
382
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">NoPE</td>
383
+ </tr>
384
+ <tr>
385
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Parameters</td>
386
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">350M</td>
387
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">340M</td>
388
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1.6B</td>
389
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1.5B</td>
390
+ </tr>
391
+ <tr>
392
+ <td style="text-align:left; background-color: #FFFFFF; color: black;"># Active parameters</td>
393
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">350M</td>
394
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">340M</td>
395
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1.6B</td>
396
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">1.5B</td>
397
+ </tr>
398
+ </tbody></table>
399
+
400
+
401
+ **Training Data:** This model is trained on a mix of open source and proprietary data following a four-stage training strategy.
402
+
403
+ <table>
404
+ <thead>
405
+ <tr>
406
+ <th style="text-align:left; background-color: #001d6c; color: white;">Stage</th>
407
+ <th style="text-align:left; background-color: #001d6c; color: white;">Characteristics</th>
408
+ <th style="text-align:center; background-color: #001d6c; color: white;">350M Dense</th>
409
+ <th style="text-align:center; background-color: #001d6c; color: white;">H 350M Dense</th>
410
+ <th style="text-align:center; background-color: #001d6c; color: white;">1B Dense</th>
411
+ <th style="text-align:center; background-color: #001d6c; color: white;">H 1B Dense</th>
412
+ </tr></thead>
413
+ <tbody>
414
+ <tr>
415
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">I</td>
416
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">General mixture of training data, warmup, and power scheduler for learning rate.</td>
417
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10</td>
418
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10</td>
419
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10</td>
420
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">10</td>
421
+ </tr>
422
+ <tr>
423
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">II</td>
424
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">General mixture of training data with higher percentages of code and math with power scheduler for learning rate.</td>
425
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
426
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
427
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
428
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
429
+ </tr>
430
+ <tr>
431
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">III</td>
432
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">High quality training data, exponential decay of learning rate.</td>
433
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
434
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
435
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
436
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">2</td>
437
+ </tr>
438
+ <tr>
439
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">IV</td>
440
+ <td style="text-align:left; background-color: #FFFFFF; color: black;">High quality training data, linear decay to zero for learning rate.</td>
441
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.5</td>
442
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.5</td>
443
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.5</td>
444
+ <td style="text-align:center; background-color: #FFFFFF; color: black;">0.5</td>
445
+ </tr>
446
+ </tbody></table>
447
+
448
+ **Infrastructure:**
449
+ We trained the Granite 4.0 Nano Language Models utilizing an NVIDIA GB200 NVL72 cluster hosted in CoreWeave. Intra-rack communication occurs via the 72-GPU NVLink domain, and a non-blocking, full Fat-Tree NDR 400 Gb/s InfiniBand network provides inter-rack communication. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
450
+
451
+ **Ethical Considerations and Limitations:**
452
+ The use of Large Language Models involves risks and ethical considerations people must be aware of, including but not limited to: bias and fairness, misinformation, and autonomous decision-making. Granite-4.0-350M-Base model is not the exception in this regard. Even though this model is suited for multiple generative AI tasks, it has not undergone any safety alignment; therefore, it may produce problematic outputs. Additionally, it remains uncertain whether smaller models might exhibit increased susceptibility to hallucination in generation scenarios by copying text verbatim from the training dataset due to their reduced sizes and memorization capacities. This aspect is currently an active area of research, and we anticipate more rigorous exploration, comprehension, and mitigations in this domain. Regarding ethics, a latent risk associated with all Large Language Models is their malicious utilization. We urge the community to use Granite-4.0-350M-Base model with ethical intentions and in a responsible way.
453
+
454
+ **Resources**
455
+ - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
456
+ - 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
457
+ - 💡 Learn about the latest Granite learning resources: https://github.com/ibm-granite-community/
config.json ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "GraniteMoeHybridForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "attention_multiplier": 0.015625,
8
+ "bos_token_id": 100257,
9
+ "torch_dtype": "bfloat16",
10
+ "embedding_multiplier": 12,
11
+ "eos_token_id": 100257,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1024,
14
+ "init_method": "mup",
15
+ "initializer_range": 0.1,
16
+ "intermediate_size": 2048,
17
+ "layer_types": [
18
+ "attention",
19
+ "attention",
20
+ "attention",
21
+ "attention",
22
+ "attention",
23
+ "attention",
24
+ "attention",
25
+ "attention",
26
+ "attention",
27
+ "attention",
28
+ "attention",
29
+ "attention",
30
+ "attention",
31
+ "attention",
32
+ "attention",
33
+ "attention",
34
+ "attention",
35
+ "attention",
36
+ "attention",
37
+ "attention",
38
+ "attention",
39
+ "attention",
40
+ "attention",
41
+ "attention",
42
+ "attention",
43
+ "attention",
44
+ "attention",
45
+ "attention"
46
+ ],
47
+ "logits_scaling": 4,
48
+ "mamba_chunk_size": 256,
49
+ "mamba_conv_bias": true,
50
+ "mamba_d_conv": 4,
51
+ "mamba_d_head": 16,
52
+ "mamba_d_state": 256,
53
+ "mamba_expand": 2,
54
+ "mamba_n_groups": 1,
55
+ "mamba_n_heads": 128,
56
+ "mamba_proj_bias": false,
57
+ "max_position_embeddings": 32768,
58
+ "model_type": "granitemoehybrid",
59
+ "normalization_function": "rmsnorm",
60
+ "num_attention_heads": 16,
61
+ "num_experts_per_tok": 0,
62
+ "num_hidden_layers": 28,
63
+ "num_key_value_heads": 4,
64
+ "num_local_experts": 0,
65
+ "output_router_logits": false,
66
+ "pad_token_id": 100256,
67
+ "position_embedding_type": "rope",
68
+ "quantization_config": {
69
+ "_load_in_4bit": true,
70
+ "_load_in_8bit": false,
71
+ "bnb_4bit_compute_dtype": "bfloat16",
72
+ "bnb_4bit_quant_storage": "uint8",
73
+ "bnb_4bit_quant_type": "nf4",
74
+ "bnb_4bit_use_double_quant": true,
75
+ "llm_int8_enable_fp32_cpu_offload": false,
76
+ "llm_int8_has_fp16_weight": false,
77
+ "llm_int8_skip_modules": [
78
+ "embed_tokens",
79
+ "embedding",
80
+ "lm_head",
81
+ "multi_modal_projector",
82
+ "merger",
83
+ "modality_projection",
84
+ "router",
85
+ "visual",
86
+ "vision_tower",
87
+ "mamba",
88
+ "model.layers.1.self_attn",
89
+ "model.layers.1.shared_mlp",
90
+ "model.layers.3.shared_mlp",
91
+ "model.layers.0.self_attn",
92
+ "model.layers.2.shared_mlp",
93
+ "model.layers.2.self_attn",
94
+ "model.layers.27.shared_mlp",
95
+ "model.layers.27.shared_mlp.input_linear"
96
+ ],
97
+ "llm_int8_threshold": 6.0,
98
+ "load_in_4bit": true,
99
+ "load_in_8bit": false,
100
+ "quant_method": "bitsandbytes"
101
+ },
102
+ "residual_multiplier": 0.263,
103
+ "rms_norm_eps": 1e-05,
104
+ "rope_scaling": null,
105
+ "rope_theta": 10000000,
106
+ "router_aux_loss_coef": 0.01,
107
+ "shared_intermediate_size": 2048,
108
+ "tie_word_embeddings": true,
109
+ "transformers_version": "4.57.1",
110
+ "unsloth_fixed": true,
111
+ "use_cache": true,
112
+ "vocab_size": 100352
113
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 100257,
4
+ "eos_token_id": 100257,
5
+ "max_length": 32768,
6
+ "pad_token_id": 100256,
7
+ "transformers_version": "4.57.1"
8
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:603d6a462a76b95ed0e665baf73528f853d2cc82cf1a79084d49f7c2341b95de
3
+ size 383702443
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|end_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|pad|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|unk|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,783 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "100256": {
6
+ "content": "<|pad|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "100257": {
14
+ "content": "<|end_of_text|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "100258": {
22
+ "content": "<|fim_prefix|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": false
28
+ },
29
+ "100259": {
30
+ "content": "<|fim_middle|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": false
36
+ },
37
+ "100260": {
38
+ "content": "<|fim_suffix|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": false
44
+ },
45
+ "100261": {
46
+ "content": "<|fim_pad|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": false
52
+ },
53
+ "100262": {
54
+ "content": "<|filename|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": false
60
+ },
61
+ "100263": {
62
+ "content": "<|reponame|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": false
68
+ },
69
+ "100264": {
70
+ "content": "<|start_of_role|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "100265": {
78
+ "content": "<|end_of_role|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "100266": {
86
+ "content": "<|unused_1|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "100267": {
94
+ "content": "<|start_of_plugin|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "100268": {
102
+ "content": "<|end_of_plugin|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "100269": {
110
+ "content": "<|unk|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "100270": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "100271": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "100272": {
134
+ "content": "<tool_response>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "100273": {
142
+ "content": "</tool_response>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "100274": {
150
+ "content": "<think>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "100275": {
158
+ "content": "</think>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "100276": {
166
+ "content": "<think_on>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": true
172
+ },
173
+ "100277": {
174
+ "content": "<think_off>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": true
180
+ },
181
+ "100278": {
182
+ "content": "<schema>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": true
188
+ },
189
+ "100279": {
190
+ "content": "</schema>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": true
196
+ },
197
+ "100280": {
198
+ "content": "<tools>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": true
204
+ },
205
+ "100281": {
206
+ "content": "</tools>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": true
212
+ },
213
+ "100282": {
214
+ "content": "<documents>",
215
+ "lstrip": false,
216
+ "normalized": false,
217
+ "rstrip": false,
218
+ "single_word": false,
219
+ "special": true
220
+ },
221
+ "100283": {
222
+ "content": "</documents>",
223
+ "lstrip": false,
224
+ "normalized": false,
225
+ "rstrip": false,
226
+ "single_word": false,
227
+ "special": true
228
+ },
229
+ "100284": {
230
+ "content": "<|unused_15|>",
231
+ "lstrip": false,
232
+ "normalized": false,
233
+ "rstrip": false,
234
+ "single_word": false,
235
+ "special": true
236
+ },
237
+ "100285": {
238
+ "content": "<|unused_16|>",
239
+ "lstrip": false,
240
+ "normalized": false,
241
+ "rstrip": false,
242
+ "single_word": false,
243
+ "special": true
244
+ },
245
+ "100286": {
246
+ "content": "<|unused_17|>",
247
+ "lstrip": false,
248
+ "normalized": false,
249
+ "rstrip": false,
250
+ "single_word": false,
251
+ "special": true
252
+ },
253
+ "100287": {
254
+ "content": "<|unused_18|>",
255
+ "lstrip": false,
256
+ "normalized": false,
257
+ "rstrip": false,
258
+ "single_word": false,
259
+ "special": true
260
+ },
261
+ "100288": {
262
+ "content": "<|unused_19|>",
263
+ "lstrip": false,
264
+ "normalized": false,
265
+ "rstrip": false,
266
+ "single_word": false,
267
+ "special": true
268
+ },
269
+ "100289": {
270
+ "content": "<|unused_20|>",
271
+ "lstrip": false,
272
+ "normalized": false,
273
+ "rstrip": false,
274
+ "single_word": false,
275
+ "special": true
276
+ },
277
+ "100290": {
278
+ "content": "<|unused_21|>",
279
+ "lstrip": false,
280
+ "normalized": false,
281
+ "rstrip": false,
282
+ "single_word": false,
283
+ "special": true
284
+ },
285
+ "100291": {
286
+ "content": "<|unused_22|>",
287
+ "lstrip": false,
288
+ "normalized": false,
289
+ "rstrip": false,
290
+ "single_word": false,
291
+ "special": true
292
+ },
293
+ "100292": {
294
+ "content": "<|unused_23|>",
295
+ "lstrip": false,
296
+ "normalized": false,
297
+ "rstrip": false,
298
+ "single_word": false,
299
+ "special": true
300
+ },
301
+ "100293": {
302
+ "content": "<|unused_24|>",
303
+ "lstrip": false,
304
+ "normalized": false,
305
+ "rstrip": false,
306
+ "single_word": false,
307
+ "special": true
308
+ },
309
+ "100294": {
310
+ "content": "<|unused_25|>",
311
+ "lstrip": false,
312
+ "normalized": false,
313
+ "rstrip": false,
314
+ "single_word": false,
315
+ "special": true
316
+ },
317
+ "100295": {
318
+ "content": "<|unused_26|>",
319
+ "lstrip": false,
320
+ "normalized": false,
321
+ "rstrip": false,
322
+ "single_word": false,
323
+ "special": true
324
+ },
325
+ "100296": {
326
+ "content": "<|unused_27|>",
327
+ "lstrip": false,
328
+ "normalized": false,
329
+ "rstrip": false,
330
+ "single_word": false,
331
+ "special": true
332
+ },
333
+ "100297": {
334
+ "content": "<|unused_28|>",
335
+ "lstrip": false,
336
+ "normalized": false,
337
+ "rstrip": false,
338
+ "single_word": false,
339
+ "special": true
340
+ },
341
+ "100298": {
342
+ "content": "<|unused_29|>",
343
+ "lstrip": false,
344
+ "normalized": false,
345
+ "rstrip": false,
346
+ "single_word": false,
347
+ "special": true
348
+ },
349
+ "100299": {
350
+ "content": "<|unused_30|>",
351
+ "lstrip": false,
352
+ "normalized": false,
353
+ "rstrip": false,
354
+ "single_word": false,
355
+ "special": true
356
+ },
357
+ "100300": {
358
+ "content": "<|unused_31|>",
359
+ "lstrip": false,
360
+ "normalized": false,
361
+ "rstrip": false,
362
+ "single_word": false,
363
+ "special": true
364
+ },
365
+ "100301": {
366
+ "content": "<|unused_32|>",
367
+ "lstrip": false,
368
+ "normalized": false,
369
+ "rstrip": false,
370
+ "single_word": false,
371
+ "special": true
372
+ },
373
+ "100302": {
374
+ "content": "<|unused_33|>",
375
+ "lstrip": false,
376
+ "normalized": false,
377
+ "rstrip": false,
378
+ "single_word": false,
379
+ "special": true
380
+ },
381
+ "100303": {
382
+ "content": "<|unused_34|>",
383
+ "lstrip": false,
384
+ "normalized": false,
385
+ "rstrip": false,
386
+ "single_word": false,
387
+ "special": true
388
+ },
389
+ "100304": {
390
+ "content": "<|unused_35|>",
391
+ "lstrip": false,
392
+ "normalized": false,
393
+ "rstrip": false,
394
+ "single_word": false,
395
+ "special": true
396
+ },
397
+ "100305": {
398
+ "content": "<|unused_36|>",
399
+ "lstrip": false,
400
+ "normalized": false,
401
+ "rstrip": false,
402
+ "single_word": false,
403
+ "special": true
404
+ },
405
+ "100306": {
406
+ "content": "<|unused_37|>",
407
+ "lstrip": false,
408
+ "normalized": false,
409
+ "rstrip": false,
410
+ "single_word": false,
411
+ "special": true
412
+ },
413
+ "100307": {
414
+ "content": "<|unused_38|>",
415
+ "lstrip": false,
416
+ "normalized": false,
417
+ "rstrip": false,
418
+ "single_word": false,
419
+ "special": true
420
+ },
421
+ "100308": {
422
+ "content": "<|unused_39|>",
423
+ "lstrip": false,
424
+ "normalized": false,
425
+ "rstrip": false,
426
+ "single_word": false,
427
+ "special": true
428
+ },
429
+ "100309": {
430
+ "content": "<|unused_40|>",
431
+ "lstrip": false,
432
+ "normalized": false,
433
+ "rstrip": false,
434
+ "single_word": false,
435
+ "special": true
436
+ },
437
+ "100310": {
438
+ "content": "<|unused_41|>",
439
+ "lstrip": false,
440
+ "normalized": false,
441
+ "rstrip": false,
442
+ "single_word": false,
443
+ "special": true
444
+ },
445
+ "100311": {
446
+ "content": "<|unused_42|>",
447
+ "lstrip": false,
448
+ "normalized": false,
449
+ "rstrip": false,
450
+ "single_word": false,
451
+ "special": true
452
+ },
453
+ "100312": {
454
+ "content": "<|unused_43|>",
455
+ "lstrip": false,
456
+ "normalized": false,
457
+ "rstrip": false,
458
+ "single_word": false,
459
+ "special": true
460
+ },
461
+ "100313": {
462
+ "content": "<|unused_44|>",
463
+ "lstrip": false,
464
+ "normalized": false,
465
+ "rstrip": false,
466
+ "single_word": false,
467
+ "special": true
468
+ },
469
+ "100314": {
470
+ "content": "<|unused_45|>",
471
+ "lstrip": false,
472
+ "normalized": false,
473
+ "rstrip": false,
474
+ "single_word": false,
475
+ "special": true
476
+ },
477
+ "100315": {
478
+ "content": "<|unused_46|>",
479
+ "lstrip": false,
480
+ "normalized": false,
481
+ "rstrip": false,
482
+ "single_word": false,
483
+ "special": true
484
+ },
485
+ "100316": {
486
+ "content": "<|unused_47|>",
487
+ "lstrip": false,
488
+ "normalized": false,
489
+ "rstrip": false,
490
+ "single_word": false,
491
+ "special": true
492
+ },
493
+ "100317": {
494
+ "content": "<|unused_48|>",
495
+ "lstrip": false,
496
+ "normalized": false,
497
+ "rstrip": false,
498
+ "single_word": false,
499
+ "special": true
500
+ },
501
+ "100318": {
502
+ "content": "<|unused_49|>",
503
+ "lstrip": false,
504
+ "normalized": false,
505
+ "rstrip": false,
506
+ "single_word": false,
507
+ "special": true
508
+ },
509
+ "100319": {
510
+ "content": "<|unused_50|>",
511
+ "lstrip": false,
512
+ "normalized": false,
513
+ "rstrip": false,
514
+ "single_word": false,
515
+ "special": true
516
+ },
517
+ "100320": {
518
+ "content": "<|unused_51|>",
519
+ "lstrip": false,
520
+ "normalized": false,
521
+ "rstrip": false,
522
+ "single_word": false,
523
+ "special": true
524
+ },
525
+ "100321": {
526
+ "content": "<|unused_52|>",
527
+ "lstrip": false,
528
+ "normalized": false,
529
+ "rstrip": false,
530
+ "single_word": false,
531
+ "special": true
532
+ },
533
+ "100322": {
534
+ "content": "<|unused_53|>",
535
+ "lstrip": false,
536
+ "normalized": false,
537
+ "rstrip": false,
538
+ "single_word": false,
539
+ "special": true
540
+ },
541
+ "100323": {
542
+ "content": "<|unused_54|>",
543
+ "lstrip": false,
544
+ "normalized": false,
545
+ "rstrip": false,
546
+ "single_word": false,
547
+ "special": true
548
+ },
549
+ "100324": {
550
+ "content": "<|unused_55|>",
551
+ "lstrip": false,
552
+ "normalized": false,
553
+ "rstrip": false,
554
+ "single_word": false,
555
+ "special": true
556
+ },
557
+ "100325": {
558
+ "content": "<|unused_56|>",
559
+ "lstrip": false,
560
+ "normalized": false,
561
+ "rstrip": false,
562
+ "single_word": false,
563
+ "special": true
564
+ },
565
+ "100326": {
566
+ "content": "<|unused_57|>",
567
+ "lstrip": false,
568
+ "normalized": false,
569
+ "rstrip": false,
570
+ "single_word": false,
571
+ "special": true
572
+ },
573
+ "100327": {
574
+ "content": "<|unused_58|>",
575
+ "lstrip": false,
576
+ "normalized": false,
577
+ "rstrip": false,
578
+ "single_word": false,
579
+ "special": true
580
+ },
581
+ "100328": {
582
+ "content": "<|unused_59|>",
583
+ "lstrip": false,
584
+ "normalized": false,
585
+ "rstrip": false,
586
+ "single_word": false,
587
+ "special": true
588
+ },
589
+ "100329": {
590
+ "content": "<|unused_60|>",
591
+ "lstrip": false,
592
+ "normalized": false,
593
+ "rstrip": false,
594
+ "single_word": false,
595
+ "special": true
596
+ },
597
+ "100330": {
598
+ "content": "<|unused_61|>",
599
+ "lstrip": false,
600
+ "normalized": false,
601
+ "rstrip": false,
602
+ "single_word": false,
603
+ "special": true
604
+ },
605
+ "100331": {
606
+ "content": "<|unused_62|>",
607
+ "lstrip": false,
608
+ "normalized": false,
609
+ "rstrip": false,
610
+ "single_word": false,
611
+ "special": true
612
+ },
613
+ "100332": {
614
+ "content": "<|unused_63|>",
615
+ "lstrip": false,
616
+ "normalized": false,
617
+ "rstrip": false,
618
+ "single_word": false,
619
+ "special": true
620
+ },
621
+ "100333": {
622
+ "content": "<|unused_64|>",
623
+ "lstrip": false,
624
+ "normalized": false,
625
+ "rstrip": false,
626
+ "single_word": false,
627
+ "special": true
628
+ },
629
+ "100334": {
630
+ "content": "<|unused_65|>",
631
+ "lstrip": false,
632
+ "normalized": false,
633
+ "rstrip": false,
634
+ "single_word": false,
635
+ "special": true
636
+ },
637
+ "100335": {
638
+ "content": "<|unused_66|>",
639
+ "lstrip": false,
640
+ "normalized": false,
641
+ "rstrip": false,
642
+ "single_word": false,
643
+ "special": true
644
+ },
645
+ "100336": {
646
+ "content": "<|unused_67|>",
647
+ "lstrip": false,
648
+ "normalized": false,
649
+ "rstrip": false,
650
+ "single_word": false,
651
+ "special": true
652
+ },
653
+ "100337": {
654
+ "content": "<|unused_68|>",
655
+ "lstrip": false,
656
+ "normalized": false,
657
+ "rstrip": false,
658
+ "single_word": false,
659
+ "special": true
660
+ },
661
+ "100338": {
662
+ "content": "<|unused_69|>",
663
+ "lstrip": false,
664
+ "normalized": false,
665
+ "rstrip": false,
666
+ "single_word": false,
667
+ "special": true
668
+ },
669
+ "100339": {
670
+ "content": "<|unused_70|>",
671
+ "lstrip": false,
672
+ "normalized": false,
673
+ "rstrip": false,
674
+ "single_word": false,
675
+ "special": true
676
+ },
677
+ "100340": {
678
+ "content": "<|unused_71|>",
679
+ "lstrip": false,
680
+ "normalized": false,
681
+ "rstrip": false,
682
+ "single_word": false,
683
+ "special": true
684
+ },
685
+ "100341": {
686
+ "content": "<|unused_72|>",
687
+ "lstrip": false,
688
+ "normalized": false,
689
+ "rstrip": false,
690
+ "single_word": false,
691
+ "special": true
692
+ },
693
+ "100342": {
694
+ "content": "<|unused_73|>",
695
+ "lstrip": false,
696
+ "normalized": false,
697
+ "rstrip": false,
698
+ "single_word": false,
699
+ "special": true
700
+ },
701
+ "100343": {
702
+ "content": "<|unused_74|>",
703
+ "lstrip": false,
704
+ "normalized": false,
705
+ "rstrip": false,
706
+ "single_word": false,
707
+ "special": true
708
+ },
709
+ "100344": {
710
+ "content": "<|unused_75|>",
711
+ "lstrip": false,
712
+ "normalized": false,
713
+ "rstrip": false,
714
+ "single_word": false,
715
+ "special": true
716
+ },
717
+ "100345": {
718
+ "content": "<|unused_76|>",
719
+ "lstrip": false,
720
+ "normalized": false,
721
+ "rstrip": false,
722
+ "single_word": false,
723
+ "special": true
724
+ },
725
+ "100346": {
726
+ "content": "<|unused_77|>",
727
+ "lstrip": false,
728
+ "normalized": false,
729
+ "rstrip": false,
730
+ "single_word": false,
731
+ "special": true
732
+ },
733
+ "100347": {
734
+ "content": "<|unused_78|>",
735
+ "lstrip": false,
736
+ "normalized": false,
737
+ "rstrip": false,
738
+ "single_word": false,
739
+ "special": true
740
+ },
741
+ "100348": {
742
+ "content": "<|unused_79|>",
743
+ "lstrip": false,
744
+ "normalized": false,
745
+ "rstrip": false,
746
+ "single_word": false,
747
+ "special": true
748
+ },
749
+ "100349": {
750
+ "content": "<|unused_80|>",
751
+ "lstrip": false,
752
+ "normalized": false,
753
+ "rstrip": false,
754
+ "single_word": false,
755
+ "special": true
756
+ },
757
+ "100350": {
758
+ "content": "<|unused_81|>",
759
+ "lstrip": false,
760
+ "normalized": false,
761
+ "rstrip": false,
762
+ "single_word": false,
763
+ "special": true
764
+ },
765
+ "100351": {
766
+ "content": "<|unused_82|>",
767
+ "lstrip": false,
768
+ "normalized": false,
769
+ "rstrip": false,
770
+ "single_word": false,
771
+ "special": true
772
+ }
773
+ },
774
+ "bos_token": "<|end_of_text|>",
775
+ "clean_up_tokenization_spaces": false,
776
+ "eos_token": "<|end_of_text|>",
777
+ "extra_special_tokens": {},
778
+ "model_max_length": 32768,
779
+ "pad_token": "<|pad|>",
780
+ "padding_side": "left",
781
+ "tokenizer_class": "GPT2Tokenizer",
782
+ "unk_token": "<|unk|>"
783
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff