spooknik

ArthurZ HF Staff commited on 24 days ago

Commit

9e152c2

verified ·

0 Parent(s):

Duplicate from google/flan-t5-xxl

Browse files

Co-authored-by: Arthur Zucker <[email protected]>

Files changed (32) hide show

.gitattributes +33 -0
README.md +276 -0
config.json +28 -0
flax_model-00001-of-00005.msgpack +3 -0
flax_model-00002-of-00005.msgpack +3 -0
flax_model-00003-of-00005.msgpack +3 -0
flax_model-00004-of-00005.msgpack +3 -0
flax_model-00005-of-00005.msgpack +3 -0
flax_model.msgpack.index.json +565 -0
generation_config.json +7 -0
model-00001-of-00005.safetensors +3 -0
model-00002-of-00005.safetensors +3 -0
model-00003-of-00005.safetensors +3 -0
model-00004-of-00005.safetensors +3 -0
model-00005-of-00005.safetensors +3 -0
model.safetensors.index.json +567 -0
pytorch_model-00001-of-00005.bin +3 -0
pytorch_model-00002-of-00005.bin +3 -0
pytorch_model-00003-of-00005.bin +3 -0
pytorch_model-00004-of-00005.bin +3 -0
pytorch_model-00005-of-00005.bin +3 -0
pytorch_model.bin.index.json +567 -0
special_tokens_map.json +107 -0
spiece.model +3 -0
tf_model-00001-of-00005.h5 +3 -0
tf_model-00002-of-00005.h5 +3 -0
tf_model-00003-of-00005.h5 +3 -0
tf_model-00004-of-00005.h5 +3 -0
tf_model-00005-of-00005.h5 +3 -0
tf_model.h5.index.json +565 -0
tokenizer.json +0 -0
tokenizer_config.json +113 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,33 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,276 @@

+---
+language:
+- en
+- fr
+- ro
+- de
+- multilingual
+widget:
+- text: "Translate to German:  My name is Arthur"
+  example_title: "Translation"
+- text: "Please answer to the following question. Who is going to be the next Ballon d'or?"
+  example_title: "Question Answering"
+- text: "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering."
+  example_title: "Logical reasoning"
+- text: "Please answer the following question. What is the boiling point of Nitrogen?"
+  example_title: "Scientific knowledge"
+- text: "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?"
+  example_title: "Yes/no question"
+- text: "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?"
+  example_title: "Reasoning task"
+- text: "Q: ( False or not False or False ) is? A: Let's think step by step"
+  example_title: "Boolean Expressions"
+- text: "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?"
+  example_title: "Math reasoning"
+- text: "Premise:  At my age you will probably have learnt one lesson. Hypothesis:  It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?"
+  example_title: "Premise and hypothesis"
+tags:
+- text2text-generation
+datasets:
+- svakulenk0/qrecc
+- taskmaster2
+- djaym7/wiki_dialog
+- deepmind/code_contests
+- lambada
+- gsm8k
+- aqua_rat
+- esnli
+- quasc
+- qed
+license: apache-2.0
+---
+# Model Card for FLAN-T5 XXL
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg"
+alt="drawing" width="600"/>
+#  Table of Contents
+0. [TL;DR](#TL;DR)
+1. [Model Details](#model-details)
+2. [Usage](#usage)
+3. [Uses](#uses)
+4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
+5. [Training Details](#training-details)
+6. [Evaluation](#evaluation)
+7. [Environmental Impact](#environmental-impact)
+8. [Citation](#citation)
+# TL;DR
+If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages.
+As mentioned in the first few lines of the abstract :
+>  Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
+**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
+# Model Details
+## Model Description
+- **Model type:** Language model
+- **Language(s) (NLP):** English, German, French
+- **License:** Apache 2.0
+- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)
+- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)
+- **Resources for more information:**
+  - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)
+  - [GitHub Repo](https://github.com/google-research/t5x)
+  - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)
+# Usage
+Find below some example scripts on how to use the model in `transformers`:
+## Using the Pytorch model
+### Running the model on a CPU
+<details>
+<summary> Click to expand </summary>
+```python
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl")
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+### Running the model on a GPU
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install accelerate
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto")
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+### Running the model on a GPU using different precisions
+#### FP16
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install accelerate
+import torch
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", torch_dtype=torch.float16)
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+#### INT8
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install bitsandbytes accelerate
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", load_in_8bit=True)
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+# Uses
+## Direct Use and Downstream Use
+The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
+> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
+See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
+## Out-of-Scope Use
+More information needed.
+# Bias, Risks, and Limitations
+The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):
+> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
+## Ethical considerations and risks
+> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.
+## Known Limitations
+> Flan-T5 has not been tested in real world applications.
+## Sensitive Use:
+> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
+# Training Details
+## Training Data
+The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):
+![table.png](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan_t5_tasks.png)
+## Training Procedure
+According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
+> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
+The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
+# Evaluation
+## Testing Data, Factors & Metrics
+The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:
+![image.png](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan_t5_evals_lang.png)
+For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).
+## Results
+For full results for FLAN-T5-XXL, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.
+# Environmental Impact
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4  | Number of chips ≥ 4.
+- **Hours used:** More information needed
+- **Cloud Provider:** GCP
+- **Compute Region:** More information needed
+- **Carbon Emitted:** More information needed
+# Citation
+**BibTeX:**
+```bibtex
+@misc{https://doi.org/10.48550/arxiv.2210.11416,
+  doi = {10.48550/ARXIV.2210.11416},
+  url = {https://arxiv.org/abs/2210.11416},
+  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
+  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title = {Scaling Instruction-Finetuned Language Models},
+  publisher = {arXiv},
+  year = {2022},
+  copyright = {Creative Commons Attribution 4.0 International}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 10240,
+  "d_kv": 64,
+  "d_model": 4096,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "num_decoder_layers": 24,
+  "num_heads": 64,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.24.0.dev0",
+  "use_cache": true,
+  "vocab_size": 32128
+}

flax_model-00001-of-00005.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:75f4ae8c4595f4c347def1730867bcf9c600e4628293a721c905c959eba777a5
+size 9989141334

flax_model-00002-of-00005.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8a0e1e619531c843be9dbf480412ccba3d15f1c55c37a06bc93a33dc65641ac
+size 9932567678

flax_model-00003-of-00005.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a398b96321a9d4b36f44383662709341da0e7bb106392f5cdd2dd6d0510a218f
+size 9999701724

flax_model-00004-of-00005.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7b7827124570344fdd6dae1873b130228b3b2e576baf6c1a2d710928caed3f6a
+size 9999701629

flax_model-00005-of-00005.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec9af3100f6c3286a133aecb7307e6668bec2c9d79ea2b0ae2f03676fb22d099
+size 4620241312

flax_model.msgpack.index.json ADDED Viewed

	@@ -0,0 +1,565 @@

+{
+  "metadata": {
+    "total_size": 44541329408
+  },
+  "weight_map": {
+    "decoder/block/0/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/relative_attention_bias/embedding": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "decoder/block/0/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/0/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/1/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/10/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/10/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/10/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/10/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/10/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/11/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/12/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/13/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/14/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/15/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/16/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/17/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/18/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/0/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/k/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/o/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/v/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/1/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/2/DenseReluDense/wo/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/19/layer/2/layer_norm/weight": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/2/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/q/kernel": "flax_model-00004-of-00005.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/0/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/1/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/2/DenseReluDense/wo/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/20/layer/2/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/0/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/1/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/2/DenseReluDense/wo/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/21/layer/2/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/0/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/1/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/2/DenseReluDense/wo/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/22/layer/2/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/0/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/k/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/o/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/q/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/v/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/1/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/2/DenseReluDense/wo/kernel": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/23/layer/2/layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/3/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/4/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/5/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/6/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/7/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/8/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/0/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/k/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/o/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/q/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/v/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/1/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/2/DenseReluDense/wo/kernel": "flax_model-00003-of-00005.msgpack",
+    "decoder/block/9/layer/2/layer_norm/weight": "flax_model-00003-of-00005.msgpack",
+    "decoder/final_layer_norm/weight": "flax_model-00005-of-00005.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/relative_attention_bias/embedding": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/0/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/1/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/10/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/11/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/12/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/12/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/12/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/12/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/12/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/13/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/14/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/15/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/16/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/17/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/18/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/19/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/2/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/20/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/21/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/22/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/0/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/1/DenseReluDense/wo/kernel": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/23/layer/1/layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/3/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/4/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/5/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/6/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/7/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/8/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/0/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00005.msgpack",
+    "encoder/block/9/layer/1/layer_norm/weight": "flax_model-00001-of-00005.msgpack",
+    "encoder/final_layer_norm/weight": "flax_model-00002-of-00005.msgpack",
+    "lm_head/kernel": "flax_model-00005-of-00005.msgpack",
+    "shared/embedding": "flax_model-00001-of-00005.msgpack"
+  }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "decoder_start_token_id": 0,
+  "eos_token_id": 1,
+  "pad_token_id": 0,
+  "transformers_version": "4.27.0.dev0"
+}

model-00001-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2399615e60e983d1e647248b15bfad3c0ccd9473a0e387dbe03284fcaacf21c2
+size 9452262278

model-00002-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d2459d60251e60af6314ae4785fb0bea32febb1989434c816de84c0fdae2d88
+size 9597007440

model-00003-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60a2c0b6f284a55df23ee68ce3065631368d7a6a35e5443b5cd98151f145fd67
+size 9955647022

model-00004-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f0cbe2b9a51ab64c93d16993e79bf81cc39f8692f89122cc68fa3b824c2b3cf
+size 9999712776

model-00005-of-00005.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0001f5f48ce5bf0e7e997c57ce5170ce0c0b0d667ee2a72d01d9aa26f40b7b5
+size 6063154240

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,567 @@

+{
+    "metadata": {
+        "total_size": 45594099712
+    },
+    "weight_map": {
+        "decoder.block.0.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.0.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.1.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.10.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.11.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.12.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.13.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.14.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.15.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.16.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.17.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.18.layer.2.DenseReluDense.wo.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.18.layer.2.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.0.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.1.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.2.DenseReluDense.wo.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.19.layer.2.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.2.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.0.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.1.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.2.DenseReluDense.wo.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.20.layer.2.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.0.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.1.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.2.DenseReluDense.wo.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.21.layer.2.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.0.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.1.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.2.DenseReluDense.wo.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.22.layer.2.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.0.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.k.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.o.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.q.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.v.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.1.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.2.DenseReluDense.wo.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.23.layer.2.layer_norm.weight": "model-00005-of-00005.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.3.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.4.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.5.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.6.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.7.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.0.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.k.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.o.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.v.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.1.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.2.DenseReluDense.wo.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.8.layer.2.layer_norm.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.q.weight": "model-00003-of-00005.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.0.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.k.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.o.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.q.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.v.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.1.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.2.DenseReluDense.wo.weight": "model-00004-of-00005.safetensors",
+        "decoder.block.9.layer.2.layer_norm.weight": "model-00004-of-00005.safetensors",
+        "decoder.embed_tokens.weight": "model-00003-of-00005.safetensors",
+        "decoder.final_layer_norm.weight": "model-00005-of-00005.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.11.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.12.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.13.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.14.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.15.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.16.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.17.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.18.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.19.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.20.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.21.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.22.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.k.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.o.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.q.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.v.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.0.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.23.layer.1.layer_norm.weight": "model-00002-of-00005.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00005.safetensors",
+        "encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00005.safetensors",
+        "encoder.embed_tokens.weight": "model-00001-of-00005.safetensors",
+        "encoder.final_layer_norm.weight": "model-00002-of-00005.safetensors",
+        "lm_head.weight": "model-00005-of-00005.safetensors",
+        "shared.weight": "model-00001-of-00005.safetensors"
+    }
+}

pytorch_model-00001-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e59f5b92cddafc7352b236b6cb5d1eac010712cae9ed23be81caf701255e719
+size 9452284099

pytorch_model-00002-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:685334dbb04020ebe63f1d10c6d1401d8fc11dc650fd83582311dc8a3d7c9f1f
+size 9597030149

pytorch_model-00003-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:25fbb7a31f026dece0c7db7fb997f4b658c45ef36e748d653dccac4b1625759d
+size 9955673595

pytorch_model-00004-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe2d4605b8d1dd52feb8963df9d2b377d6a4dcbdcc8a41cdd49022214cfcaeb5
+size 9999740781

pytorch_model-00005-of-00005.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae0643b508fcd5cbe22447d588c12ddb81e04f451acc92e63ed0ecdfa4fc7080
+size 6063168980

pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,567 @@

+{
+  "metadata": {
+    "total_size": 45594099712
+  },
+  "weight_map": {
+    "decoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.0.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.1.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.10.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.11.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.12.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.13.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.14.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.15.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.16.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.17.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wo.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.18.layer.2.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.0.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.1.EncDecAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.1.EncDecAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.1.EncDecAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.1.EncDecAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.1.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wo.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.19.layer.2.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.2.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.0.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.1.EncDecAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.1.EncDecAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.1.EncDecAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.1.EncDecAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.1.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wo.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.20.layer.2.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.0.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.1.EncDecAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.1.EncDecAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.1.EncDecAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.1.EncDecAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.1.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wo.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.21.layer.2.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.0.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.1.EncDecAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.1.EncDecAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.1.EncDecAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.1.EncDecAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.1.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wo.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.22.layer.2.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.0.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.1.EncDecAttention.k.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.1.EncDecAttention.o.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.1.EncDecAttention.q.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.1.EncDecAttention.v.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.1.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wo.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.23.layer.2.layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "decoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.3.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.4.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.5.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.6.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.7.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.0.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.1.EncDecAttention.k.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.1.EncDecAttention.o.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.1.EncDecAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.1.EncDecAttention.v.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.1.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wo.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.8.layer.2.layer_norm.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.0.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.1.EncDecAttention.k.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.1.EncDecAttention.o.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.1.EncDecAttention.q.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.1.EncDecAttention.v.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.1.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wo.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.block.9.layer.2.layer_norm.weight": "pytorch_model-00004-of-00005.bin",
+    "decoder.embed_tokens.weight": "pytorch_model-00003-of-00005.bin",
+    "decoder.final_layer_norm.weight": "pytorch_model-00005-of-00005.bin",
+    "encoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.0.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.1.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.10.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.11.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.12.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.13.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.14.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.15.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.16.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.17.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.18.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.19.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.2.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.20.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.21.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.22.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.0.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wo.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.23.layer.1.layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "encoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.3.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.4.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.5.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.6.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.7.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.8.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.0.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.block.9.layer.1.layer_norm.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.embed_tokens.weight": "pytorch_model-00001-of-00005.bin",
+    "encoder.final_layer_norm.weight": "pytorch_model-00002-of-00005.bin",
+    "lm_head.weight": "pytorch_model-00005-of-00005.bin",
+    "shared.weight": "pytorch_model-00001-of-00005.bin"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
+size 791656

tf_model-00001-of-00005.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b26cdb4f6a728ad7d7ac8ab74100ceb9988f27119a4c58e96c781baa9cb9f6a
+size 9989362864

tf_model-00002-of-00005.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f22f1007bc7f7df18cd01258ca980ed570f3ef9c0d0d2e7dfaf214693e92b28c
+size 9932810536

tf_model-00003-of-00005.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f8c1b2e49c147718deeaf0562bc46f9ca1bca52e38f66fe25c6271a67c3b92b
+size 9999968760

tf_model-00004-of-00005.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0096154c641dba8fdc3d87fabbdf2346693b45d81711ca9913a13b4734c44540
+size 9999964640

tf_model-00005-of-00005.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:020b93b1bb89dbbdf61fec1184c342e78ec06294c52708efda40a7fee0edbae8
+size 4620350264

tf_model.h5.index.json ADDED Viewed

	@@ -0,0 +1,565 @@

+{
+  "metadata": {
+    "total_size": 44541329408
+  },
+  "weight_map": {
+    "shared/shared/embeddings:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._0/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._1/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._10/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._11/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._12/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._13/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._14/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._15/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._16/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._17/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._18/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._0/SelfAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._0/SelfAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._0/SelfAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._0/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._1/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._19/layer_._2/layer_norm/weight:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._2/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._0/SelfAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._0/SelfAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._0/SelfAttention/q/kernel:0": "tf_model-00004-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._0/SelfAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._0/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._1/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._20/layer_._2/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._0/SelfAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._0/SelfAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._0/SelfAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._0/SelfAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._0/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._1/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._21/layer_._2/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._0/SelfAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._0/SelfAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._0/SelfAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._0/SelfAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._0/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._1/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._22/layer_._2/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._0/SelfAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._0/SelfAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._0/SelfAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._0/SelfAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._0/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._1/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._23/layer_._2/layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._3/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._4/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._5/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._6/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._7/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._8/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._0/SelfAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._0/SelfAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._0/SelfAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._0/SelfAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._0/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._1/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/block_._9/layer_._2/layer_norm/weight:0": "tf_model-00003-of-00005.h5",
+    "tft5_for_conditional_generation_1/decoder/final_layer_norm/weight:0": "tf_model-00005-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._0/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._1/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._10/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._11/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._12/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._13/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._14/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._15/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._16/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._17/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._18/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._19/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._2/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._20/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._21/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._22/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._23/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._3/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._4/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._5/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._6/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._7/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._8/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/block_._9/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00005.h5",
+    "tft5_for_conditional_generation_1/encoder/final_layer_norm/weight:0": "tf_model-00002-of-00005.h5",
+    "tft5_for_conditional_generation_1/lm_head/kernel:0": "tf_model-00005-of-00005.h5"
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "model_max_length": 512,
+  "name_or_path": "google/t5-v1_1-small",
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "special_tokens_map_file": "/home/arthur_huggingface_co/.cache/huggingface/hub/models--google--t5-v1_1-small/snapshots/fb7e6cba609f7bab11c614294bc04f82f613c7b1/special_tokens_map.json",
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>"
+}