Remove table and add direct link to DPO model
Browse files
README.md
CHANGED
|
@@ -49,13 +49,9 @@ This model card provides an overview of our **TFree-HAT-Pretrained-7B-Base** mod
|
|
| 49 |
|
| 50 |
The model is based on our Hierarchical Autoregressive Transformer (HAT) architecture which is described originally in our [paper](https://arxiv.org/abs/2501.10322). This novel architecture integrates character-level encoding and decoding with the word-level backbone, allowing for improved text compression (less sequence positions) and performance in the languages it has been trained on, and potentially higher robustness to prompt changes, as well as improved adaptability to new languages & domains via fine-tuning.
|
| 51 |
|
| 52 |
-
The model was pre-trained in English & German on carefully curated data in compliance with applicable EU and national regulations, including copyright and data privacy laws. It shows strong proficiency in German, while also beating Llama 3.1 on many benchmarks in English.
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
| Model Name | Description |
|
| 57 |
-
| --- | --- |
|
| 58 |
-
| `TFree-HAT-Pretrained-7B-Base` | [Link](https://huggingface.co/Aleph-Alpha/TFree-HAT-Pretrained-7B-Base) - pre-trained for English and German, adapted to a maximum context length of 32900 words |
|
| 59 |
|
| 60 |
# Model Access
|
| 61 |
|
|
|
|
| 49 |
|
| 50 |
The model is based on our Hierarchical Autoregressive Transformer (HAT) architecture which is described originally in our [paper](https://arxiv.org/abs/2501.10322). This novel architecture integrates character-level encoding and decoding with the word-level backbone, allowing for improved text compression (less sequence positions) and performance in the languages it has been trained on, and potentially higher robustness to prompt changes, as well as improved adaptability to new languages & domains via fine-tuning.
|
| 51 |
|
| 52 |
+
The model was pre-trained in English & German and adapted to a maximum context length of 32900 words on carefully curated data in compliance with applicable EU and national regulations, including copyright and data privacy laws. It shows strong proficiency in German, while also beating Llama 3.1 on many benchmarks in English.
|
| 53 |
|
| 54 |
+
A model post-trained and direct-preference-optimized for English & German starting from this base model can be found under this [Link](https://huggingface.co/Aleph-Alpha/llama-tfree-hat-pretrained-7b-dpo).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
# Model Access
|
| 57 |
|