Improve paper link display title
Browse filesThis PR improves the model card by updating the display title of the paper link in the content to its full official title, "TPTT: Transforming Pretrained Transformers into Titans", while keeping the existing Hugging Face Papers URL. This change enhances clarity and consistency with the paper's official documentation.
No other changes are made to the metadata, badges (including the Arxiv link as per guidelines), or usage examples, as they are already well-documented.
README.md
CHANGED
|
@@ -1,15 +1,15 @@
|
|
| 1 |
---
|
| 2 |
-
language: en
|
| 3 |
-
license: apache-2.0
|
| 4 |
-
library_name: transformers
|
| 5 |
-
tags:
|
| 6 |
-
- tptt
|
| 7 |
-
- peft
|
| 8 |
-
- trust_remote_code
|
| 9 |
-
pipeline_tag: text-generation
|
| 10 |
base_model: allenai/OLMoE-1B-7B-0924
|
| 11 |
datasets:
|
| 12 |
- yahma/alpaca-cleaned
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# Titans-v2-OLMoE-1B-7B-0924
|
|
@@ -34,20 +34,20 @@ datasets:
|
|
| 34 |
|
| 35 |
Titanesque version of `allenai/OLMoE-1B-7B-0924` with parallel linearized attention (TPTT 😊) and PEFT.
|
| 36 |
|
| 37 |
-
The architecture was presented in the paper [TPTT](https://huggingface.co/papers/2506.17671).
|
| 38 |
|
| 39 |
|
| 40 |
## Model list
|
| 41 |
|
| 42 |
Classic model parameter with LiZA injection :
|
| 43 |
|
| 44 |
-
| Subfolder
|
| 45 |
-
|
| 46 |
-
| delta_rule
|
| 47 |
-
| delta_rule_gelu | 8192 (default) | 0.5
|
| 48 |
-
| delta_product
|
| 49 |
-
| delta_product_r
|
| 50 |
-
| delta_product_c
|
| 51 |
|
| 52 |
## Usage
|
| 53 |
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model: allenai/OLMoE-1B-7B-0924
|
| 3 |
datasets:
|
| 4 |
- yahma/alpaca-cleaned
|
| 5 |
+
language: en
|
| 6 |
+
library_name: transformers
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
tags:
|
| 10 |
+
- tptt
|
| 11 |
+
- peft
|
| 12 |
+
- trust_remote_code
|
| 13 |
---
|
| 14 |
|
| 15 |
# Titans-v2-OLMoE-1B-7B-0924
|
|
|
|
| 34 |
|
| 35 |
Titanesque version of `allenai/OLMoE-1B-7B-0924` with parallel linearized attention (TPTT 😊) and PEFT.
|
| 36 |
|
| 37 |
+
The architecture was presented in the paper [TPTT: Transforming Pretrained Transformers into Titans](https://huggingface.co/papers/2506.17671).
|
| 38 |
|
| 39 |
|
| 40 |
## Model list
|
| 41 |
|
| 42 |
Classic model parameter with LiZA injection :
|
| 43 |
|
| 44 |
+
| Subfolder | Max Self Attn Length | Mag Weight | Cross Gate | Max Chunk Size | Bidirectional | LoRA | Description |
|
| 45 |
+
|---|---|---|---|---|---|---|---|
|
| 46 |
+
| delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
|
| 47 |
+
| delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
|
| 48 |
+
| delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
|
| 49 |
+
| delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
|
| 50 |
+
| delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
|
| 51 |
|
| 52 |
## Usage
|
| 53 |
|