nielsr HF Staff commited on
Commit
dc7563d
·
verified ·
1 Parent(s): 5cf78c6

Improve paper link display title

Browse files

This PR improves the model card by updating the display title of the paper link in the content to its full official title, "TPTT: Transforming Pretrained Transformers into Titans", while keeping the existing Hugging Face Papers URL. This change enhances clarity and consistency with the paper's official documentation.

No other changes are made to the metadata, badges (including the Arxiv link as per guidelines), or usage examples, as they are already well-documented.

Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
2
- language: en
3
- license: apache-2.0
4
- library_name: transformers
5
- tags:
6
- - tptt
7
- - peft
8
- - trust_remote_code
9
- pipeline_tag: text-generation
10
  base_model: allenai/OLMoE-1B-7B-0924
11
  datasets:
12
  - yahma/alpaca-cleaned
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Titans-v2-OLMoE-1B-7B-0924
@@ -34,20 +34,20 @@ datasets:
34
 
35
  Titanesque version of `allenai/OLMoE-1B-7B-0924` with parallel linearized attention (TPTT 😊) and PEFT.
36
 
37
- The architecture was presented in the paper [TPTT](https://huggingface.co/papers/2506.17671).
38
 
39
 
40
  ## Model list
41
 
42
  Classic model parameter with LiZA injection :
43
 
44
- | Subfolder | Max Self Attn Length | Mag Weight | Cross Gate | Max Chunk Size | Bidirectional | LoRA | Description |
45
- |-------------------------------|----------------------|------------|------------|----------------|---------------|------|-------------------------------------------------------|
46
- | delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
47
- | delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
48
- | delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
49
- | delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
50
- | delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
51
 
52
  ## Usage
53
 
 
1
  ---
 
 
 
 
 
 
 
 
2
  base_model: allenai/OLMoE-1B-7B-0924
3
  datasets:
4
  - yahma/alpaca-cleaned
5
+ language: en
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - tptt
11
+ - peft
12
+ - trust_remote_code
13
  ---
14
 
15
  # Titans-v2-OLMoE-1B-7B-0924
 
34
 
35
  Titanesque version of `allenai/OLMoE-1B-7B-0924` with parallel linearized attention (TPTT 😊) and PEFT.
36
 
37
+ The architecture was presented in the paper [TPTT: Transforming Pretrained Transformers into Titans](https://huggingface.co/papers/2506.17671).
38
 
39
 
40
  ## Model list
41
 
42
  Classic model parameter with LiZA injection :
43
 
44
+ | Subfolder | Max Self Attn Length | Mag Weight | Cross Gate | Max Chunk Size | Bidirectional | LoRA | Description |
45
+ |---|---|---|---|---|---|---|---|
46
+ | delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
47
+ | delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
48
+ | delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
49
+ | delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
50
+ | delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
51
 
52
  ## Usage
53