ffurfaro nielsr HF Staff commited on
Commit
a39b094
·
verified ·
1 Parent(s): 5cf78c6

Improve paper link display title (#1)

Browse files

- Improve paper link display title (dc7563d4f6f7cf9e69d799ae2ced2d104ea98a96)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
2
- language: en
3
- license: apache-2.0
4
- library_name: transformers
5
- tags:
6
- - tptt
7
- - peft
8
- - trust_remote_code
9
- pipeline_tag: text-generation
10
  base_model: allenai/OLMoE-1B-7B-0924
11
  datasets:
12
  - yahma/alpaca-cleaned
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Titans-v2-OLMoE-1B-7B-0924
@@ -34,20 +34,20 @@ datasets:
34
 
35
  Titanesque version of `allenai/OLMoE-1B-7B-0924` with parallel linearized attention (TPTT 😊) and PEFT.
36
 
37
- The architecture was presented in the paper [TPTT](https://huggingface.co/papers/2506.17671).
38
 
39
 
40
  ## Model list
41
 
42
  Classic model parameter with LiZA injection :
43
 
44
- | Subfolder | Max Self Attn Length | Mag Weight | Cross Gate | Max Chunk Size | Bidirectional | LoRA | Description |
45
- |-------------------------------|----------------------|------------|------------|----------------|---------------|------|-------------------------------------------------------|
46
- | delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
47
- | delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
48
- | delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
49
- | delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
50
- | delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
51
 
52
  ## Usage
53
 
 
1
  ---
 
 
 
 
 
 
 
 
2
  base_model: allenai/OLMoE-1B-7B-0924
3
  datasets:
4
  - yahma/alpaca-cleaned
5
+ language: en
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - tptt
11
+ - peft
12
+ - trust_remote_code
13
  ---
14
 
15
  # Titans-v2-OLMoE-1B-7B-0924
 
34
 
35
  Titanesque version of `allenai/OLMoE-1B-7B-0924` with parallel linearized attention (TPTT 😊) and PEFT.
36
 
37
+ The architecture was presented in the paper [TPTT: Transforming Pretrained Transformers into Titans](https://huggingface.co/papers/2506.17671).
38
 
39
 
40
  ## Model list
41
 
42
  Classic model parameter with LiZA injection :
43
 
44
+ | Subfolder | Max Self Attn Length | Mag Weight | Cross Gate | Max Chunk Size | Bidirectional | LoRA | Description |
45
+ |---|---|---|---|---|---|---|---|
46
+ | delta_rule | 8192 (default) | 0.5 | False | 64 | False | Yes | Parallel linearized attention with delta_rule operator|
47
+ | delta_rule_gelu | 8192 (default) | 0.5 | False | 64 | False | Yes | Non-linear operator with gelu activation |
48
+ | delta_product | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with derivative trick |
49
+ | delta_product_r | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with rotative trick |
50
+ | delta_product_c | 8192 (default) | 0.5 | False | 64 | False | Yes | Second order operator with combined trick |
51
 
52
  ## Usage
53