Retentive Network: A Successor to Transformer for Large Language Models
Paper
•
2307.08621
•
Published
•
172
This model is a Indonesian RetNet model train using the Liputan6 dataset. Using Tokenizer from IndoBERT It achieves the following results on the evaluation set:
Demonstrate training and recurrent inference using a retentive network (https://arxiv.org/pdf/2307.08621.pdf). The code utilizes Sehyun Choi's implementation of retentive network (https://github.com/syncdoth/RetNet).
Intended to demonstrate training and (recurrent O(1)) inference using a retentive network in Indonesian language.
Using Train and validation set from Liputan6 dataset provided by NusaCrowd.
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 4.5053 | 0.17 | 1000 | 4.5145 |
| 4.1281 | 0.34 | 2000 | 4.1702 |
| 3.9452 | 0.52 | 3000 | 4.0094 |
| 3.8302 | 0.69 | 4000 | 3.8972 |
| 3.6955 | 0.86 | 5000 | 3.8144 |
| 3.589 | 1.03 | 6000 | 3.7600 |
| 3.5279 | 1.21 | 7000 | 3.7088 |
| 3.4598 | 1.38 | 8000 | 3.6670 |
| 3.4445 | 1.55 | 9000 | 3.6259 |
| 3.4098 | 1.72 | 10000 | 3.5904 |
| 3.3455 | 1.9 | 11000 | 3.5610 |
| 3.2306 | 2.07 | 12000 | 3.5406 |
| 3.261 | 2.24 | 13000 | 3.5216 |
| 3.2204 | 2.41 | 14000 | 3.5111 |
| 3.2321 | 2.59 | 15000 | 3.5001 |
| 3.2514 | 2.76 | 16000 | 3.4941 |
| 3.233 | 2.93 | 17000 | 3.4936 |