AdityaNarayan commited on
Commit
668dab7
·
verified ·
1 Parent(s): 3dcceba

updated README.MD

Browse files
Files changed (1) hide show
  1. README.md +9 -10
README.md CHANGED
@@ -49,8 +49,7 @@ target_modules: # Applied to all linear layers
49
  ```
50
 
51
  ### Training Hyperparameters
52
- - **Epochs**: 2.3
53
- - **Steps**: 550
54
  - **Batch Size**: 2 per device (16 effective with gradient accumulation)
55
  - **Learning Rate**: 5e-5 (cosine schedule)
56
  - **Max Context**: 8,192 tokens
@@ -59,14 +58,14 @@ target_modules: # Applied to all linear layers
59
 
60
  ### Training Results
61
  ```
62
- "final_train_loss": 0.2793,
63
- "final_eval_loss": 0.3765236437320709,
64
- "final_train_perplexity": 1.322203945559979,
65
- "final_eval_perplexity": 1.457209992899547,
66
- "final_token_accuracy": 0.9227368004620076,
67
- "initial_loss": 1.6654,
68
- "initial_perplexity": 5.2877879419709135,
69
- "initial_accuracy": 0.6416946474462748
70
  ```
71
 
72
  ## 🚀 Usage
 
49
  ```
50
 
51
  ### Training Hyperparameters
52
+ - **Epochs**: 3
 
53
  - **Batch Size**: 2 per device (16 effective with gradient accumulation)
54
  - **Learning Rate**: 5e-5 (cosine schedule)
55
  - **Max Context**: 8,192 tokens
 
58
 
59
  ### Training Results
60
  ```
61
+ "final_train_loss": 0.2641,
62
+ "final_eval_loss": 0.37574875354766846,
63
+ "final_train_perplexity": 1.3022584156313823,
64
+ "final_eval_perplexity": 1.4560812525608204,
65
+ "final_token_accuracy": 0.9259863365441561,
66
+ "initial_loss": 1.6648,
67
+ "initial_perplexity": 5.284616220817229,
68
+ "initial_accuracy": 0.6015806214883923
69
  ```
70
 
71
  ## 🚀 Usage