AdityaNarayan commited on
Commit
fe12de4
·
verified ·
1 Parent(s): 668dab7

updated README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -50,11 +50,10 @@ target_modules: # Applied to all linear layers
50
 
51
  ### Training Hyperparameters
52
  - **Epochs**: 3
53
- - **Batch Size**: 2 per device (16 effective with gradient accumulation)
54
  - **Learning Rate**: 5e-5 (cosine schedule)
55
  - **Max Context**: 8,192 tokens
56
- - **Hardware**: 2x NVIDIA H200 (80GB each)
57
- - **Training Time**: ~4 hours (2,355 steps)
58
 
59
  ### Training Results
60
  ```
 
50
 
51
  ### Training Hyperparameters
52
  - **Epochs**: 3
 
53
  - **Learning Rate**: 5e-5 (cosine schedule)
54
  - **Max Context**: 8,192 tokens
55
+ - **Hardware**: 4 x NVIDIA H200
56
+
57
 
58
  ### Training Results
59
  ```