AdityaNarayan
/

KAT-Dev-72B-Exp-CPT-LoRA-Adapter-HyperSwitch

Text Generation

Model card Files Files and versions

AdityaNarayan commited on 9 days ago

Commit

fe12de4

·

verified ·

1 Parent(s): 668dab7

updated README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -50,11 +50,10 @@ target_modules:         # Applied to all linear layers
 ### Training Hyperparameters
 - **Epochs**: 3
-- **Batch Size**: 2 per device (16 effective with gradient accumulation)
 - **Learning Rate**: 5e-5 (cosine schedule)
 - **Max Context**: 8,192 tokens
-- **Hardware**: 2x NVIDIA H200 (80GB each)
-- **Training Time**: ~4 hours (2,355 steps)
 ### Training Results
 ```

 ### Training Hyperparameters
 - **Epochs**: 3
 - **Learning Rate**: 5e-5 (cosine schedule)
 - **Max Context**: 8,192 tokens
+- **Hardware**: 4 x NVIDIA H200
 ### Training Results
 ```