updated README.md
Browse files
README.md
CHANGED
|
@@ -50,11 +50,10 @@ target_modules: # Applied to all linear layers
|
|
| 50 |
|
| 51 |
### Training Hyperparameters
|
| 52 |
- **Epochs**: 3
|
| 53 |
-
- **Batch Size**: 2 per device (16 effective with gradient accumulation)
|
| 54 |
- **Learning Rate**: 5e-5 (cosine schedule)
|
| 55 |
- **Max Context**: 8,192 tokens
|
| 56 |
-
- **Hardware**:
|
| 57 |
-
|
| 58 |
|
| 59 |
### Training Results
|
| 60 |
```
|
|
|
|
| 50 |
|
| 51 |
### Training Hyperparameters
|
| 52 |
- **Epochs**: 3
|
|
|
|
| 53 |
- **Learning Rate**: 5e-5 (cosine schedule)
|
| 54 |
- **Max Context**: 8,192 tokens
|
| 55 |
+
- **Hardware**: 4 x NVIDIA H200
|
| 56 |
+
|
| 57 |
|
| 58 |
### Training Results
|
| 59 |
```
|