Training in progress, epoch 2, checkpoint

Browse files

Files changed (15) hide show

checkpoint-31138/README.md +202 -0
checkpoint-31138/adapter_config.json +32 -0
checkpoint-31138/adapter_model.safetensors +3 -0
checkpoint-31138/optimizer.pt +3 -0
checkpoint-31138/rng_state_0.pth +3 -0
checkpoint-31138/rng_state_1.pth +3 -0
checkpoint-31138/rng_state_2.pth +3 -0
checkpoint-31138/rng_state_3.pth +3 -0
checkpoint-31138/rng_state_4.pth +3 -0
checkpoint-31138/rng_state_5.pth +3 -0
checkpoint-31138/rng_state_6.pth +3 -0
checkpoint-31138/rng_state_7.pth +3 -0
checkpoint-31138/scheduler.pt +3 -0
checkpoint-31138/trainer_state.json +539 -0
checkpoint-31138/training_args.bin +3 -0

checkpoint-31138/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: meta-llama/Llama-3.2-1B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

checkpoint-31138/adapter_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.2-1B",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-31138/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa1da4a1ea4c65c7018d2dba34a9f45021358b2ca448155a9f625353d5df95b7
+size 3416264

checkpoint-31138/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88abefbc31ce4184da3e73a7ba375d5c80daca3182a0cac8d2da5cd07f3ea150
+size 6869818

checkpoint-31138/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18949a21310584311bca4b98e55d8e9f5f59a73d33edf84a6fdc1e1d687e28dd
+size 15984

checkpoint-31138/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8db58c6faa0e31311536cf3d63b7d73dfbff983882efd3ff2b853f1f831948f4
+size 15984

checkpoint-31138/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:acc593c4a9fac955d67c7a3f2999e9bbd57829191dfafd08fb9dce42b924d39e
+size 15984

checkpoint-31138/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:400c57f0e645a400b42b30dc65eb1c2b2b3d8673999465459a383128b2740687
+size 15984

checkpoint-31138/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8396edc2032974eb3d4e16af74ace08fb2ec9dfbe76da61a6775ecc614eb7bb
+size 15984

checkpoint-31138/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:501d5d285e27825eaaa16954c7dfb6dd28d0d399b0b7ebb1c141d279e78393c5
+size 15984

checkpoint-31138/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d079c7d7409603b273376fc3c32ed4a2c33cdd27f03acf06a7911694cba29d30
+size 15984

checkpoint-31138/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fff6e3259982eb102af91af320b78802ccb9148e573613e5fbe4e0950a6e0478
+size 15984

checkpoint-31138/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:843bd60e5e01ef6e93ca51854fca28d4dc45c7a5934d89b58dd0dad8aa6fcd96
+size 1064

checkpoint-31138/trainer_state.json ADDED Viewed

	@@ -0,0 +1,539 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 3114,
+  "global_step": 31138,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03211510052026463,
+      "grad_norm": 1.2270241975784302,
+      "learning_rate": 0.0007978718393388571,
+      "loss": 0.4499,
+      "step": 500
+    },
+    {
+      "epoch": 0.06423020104052926,
+      "grad_norm": 1.1903154850006104,
+      "learning_rate": 0.0007957308326375062,
+      "loss": 0.3541,
+      "step": 1000
+    },
+    {
+      "epoch": 0.09634530156079389,
+      "grad_norm": 1.1337592601776123,
+      "learning_rate": 0.0007935898259361552,
+      "loss": 0.3336,
+      "step": 1500
+    },
+    {
+      "epoch": 0.12846040208105852,
+      "grad_norm": 0.9932805895805359,
+      "learning_rate": 0.0007914488192348043,
+      "loss": 0.3187,
+      "step": 2000
+    },
+    {
+      "epoch": 0.16057550260132314,
+      "grad_norm": 1.4273182153701782,
+      "learning_rate": 0.0007893120945468559,
+      "loss": 0.3094,
+      "step": 2500
+    },
+    {
+      "epoch": 0.19269060312158778,
+      "grad_norm": 1.296076774597168,
+      "learning_rate": 0.000787171087845505,
+      "loss": 0.306,
+      "step": 3000
+    },
+    {
+      "epoch": 0.2000128460402081,
+      "eval_loss": 0.3016555607318878,
+      "eval_runtime": 4.134,
+      "eval_samples_per_second": 120.948,
+      "eval_steps_per_second": 7.741,
+      "step": 3114
+    },
+    {
+      "epoch": 0.2248057036418524,
+      "grad_norm": 1.1079399585723877,
+      "learning_rate": 0.0007850343631575566,
+      "loss": 0.2993,
+      "step": 3500
+    },
+    {
+      "epoch": 0.25692080416211704,
+      "grad_norm": 1.5939998626708984,
+      "learning_rate": 0.0007828933564562057,
+      "loss": 0.2985,
+      "step": 4000
+    },
+    {
+      "epoch": 0.28903590468238166,
+      "grad_norm": 1.4033524990081787,
+      "learning_rate": 0.0007807523497548548,
+      "loss": 0.2959,
+      "step": 4500
+    },
+    {
+      "epoch": 0.3211510052026463,
+      "grad_norm": 1.9331912994384766,
+      "learning_rate": 0.0007786113430535038,
+      "loss": 0.2881,
+      "step": 5000
+    },
+    {
+      "epoch": 0.3532661057229109,
+      "grad_norm": 1.112199068069458,
+      "learning_rate": 0.0007764703363521528,
+      "loss": 0.2874,
+      "step": 5500
+    },
+    {
+      "epoch": 0.38538120624317557,
+      "grad_norm": 0.9604991674423218,
+      "learning_rate": 0.0007743293296508018,
+      "loss": 0.2867,
+      "step": 6000
+    },
+    {
+      "epoch": 0.4000256920804162,
+      "eval_loss": 0.27945849299430847,
+      "eval_runtime": 4.1149,
+      "eval_samples_per_second": 121.509,
+      "eval_steps_per_second": 7.777,
+      "step": 6228
+    },
+    {
+      "epoch": 0.4174963067634402,
+      "grad_norm": 1.542017936706543,
+      "learning_rate": 0.0007721883229494509,
+      "loss": 0.2842,
+      "step": 6500
+    },
+    {
+      "epoch": 0.4496114072837048,
+      "grad_norm": 0.9913608431816101,
+      "learning_rate": 0.0007700473162480999,
+      "loss": 0.2805,
+      "step": 7000
+    },
+    {
+      "epoch": 0.4817265078039694,
+      "grad_norm": 1.0861561298370361,
+      "learning_rate": 0.000767906309546749,
+      "loss": 0.2795,
+      "step": 7500
+    },
+    {
+      "epoch": 0.5138416083242341,
+      "grad_norm": 1.408018946647644,
+      "learning_rate": 0.0007657695848588007,
+      "loss": 0.2791,
+      "step": 8000
+    },
+    {
+      "epoch": 0.5459567088444987,
+      "grad_norm": 0.935492992401123,
+      "learning_rate": 0.0007636285781574497,
+      "loss": 0.2768,
+      "step": 8500
+    },
+    {
+      "epoch": 0.5780718093647633,
+      "grad_norm": 0.974107027053833,
+      "learning_rate": 0.0007614875714560987,
+      "loss": 0.2775,
+      "step": 9000
+    },
+    {
+      "epoch": 0.6000385381206244,
+      "eval_loss": 0.26882508397102356,
+      "eval_runtime": 4.0584,
+      "eval_samples_per_second": 123.2,
+      "eval_steps_per_second": 7.885,
+      "step": 9342
+    },
+    {
+      "epoch": 0.610186909885028,
+      "grad_norm": 1.00677490234375,
+      "learning_rate": 0.0007593465647547478,
+      "loss": 0.2719,
+      "step": 9500
+    },
+    {
+      "epoch": 0.6423020104052926,
+      "grad_norm": 1.3076094388961792,
+      "learning_rate": 0.0007572098400667994,
+      "loss": 0.2733,
+      "step": 10000
+    },
+    {
+      "epoch": 0.6744171109255572,
+      "grad_norm": 1.332555890083313,
+      "learning_rate": 0.0007550688333654485,
+      "loss": 0.2738,
+      "step": 10500
+    },
+    {
+      "epoch": 0.7065322114458218,
+      "grad_norm": 1.065308928489685,
+      "learning_rate": 0.0007529278266640976,
+      "loss": 0.2697,
+      "step": 11000
+    },
+    {
+      "epoch": 0.7386473119660865,
+      "grad_norm": 1.1714705228805542,
+      "learning_rate": 0.0007507868199627465,
+      "loss": 0.2718,
+      "step": 11500
+    },
+    {
+      "epoch": 0.7707624124863511,
+      "grad_norm": 1.545327067375183,
+      "learning_rate": 0.0007486500952747983,
+      "loss": 0.2701,
+      "step": 12000
+    },
+    {
+      "epoch": 0.8000513841608324,
+      "eval_loss": 0.25833237171173096,
+      "eval_runtime": 3.9621,
+      "eval_samples_per_second": 126.196,
+      "eval_steps_per_second": 8.077,
+      "step": 12456
+    },
+    {
+      "epoch": 0.8028775130066157,
+      "grad_norm": 0.8509078621864319,
+      "learning_rate": 0.0007465090885734473,
+      "loss": 0.2702,
+      "step": 12500
+    },
+    {
+      "epoch": 0.8349926135268804,
+      "grad_norm": 1.0517570972442627,
+      "learning_rate": 0.0007443680818720963,
+      "loss": 0.2684,
+      "step": 13000
+    },
+    {
+      "epoch": 0.8671077140471449,
+      "grad_norm": 1.0709669589996338,
+      "learning_rate": 0.000742231357184148,
+      "loss": 0.2674,
+      "step": 13500
+    },
+    {
+      "epoch": 0.8992228145674096,
+      "grad_norm": 1.1354570388793945,
+      "learning_rate": 0.0007400903504827971,
+      "loss": 0.269,
+      "step": 14000
+    },
+    {
+      "epoch": 0.9313379150876743,
+      "grad_norm": 1.5820938348770142,
+      "learning_rate": 0.000737949343781446,
+      "loss": 0.2665,
+      "step": 14500
+    },
+    {
+      "epoch": 0.9634530156079388,
+      "grad_norm": 1.3303571939468384,
+      "learning_rate": 0.0007358083370800951,
+      "loss": 0.2644,
+      "step": 15000
+    },
+    {
+      "epoch": 0.9955681161282035,
+      "grad_norm": 1.0390913486480713,
+      "learning_rate": 0.0007336716123921468,
+      "loss": 0.268,
+      "step": 15500
+    },
+    {
+      "epoch": 1.0000642302010405,
+      "eval_loss": 0.25299617648124695,
+      "eval_runtime": 3.8751,
+      "eval_samples_per_second": 129.03,
+      "eval_steps_per_second": 8.258,
+      "step": 15570
+    },
+    {
+      "epoch": 1.0276832166484682,
+      "grad_norm": 1.2327704429626465,
+      "learning_rate": 0.0007315306056907958,
+      "loss": 0.263,
+      "step": 16000
+    },
+    {
+      "epoch": 1.0597983171687329,
+      "grad_norm": 0.9403806924819946,
+      "learning_rate": 0.0007293895989894448,
+      "loss": 0.2633,
+      "step": 16500
+    },
+    {
+      "epoch": 1.0919134176889973,
+      "grad_norm": 1.1138664484024048,
+      "learning_rate": 0.0007272485922880939,
+      "loss": 0.2608,
+      "step": 17000
+    },
+    {
+      "epoch": 1.124028518209262,
+      "grad_norm": 1.1546539068222046,
+      "learning_rate": 0.000725107585586743,
+      "loss": 0.2569,
+      "step": 17500
+    },
+    {
+      "epoch": 1.1561436187295266,
+      "grad_norm": 1.0123635530471802,
+      "learning_rate": 0.0007229665788853919,
+      "loss": 0.2596,
+      "step": 18000
+    },
+    {
+      "epoch": 1.1882587192497913,
+      "grad_norm": 1.1647980213165283,
+      "learning_rate": 0.000720825572184041,
+      "loss": 0.2609,
+      "step": 18500
+    },
+    {
+      "epoch": 1.2000770762412487,
+      "eval_loss": 0.2469949871301651,
+      "eval_runtime": 3.796,
+      "eval_samples_per_second": 131.719,
+      "eval_steps_per_second": 8.43,
+      "step": 18684
+    },
+    {
+      "epoch": 1.2203738197700558,
+      "grad_norm": 1.2368906736373901,
+      "learning_rate": 0.00071868456548269,
+      "loss": 0.2597,
+      "step": 19000
+    },
+    {
+      "epoch": 1.2524889202903204,
+      "grad_norm": 0.9881177544593811,
+      "learning_rate": 0.000716543558781339,
+      "loss": 0.2563,
+      "step": 19500
+    },
+    {
+      "epoch": 1.2846040208105851,
+      "grad_norm": 0.9961882829666138,
+      "learning_rate": 0.0007144068340933907,
+      "loss": 0.2563,
+      "step": 20000
+    },
+    {
+      "epoch": 1.3167191213308498,
+      "grad_norm": 1.6545355319976807,
+      "learning_rate": 0.0007122658273920398,
+      "loss": 0.2566,
+      "step": 20500
+    },
+    {
+      "epoch": 1.3488342218511145,
+      "grad_norm": 1.2175770998001099,
+      "learning_rate": 0.0007101248206906887,
+      "loss": 0.251,
+      "step": 21000
+    },
+    {
+      "epoch": 1.3809493223713791,
+      "grad_norm": 1.2942149639129639,
+      "learning_rate": 0.0007079838139893379,
+      "loss": 0.2549,
+      "step": 21500
+    },
+    {
+      "epoch": 1.4000899222814567,
+      "eval_loss": 0.24247248470783234,
+      "eval_runtime": 4.0127,
+      "eval_samples_per_second": 124.605,
+      "eval_steps_per_second": 7.975,
+      "step": 21798
+    },
+    {
+      "epoch": 1.4130644228916436,
+      "grad_norm": 0.9972023963928223,
+      "learning_rate": 0.0007058470893013895,
+      "loss": 0.2532,
+      "step": 22000
+    },
+    {
+      "epoch": 1.4451795234119083,
+      "grad_norm": 2.422755479812622,
+      "learning_rate": 0.0007037060826000386,
+      "loss": 0.2525,
+      "step": 22500
+    },
+    {
+      "epoch": 1.477294623932173,
+      "grad_norm": 1.0350821018218994,
+      "learning_rate": 0.0007015650758986876,
+      "loss": 0.2528,
+      "step": 23000
+    },
+    {
+      "epoch": 1.5094097244524374,
+      "grad_norm": 0.9712342023849487,
+      "learning_rate": 0.0006994240691973367,
+      "loss": 0.2498,
+      "step": 23500
+    },
+    {
+      "epoch": 1.541524824972702,
+      "grad_norm": 1.0698814392089844,
+      "learning_rate": 0.0006972830624959856,
+      "loss": 0.2511,
+      "step": 24000
+    },
+    {
+      "epoch": 1.5736399254929667,
+      "grad_norm": 1.0637270212173462,
+      "learning_rate": 0.0006951463378080374,
+      "loss": 0.2542,
+      "step": 24500
+    },
+    {
+      "epoch": 1.6001027683216649,
+      "eval_loss": 0.23835672438144684,
+      "eval_runtime": 3.8011,
+      "eval_samples_per_second": 131.543,
+      "eval_steps_per_second": 8.419,
+      "step": 24912
+    },
+    {
+      "epoch": 1.6057550260132314,
+      "grad_norm": 1.08571195602417,
+      "learning_rate": 0.0006930053311066865,
+      "loss": 0.2531,
+      "step": 25000
+    },
+    {
+      "epoch": 1.637870126533496,
+      "grad_norm": 0.9403467774391174,
+      "learning_rate": 0.0006908643244053354,
+      "loss": 0.2522,
+      "step": 25500
+    },
+    {
+      "epoch": 1.6699852270537607,
+      "grad_norm": 0.8324321508407593,
+      "learning_rate": 0.0006887233177039845,
+      "loss": 0.2493,
+      "step": 26000
+    },
+    {
+      "epoch": 1.7021003275740254,
+      "grad_norm": 0.9599499702453613,
+      "learning_rate": 0.0006865908750294389,
+      "loss": 0.2485,
+      "step": 26500
+    },
+    {
+      "epoch": 1.73421542809429,
+      "grad_norm": 1.369850754737854,
+      "learning_rate": 0.0006844498683280879,
+      "loss": 0.25,
+      "step": 27000
+    },
+    {
+      "epoch": 1.7663305286145545,
+      "grad_norm": 1.042289137840271,
+      "learning_rate": 0.0006823088616267369,
+      "loss": 0.2449,
+      "step": 27500
+    },
+    {
+      "epoch": 1.7984456291348192,
+      "grad_norm": 1.2191327810287476,
+      "learning_rate": 0.000680167854925386,
+      "loss": 0.2489,
+      "step": 28000
+    },
+    {
+      "epoch": 1.8001156143618728,
+      "eval_loss": 0.23773300647735596,
+      "eval_runtime": 4.0318,
+      "eval_samples_per_second": 124.015,
+      "eval_steps_per_second": 7.937,
+      "step": 28026
+    },
+    {
+      "epoch": 1.8305607296550839,
+      "grad_norm": 0.9970433712005615,
+      "learning_rate": 0.0006780268482240349,
+      "loss": 0.2517,
+      "step": 28500
+    },
+    {
+      "epoch": 1.8626758301753483,
+      "grad_norm": 1.0307445526123047,
+      "learning_rate": 0.000675885841522684,
+      "loss": 0.2462,
+      "step": 29000
+    },
+    {
+      "epoch": 1.894790930695613,
+      "grad_norm": 1.1497652530670166,
+      "learning_rate": 0.000673744834821333,
+      "loss": 0.2494,
+      "step": 29500
+    },
+    {
+      "epoch": 1.9269060312158777,
+      "grad_norm": 0.8870740532875061,
+      "learning_rate": 0.000671603828119982,
+      "loss": 0.2459,
+      "step": 30000
+    },
+    {
+      "epoch": 1.9590211317361423,
+      "grad_norm": 1.0110082626342773,
+      "learning_rate": 0.0006694671034320337,
+      "loss": 0.2466,
+      "step": 30500
+    },
+    {
+      "epoch": 1.991136232256407,
+      "grad_norm": 0.9974693655967712,
+      "learning_rate": 0.0006673303787440855,
+      "loss": 0.2469,
+      "step": 31000
+    }
+  ],
+  "logging_steps": 500,
+  "max_steps": 186828,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 12,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.920739022847345e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-31138/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:77551947c299edd07d7d00ea85fe8647fd097b250d8db1832ddb7bb553f9105f
+size 5496